-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep nets with stochastic depth #66
Comments
The identity transform in Eq. 2 is the same as in Eq. 1, which is just the shortcut connection of the ResNet. For shortcuts changing the spatial size (dashed arrows in http://arxiv.org/abs/1512.03385, Fig 3), there are two options, explained in the first paragraph of page 4 of http://arxiv.org/abs/1512.03385. An implementation of that paper is given in https://github.com/Lasagne/Recipes/blob/master/papers/deep_residual_learning/Deep_Residual_Learning_CIFAR-10.py, including these two options. On page 8 of the stochastic depth paper, they mention that for blocks changing the number of filters and spatial dimension, they "replace the identity connections in these blocks by an average pooling layer followed by zero paddings to match the dimensions." This is neither of the two options in the ResNet paper, but it's easy enough to modify the existing Lasagne Recipe to do so. If in doubt about what they did in the stochastic depth paper, refer to the source code at https://github.com/yueatsprograms/Stochastic_Depth. /edit: If you manage to reproduce the results of the stochastic depth paper (CIFAR-10 will be the easiest target), we'd appreciate a PR to this repository. |
Ah, thank you! How did I not notice the implementation detail section in the paper... I'll take a crack at this and see what I can come up with! |
I hope this is the appropriate venue to post this. I don't have an implementation yet, but maybe this ticket could encourage some work.
I am currently interested in this stochastic depth paper:
http://arxiv.org/pdf/1603.09382v2.pdf
I was going to have a go in implementing this, but I was a bit stumped as to how one would go about the identity transform that is mentioned in equation (2). As you can see, if the next layer and the current layer have different output shapes, you need to linearly project the output of the current layer so that it matches the dimensions of the output of the following layer. I'm not clear on how this is done and am afraid it's blatently obvious... is your "projection matrix" (or whatever it's called) a matrix (of some appropriate shape) consisting solely of ones? Furthermore, how would we do this for convolution networks?
It seems like that's the only roadblock for me -- the binomial mask is easy to do.
Let me know what you think.
PS: Interesting, I found a post asking on how to go about implementing this, but it seems to omit the identity transform:
https://www.reddit.com/r/MachineLearning/comments/4dr998/askreddit_has_anyone_implemented_resnets_with/
The text was updated successfully, but these errors were encountered: