Deep nets with stochastic depth #66

christopher-beckham · 2016-05-18T15:29:16Z

I hope this is the appropriate venue to post this. I don't have an implementation yet, but maybe this ticket could encourage some work.

I am currently interested in this stochastic depth paper:

http://arxiv.org/pdf/1603.09382v2.pdf

I was going to have a go in implementing this, but I was a bit stumped as to how one would go about the identity transform that is mentioned in equation (2). As you can see, if the next layer and the current layer have different output shapes, you need to linearly project the output of the current layer so that it matches the dimensions of the output of the following layer. I'm not clear on how this is done and am afraid it's blatently obvious... is your "projection matrix" (or whatever it's called) a matrix (of some appropriate shape) consisting solely of ones? Furthermore, how would we do this for convolution networks?

It seems like that's the only roadblock for me -- the binomial mask is easy to do.

Let me know what you think.

PS: Interesting, I found a post asking on how to go about implementing this, but it seems to omit the identity transform:

https://www.reddit.com/r/MachineLearning/comments/4dr998/askreddit_has_anyone_implemented_resnets_with/

f0k · 2016-05-18T16:16:11Z

As you can see, if the next layer and the current layer have different output shapes, you need to linearly project the output of the current layer so that it matches the dimensions of the output of the following layer.

The identity transform in Eq. 2 is the same as in Eq. 1, which is just the shortcut connection of the ResNet. For shortcuts changing the spatial size (dashed arrows in http://arxiv.org/abs/1512.03385, Fig 3), there are two options, explained in the first paragraph of page 4 of http://arxiv.org/abs/1512.03385. An implementation of that paper is given in https://github.com/Lasagne/Recipes/blob/master/papers/deep_residual_learning/Deep_Residual_Learning_CIFAR-10.py, including these two options.

On page 8 of the stochastic depth paper, they mention that for blocks changing the number of filters and spatial dimension, they "replace the identity connections in these blocks by an average pooling layer followed by zero paddings to match the dimensions." This is neither of the two options in the ResNet paper, but it's easy enough to modify the existing Lasagne Recipe to do so.

If in doubt about what they did in the stochastic depth paper, refer to the source code at https://github.com/yueatsprograms/Stochastic_Depth.

/edit: If you manage to reproduce the results of the stochastic depth paper (CIFAR-10 will be the easiest target), we'd appreciate a PR to this repository.
For additional fun, note that there's also a second ResNet paper (https://arxiv.org/abs/1603.05027) which was done concurrently to the stochastic depth paper. It's possible that combining these two would yield even better results. A Lasagne implementation is here: https://github.com/FlorianMuellerklein/Identity-Mapping-ResNet-Lasagne.

christopher-beckham · 2016-05-18T16:41:31Z

Ah, thank you! How did I not notice the implementation detail section in the paper... I'll take a crack at this and see what I can come up with!

christopher-beckham mentioned this issue May 20, 2016

Stochastic depth example #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep nets with stochastic depth #66

Deep nets with stochastic depth #66

christopher-beckham commented May 18, 2016

f0k commented May 18, 2016 •

edited

Loading

christopher-beckham commented May 18, 2016

Deep nets with stochastic depth #66

Deep nets with stochastic depth #66

Comments

christopher-beckham commented May 18, 2016

f0k commented May 18, 2016 • edited Loading

christopher-beckham commented May 18, 2016

f0k commented May 18, 2016 •

edited

Loading