Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuse layers for conv+affine+relu and conv+relu #2842

Merged

Conversation

facug91
Copy link
Contributor

@facug91 facug91 commented Jul 28, 2023

This PR extends the fuse_layers method to also fuse convolutional layers followed by relu, and convolutional layers follower by affine followed by relu.

I tested it with examples/dnn_mmod_face_detection_ex.cpp and got exactly the same results on the images shown.

Here are also some tests for different neural networks from https://github.com/dlibml/dnn:

Name master (w/o fusion) master (w/ fusion) this PR (w/o fusion) this PR (w/ fusion) comparative1 comparative2
alexnet 9.53 9.63 9.55 9.45 1.84% 1.10%
sqznet1.0 6.53 6.00 6.40 4.47 25.60% 30.25%
sqznet1.1 4.41 3.90 4.41 3.01 22.86% 31.76%
vggnet11 37.00 34.46 36.76 32.77 4.91% 10.87%
vggnet13 46.94 43.90 47.31 41.06 6.47% 13.21%
vggnet16 58.80 54.87 58.01 52.29 4.71% 9.86%
vggnet19 68.32 65.11 69.29 61.89 4.95% 10.69%
googlenet 9.95 9.54 10.09 8.53 10.61% 15.50%
resnet18 7.66 7.39 7.68 6.86 7.23% 10.64%
resnet34 13.90 13.67 13.90 12.77 6.57% 8.10%
resnet50 22.41 20.11 22.41 18.89 6.07% 15.71%
resnet101 39.15 35.72 38.39 33.73 5.56% 12.13%
resnet152 54.98 50.76 55.03 48.44 4.57% 11.98%
darknet19 13.55 12.43 13.54 12.53 -0.79% 7.51%
darknet53 32.70 30.87 32.36 30.63 0.80% 5.36%
darknet53csp 26.47 23.36 26.60 23.48 -0.53% 11.73%
densenet121 21.57 20.33 21.57 20.12 1.03% 6.71%
densenet169 28.50 27.13 28.43 25.66 5.43% 9.76%
densenet201 38.60 37.17 38.52 35.65 4.10% 7.45%
densenet265 55.85 54.43 55.81 52.72 3.13% 5.53%
densenet161 49.80 47.90 49.71 45.74 4.52% 7.99%
vovnet19s 6.84 6.39 6.85 5.05 20.88% 26.23%
vovnet19 14.25 13.85 14.18 11.57 16.47% 18.41%
vovnet27s 8.31 7.84 8.31 6.25 20.31% 24.81%
vovnet27 18.32 17.67 18.32 15.28 13.54% 16.61%
vovnet39 23.69 23.30 23.53 20.21 13.23% 14.11%
vovnet57 31.00 30.81 31.02 27.41 11.03% 11.63%
vovnet99 56.88 56.44 56.31 51.45 8.84% 8.64%
repvgg_a0 6.47 5.67 6.38 4.96 12.52% 22.22%
repvgg_a1 8.01 8.06 8.04 7.23 10.30% 10.14%
repvgg_a2 19.50 19.46 19.35 18.01 7.49% 6.95%
repvgg_b0 10.16 10.17 10.04 8.88 12.61% 11.54%
repvgg_b1 42.13 42.18 42.12 39.34 6.72% 6.60%
repvgg_b2 59.09 59.01 58.67 55.87 5.32% 4.78%
repvgg_b3 82.28 82.40 82.66 79.12 3.97% 4.28%

comparative1: measures how much faster the model run with the new fuse_layers compared to the current fuse_layers.
comparative2: measures how much faster the model run with the new fuse_layers compared to not using fuse_layers.

I left the measurements before the fusion to be sure that the different runs used the same convolutional algorithms in cuDNN. Every time they differed by more than 2% up or down, I ran those tests again.

@facug91 facug91 marked this pull request as draft July 28, 2023 19:37
@facug91
Copy link
Contributor Author

facug91 commented Jul 28, 2023

test_fuse_layers is failing, so I must have done something wrong. I'll review this before removing the Draft. Maybe @arrufat can help me out here please? This is the PR I was talking about before 😅

@arrufat
Copy link
Contributor

arrufat commented Jul 28, 2023

I'm traveling these days, so it might take some take to have a look. I'll try during this week, though.

I also thought about fusing the relu layer, so it's nice so see it here.

@facug91
Copy link
Contributor Author

facug91 commented Jul 28, 2023

I thought it was test_fuse_layers the test that was failing, but no, it was something else. When I looked for the line in the test file, it was a test of a normal convolution. I had forgotten to update the copy constructor and the assignment operator, and that's why it was failing. The tests on the CPU pass correctly now.
Anyway, I noticed that the tests on gpu are not passing. But it is not a problem of this PR, in master they are not passing either.

@facug91 facug91 marked this pull request as ready for review July 28, 2023 23:51
@facug91
Copy link
Contributor Author

facug91 commented Jul 28, 2023

One other thing, while reviewing this problem, I was looking for the documentation of "disable_duplicative_biases" function and I couldn't find it. Turns out it was left in layers_abstract.h file, so I took the opportunity to made a commit moving it to "visitors_abstract.h" right here.

@davisking
Copy link
Owner

disable_duplicative_biases

Oh yeah, thanks. That fixes the links in the main docs too :)

@davisking
Copy link
Owner

Huh yeah, the GPU version of, layer_norm_ is failing since it's giving out bad derivatives on master. We need a CI test that runs with a GPU :|

Anyway, yeah, looks like your PR is good :)

@davisking davisking merged commit be2fa7f into davisking:master Aug 5, 2023
9 of 10 checks passed
@arrufat
Copy link
Contributor

arrufat commented Aug 5, 2023

Huh yeah, the GPU version of, layer_norm_ is failing since it's giving out bad derivatives on master. We need a CI test that runs with a GPU :|

Anyway, yeah, looks like your PR is good :)

Oh, I need to check what's going on there

@facug91 facug91 deleted the fuse-conv-relu-and-conv-affine-relu branch August 5, 2023 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants