Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 3.97 KB

README.md

File metadata and controls

61 lines (43 loc) · 3.97 KB

Bilinear ConvNets for Fine-Grained Recognition

This is a PyTorch implementation of Bilinear CNNs as described in the paper Bilinear CNN Models For Fine-Grained Visual Recognition by Tsung-Yu Lin, Aruni Roy Chowdhury, and Subhransu Maji. On the Caltech-UCSD Birds-200-2011 or CUB-200-2011 dataset, for the task of 200 class fine-grained bird species classification, this implementation reaches:

  • Accuracy of 84.29% using the following training regime
    • Train only new bilinear classifier, keeping pre-trained layers frozen
      • Learning rate: 1e0, Weight Decay: 1e-8, Epochs: 55
    • Finetune all pretrained layers as well as bilinear layer jointly
      • Learning rate: 1e-2, Weight Decay: 1e-5, Epochs: 25
    • Common settings for both training runs
      • Optimizer: SGD, Momentum: 0.9, Batch Size: 64, GPUs: 4
  • These values are plugged into the config file as defaults
  • The original paper reports 84.00% accuracy on CUB-200-2011 dataset using VGG-D pretrained model, which is similar to the VGG-16 model that this implementation uses.
  • Minor differences exist, e.g. no SVM being used, and the L2 normalization is done differently.

Requirements

  • Python (tested on 3.6.9, should work on 3.5.0 onwards due to typing).
  • Other dependencies are in requirements.txt
  • Currently works with Pytorch 1.1.0, but should work fine with newer versions.

Usage

The actual model class along with the relevant dataset class and a utility trainer class is packaged into the bcnn subfolder, from which the relevant modules can be imported. Dataset downloading and preprocessing is done via a shell script, and a Python driver script is provided to run the actual training/testing loop.

  • Use the script scripts/prepareData.sh which does the following:
    • WARNING: Some of these steps require GNU Parallel, which can be installed via these methods
    • Download the CUB-200-2011 dataset and extract it.
    • Preprocess the dataset, i.e. resizing smaller edge to 512 pixels maintaining aspect ratio.
    • A copy of the dataset is also created where images are cropped to their bounding boxes.
  • main.py is the actual driver script. It imports relevant modules from the bcnn package, and performs the actual pre-training and fine-tuning of the model, and testing it on the test splits. For a list of all command-line arguments, have a look at config.py.
    • Model checkpoints are saved to the ckpt/ directory with the name specified by the command line argument --savedir.

If you have a working Python3 environment, simply run the following sequence of steps:

- bash scripts/prepareData.sh
- pip install -r requirements.txt
- export CUDA_VISIBLE_DEVICES=0,1,2,3
- python main.py --gpus 1 2 3 4 --savedir ./ckpt/exp_test

Notes

  • (Oct 12, 2019) GPU memory consumption is not very high, which means batch size can be increased. However, that requires changing other hyperparameters such as learning rate.

Acknowledgements

Tsung-Yu Lin and Aruni Roy Chowdhury released the original implementation which was invaluable in understanding the model architecture.
Hao Mood also released a PyTorch implementation which was critical for finding the right hyperparameters to reach the accuracy reported in the paper.
As usual, shout-out to the Pytorch team for the incredible library.

Contact

Riddhiman Dasgupta
Please create an issue or submit a PR if you find any bugs!

License

MIT