Implementation of Convolutional Recurrent Neural Nets for zero-shot retrieval of images based on corresponding captions.
CRNNs consist of 1D Convolutional blocks followed by a RNN. Convolutions decrease the sequence length of the captions, allowing the RNN to learn efficiently.
Follow these instructions.
To load pretrained and use character-level model on CUB:
from crnns4captions.utils import load_best_model, captions_to_tensor
# model is returned in eval mode
model = load_best_model('./models/', './models/experiments.txt', device='cuda:0')
captions = ['This bird has blue wings, a pointed red beak and long legs.', 'El pollo loco!']
captions_tensor = captions_to_tensor(captions, device='cuda:0')
reprs = model(captions_tensor) # torch.Size([2, 1024])
Alternatively, if you download the files locally without the rest of the repo, you can modify crnns4captions/utils/deploy.py
by pasting the repo relative code in the file.
After installation:
-
Make
scripts
executables:chmod +x scripts/*
-
Configure the paths in
scripts/to_h5py
and execute it to get h5 files for every t7 file in the CUB dataset (NOTE: do not overwrite the t7 files):scripts/to_h5py
-
Change the hyperparameters configuration in
scripts/grid_search
(default: the ones suggested in original paper and the ones used in the pre-trained model) and run the grid_search:scripts/grid_search