SOFT: Softmax-free Transformer with Linear Complexity,
Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang
NeurIPS 2021
Softmax-free Linear Transformers,
Jiachen Lu, Junge Zhang, Xiatian Zhu, Jianfeng Feng, Tao Xiang, Li Zhang
IJCV 2024
- We propose a normalized softmax-free self-attention with stronger generalizability.
- SOFT is now avaliable on more vision tasks (object detection and semantic segmentation).
- [2024/02/12] Our journal extension Softmax-free Linear Transformer is accepted by IJCV.
- [2022/07/05] SOFT is now available for downstream tasks! An efficient normalization is applied to SOFT. Please refer to SOFT-Norm
-
timm==0.3.2
-
torch>=1.7.0 and torchvision that matches the PyTorch installation
-
cuda>=10.2
Compilation may be fail on cuda < 10.2.
We have compiled it successfully on cuda 10.2
and cuda 11.2
.
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision datasets.ImageFolder
, and the training and validation data is expected to be in the train/
folder and val
folder respectively:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
git clone https://github.com/fudan-zvg/SOFT.git
python -m pip install -e SOFT
Model | Resolution | Params | FLOPs | Top-1 % | Config | Pretrained Model |
---|---|---|---|---|---|---|
SOFT-Tiny | 224 | 13M | 1.9G | 79.3 | SOFT_Tiny.yaml, SOFT_Tiny_cuda.yaml | SOFT_Tiny, SOFT_Tiny_cuda |
SOFT-Small | 224 | 24M | 3.3G | 82.2 | SOFT_Small.yaml, SOFT_Small_cuda.yaml | |
SOFT-Medium | 224 | 45M | 7.2G | 82.9 | SOFT_Meidum.yaml, SOFT_Meidum_cuda.yaml | |
SOFT-Large | 224 | 64M | 11.0G | 83.1 | SOFT_Large.yaml, SOFT_Large_cuda.yaml | |
SOFT-Huge | 224 | 87M | 16.3G | 83.3 | SOFT_Huge.yaml, SOFT_Huge_cuda.yaml | |
SOFT-Tiny-Norm | 224 | 13M | 1.9G | 79.4 | SOFT_Tiny_norm.yaml | SOFT_Tiny_norm |
SOFT-Small-Norm | 224 | 24M | 3.3G | 82.4 | SOFT_Small_norm.yaml | SOFT_Small_norm |
SOFT-Medium-Norm | 224 | 45M | 7.2G | 83.1 | SOFT_Meidum_norm.yaml | SOFT_Medium_norm |
SOFT-Large-Norm | 224 | 64M | 11.0G | 83.3 | SOFT_Large_norm.yaml | SOFT_Large_norm |
SOFT-Huge-Norm | 224 | 87M | 16.3G | 83.4 | SOFT_Huge_norm.yaml |
Backbone | Method | lr schd | box mAP | mask mAP | Params |
---|---|---|---|---|---|
SOFT-Tiny-Norm | RetinaNet | 1x | 40.0 | - | 23M |
SOFT-Tiny-Norm | Mask R-CNN | 1x | 41.2 | 38.2 | 33M |
SOFT-Small-Norm | RetinaNet | 1x | 42.8 | - | 34M |
SOFT-Small-Norm | Mask R-CNN | 1x | 43.8 | 40.1 | 44M |
SOFT-Medium-Norm | RetinaNet | 1x | 44.3 | - | 55M |
SOFT-Medium-Norm | Mask R-CNN | 1x | 46.6 | 42.0 | 65M |
SOFT-Large-Norm | RetinaNet | 1x | 45.3 | - | 74M |
SOFT-Large-Norm | Mask R-CNN | 1x | 47.0 | 42.2 | 84M |
Backbone | Method | Crop size | lr schd | mIoU | Params |
---|---|---|---|---|---|
SOFT-Small-Norm | UperNet | 512x512 | 1x | 46.2 | 54M |
SOFT-Medium-Norm | UperNet | 512x512 | 1x | 48.0 | 76M |
We have two implementations of Gaussian Kernel: PyTorch
version and
the exact form of Gaussian function implemented by cuda
. The config file containing cuda
is the
cuda implementation. Both implementations yield same performance.
Please install SOFT before running the cuda
version.
./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE}
# For example, train SOFT-Tiny on Imagenet training dataset with 8 GPUs
./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml
./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE} --eval_checkpoint ${CHECKPOINT_FILE} --eval
# For example, test SOFT-Tiny on Imagenet validation dataset with 8 GPUs
./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml --eval_checkpoint ${CHECKPOINT_FILE} --eval
@inproceedings{SOFT,
title={SOFT: Softmax-free Transformer with Linear Complexity},
author={Lu, Jiachen and Yao, Jinghan and Zhang, Junge and Zhu, Xiatian and Xu, Hang and Gao, Weiguo and Xu, Chunjing and Xiang, Tao and Zhang, Li},
booktitle={NeurIPS},
year={2021}
}
@article{Softmax,
title={Softmax-free Linear Transformers},
author={Lu, Jiachen and Zhang, Li and Zhang, Junge and Zhu, Xiatian and Feng, Jianfeng and Xiang, Tao},
journal={International Journal of Coumputer Vision},
year={2024}
}
Thanks to previous open-sourced repo:
Detectron2
T2T-ViT
PVT
Nystromformer
pytorch-image-models