- [CUT] Segmentation Assisted U-shaped Multi-scale Transformer for Crowd Counting (BMVC) [paper]
- [MSDTrans] RGB-T Multi-Modal Crowd Counting Based on Transformer (BMVC)[paper] [code]
- [LoViTCrowd] Improving Local Features with Relevant Spatial Information by Vision Transformer for Crowd Counting (BMVC) [paper] [code]
- [CLTR] An End-to-End Transformer Model for Crowd Localization (ECCV) [paper] [code][project]
- [CrowdFormer] CrowdFormer: An Overlap Patching Vision Transformer for Top-Down Crowd Counting (IJCAI)[paper]
- [SL-ViT] Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead (Neural Networks) [paper]
- [TransCrowd] TransCrowd: Weakly-Supervised Crowd Counting with Transformer (Science China Information Sciences) [paper] [code]
- [CC-AV] Audio-Visual Transformer Based Crowd Counting (ICCVW) [paper]
- [MLSTN] Multi-level feature fusion based Locality-Constrained Spatial Transformer network for video crowd counting (Neurocomputing) [paper](extension of LSTN)
- [LSTN] Locality-Constrained Spatial Transformer Network for Video Crowd Counting (ICME(oral)) [paper]