Skip to content

Latest commit

 

History

History
76 lines (55 loc) · 7.9 KB

MODEL_ZOO.md

File metadata and controls

76 lines (55 loc) · 7.9 KB

PySlowFast Model Zoo and Baselines

Kinetics 400 and 600

architecture depth crops x clips frame length x sample rate top1 top5 model config dataset
C2D R50 3 x 10 8 x 8 67.2 87.8 link Kinetics/c2/C2D_NOPOOL_8x8_R50 K400
I3D R50 3 x 10 8 x 8 73.5 90.8 link Kinetics/c2/I3D_8x8_R50 K400
I3D NLN R50 3 x 10 8 x 8 74.0 91.1 link Kinetics/c2/I3D_NLN_8x8_R50 K400
Slow R50 3 x 10 4 x 16 72.7 90.3 link Kinetics/c2/SLOW_4x16_R50 K400
Slow R50 3 x 10 8 x 8 74.8 91.6 link Kinetics/c2/SLOW_8x8_R50 K400
SlowFast R50 3 x 10 4 x 16 75.6 92.0 link Kinetics/c2/SLOWFAST_4x16_R50 K400
SlowFast R50 3 x 10 8 x 8 77.0 92.6 link Kinetics/c2/SLOWFAST_8x8_R50 K400
MViT B-Conv 1 x 5 16 x 4 78.4 93.5 link Kinetics/MVIT_B_16x4_CONV K400
MViT B-Conv 1 x 5 32 x 3 80.4 94.8 link Kinetics/MVIT_B_32x3_CONV K600
MViT B-Conv 1 x 5 32 x 3 83.9 96.5 link Kinetics/MVIT_B_32x3_CONV_K600 K600

X3D models (details in projects/x3d)

architecture size pretrain frame length x sample rate top1 10-view top1 30-view parameters (M) FLOPs (G) model config
X3D XS - 4 x 12 68.7 69.5 3.8 0.60 link Kinetics/X3D_XS
X3D S - 13 x 6 73.1 73.5 3.8 1.96 link Kinetics/X3D_S
X3D M - 16 x 5 75.1 76.2 3.8 4.73 link Kinetics/X3D_M
X3D L - 16 x 5 76.9 77.5 6.2 18.37 link Kinetics/X3D_L

AVA

architecture depth Pretrain Model frame length x sample rate MAP AVA version model
Slow R50 Kinetics 400 4 x 16 19.5 2.2 link
SlowFast R101 Kinetics 600 8 x 8 28.2 2.1 link
SlowFast R101 Kinetics 600 8 x 8 29.1 2.2 link
SlowFast R101 Kinetics 600 16 x 8 29.4 2.2 link

Multigrid Training

Update June, 2020: In the following we provide (reimplemented) models from "A Multigrid Method for Efficiently Training Video Models " paper. The multigrid method trains about 3-6x faster than the original training on multiple datasets. See projects/multigrid for more information. The following provides models, results, and example config files.

Kinetics:

architecture depth pretrain frame length x sample rate training top1 top5 model config
SlowFast R50 - 8 x 8 Standard 76.8 92.7 link Kinetics/SLOWFAST_8x8_R50_stepwise
SlowFast R50 - 8 x 8 Multigrid 76.6 92.7 link Kinetics/SLOWFAST_8x8_R50_stepwise_multigrid

(Here we use stepwise learning rate schedule.)

Something-Something V2:

architecture depth pretrain frame length x sample rate training top1 top5 model config
SlowFast R50 Kinetics 400 16 x 8 Standard 63.0 88.5 link SSv2/SLOWFAST_16x8_R50
SlowFast R50 Kinetics 400 16 x 8 Multigrid 63.5 88.7 link SSv2/SLOWFAST_16x8_R50_multigrid

Charades

architecture depth pretrain frame length x sample rate training mAP model config
SlowFast R50 Kinetics 400 16 x 8 Standard 38.9 link SSv2/SLOWFAST_16x8_R50
SlowFast R50 Kinetics 400 16 x 8 Multigrid 38.6 link SSv2/SLOWFAST_16x8_R50_multigrid

ImageNet

We also release the imagenet pretrained model if finetuning from ImageNet is preferred. The reported accuracy is obtained by center crop testing on the validation set.

architecture depth Top1 Top5 model Config
ResNet R50 23.6 6.8 link ImageNet/RES_R50
MVIT B-16-Conv 17.1 3.7 link ImageNet/MVIT_B_16_CONV

PyTorchVideo

We support and benchmark PyTorchVideo models and datasets in PySlowFast. See projects/pytorchvideo for more information about PyTorchVideo Model Zoo.