PySlowFast Model Zoo and Baselines

Kinetics 400 and 600

architecture	depth	crops x clips	frame length x sample rate	top1	top5	model	config	dataset
C2D	R50	3 x 10	8 x 8	67.2	87.8	`link`	Kinetics/c2/C2D_NOPOOL_8x8_R50	K400
I3D	R50	3 x 10	8 x 8	73.5	90.8	`link`	Kinetics/c2/I3D_8x8_R50	K400
I3D NLN	R50	3 x 10	8 x 8	74.0	91.1	`link`	Kinetics/c2/I3D_NLN_8x8_R50	K400
Slow	R50	3 x 10	4 x 16	72.7	90.3	`link`	Kinetics/c2/SLOW_4x16_R50	K400
Slow	R50	3 x 10	8 x 8	74.8	91.6	`link`	Kinetics/c2/SLOW_8x8_R50	K400
SlowFast	R50	3 x 10	4 x 16	75.6	92.0	`link`	Kinetics/c2/SLOWFAST_4x16_R50	K400
SlowFast	R50	3 x 10	8 x 8	77.0	92.6	`link`	Kinetics/c2/SLOWFAST_8x8_R50	K400
MViT	B-Conv	1 x 5	16 x 4	78.4	93.5	`link`	Kinetics/MVIT_B_16x4_CONV	K400
MViT	B-Conv	1 x 5	32 x 3	80.4	94.8	`link`	Kinetics/MVIT_B_32x3_CONV	K600
MViT	B-Conv	1 x 5	32 x 3	83.9	96.5	`link`	Kinetics/MVIT_B_32x3_CONV_K600	K600

X3D models (details in projects/x3d)

architecture	size	pretrain	frame length x sample rate	top1 10-view	top1 30-view	parameters (M)	FLOPs (G)	model	config
X3D	XS	-	4 x 12	68.7	69.5	3.8	0.60	`link`	Kinetics/X3D_XS
X3D	S	-	13 x 6	73.1	73.5	3.8	1.96	`link`	Kinetics/X3D_S
X3D	M	-	16 x 5	75.1	76.2	3.8	4.73	`link`	Kinetics/X3D_M
X3D	L	-	16 x 5	76.9	77.5	6.2	18.37	`link`	Kinetics/X3D_L

AVA

architecture	depth	Pretrain Model	frame length x sample rate	MAP	AVA version	model
Slow	R50	Kinetics 400	4 x 16	19.5	2.2	`link`
SlowFast	R101	Kinetics 600	8 x 8	28.2	2.1	`link`
SlowFast	R101	Kinetics 600	8 x 8	29.1	2.2	`link`
SlowFast	R101	Kinetics 600	16 x 8	29.4	2.2	`link`

Multigrid Training

Update June, 2020: In the following we provide (reimplemented) models from "A Multigrid Method for Efficiently Training Video Models " paper. The multigrid method trains about 3-6x faster than the original training on multiple datasets. See projects/multigrid for more information. The following provides models, results, and example config files.

Kinetics:

architecture	depth	pretrain	frame length x sample rate	training	top1	top5	model	config
SlowFast	R50	-	8 x 8	Standard	76.8	92.7	`link`	Kinetics/SLOWFAST_8x8_R50_stepwise
SlowFast	R50	-	8 x 8	Multigrid	76.6	92.7	`link`	Kinetics/SLOWFAST_8x8_R50_stepwise_multigrid

(Here we use stepwise learning rate schedule.)

Something-Something V2:

architecture	depth	pretrain	frame length x sample rate	training	top1	top5	model	config
SlowFast	R50	Kinetics 400	16 x 8	Standard	63.0	88.5	`link`	SSv2/SLOWFAST_16x8_R50
SlowFast	R50	Kinetics 400	16 x 8	Multigrid	63.5	88.7	`link`	SSv2/SLOWFAST_16x8_R50_multigrid

Charades

architecture	depth	pretrain	frame length x sample rate	training	mAP	model	config
SlowFast	R50	Kinetics 400	16 x 8	Standard	38.9	`link`	SSv2/SLOWFAST_16x8_R50
SlowFast	R50	Kinetics 400	16 x 8	Multigrid	38.6	`link`	SSv2/SLOWFAST_16x8_R50_multigrid

ImageNet

We also release the imagenet pretrained model if finetuning from ImageNet is preferred. The reported accuracy is obtained by center crop testing on the validation set.

architecture	depth	Top1	Top5	model	Config
ResNet	R50	23.6	6.8	`link`	ImageNet/RES_R50
MVIT	B-16-Conv	17.1	3.7	`link`	ImageNet/MVIT_B_16_CONV

PyTorchVideo

We support and benchmark PyTorchVideo models and datasets in PySlowFast. See projects/pytorchvideo for more information about PyTorchVideo Model Zoo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODEL_ZOO.md

MODEL_ZOO.md

PySlowFast Model Zoo and Baselines

Kinetics 400 and 600

X3D models (details in projects/x3d)

AVA

Multigrid Training

Kinetics:

Something-Something V2:

Charades

ImageNet

PyTorchVideo

Files

MODEL_ZOO.md

Latest commit

History

MODEL_ZOO.md

File metadata and controls

PySlowFast Model Zoo and Baselines

Kinetics 400 and 600

X3D models (details in projects/x3d)

AVA

Multigrid Training

Kinetics:

Something-Something V2:

Charades

ImageNet

PyTorchVideo