Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: __len__() should return >= 0 #71

Open
JanineCHEN opened this issue Sep 18, 2020 · 1 comment
Open

ValueError: __len__() should return >= 0 #71

JanineCHEN opened this issue Sep 18, 2020 · 1 comment

Comments

@JanineCHEN
Copy link

Great thanks for the open-sourced models. I have encountered the following issue when trying to resume the training process. I trained the model using my own dataset, the previous training and checkpoints saving were successful without any prompted error:

Traceback (most recent call last):
  File "train.py", line 227, in <module>
    main()
  File "train.py", line 224, in main
    run(config)
  File "train.py", line 171, in run
    for i, (x, y) in enumerate(pbar):
  File "/home/projects/11002043/BIGGAN_archdaily_outdoor_128_bs110x237/utils.py", line 834, in progress
    total = total or len(items)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 316, in __len__
    return len(self._index_sampler)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 212, in __len__
    return (len(self.sampler) + self.batch_size - 1) // self.batch_size
ValueError: __len__() should return >= 0

The configurations are as following:

$ sh scripts/launch_BigGAN_bs110x237.sh
{'dataset': 'I128_hdf5', 'augment': False, 'num_workers': 4, 'pin_memory': True, 'shuffle': True, 'load_in_mem': True, 'use_multiepoch_sampler': True, 'model': 'BigGAN', 'G_param': 'SN', 'D_param': 'SN', 'G_ch': 32, 'D_ch': 32, 'G_depth': 1, 'D_depth': 1, 'D_wide': True, 'G_shared': True, 'shared_dim': 128, 'dim_z': 120, 'z_var': 1.0, 'hier': True, 'cross_replica': False, 'mybn': False, 'G_nl': 'inplace_relu', 'D_nl': 'inplace_relu', 'G_attn': '32', 'D_attn': '32', 'norm_style': 'bn', 'seed': 0, 'G_init': 'ortho', 'D_init': 'ortho', 'skip_init': False, 'G_lr': 0.0001, 'D_lr': 0.0004, 'G_B1': 0.0, 'D_B1': 0.0, 'G_B2': 0.999, 'D_B2': 0.999, 'batch_size': 110, 'G_batch_size': 0, 'num_G_accumulations': 237, 'num_D_steps': 1, 'num_D_accumulations': 237, 'split_D': False, 'num_epochs': 100, 'parallel': True, 'G_fp16': False, 'D_fp16': False, 'D_mixed_precision': False, 'G_mixed_precision': False, 'accumulate_stats': False, 'num_standing_accumulations': 16, 'G_eval_mode': True, 'save_every': 100, 'num_save_copies': 2, 'num_best_copies': 5, 'which_best': 'FID', 'no_fid': False, 'test_every': 100, 'num_inception_images': 50000, 'hashname': False, 'base_root': '', 'data_root': 'data', 'weights_root': 'weights', 'logs_root': 'logs', 'samples_root': 'samples', 'pbar': 'mine', 'name_suffix': '', 'experiment_name': 'BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema', 'config_from_name': False, 'ema': True, 'ema_decay': 0.9999, 'use_ema': True, 'ema_start': 300, 'adam_eps': 1e-06, 'BN_eps': 1e-05, 'SN_eps': 1e-06, 'num_G_SVs': 1, 'num_D_SVs': 1, 'num_G_SV_itrs': 1, 'num_D_SV_itrs': 1, 'G_ortho': 0.0, 'D_ortho': 0.0, 'toggle_grads': True, 'which_train_fn': 'GAN', 'load_weights': '', 'resume': True, 'logstyle': '%3.3e', 'log_G_spectra': False, 'log_D_spectra': False, 'sv_log_interval': 10}
Skipping initialization for training resumption...
Experiment name is BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema
Adding attention layer in G at resolution 32
Adding attention layer in D at resolution 32
Preparing EMA for G with decay of 0.9999
Adding attention layer in G at resolution 32
Initializing EMA parameters to be source parameters...
Generator(
  (activation): ReLU(inplace=True)
  (shared): Embedding(160, 128)
  (linear): SNLinear(in_features=20, out_features=8192, bias=True)
  (blocks): ModuleList(
    (0): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 512, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=512, bias=False)
          (bias): SNLinear(in_features=148, out_features=512, bias=False)
        )
        (bn2): ccbn(
          out: 512, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=512, bias=False)
          (bias): SNLinear(in_features=148, out_features=512, bias=False)
        )
      )
    )
    (1): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 512, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=512, bias=False)
          (bias): SNLinear(in_features=148, out_features=512, bias=False)
        )
        (bn2): ccbn(
          out: 256, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=256, bias=False)
          (bias): SNLinear(in_features=148, out_features=256, bias=False)
        )
      )
    )
    (2): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 256, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=256, bias=False)
          (bias): SNLinear(in_features=148, out_features=256, bias=False)
        )
        (bn2): ccbn(
          out: 128, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=128, bias=False)
          (bias): SNLinear(in_features=148, out_features=128, bias=False)
        )
      )
      (1): Attention(
        (theta): SNConv2d(128, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (phi): SNConv2d(128, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (g): SNConv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (o): SNConv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
    )
    (3): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 128, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=128, bias=False)
          (bias): SNLinear(in_features=148, out_features=128, bias=False)
        )
        (bn2): ccbn(
          out: 64, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=64, bias=False)
          (bias): SNLinear(in_features=148, out_features=64, bias=False)
        )
      )
    )
    (4): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 64, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=64, bias=False)
          (bias): SNLinear(in_features=148, out_features=64, bias=False)
        )
        (bn2): ccbn(
          out: 32, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=32, bias=False)
          (bias): SNLinear(in_features=148, out_features=32, bias=False)
        )
      )
    )
  )
  (output_layer): Sequential(
    (0): bn()
    (1): ReLU(inplace=True)
    (2): SNConv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
)
Discriminator(
  (activation): ReLU(inplace=True)
  (blocks): ModuleList(
    (0): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (1): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      )
      (1): Attention(
        (theta): SNConv2d(64, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (phi): SNConv2d(64, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (g): SNConv2d(64, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (o): SNConv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
    )
    (2): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (3): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(128, 256, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (4): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(256, 512, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (5): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
  )
  (linear): SNLinear(in_features=512, out_features=1, bias=True)
  (embed): SNEmbedding(160, 512)
)
Number of params in G: 8451140 D: 9694562
Loading weights...
Loading weights from weights/BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema...
Inception Metrics will be saved to logs/BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema_log.jsonl
Training Metrics will be saved to logs/BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema
Using dataset root location data/ILSVRC128.hdf5
Loading data/ILSVRC128.hdf5 into memory...
Using multiepoch sampler from start_itr 200...
Parallelizing Inception module...
Beginning training at epoch 1...

Any idea why this error takes place? Any help will be highly appreciated!

@szulm
Copy link

szulm commented Dec 19, 2021

hello ,how did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants