[Bug]: efficientad training its own dataset reports an error #1200

laogonggong847 · 2023-07-13T06:19:28Z

laogonggong847
Jul 13, 2023

Describe the bug

I encountered an error while training my own dataset using EfficientAD. I only made modifications to the dataset section of the configuration file provided by the official EfficientAD repository. Based on the same modifications, I was able to train models like CFA and PatchCore successfully, but I encountered an error specifically when training EfficientAD.

The yaml file for my efficientad is as follows (I only changed the dataset section, the rest is consistent)

dataset:
  name: mydata
  format: folder
  path: ./MyDataset/HC_ZT_ROI
  normal_dir: normal #  name of the folder containing normal images.
  abnormal_dir: abnormal #  name of the folder containing abnormal images.
  normal_test_dir: null #  name of the folder containing normal test images.
  task: classification
  mask: null
  extensions: null
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 0
  image_size: 500 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: null # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

I have tentatively determined that the cause of my error is the parameter "normalization"

When I when I set normalization: imagenet or normalization: none the exact error message is:

2023-07-13 13:56:58,938 - anomalib.models.efficientad.lightning_model - INFO - Load pretrained teacher model from pre_trained\efficientad_pretrained_weights\pretrained_teacher_small.pth
Traceback (most recent call last):
  File "E:/Code/Anomalib/0.6.0/anomalib-0-6-0/tools/train.py", line 82, in <module>
    train(args)
  File "E:/Code/Anomalib/0.6.0/anomalib-0-6-0/tools/train.py", line 59, in train
    model = get_model(config)
  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\models\__init__.py", line 106, in get_model
    model = getattr(module, f"{_snake_to_pascal_case(config.model.name)}Lightning")(config)
  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\models\efficientad\lightning_model.py", line 289, in __init__
    super().__init__(
  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\models\efficientad\lightning_model.py", line 95, in __init__
    self.prepare_imagenette_data()
  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\models\efficientad\lightning_model.py", line 121, in prepare_imagenette_data
    imagenet_dataset = ImageFolder(imagenet_dir, transform=TransformsWrapper(t=self.data_transforms_imagenet))
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torchvision\datasets\folder.py", line 310, in __init__
    super().__init__(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torchvision\datasets\folder.py", line 145, in __init__
    classes, class_to_idx = self.find_classes(self.root)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torchvision\datasets\folder.py", line 219, in find_classes
    return find_classes(directory)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torchvision\datasets\folder.py", line 43, in find_classes
    raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
FileNotFoundError: Couldn't find any class folder in datasets\imagenette.

Then I referred to the related answer in #1148, and I see that @alexriedel1 explains that it should be set to normalization: null.

When I when I set normalization: null the exact error message is:

C:\ProgramData\anaconda3\envs\HC_Anomalib\python.exe E:/Code/Anomalib/0.6.0/anomalib-0-6-0/tools/train.py
E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\config\config.py:275: UserWarning: config.project.unique_dir is set to False. This does not ensure that your results will be written in an empty directory and you may overwrite files.
  warn(
Global seed set to 42
2023-07-13 14:08:23,153 - anomalib.data - INFO - Loading the datamodule
Traceback (most recent call last):
  File "E:/Code/Anomalib/0.6.0/anomalib-0-6-0/tools/train.py", line 82, in <module>
    train(args)
  File "E:/Code/Anomalib/0.6.0/anomalib-0-6-0/tools/train.py", line 57, in train
    datamodule = get_datamodule(config)
  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\data\__init__.py", line 116, in get_datamodule
    datamodule = Folder(
  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\data\folder.py", line 270, in __init__
    normalization=InputNormalizationMethod(normalization),
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\enum.py", line 339, in __call__
    return cls.__new__(cls, value)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\enum.py", line 663, in __new__
    raise ve_exc
ValueError: None is not a valid InputNormalizationMethod

Dataset

Folder

Model

Other (please specify in the field below)

Steps to reproduce the behavior

efficientad training its own dataset reports an error

OS information

Anomalib: 0.6.0
torch: 1.12.1+cu113
OS: windows

Expected behavior

Hello @alexriedel1， @nelson1425, As someone who is most familiar with efficiented, can you answer the following questions?

1: What is the reason for this error in efficiented and how should I fix it.

2: Is the performance of efficiented really as good as in the paper, in fact I am more concerned about the speed of efficiented as in the paper. In the paper, it is mentioned that the FPS reaches 269 with efficientAD-M and 614 with efficientAD-S. Is it really possible to achieve this in real tests? If not, what is the FPS of your implementation for different sizes of images. （Although I realize this may be affected and limited by specific hardware）

3:What are the advantages of Efficiented over other tools in Anomalib, and what situations is it more suitable for.

Looking forward to your answer, thanks!

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

Logs

Code of Conduct

I agree to follow this project's Code of Conduct

Answered by blaz-r

Jul 13, 2023

It's not about your dataset, it's imagenet(te) dataset downloaded by EfficientAD model, as it uses imagenet as part of its functionality. It should be located inside datasets/imagenette. So first thing I would recommend is to just delete this and rerun, which will download the entire dataset again, potentially fixing the issue.

View full answer

blaz-r · 2023-07-13T09:58:47Z

blaz-r
Jul 13, 2023

Hello. I can't answer all the questions. but regarding the normalization. As config says, only none and imagenet are valid options, null will cause error.
Now your problem is not really related to imagenet normalization, I do see where the confusion might come from. There seems to be a problem with preparing of dataset that is downloaded as part of EfficientAD, here called imagenette in these lines of code. I think something went wrong while downloading, so I recommend you remove imagenette from datasets folder and run the procedure again, so that it is redownloaded. Also make sure you are running train script from root folder, as path is relative: ./datasets/imagenette, so working dir needs to be inside anomalib root.

0 replies

laogonggong847 · 2023-07-13T10:07:52Z

laogonggong847
Jul 13, 2023
Author

hello @blaz-r， Thank you very much for your reply, I wasn't able to understand some of the things you said very accurately. As I said at the beginning of the question, for other models such as patchcore and cfa, when I make the exact same changes to the dataset section of the corresponding yaml files for these models (exactly the same dataset I gave in this question), all the other models train correctly and get results. Why is there still a download problem when my dataset is already local? Also the fact that other models' profiles utilizing the same dataset can be trained should be enough to show that there is no error in my file. Looking forward to another reply from you, thanks a lot!

0 replies

blaz-r · 2023-07-13T10:16:15Z

blaz-r
Jul 13, 2023

It's not about your dataset, it's imagenet(te) dataset downloaded by EfficientAD model, as it uses imagenet as part of its functionality. It should be located inside datasets/imagenette. So first thing I would recommend is to just delete this and rerun, which will download the entire dataset again, potentially fixing the issue.

0 replies

laogonggong847 · 2023-07-13T10:17:07Z

laogonggong847
Jul 13, 2023
Author

Hello @blaz-r Thank you very much, I'll give it a try and get back to you!

0 replies

laogonggong847 · 2023-07-13T10:25:11Z

laogonggong847
Jul 13, 2023
Author

Hello @blaz-r , This does work, thank you very much for your help. Also I would like to ask two more questions:
1: What specifically is the role of ImageNet in efficientAD? What exactly do you mean by efficientAD using ImageNet as part of its functionality?

2: If I want to skip ImageNet, i.e. not use ImageNet, will my efficientAD still work?

Looking forward to another reply from you, thanks a lot!

0 replies

blaz-r · 2023-07-13T10:51:48Z

blaz-r
Jul 13, 2023

If you want to know exactly how EfficientAD works, I recommend reading the paper.

To cite the authors:

In the standard S–T framework, the teacher is pretrained on an image classification dataset, or it is a distilled version of such a pretrained network. The student is not trained on that pretraining dataset but only on the application’s normal images. We propose to also use the images from the teacher’s pretraining during the training of the student. Specifically, we sample a random image P from the pretraining dataset, in our case ImageNet, in each training step. We compute the loss of the student as

This penalty hinders the student from generalizing its imitation of the teacher to out-of-distribution images.

So imagenet is a key component of training, which can't really be skipped.

0 replies

laogonggong847 · 2023-07-13T10:52:56Z

laogonggong847
Jul 13, 2023
Author

Thank you very much @blaz-r

0 replies

blaz-r · 2023-07-13T10:59:48Z

blaz-r
Jul 13, 2023

Glad to help 😄. Regarding the other questions you had, I can't answer all that properly, but I'm sure contributors of EfficientAD can help.

0 replies

laogonggong847 · 2023-07-13T11:02:23Z

laogonggong847
Jul 13, 2023
Author

Looking forward to hearing from them, and thank you very much for your patience, again! @blaz-r

0 replies

laogonggong847 · 2023-07-13T11:07:58Z

laogonggong847
Jul 13, 2023
Author

hello @blaz-r ， I'm sorry to bother you again, but I have a new error. Once ImageNet was downloaded, it started to train, but not long after that it reported an error.

2023-07-13 19:04:06,034 - anomalib.models.efficientad.lightning_model - INFO - Calculate teacher channel mean and std
Calculate teacher channel mean: 100%|██████████| 11/11 [00:02<00:00,  4.49it/s]
Calculate teacher channel std: 100%|██████████| 11/11 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/20 [00:00<?, ?it/s] Traceback (most recent call last):

......

  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\models\efficientad\torch_model.py", line 282, in forward
    d_hard = torch.quantile(distance_st, 0.999)
RuntimeError: quantile() input tensor is too large
Epoch 0:   0%|          | 0/20 [00:00<?, ?it/s]

Why am I prompted to RuntimeError: quantile() input tensor is too large

0 replies

blaz-r · 2023-07-13T11:28:13Z

blaz-r
Jul 13, 2023

This indeed seems like a bug, that was already addressed in one PR, but it was only fixed in lighting model as it seems.
I believe that quantile should also be implemented differently when calculating d_hard. Maybe @nelson1425 can confirm. The problem is that quantile only works with input up to 2**24.

I think this will need to be fixed the same way as it was done in lightning model. If you are able to fix this, a PR would be very welcome.

0 replies

alexriedel1 · 2023-07-13T11:36:02Z

alexriedel1
Jul 13, 2023

hello @blaz-r ， I'm sorry to bother you again, but I have a new error. Once ImageNet was downloaded, it started to train, but not long after that it reported an error.

2023-07-13 19:04:06,034 - anomalib.models.efficientad.lightning_model - INFO - Calculate teacher channel mean and std
Calculate teacher channel mean: 100%|██████████| 11/11 [00:02<00:00,  4.49it/s]
Calculate teacher channel std: 100%|██████████| 11/11 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/20 [00:00<?, ?it/s] Traceback (most recent call last):

......

  File "E:\Code\Anomalib\0.6.0\anomalib-0-6-0\src\anomalib\models\efficientad\torch_model.py", line 282, in forward
    d_hard = torch.quantile(distance_st, 0.999)
RuntimeError: quantile() input tensor is too large
Epoch 0:   0%|          | 0/20 [00:00<?, ?it/s]

Why am I prompted to RuntimeError: quantile() input tensor is too large

You could start by using a train and test batch size of 1, as recommended for the training of efficientad

0 replies

samet-akcay · 2023-07-13T12:06:42Z

samet-akcay
Jul 13, 2023
Maintainer

@alexriedel1, would it be an idea to hardcode the batch size and remove from the config file for now ?

0 replies

laogonggong847 · 2023-07-14T03:32:18Z

laogonggong847
Jul 14, 2023
Author

hello @alexriedel1 , @blaz-r Thank you very much for your help and patience in answering, I have successfully trained it when I set the batch size to 1. But I have a question, if the maximum value is 2**24, then when I start training with batch_size set to 32 and image_size set to 500. Logically, 500*500*32<2**24. should match, so why the error?

0 replies

alexriedel1 · 2023-07-14T07:09:58Z

alexriedel1
Jul 14, 2023

@alexriedel1, would it be an idea to hardcode the batch size and remove from the config file for now ?

Best of I can think right now is raising an error if the batch size is different from one. Otherwise it would be needed to implemented in the datamodule generator too I guess

0 replies

alexriedel1 · 2023-07-14T07:12:56Z

alexriedel1
Jul 14, 2023

hello @alexriedel1 , @blaz-r Thank you very much for your help and patience in answering, I have successfully trained it when I set the batch size to 1. But I have a question, if the maximum value is 2**24, then when I start training with batch_size set to 32 and image_size set to 500. Logically, 500*500*32<2**24. should match, so why the error?

The quantile calculation is not based on the input image but on feature maps from the teacher model. the tensor shape for 500x500 images and batch size 32 is [32, 384, 117, 117] -> 168,210,432 > 2**24

0 replies

laogonggong847 · 2023-07-14T07:19:07Z

laogonggong847
Jul 14, 2023
Author

hello @alexriedel1 , Thank you very much for your help and patience in answering. I see, but there doesn't seem to be an early stop mechanism in efficientad. If I set the max_epochs to a very large setting efficientad may get good results, but will this cause overfitting. How should I set my max_epochs to be more reasonable when there is no early stopping mechanism.

0 replies

alexriedel1 · 2023-07-14T07:30:36Z

alexriedel1
Jul 14, 2023

hello @alexriedel1 , Thank you very much for your help and patience in answering. I see, but there doesn't seem to be an early stop mechanism in efficientad. If I set the max_epochs to a very large setting efficientad may get good results, but will this cause overfitting. How should I set my max_epochs to be more reasonable when there is no early stopping mechanism.

you can use early stopping just like in other models

anomalib/src/anomalib/models/cflow/config.yaml

Lines 37 to 40 in 27876c8

    
           early_stopping: 
        
             patience: 2 
        
             metric: pixel_AUROC 
        
             mode: max

0 replies

laogonggong847 · 2023-07-14T07:38:12Z

laogonggong847
Jul 14, 2023
Author

@alexriedel1 OK！ Thank you very much, at the very beginning of this question I presented my three confusions about efficientad:

1: What is the reason for this error in efficiented and how should I fix it.

2: Is the performance of efficiented really as good as in the paper, in fact I am more concerned about the speed of efficiented as in the paper. In the paper, it is mentioned that the FPS reaches 269 with efficientAD-M and 614 with efficientAD-S. Is it really possible to achieve this in real tests? If not, what is the FPS of your implementation for different sizes of images. （Although I realize this may be affected and limited by specific hardware）

3:What are the advantages of Efficiented over other tools in Anomalib, and what situations is it more suitable for.

I've got a clearer picture of the first question so far. Regarding my second and third questions, can you give the appropriate answers? Because I think you are one of the most knowledgeable people about efficientad. Very much looking forward to your answers and thanks again for your patience and help!

0 replies

alexriedel1 · 2023-07-14T08:03:50Z

alexriedel1
Jul 14, 2023

@alexriedel1 OK！ Thank you very much, at the very beginning of this question I presented my three confusions about efficientad:
1: What is the reason for this error in efficiented and how should I fix it.

2: Is the performance of efficiented really as good as in the paper, in fact I am more concerned about the speed of efficiented as in the paper. In the paper, it is mentioned that the FPS reaches 269 with efficientAD-M and 614 with efficientAD-S. Is it really possible to achieve this in real tests? If not, what is the FPS of your implementation for different sizes of images. （Although I realize this may be affected and limited by specific hardware）

3:What are the advantages of Efficiented over other tools in Anomalib, and what situations is it more suitable for.
I've got a clearer picture of the first question so far. Regarding my second and third questions, can you give the appropriate answers? Because I think you are one of the most knowledgeable people about efficientad. Very much looking forward to your answers and thanks again for your patience and help!

Im getting around 30 FPS on a GTX 1650, but that GPU is nowhere near the one used in the paper..

0 replies

laogonggong847 · 2023-07-14T08:07:07Z

laogonggong847
Jul 14, 2023
Author

@alexriedel1 OK, Thank you very much.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: efficientad training its own dataset reports an error #1200

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 21 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Bug]: efficientad training its own dataset reports an error #1200

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Replies: 21 comments

laogonggong847 Jul 13, 2023 Author

laogonggong847 Jul 13, 2023 Author

laogonggong847 Jul 13, 2023 Author

laogonggong847 Jul 13, 2023 Author

laogonggong847 Jul 13, 2023 Author

laogonggong847 Jul 13, 2023 Author

samet-akcay Jul 13, 2023 Maintainer

laogonggong847 Jul 14, 2023 Author

laogonggong847 Jul 14, 2023 Author

laogonggong847 Jul 14, 2023 Author

laogonggong847 Jul 14, 2023 Author

laogonggong847
Jul 13, 2023
Author

laogonggong847
Jul 13, 2023
Author

laogonggong847
Jul 13, 2023
Author

laogonggong847
Jul 13, 2023
Author

laogonggong847
Jul 13, 2023
Author

laogonggong847
Jul 13, 2023
Author

samet-akcay
Jul 13, 2023
Maintainer

laogonggong847
Jul 14, 2023
Author

laogonggong847
Jul 14, 2023
Author

laogonggong847
Jul 14, 2023
Author

laogonggong847
Jul 14, 2023
Author