Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The CFG strategy - linear. vs constant #31

Open
yuhuUSTC opened this issue Sep 5, 2024 · 9 comments
Open

The CFG strategy - linear. vs constant #31

yuhuUSTC opened this issue Sep 5, 2024 · 9 comments

Comments

@yuhuUSTC
Copy link

yuhuUSTC commented Sep 5, 2024

Thanks for the great work!
I meet a confusing question when replicating this work. I retrain this code and evaluate the performance between different CFG strategies provided in the code, linear vs constant. I find that constant cfg has a much worse FID but much better sFID. The contradiction between FID and sFID is confusing. Besides, the IS and Recall also seems contradict each other.
截屏2024-09-05 11 28 46

@LTH14
Copy link
Owner

LTH14 commented Sep 5, 2024

Thanks for your interest! Constant CFG typically results in a very high inception score (in our final model, around 500 IS) but poor FID -- that's also the reason why it achieves very high precision. Linear CFG is used to improve the diversity of the generated images, so that the FID is improved as well as the recall.

Also, 200 epochs should result in an FID < 3 if you follow our default training setting.

@yuhuUSTC
Copy link
Author

yuhuUSTC commented Sep 5, 2024

Thanks for the reply!
The above table follows the default training setting, expect that the sampling step is 16.
The question for the above table is that FID and sFID strongly contradicts each other, which is very rare and confusion.

Another question is that for 128x128 generation, the result are opposite to the above table. In 128x128, constant_cfg generates much better FID but much worse sFID than the linear_cfg.
截屏2024-09-05 13 17 31
This result is also very confusing, and contradicts with 256x256 result.

@LTH14
Copy link
Owner

LTH14 commented Sep 5, 2024

It would be good to sweep the CFG again if you change other configs, such as resolution and sampling steps. The optimal CFG typically changes if these configs change.

Also, even with 16 sampling steps, the FID seems too high -- as shown in Figure 6 of the paper, the FID using 16 sampling steps should be around 3.0 after 400 epochs training.

@yuhuUSTC
Copy link
Author

yuhuUSTC commented Sep 5, 2024

About the FID value. I also think it is too high. To verify this, I employ the provided official checkpoint of MAR_L and choose the step=16 to generate 50k images and test the FID. All the setting totally follows this official recommendation without any modifications. After generating 50k images, I employ the evaluation suite provided by guided_diffusion for FID calculation. However, the FID is 6.23.
Do you have any suggestions for this abnormal.

@LTH14
Copy link
Owner

LTH14 commented Sep 5, 2024

Can you try step=64 and see if you can reproduce the result? This 6.23 FID is too high. step=64, cfg=3.0 should give you <2.0 FID.

@yuhuUSTC
Copy link
Author

yuhuUSTC commented Sep 6, 2024

I find that inference step significantly affects the FID.
截屏2024-09-06 13 29 27
with step=64, it can achieve FID 2.14. This shows that the FID-step curve is different from the paper in small step.

Besdies, constant CFG has very high FID. I am curious whether this finding is consistent with yours.

@LTH14
Copy link
Owner

LTH14 commented Sep 6, 2024

First of all, your FID is still higher than our results. For instance, at 256 steps it should be 1.78 FID. Besides, for small generation steps, the optimal CFG scale is no longer 3.0 and should typically be larger. In our experiments, we sweep the optimal CFG and temperature to find the best FID, for all generation steps.

Constant CFG, when using the same scale as linear CFG, typically has very high FID and also high IS (>10 FID and >450 IS). This is because it sacrifices diversity for better fidelity.

I would suggest to first identify why your 256 steps result is different from ours, and then sweep the generation parameters (cfg_scale and temperature) for smaller generation steps.

@yuhuUSTC
Copy link
Author

yuhuUSTC commented Sep 9, 2024

Thanks very much for your valuabale suggestions. I did no realize that you sweep the CFG for different inference settings, and I will conduct more experiments following your suggestions.

About the 1.78 and 1.98 FID. The 1.98 FID is achieved by totally following the repository without any modification. The inference setting is thus same with the following:

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0
main_mar.py
--model mar_large --diffloss_d 8 --diffloss_w 1280
--eval_bsz 256 --num_images 50000
--num_iter 256 --num_sampling_steps 100 --cfg 3.0 --cfg_schedule linear --temperature 1.0
--output_dir pretrained_models/mar/mar_large
--resume pretrained_models/mar/mar_large
--data_path ${IMAGENET_PATH} --evaluate

Do you have any suggestions? Thanks again for your help!

@LTH14
Copy link
Owner

LTH14 commented Sep 9, 2024

I cannot diagnose your exact issue, but here are some reference points. I just ran an evaluation on 8 A6000 GPUs (using your command above) and here is the result:

image

A typical fluctuation for FID should be smaller than 0.05, and a typical fluctuation for IS be smaller than 5. I also once validated our model's performance on L40s, H100, A100 and V100.

Since we fix the random seed for generation, this result should be exactly reproducible if you use the same eval_bsz and number of GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants