-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The CFG strategy - linear. vs constant #31
Comments
Thanks for your interest! Constant CFG typically results in a very high inception score (in our final model, around 500 IS) but poor FID -- that's also the reason why it achieves very high precision. Linear CFG is used to improve the diversity of the generated images, so that the FID is improved as well as the recall. Also, 200 epochs should result in an FID < 3 if you follow our default training setting. |
It would be good to sweep the CFG again if you change other configs, such as resolution and sampling steps. The optimal CFG typically changes if these configs change. Also, even with 16 sampling steps, the FID seems too high -- as shown in Figure 6 of the paper, the FID using 16 sampling steps should be around 3.0 after 400 epochs training. |
About the FID value. I also think it is too high. To verify this, I employ the provided official checkpoint of MAR_L and choose the step=16 to generate 50k images and test the FID. All the setting totally follows this official recommendation without any modifications. After generating 50k images, I employ the evaluation suite provided by guided_diffusion for FID calculation. However, the FID is 6.23. |
Can you try step=64 and see if you can reproduce the result? This 6.23 FID is too high. step=64, cfg=3.0 should give you <2.0 FID. |
First of all, your FID is still higher than our results. For instance, at 256 steps it should be 1.78 FID. Besides, for small generation steps, the optimal CFG scale is no longer 3.0 and should typically be larger. In our experiments, we sweep the optimal CFG and temperature to find the best FID, for all generation steps. Constant CFG, when using the same scale as linear CFG, typically has very high FID and also high IS (>10 FID and >450 IS). This is because it sacrifices diversity for better fidelity. I would suggest to first identify why your 256 steps result is different from ours, and then sweep the generation parameters (cfg_scale and temperature) for smaller generation steps. |
Thanks very much for your valuabale suggestions. I did no realize that you sweep the CFG for different inference settings, and I will conduct more experiments following your suggestions. About the 1.78 and 1.98 FID. The 1.98 FID is achieved by totally following the repository without any modification. The inference setting is thus same with the following: torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 Do you have any suggestions? Thanks again for your help! |
Thanks for the great work!
I meet a confusing question when replicating this work. I retrain this code and evaluate the performance between different CFG strategies provided in the code, linear vs constant. I find that constant cfg has a much worse FID but much better sFID. The contradiction between FID and sFID is confusing. Besides, the IS and Recall also seems contradict each other.
The text was updated successfully, but these errors were encountered: