Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReFL Training Performance #95

Open
XiaominLi1997 opened this issue Sep 1, 2024 · 2 comments
Open

ReFL Training Performance #95

XiaominLi1997 opened this issue Sep 1, 2024 · 2 comments

Comments

@XiaominLi1997
Copy link

XiaominLi1997 commented Sep 1, 2024

The performance of my ReFL model after training on refl_data.json(https://github.com/THUDM/ImageReward/blob/main/data/refl_data.json) is significantly worse than the untrained SD1.4 model. The results are far from satisfactory, and I'm not sure what might be causing this issue.

Training Settings:
GPUs: 2 * A100 GPU
--train_batch_size: 8
--gradient_accumulation_steps: 4
--num_train_epochs: 100
--learning_rate: 1e-5

Given:
seed: 100
prompt: a coffee mug made of cardboard
Result of untrained sd1.4:
image
Result of trained ReFL:
image

Could you please explain this phenomenon?

@xujz18
Copy link
Member

xujz18 commented Sep 1, 2024

image

In practice, to avoid rapid overfitting and stabilize the fine-tuning, we re-weight ReFL loss and regularize with pre-training loss.

Our code is only used to display the ReFL loss, you will need to add the pre-training loss according to your own settings.

@xujz18 xujz18 added about score About the score of ImageReward about ReFL and removed about score About the score of ImageReward labels Sep 1, 2024
@XiaominLi1997
Copy link
Author

image > In practice, to avoid rapid overfitting and stabilize the fine-tuning, we re-weight ReFL loss and regularize with pre-training loss.

Our code is only used to display the ReFL loss, you will need to add the pre-training loss according to your own settings.
Thx for your reply.

In this answer (#24 (comment)), you mentioned that 'it is simpler to use ReFL alone directly and to achieve decent results.' According to your statement, using only the ReFL loss should yield reasonably good results, but I am unable to achieve that. It seems that the loss has already converged.
image

Additionally, the paper mentions: 'the pre-training dataset is from a 625k subset of LAION-5B [50] selected by aesthetic score.' I wonder if you plan to release this part of the dataset.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants