You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thank you very much for your brilliant work on Adan!
And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according to Figure 1. However, I got a higher training loss with Adan than AdamW in ViT-H:
Steps
Adamw_train_loss
Adan_train_loss
200
6.9077
6.9077
400
6.9074
6.9075
600
6.9068
6.9073
800
6.9061
6.907
1000
6.905
6.9064
1200
6.9036
6.9056
1400
6.9014
6.9044
1600
6.899
6.9028
1800
6.8953
6.9003
2000
6.8911
6.8971
2200
6.8848
6.8929
2400
6.8789
6.8893
2600
6.8699
6.8843
2800
6.8626
6.8805
3000
6.8528
6.8744
3200
6.8402
6.868
3400
6.8293
6.862
3600
6.8172
6.8547
3800
6.7989
6.8465
4000
6.7913
6.8405
I used the same HPs as AdamW and only changed beta from (0.9, 0.999) to (0.9, 0.92, 0.999).
I only trained for few steps to see the trend. But it seems the loss gap from AdamW is quite big, should I change other HPs to better using Adan? How can I get a lower Loss than AdamW?
I noticed that Adan prefers a large batch size in Vision tasks, should we using a larger batch size?
Or should I train with more steps to see the trend?
Thank you!
The text was updated successfully, but these errors were encountered:
@haihai-00 Hi,
I suggest referring to the HPs we use for ViT-B and ViT-S. At least, you may try the default betas (0.98,0.92,0.99) and set wd to 0.02.
To make more progress on your task, I will try to train ViT-H now. So, please provide more HPs about your setting.
Thank you!
We are using the HPs provided from this paper: https://arxiv.org/pdf/2208.06366.pdf F Hyperparameters for Image Classification Fine-tuning for ViT-L/16. This is the HPs provided for ViT-L, but we use it since we didn't find released official HPs for ViT-H.
It seems that you are performing the model finetuning and do not train from scratch, right? Actually, we have provided the results for fine-tuning MAE-ViT-Large here.
Moreover, I also fine-tune the MAE-ViT-H on its official pre-trained model (obtain 86.9% after 50 epochs). You may add my WeChat: xyxie_joy, and I can send you the log file.
Hi,
Thank you very much for your brilliant work on Adan!
And from you paper, it said Adan should get a lower loss (both Train and test) than Adamw according to Figure 1. However, I got a higher training loss with Adan than AdamW in ViT-H:
I used the same HPs as AdamW and only changed beta from (0.9, 0.999) to (0.9, 0.92, 0.999).
I only trained for few steps to see the trend. But it seems the loss gap from AdamW is quite big, should I change other HPs to better using Adan? How can I get a lower Loss than AdamW?
I noticed that Adan prefers a large batch size in Vision tasks, should we using a larger batch size?
Or should I train with more steps to see the trend?
Thank you!
The text was updated successfully, but these errors were encountered: