You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this interesting study on vocabulary scaling laws.
I'm curious if you ran any experiments comparing the performance of Llama 2 models with larger vocabularies as predicted by your approaches - specifically Llama 2 7B with a 57K vocabulary, Llama 2 13B with a 79K vocabulary, and Llama 2 70B with a 216K vocabulary.
If so, how did the results compare to the original Llama 2 models with 32K vocabularies?
If not, do you have plans to conduct such experiments in future work? Is it bottlenecked by GPU memory wall?
It is not shown on paper but I think if it is memory problem I can help on this issue.
It would be valuable to see empirical validation of your predictions on these widely-used model scales. Thank you!
The text was updated successfully, but these errors were encountered:
Hello @wdlctc, thank you for your interest in our work! We appreciate your inquiry regarding experiments on 7B-level models. Due to budget constraints, we haven't been able to conduct these specific experiments yet. However, we will provide more insights on 7B-level models in the camera-ready version of our paper. We'd be very grateful if any sponsorship opportunities arise to support these experiments. Thanks!
Thank you for this interesting study on vocabulary scaling laws.
I'm curious if you ran any experiments comparing the performance of Llama 2 models with larger vocabularies as predicted by your approaches - specifically Llama 2 7B with a 57K vocabulary, Llama 2 13B with a 79K vocabulary, and Llama 2 70B with a 216K vocabulary.
If so, how did the results compare to the original Llama 2 models with 32K vocabularies?
If not, do you have plans to conduct such experiments in future work? Is it bottlenecked by GPU memory wall?
It is not shown on paper but I think if it is memory problem I can help on this issue.
It would be valuable to see empirical validation of your predictions on these widely-used model scales. Thank you!
The text was updated successfully, but these errors were encountered: