Experiments with Larger Vocabularies for Llama 2 Models? #2

wdlctc · 2024-10-11T22:13:57Z

Thank you for this interesting study on vocabulary scaling laws.

I'm curious if you ran any experiments comparing the performance of Llama 2 models with larger vocabularies as predicted by your approaches - specifically Llama 2 7B with a 57K vocabulary, Llama 2 13B with a 79K vocabulary, and Llama 2 70B with a 216K vocabulary.

If so, how did the results compare to the original Llama 2 models with 32K vocabularies?
If not, do you have plans to conduct such experiments in future work? Is it bottlenecked by GPU memory wall?

It is not shown on paper but I think if it is memory problem I can help on this issue.

It would be valuable to see empirical validation of your predictions on these widely-used model scales. Thank you!

SivilTaram · 2024-10-14T08:20:44Z

Hello @wdlctc, thank you for your interest in our work! We appreciate your inquiry regarding experiments on 7B-level models. Due to budget constraints, we haven't been able to conduct these specific experiments yet. However, we will provide more insights on 7B-level models in the camera-ready version of our paper. We'd be very grateful if any sponsorship opportunities arise to support these experiments. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments with Larger Vocabularies for Llama 2 Models? #2

Experiments with Larger Vocabularies for Llama 2 Models? #2

wdlctc commented Oct 11, 2024 •

edited

Loading

SivilTaram commented Oct 14, 2024

Experiments with Larger Vocabularies for Llama 2 Models? #2

Experiments with Larger Vocabularies for Llama 2 Models? #2

Comments

wdlctc commented Oct 11, 2024 • edited Loading

SivilTaram commented Oct 14, 2024

wdlctc commented Oct 11, 2024 •

edited

Loading