Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training with pytorch lightning #230

Open
wants to merge 71 commits into
base: main
Choose a base branch
from
Open

training with pytorch lightning #230

wants to merge 71 commits into from

Conversation

ekg
Copy link

@ekg ekg commented Mar 8, 2024

This WIP PR provides FSDP training using pytorch lightning. The current model setup provides a byte level model. I'm not sure it makes sense to merge in this state, but it might provide a nice starting point for others who want to implement FSDP training.

@pjj
Copy link

pjj commented Apr 18, 2024

Thanks for this PR. I'm very curious to hear if you can share anything about the performance you get with this FSDP training loop in terms of tokens per second on a particular model config and hardware (single GPU, single node, or cluster).

@albertfgu albertfgu force-pushed the main branch 2 times, most recently from 6d45666 to 41d30ce Compare June 3, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants