WIP: Trainer #23

isamu-isozaki · 2023-10-26T03:40:12Z

This pr is a WIP. But is the conceptual idea for Issue #18

isamu-isozaki · 2023-10-26T03:42:48Z

some issues so far is that for dataloader, the default transformer's trainer handles it in get_train_dataloader function which handles parallelizing via

return self.accelerator.prepare(DataLoader(train_dataset, **dataloader_params))

so we might want to overwrite it for now but in future have api compatible with accelerate

xrsrke · 2023-10-26T08:22:46Z

return self.accelerator.prepare(DataLoader(train_dataset, **dataloader_params))
so we might want to overwrite it for now but in future have api compatible with accelerate

@isamu-isozaki Thank you so much for the PR. This is also the reason we don't want to use Trainer from transformers. Because we implement our own 3D parallelism, we don't want it to be wrapped by accelerate

isamu-isozaki · 2023-10-26T14:43:45Z

@xrsrke Sounds good. Let me rewrite it to be the minimum training example. But in the future, it might be better to make it inherited, etc so that we can use the updated ver of the transformer's trainer/not having to maintain compatibility with it.

isamu-isozaki · 2023-10-26T18:10:10Z

made it a minimal version proof of concept. (not tested yet)
If we want to expand more the three main options are

Reimplement pretty much all of Trainer in pipegoose.
Override some of the methods in Trainer to make it work

Overall, this might not be a quick pr if we do any of the 2 but ideally, if 2 works that'll be probably the best option maybe?

xrsrke · 2023-10-26T21:08:16Z

@isamu-isozaki, the PR looks great. I think we should prefer option 1 because one potential direction for the future is that we will support the parallelization of any arbitrary transformer torch module, not just transformers. Since transformers is a hub where people push a trained model, our library is the one that people start training from scratch. I also recommend checking out the Lightning trainer [link]. They have excellent abstractions, like separating CallbackHandler (a thing that connects the callback and trainer), Callback, and Trainer.

Here are my learning notes on Lightning's trainer: https://projectfoundation.notion.site/Lightning-f027845e720d4f74aa876b045e58669b. They could be helpful for you

xrsrke · 2023-10-26T21:09:47Z

I will assign the task to you! Thank you. Sometimes, we also hold discussions on our Discord. Do you have a Discord account? https://discord.gg/nSyGZB6Gpp

isamu-isozaki · 2023-10-26T21:40:03Z

Ah sounds good. How much features do you want for the initial Trainer. Like do you have some tests in mind for preliminary use?
And haha I think we are already friends on discord. Let me send a message

Init commit

d9f2bf6

isamu-isozaki marked this pull request as draft October 26, 2023 03:40

Updated to minimal ver

22a868b

xrsrke assigned xrsrke and isamu-isozaki and unassigned xrsrke Oct 26, 2023

xrsrke self-requested a review October 26, 2023 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Trainer #23

WIP: Trainer #23

isamu-isozaki commented Oct 26, 2023

isamu-isozaki commented Oct 26, 2023

xrsrke commented Oct 26, 2023 •

edited

Loading

isamu-isozaki commented Oct 26, 2023 •

edited

Loading

isamu-isozaki commented Oct 26, 2023 •

edited

Loading

xrsrke commented Oct 26, 2023

xrsrke commented Oct 26, 2023 •

edited

Loading

isamu-isozaki commented Oct 26, 2023

WIP: Trainer #23

Are you sure you want to change the base?

WIP: Trainer #23

Conversation

isamu-isozaki commented Oct 26, 2023

isamu-isozaki commented Oct 26, 2023

xrsrke commented Oct 26, 2023 • edited Loading

isamu-isozaki commented Oct 26, 2023 • edited Loading

isamu-isozaki commented Oct 26, 2023 • edited Loading

xrsrke commented Oct 26, 2023

xrsrke commented Oct 26, 2023 • edited Loading

isamu-isozaki commented Oct 26, 2023

xrsrke commented Oct 26, 2023 •

edited

Loading

isamu-isozaki commented Oct 26, 2023 •

edited

Loading

isamu-isozaki commented Oct 26, 2023 •

edited

Loading

xrsrke commented Oct 26, 2023 •

edited

Loading