-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zerobubble] rebase main #6075
[zerobubble] rebase main #6075
Commits on Jul 1, 2024
-
fp8 operators for compressed communication
cast_to_fp8, cast_from_fp8, all_reduce_fp8
Configuration menu - View commit details
-
Copy full SHA for f5a52e1 - Browse repository at this point
Copy the full SHA f5a52e1View commit details
Commits on Jul 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 6991819 - Browse repository at this point
Copy the full SHA 6991819View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for e17f835 - Browse repository at this point
Copy the full SHA e17f835View commit details
Commits on Jul 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for dbfa7d3 - Browse repository at this point
Copy the full SHA dbfa7d3View commit details
Commits on Jul 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1e19594 - Browse repository at this point
Copy the full SHA 1e19594View commit details -
Configuration menu - View commit details
-
Copy full SHA for e881901 - Browse repository at this point
Copy the full SHA e881901View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6601874 - Browse repository at this point
Copy the full SHA 6601874View commit details -
Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/f…
…p8_comm # Conflicts: # colossalai/quantization/fp8.py
Configuration menu - View commit details
-
Copy full SHA for 1f1b856 - Browse repository at this point
Copy the full SHA 1f1b856View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 51f916b - Browse repository at this point
Copy the full SHA 51f916bView commit details
Commits on Jul 16, 2024
-
Merge pull request hpcaitech#5885 from BurkeHulk/feature/fp8_comm
Feature/fp8 comm
Configuration menu - View commit details
-
Copy full SHA for 9470701 - Browse repository at this point
Copy the full SHA 9470701View commit details -
Configuration menu - View commit details
-
Copy full SHA for 457a0de - Browse repository at this point
Copy the full SHA 457a0deView commit details
Commits on Jul 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5a310b9 - Browse repository at this point
Copy the full SHA 5a310b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a20f07 - Browse repository at this point
Copy the full SHA 6a20f07View commit details
Commits on Jul 18, 2024
-
Merge pull request hpcaitech#5899 from BurkeHulk/SP_fp8
[Feature] FP8 communication in ShardFormer
Configuration menu - View commit details
-
Copy full SHA for d0bdb51 - Browse repository at this point
Copy the full SHA d0bdb51View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b969fd - Browse repository at this point
Copy the full SHA 5b969fdView commit details -
Merge pull request hpcaitech#5921 from BurkeHulk/fp8_fix
[Shardformer] Fix Shardformer FP8 communication training accuracy degradation
Configuration menu - View commit details
-
Copy full SHA for 62661cd - Browse repository at this point
Copy the full SHA 62661cdView commit details
Commits on Jul 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5fd0592 - Browse repository at this point
Copy the full SHA 5fd0592View commit details
Commits on Aug 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for ae486ce - Browse repository at this point
Copy the full SHA ae486ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 91e596d - Browse repository at this point
Copy the full SHA 91e596dView commit details -
Merge pull request hpcaitech#5961 from ver217/feature/zeor-fp8
[fp8] add fp8 comm for low level zero
Configuration menu - View commit details
-
Copy full SHA for c297e21 - Browse repository at this point
Copy the full SHA c297e21View commit details
Commits on Aug 5, 2024
-
[Feature] llama shardformer fp8 support (hpcaitech#5938)
* add llama shardformer fp8 * Llama Shardformer Parity * fix typo * fix all reduce * fix pytest failure * fix reduce op and move function to fp8.py * fix typo
Configuration menu - View commit details
-
Copy full SHA for 53cb960 - Browse repository at this point
Copy the full SHA 53cb960View commit details
Commits on Aug 6, 2024
-
[FP8] rebase main (hpcaitech#5963)
* add SimPO * fix dataloader * remove debug code * add orpo * fix style * fix colossalai, transformers version * fix colossalai, transformers version * fix colossalai, transformers version * fix torch colossalai version * update transformers version * [shardformer] DeepseekMoE support (hpcaitech#5871) * [Feature] deepseek moe expert parallel implement * [misc] fix typo, remove redundant file (hpcaitech#5867) * [misc] fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] deepseek support & unit test * [misc] remove debug code & useless print * [misc] fix typos (hpcaitech#5872) * [Feature] remove modeling file, use auto config. (hpcaitech#5884) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [Deepseek] remove redundant code (hpcaitech#5888) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [misc] remove redundant code * [Feature/deepseek] resolve comment. (hpcaitech#5889) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [misc] remove redundant code * [misc] mv module replacement into if branch * [misc] add some warning message and modify some code in unit test * [misc] fix typos --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap Co-authored-by: Edenzzzz <[email protected]> * [Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (hpcaitech#5838) * Diffusion Model Inference support * Stable Diffusion 3 Support * pixartalpha support * [HotFix] CI,import,requirements-test for hpcaitech#5838 (hpcaitech#5892) * [Hot Fix] CI,import,requirements-test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] Enable PP + SP for llama (hpcaitech#5868) * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use a one cross entropy func for all shardformer models --------- Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (hpcaitech#5897) * add benchmark for sft, dpo, simpo, orpo. Add benchmarking result. Support lora with gradient checkpoint * fix style * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix eval * hotfix citation * [zero] support all-gather overlap (hpcaitech#5898) * [zero] support all-gather overlap * [zero] add overlap all-gather flag * [misc] fix typo * [zero] update api * fix orpo cross entropy loss * [Auto Parallel]: Speed up intra-op plan generation by 44% (hpcaitech#5446) * Remove unnecessary calls to deepcopy * Build DimSpec's difference dict only once This change considerably speeds up construction speed of DimSpec objects. The difference_dict is the same for each DimSpec object, so a single copy of it is enough. * Fix documentation of DimSpec's difference method * [ShardFormer] fix qwen2 sp (hpcaitech#5903) * [compatibility] support torch 2.2 (hpcaitech#5875) * Support Pytorch 2.2.2 * keep build_on_pr file and update .compatibility * fix object_to_tensor usage when torch>=2.3.0 (hpcaitech#5820) * [misc] support torch2.3 (hpcaitech#5893) * [misc] support torch2.3 * [devops] update compatibility ci * [devops] update compatibility ci * [devops] add debug * [devops] add debug * [devops] add debug * [devops] add debug * [devops] remove debug * [devops] remove debug * [release] update version (hpcaitech#5912) * [plugin] support all-gather overlap for hybrid parallel (hpcaitech#5919) * [plugin] fixed all-gather overlap support for hybrid parallel * add kto * fix style, add kto data sample * [Examples] Add lazy init to OPT and GPT examples (hpcaitech#5924) Co-authored-by: Edenzzzz <[email protected]> * [ColossalChat] Hotfix for ColossalChat (hpcaitech#5910) * add ignore and tiny llama * fix path issue * run style * fix issue * update bash * add ignore and tiny llama * fix path issue * run style * fix issue * update bash * fix ddp issue * add Qwen 1.5 32B * refactor tokenization * [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (hpcaitech#5931) * cannot access local variable 'default_conversation' where it is not associated with a value set default value for 'default_conversation' * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix test data * refactor evaluation * remove real data path * remove real data path * Add n_fused as an input from native_module (hpcaitech#5894) * [FIX BUG] convert env param to int in (hpcaitech#5934) * [Hotfix] Fix ZeRO typo hpcaitech#5936 Co-authored-by: Edenzzzz <[email protected]> * [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (hpcaitech#5941) * Add a switch to control whether the model checkpoint needs to be saved after each epoch ends * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix style * fix style * fix style * [shardformer] hotfix attn mask (hpcaitech#5945) * [shardformer] hotfix attn mask (hpcaitech#5947) * [Feat] Distrifusion Acceleration Support for Diffusion Inference (hpcaitech#5895) * Distrifusion Support source * comp comm overlap optimization * sd3 benchmark * pixart distrifusion bug fix * sd3 bug fix and benchmark * generation bug fix * naming fix * add docstring, fix counter and shape error * add reference * readme and requirement * [zero] hotfix update master params (hpcaitech#5951) * [release] update version (hpcaitech#5952) * [Chat] Fix lora (hpcaitech#5946) * fix merging * remove filepath * fix style * Update README.md (hpcaitech#5958) * [hotfix] Remove unused plan section (hpcaitech#5957) * remove readme * fix readme * update * [test] add mixtral for sequence classification * [test] add mixtral transformer test * [moe] fix plugin * [test] mixtra pp shard test * [chore] handle non member group * [zero] solve hang * [test] pass mixtral shardformer test * [moe] implement transit between non moe tp and ep * [zero] solve hang * [misc] solve booster hang by rename the variable * solve hang when parallel mode = pp + dp * [moe] implement submesh initialization * [moe] add mixtral dp grad scaling when not all experts are activated * [chore] manually revert unintended commit * [chore] trivial fix * [chore] arg pass & remove drop token * [test] add mixtral modelling test * [moe] implement tp * [moe] test deepseek * [moe] clean legacy code * [Feature] MoE Ulysses Support (hpcaitech#5918) * moe sp support * moe sp bug solve * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [chore] minor fix * [moe] init moe plugin comm setting with sp * moe sp + ep bug fix * [moe] finalize test (no pp) * [moe] full test for deepseek and mixtral (pp + sp to fix) * [chore] minor fix after rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [chore] solve moe ckpt test failure and some other arg pass failure * [moe] remove ops * [test] fix test: test_zero1_2 * [bug] fix: somehow logger hangs the program * [moe] deepseek moe sp support * [test] add check * [deepseek] replace attn (a workaround for bug in transformers) * [misc] skip redunant test * [misc] remove debug/print code * [moe] refactor mesh assignment * Revert "[moe] implement submesh initialization" This reverts commit 2f9bce6. * [chore] change moe_pg_mesh to private * [misc] remove incompatible test config * [misc] fix ci failure: change default value to false in moe plugin * [misc] remove useless condition * [chore] docstring * [moe] remove force_overlap_comm flag and add warning instead * [doc] add MoeHybridParallelPlugin docstring * [moe] solve dp axis issue * [chore] remove redundant test case, print string & reduce test tokens * [feat] Dist Loader for Eval (hpcaitech#5950) * support auto distributed data loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support auto distributed data loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tp error * remove unused parameters * remove unused * update inference * update docs * update inference --------- Co-authored-by: Michelle <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [lora] lora support hybrid parallel plugin (hpcaitech#5956) * lora support hybrid plugin * fix * fix * fix * fix * fp8 operators for compressed communication cast_to_fp8, cast_from_fp8, all_reduce_fp8 * fix scaling algorithm in FP8 casting * support fp8 communication in pipeline parallelism * add fp8_communication flag in the script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * shardformer fp8 * fix rebase * remove all to all * fix shardformer fp8 communication training degradation * [fp8] support all-gather flat tensor (hpcaitech#5932) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * Update low_level_optim.py --------- Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Haze188 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Runyu Lu <[email protected]> Co-authored-by: Guangyao Zhang <[email protected]> Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: Stephan Kö <[email protected]> Co-authored-by: アマデウス <[email protected]> Co-authored-by: Tong Li <[email protected]> Co-authored-by: zhurunhua <[email protected]> Co-authored-by: Insu Jang <[email protected]> Co-authored-by: Gao, Ruiyuan <[email protected]> Co-authored-by: hxwang <[email protected]> Co-authored-by: Michelle <[email protected]> Co-authored-by: Wang Binluo <[email protected]> Co-authored-by: HangXu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0c10afd - Browse repository at this point
Copy the full SHA 0c10afdView commit details -
[fp8]support all2all fp8 (hpcaitech#5953)
* support all2all fp8 * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for afb26de - Browse repository at this point
Copy the full SHA afb26deView commit details
Commits on Aug 7, 2024
-
[fp8] add fp8 linear (hpcaitech#5967)
* [fp8] add fp8 linear * [test] fix fp8 linear test condition * [test] fix fp8 linear test condition * [test] fix fp8 linear test condition
Configuration menu - View commit details
-
Copy full SHA for 76ea164 - Browse repository at this point
Copy the full SHA 76ea164View commit details -
[fp8] support fp8 amp for hybrid parallel plugin (hpcaitech#5975)
* [fp8] support fp8 amp for hybrid parallel plugin * [test] add fp8 hook test * [fp8] fix fp8 linear compatibility
Configuration menu - View commit details
-
Copy full SHA for ccabcf6 - Browse repository at this point
Copy the full SHA ccabcf6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7739629 - Browse repository at this point
Copy the full SHA 7739629View commit details
Commits on Aug 8, 2024
-
[Feature]: support FP8 communication in DDP, FSDP, Gemini (hpcaitech#…
…5928) * support fp8_communication in the Torch DDP grad comm, FSDP grad comm, and FSDP params comm * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implement communication hook for FSDP params all-gather * added unit test for fp8 operators * support fp8 communication in GeminiPlugin * update training scripts to support fsdp and fp8 communication * fixed some minor bugs observed in unit test * add all_gather_into_tensor_flat_fp8 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add skip the test if torch < 2.2.0 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add skip the test if torch < 2.2.0 * add skip the test if torch < 2.2.0 * add fp8_comm flag * rebase latest fp8 operators * rebase latest fp8 operators * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for b480eec - Browse repository at this point
Copy the full SHA b480eecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b9bec8 - Browse repository at this point
Copy the full SHA 4b9bec8View commit details
Commits on Aug 9, 2024
-
[fp8] support gemini plugin (hpcaitech#5978)
* [fp8] refactor hook * [fp8] support gemini plugin * [example] add fp8 option for llama benchmark
Configuration menu - View commit details
-
Copy full SHA for 8241c0c - Browse repository at this point
Copy the full SHA 8241c0cView commit details -
[fp8] use torch compile (torch >= 2.3.0) (hpcaitech#5979)
* [fp8] use torch compile (torch >= 2.4.0) * [fp8] set use_fast_accum in linear * [chore] formal version check * [chore] fix sig
Configuration menu - View commit details
-
Copy full SHA for e4aadee - Browse repository at this point
Copy the full SHA e4aadeeView commit details -
[fp8]Moe support fp8 communication (hpcaitech#5977)
* fix * support moe fp8 * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix fix fi * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for f1a3a32 - Browse repository at this point
Copy the full SHA f1a3a32View commit details
Commits on Aug 12, 2024
-
[fp8] support hybrid parallel plugin (hpcaitech#5982)
* support fp8 comm for qwen2 model * support fp8 comm for qwen2 model * support fp8 comm for qwen2 model * fp8 * fix * bert and bloom * chatglm and command * gpt2,gptj,bert, falcon,blip2 * mistral,opy,sam,t5,vit,whisper * fix * fix * fix
Configuration menu - View commit details
-
Copy full SHA for b2483c8 - Browse repository at this point
Copy the full SHA b2483c8View commit details
Commits on Aug 13, 2024
-
[fp8] refactor fp8 linear with compile (hpcaitech#5993)
* [fp8] refactor fp8 linear with compile * [fp8] fix linear test * [fp8] fix linear test
Configuration menu - View commit details
-
Copy full SHA for 0978080 - Browse repository at this point
Copy the full SHA 0978080View commit details
Commits on Aug 14, 2024
-
[fp8] support asynchronous FP8 communication (hpcaitech#5997)
* fix * fix * fix * support async all2all * support async op for all gather * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 597b206 - Browse repository at this point
Copy the full SHA 597b206View commit details
Commits on Aug 15, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 88fa096 - Browse repository at this point
Copy the full SHA 88fa096View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a2e90d - Browse repository at this point
Copy the full SHA 1a2e90dView commit details -
[fp8]update reduce-scatter test (hpcaitech#6002)
* fix * fix * fix * fix
Configuration menu - View commit details
-
Copy full SHA for 20722a8 - Browse repository at this point
Copy the full SHA 20722a8View commit details
Commits on Aug 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 3f09a61 - Browse repository at this point
Copy the full SHA 3f09a61View commit details -
[fp8] zero support fp8 linear. (hpcaitech#6006)
* fix * fix * fix * zero fp8 * zero fp8 * Update requirements.txt
Configuration menu - View commit details
-
Copy full SHA for 0a51319 - Browse repository at this point
Copy the full SHA 0a51319View commit details
Commits on Aug 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 4cf79fa - Browse repository at this point
Copy the full SHA 4cf79faView commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 81272e9 - Browse repository at this point
Copy the full SHA 81272e9View commit details
Commits on Aug 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 02636c5 - Browse repository at this point
Copy the full SHA 02636c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 52289e4 - Browse repository at this point
Copy the full SHA 52289e4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a5847e - Browse repository at this point
Copy the full SHA 1a5847eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3353042 - Browse repository at this point
Copy the full SHA 3353042View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 64aad96 - Browse repository at this point
Copy the full SHA 64aad96View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4c82bfc - Browse repository at this point
Copy the full SHA 4c82bfcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d8e82a - Browse repository at this point
Copy the full SHA 0d8e82aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 12b4401 - Browse repository at this point
Copy the full SHA 12b4401View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2eb3683 - Browse repository at this point
Copy the full SHA 2eb3683View commit details -
Configuration menu - View commit details
-
Copy full SHA for 88b3f06 - Browse repository at this point
Copy the full SHA 88b3f06View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f703e0 - Browse repository at this point
Copy the full SHA 1f703e0View commit details
Commits on Aug 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5382311 - Browse repository at this point
Copy the full SHA 5382311View commit details -
Configuration menu - View commit details
-
Copy full SHA for f7acfa1 - Browse repository at this point
Copy the full SHA f7acfa1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ee6235 - Browse repository at this point
Copy the full SHA 2ee6235View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e4cbe3 - Browse repository at this point
Copy the full SHA 2e4cbe3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d362ac - Browse repository at this point
Copy the full SHA 2d362acView commit details
Commits on Aug 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for eb5ba40 - Browse repository at this point
Copy the full SHA eb5ba40View commit details -
Configuration menu - View commit details
-
Copy full SHA for 193030f - Browse repository at this point
Copy the full SHA 193030fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6aface9 - Browse repository at this point
Copy the full SHA 6aface9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 698c8b9 - Browse repository at this point
Copy the full SHA 698c8b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b8e282 - Browse repository at this point
Copy the full SHA 8b8e282View commit details
Commits on Aug 22, 2024
-
[fp8] Merge feature/fp8_comm to main branch of Colossalai (hpcaitech#…
…6016) * add SimPO * fix dataloader * remove debug code * add orpo * fix style * fix colossalai, transformers version * fix colossalai, transformers version * fix colossalai, transformers version * fix torch colossalai version * update transformers version * [shardformer] DeepseekMoE support (hpcaitech#5871) * [Feature] deepseek moe expert parallel implement * [misc] fix typo, remove redundant file (hpcaitech#5867) * [misc] fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] deepseek support & unit test * [misc] remove debug code & useless print * [misc] fix typos (hpcaitech#5872) * [Feature] remove modeling file, use auto config. (hpcaitech#5884) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [Deepseek] remove redundant code (hpcaitech#5888) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [misc] remove redundant code * [Feature/deepseek] resolve comment. (hpcaitech#5889) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [misc] remove redundant code * [misc] mv module replacement into if branch * [misc] add some warning message and modify some code in unit test * [misc] fix typos --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap Co-authored-by: Edenzzzz <[email protected]> * [Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (hpcaitech#5838) * Diffusion Model Inference support * Stable Diffusion 3 Support * pixartalpha support * [HotFix] CI,import,requirements-test for hpcaitech#5838 (hpcaitech#5892) * [Hot Fix] CI,import,requirements-test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] Enable PP + SP for llama (hpcaitech#5868) * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use a one cross entropy func for all shardformer models --------- Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (hpcaitech#5897) * add benchmark for sft, dpo, simpo, orpo. Add benchmarking result. Support lora with gradient checkpoint * fix style * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix eval * hotfix citation * [zero] support all-gather overlap (hpcaitech#5898) * [zero] support all-gather overlap * [zero] add overlap all-gather flag * [misc] fix typo * [zero] update api * fix orpo cross entropy loss * [Auto Parallel]: Speed up intra-op plan generation by 44% (hpcaitech#5446) * Remove unnecessary calls to deepcopy * Build DimSpec's difference dict only once This change considerably speeds up construction speed of DimSpec objects. The difference_dict is the same for each DimSpec object, so a single copy of it is enough. * Fix documentation of DimSpec's difference method * [ShardFormer] fix qwen2 sp (hpcaitech#5903) * [compatibility] support torch 2.2 (hpcaitech#5875) * Support Pytorch 2.2.2 * keep build_on_pr file and update .compatibility * fix object_to_tensor usage when torch>=2.3.0 (hpcaitech#5820) * [misc] support torch2.3 (hpcaitech#5893) * [misc] support torch2.3 * [devops] update compatibility ci * [devops] update compatibility ci * [devops] add debug * [devops] add debug * [devops] add debug * [devops] add debug * [devops] remove debug * [devops] remove debug * [release] update version (hpcaitech#5912) * [plugin] support all-gather overlap for hybrid parallel (hpcaitech#5919) * [plugin] fixed all-gather overlap support for hybrid parallel * add kto * fix style, add kto data sample * [Examples] Add lazy init to OPT and GPT examples (hpcaitech#5924) Co-authored-by: Edenzzzz <[email protected]> * [ColossalChat] Hotfix for ColossalChat (hpcaitech#5910) * add ignore and tiny llama * fix path issue * run style * fix issue * update bash * add ignore and tiny llama * fix path issue * run style * fix issue * update bash * fix ddp issue * add Qwen 1.5 32B * refactor tokenization * [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (hpcaitech#5931) * cannot access local variable 'default_conversation' where it is not associated with a value set default value for 'default_conversation' * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix test data * refactor evaluation * remove real data path * remove real data path * Add n_fused as an input from native_module (hpcaitech#5894) * [FIX BUG] convert env param to int in (hpcaitech#5934) * [Hotfix] Fix ZeRO typo hpcaitech#5936 Co-authored-by: Edenzzzz <[email protected]> * [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (hpcaitech#5941) * Add a switch to control whether the model checkpoint needs to be saved after each epoch ends * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix style * fix style * fix style * [shardformer] hotfix attn mask (hpcaitech#5945) * [shardformer] hotfix attn mask (hpcaitech#5947) * [Feat] Distrifusion Acceleration Support for Diffusion Inference (hpcaitech#5895) * Distrifusion Support source * comp comm overlap optimization * sd3 benchmark * pixart distrifusion bug fix * sd3 bug fix and benchmark * generation bug fix * naming fix * add docstring, fix counter and shape error * add reference * readme and requirement * [zero] hotfix update master params (hpcaitech#5951) * [release] update version (hpcaitech#5952) * [Chat] Fix lora (hpcaitech#5946) * fix merging * remove filepath * fix style * Update README.md (hpcaitech#5958) * [hotfix] Remove unused plan section (hpcaitech#5957) * remove readme * fix readme * update * [test] add mixtral for sequence classification * [test] add mixtral transformer test * [moe] fix plugin * [test] mixtra pp shard test * [chore] handle non member group * [zero] solve hang * [test] pass mixtral shardformer test * [moe] implement transit between non moe tp and ep * [zero] solve hang * [misc] solve booster hang by rename the variable * solve hang when parallel mode = pp + dp * [moe] implement submesh initialization * [moe] add mixtral dp grad scaling when not all experts are activated * [chore] manually revert unintended commit * [chore] trivial fix * [chore] arg pass & remove drop token * [test] add mixtral modelling test * [moe] implement tp * [moe] test deepseek * [moe] clean legacy code * [Feature] MoE Ulysses Support (hpcaitech#5918) * moe sp support * moe sp bug solve * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [chore] minor fix * [moe] init moe plugin comm setting with sp * moe sp + ep bug fix * [moe] finalize test (no pp) * [moe] full test for deepseek and mixtral (pp + sp to fix) * [chore] minor fix after rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [chore] solve moe ckpt test failure and some other arg pass failure * [moe] remove ops * [test] fix test: test_zero1_2 * [bug] fix: somehow logger hangs the program * [moe] deepseek moe sp support * [test] add check * [deepseek] replace attn (a workaround for bug in transformers) * [misc] skip redunant test * [misc] remove debug/print code * [moe] refactor mesh assignment * Revert "[moe] implement submesh initialization" This reverts commit 2f9bce6. * [chore] change moe_pg_mesh to private * [misc] remove incompatible test config * [misc] fix ci failure: change default value to false in moe plugin * [misc] remove useless condition * [chore] docstring * [moe] remove force_overlap_comm flag and add warning instead * [doc] add MoeHybridParallelPlugin docstring * [moe] solve dp axis issue * [chore] remove redundant test case, print string & reduce test tokens * [feat] Dist Loader for Eval (hpcaitech#5950) * support auto distributed data loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support auto distributed data loader * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tp error * remove unused parameters * remove unused * update inference * update docs * update inference --------- Co-authored-by: Michelle <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [lora] lora support hybrid parallel plugin (hpcaitech#5956) * lora support hybrid plugin * fix * fix * fix * fix * Support overall loss, update KTO logging * [Docs] clarify launch port Co-authored-by: Edenzzzz <[email protected]> * [Hotfix] README link (hpcaitech#5966) * update ignore * update readme * run style * update readme * [Hotfix] Avoid fused RMSnorm import error without apex (hpcaitech#5985) Co-authored-by: Edenzzzz <[email protected]> * [Chat] fix readme (hpcaitech#5989) * fix readme * fix readme, tokenization fully tested * fix readme, tokenization fully tested * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: root <root@notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9-0.notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9.colossal-ai.svc.cluster.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix sync condition (hpcaitech#6000) * [plugin] add cast inputs option for zero (hpcaitech#6003) * [pre-commit.ci] pre-commit autoupdate (hpcaitech#5995) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [misc] Bypass the huggingface bug to solve the mask mismatch problem (hpcaitech#5991) * [Feature] Zigzag Ring attention (hpcaitech#5905) * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements --------- Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [misc] update compatibility (hpcaitech#6008) * [misc] update compatibility * [misc] update requirements * [devops] disable requirements cache * [test] fix torch ddp test * [test] fix rerun on address in use * [test] fix lazy init * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the merge * fix the merge * overlap kv comm with output rescale (hpcaitech#6017) Co-authored-by: Edenzzzz <[email protected]> * fix the merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the merge * fix * fix * fix the merge * fix * [misc] Use dist logger in plugins (hpcaitech#6011) * use dist logger in plugins * remove trash * print on rank 0 --------- Co-authored-by: Edenzzzz <[email protected]> * fix * fix * fix * fix * fix the merge * fix * fix * fix * fix --------- Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Haze188 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Runyu Lu <[email protected]> Co-authored-by: Guangyao Zhang <[email protected]> Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: Stephan Kö <[email protected]> Co-authored-by: アマデウス <[email protected]> Co-authored-by: Tong Li <[email protected]> Co-authored-by: zhurunhua <[email protected]> Co-authored-by: Insu Jang <[email protected]> Co-authored-by: Gao, Ruiyuan <[email protected]> Co-authored-by: hxwang <[email protected]> Co-authored-by: Michelle <[email protected]> Co-authored-by: root <root@notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9-0.notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9.colossal-ai.svc.cluster.local>
Configuration menu - View commit details
-
Copy full SHA for eea37da - Browse repository at this point
Copy the full SHA eea37daView commit details -
Merge pull request hpcaitech#6023 from wangbluo/fp8_merge
[fp8] merge
Configuration menu - View commit details
-
Copy full SHA for d77e66a - Browse repository at this point
Copy the full SHA d77e66aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 971b16a - Browse repository at this point
Copy the full SHA 971b16aView commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for a292554 - Browse repository at this point
Copy the full SHA a292554View commit details -
Merge pull request hpcaitech#6024 from wangbluo/fix_merge
[fp8] merge
Configuration menu - View commit details
-
Copy full SHA for afe845f - Browse repository at this point
Copy the full SHA afe845fView commit details -
Configuration menu - View commit details
-
Copy full SHA for caab4a3 - Browse repository at this point
Copy the full SHA caab4a3View commit details
Commits on Aug 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0bc9a87 - Browse repository at this point
Copy the full SHA 0bc9a87View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 3b0df30 - Browse repository at this point
Copy the full SHA 3b0df30View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9e76764 - Browse repository at this point
Copy the full SHA 9e76764View commit details -
Merge pull request hpcaitech#6029 from hpcaitech/flybird11111-patch-1
Update train_dpo.py
Configuration menu - View commit details
-
Copy full SHA for 0bf46c5 - Browse repository at this point
Copy the full SHA 0bf46c5View commit details
Commits on Aug 26, 2024
-
Configuration menu - View commit details
-
Copy full SHA for dae3999 - Browse repository at this point
Copy the full SHA dae3999View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 80d24ae - Browse repository at this point
Copy the full SHA 80d24aeView commit details -
Merge pull request hpcaitech#6033 from wangbluo/fix
[fp8] fix the merge
Configuration menu - View commit details
-
Copy full SHA for 4a6f31e - Browse repository at this point
Copy the full SHA 4a6f31eView commit details
Commits on Aug 27, 2024
-
Merge pull request hpcaitech#6012 from hpcaitech/feature/fp8_comm
[fp8] support fp8 communication and fp8 training for Colossalai
Configuration menu - View commit details
-
Copy full SHA for 17904cb - Browse repository at this point
Copy the full SHA 17904cbView commit details -
[CI] Remove triton version for compatibility bug; update req torch >=…
…2.2 (hpcaitech#6018) * remove triton version * remove torch 2.2 * remove torch 2.1 * debug * remove 2.1 build tests * require torch >=2.2 --------- Co-authored-by: Edenzzzz <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d383449 - Browse repository at this point
Copy the full SHA d383449View commit details
Commits on Aug 28, 2024
-
[plugin] hotfix zero plugin (hpcaitech#6036)
* [plugin] hotfix zero plugin * [plugin] hotfix zero plugin
Configuration menu - View commit details
-
Copy full SHA for cc1b0ef - Browse repository at this point
Copy the full SHA cc1b0efView commit details -
[Colossal-LLaMA] Refactor latest APIs (hpcaitech#6030)
* refactor latest code * update api * add dummy dataset * update Readme * add setup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update files * add PP support * update arguments * update argument * reorg folder * update version * remove IB infor * update utils * update readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update save for zero * update save * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add apex * update --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 4a68efb - Browse repository at this point
Copy the full SHA 4a68efbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d3a85d - Browse repository at this point
Copy the full SHA 0d3a85dView commit details
Commits on Aug 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e96a076 - Browse repository at this point
Copy the full SHA e96a076View commit details
Commits on Sep 2, 2024
-
[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model;…
… format error msg (hpcaitech#6020) * fix bug in load_state_dict_into_model; format error msg * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py to support checking missing_keys * Update general_checkpoint_io.py fix bug in missing_keys error message * retrigger tests --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for e9032fb - Browse repository at this point
Copy the full SHA e9032fbView commit details
Commits on Sep 3, 2024
-
[Hotfix] Remove deprecated install (hpcaitech#6042)
* remove deprecated install * remove unused folder
Configuration menu - View commit details
-
Copy full SHA for c650a90 - Browse repository at this point
Copy the full SHA c650a90View commit details -
[fp8] optimize all-gather (hpcaitech#6043)
* [fp8] optimize all-gather * [fp8] fix all gather fp8 ring * [fp8] enable compile * [fp8] fix all gather fp8 ring
Configuration menu - View commit details
-
Copy full SHA for c3b5caf - Browse repository at this point
Copy the full SHA c3b5cafView commit details -
Configuration menu - View commit details
-
Copy full SHA for 26e5539 - Browse repository at this point
Copy the full SHA 26e5539View commit details
Commits on Sep 9, 2024
-
[fp8] disable all_to_all_fp8 in intranode (hpcaitech#6045)
* enhance all_to_all_fp8 with internode comm control * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable some fp8 ops due to performance issue * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 5ce6dd7 - Browse repository at this point
Copy the full SHA 5ce6dd7View commit details
Commits on Sep 10, 2024
-
[release] update version (hpcaitech#6041)
* [release] update version * [devops] update comp test * [devops] update comp test debug * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test * [devops] debug comp test
Configuration menu - View commit details
-
Copy full SHA for b3db105 - Browse repository at this point
Copy the full SHA b3db105View commit details -
[Feature] Split cross-entropy computation in SP (hpcaitech#5959)
* halfway * fix cross-PP-stage position id length diff bug * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * update softmax_lse shape by new interface * change tester name * remove buffer clone; support packed seq layout * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * adapt chatglm, command-R, qwen * debug * halfway * fix cross-PP-stage position id length diff bug * fix typo * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unified cross entropy func for all shardformer models * remove redundant lines * add basic ring attn; debug cross entropy * fwd bwd logic complete * fwd bwd logic complete; add experimental triton rescale * precision tests passed * precision tests passed * fix typos and remove misc files * add sp_mode to benchmark; fix varlen interface * update softmax_lse shape by new interface * add varlen tests * fix typo * all tests passed * add dkv_group; fix mask * remove debug statements * add comments * q1 index only once * remove events to simplify stream sync * simplify forward/backward logic * 2d ring forward passed * 2d ring backward passed * fixes * fix ring attn loss * 2D ring backward + llama passed * merge * update logger * fix typo * rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * remove typos * fixes * support GPT --------- Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 8fd25d6 - Browse repository at this point
Copy the full SHA 8fd25d6View commit details -
[hotfix] moe hybrid parallelism benchmark & follow-up fix (hpcaitech#…
…6048) * [example] pass use_fp8_comm flag to all plugins * [example] add mixtral benchmark * [moe] refine assertion and check * [moe] fix mixtral & add more tests * [moe] consider checking dp * sp group and moe_dp_group * [mixtral] remove gate tp & add more tests * [deepseek] fix tp & sp for deepseek * [mixtral] minor fix * [deepseek] add deepseek benchmark
Configuration menu - View commit details
-
Copy full SHA for c54c4fc - Browse repository at this point
Copy the full SHA c54c4fcView commit details
Commits on Sep 11, 2024
-
[fp8] hotfix backward hook (hpcaitech#6053)
* [fp8] hotfix backward hook * [fp8] hotfix pipeline loss accumulation
Configuration menu - View commit details
-
Copy full SHA for 13946c4 - Browse repository at this point
Copy the full SHA 13946c4View commit details -
[doc] update sp doc (hpcaitech#6055)
* update sp doc * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for a35a078 - Browse repository at this point
Copy the full SHA a35a078View commit details
Commits on Sep 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for fdd84b9 - Browse repository at this point
Copy the full SHA fdd84b9View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 216d54e - Browse repository at this point
Copy the full SHA 216d54eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0a01e2a - Browse repository at this point
Copy the full SHA 0a01e2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 683179c - Browse repository at this point
Copy the full SHA 683179cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6eb8832 - Browse repository at this point
Copy the full SHA 6eb8832View commit details -
Configuration menu - View commit details
-
Copy full SHA for f393867 - Browse repository at this point
Copy the full SHA f393867View commit details -
Configuration menu - View commit details
-
Copy full SHA for dc03217 - Browse repository at this point
Copy the full SHA dc03217View commit details -
[zerobubble]Support ZeroBubble Pipeline (hpcaitech#6034)
* [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble; * [feat] add dw test; * [fix] fix weight not close; * [update] update text; * [feat] add test run_fwd_bwd automatic scheduling; * [feat] split communication and calculation; fix pop empty send_bwd_buffer error; * [feat] add test for p & p grad; * [feat] add comments for ZBV func; * [fix] rm useless assign and comments; * [fix] fix ci test; add pytest; * [feat] add run_fwd_bwd_with_microbatch (replace input) & test; add p&p.grad assert close test & all pass; * [feat] add apply v_schedule graph; p & p.grad assert err exist; * [fix] update * [feat] fix ci; add assert; * [feat] fix poc format * [feat] fix func name & ci; add comments; * [fix] fix poc test; add comments in poc; * [feat] add optim backward_b_by_grad * [feat] fix optimizer bwd b & w; support return accum loss & output * [feat] add fwd_bwd_step, run_fwd_only; * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict; * [fix] fix communication_map; * [feat] update test; rm comments; * [fix] rm zbv in hybridplugin * [fix] fix optim bwd; * [fix] fix optim bwd; * [fix] rm output.data after send fwd; * [fix] fix bwd step if condition; remove useless comments and format info; * [fix] fix detach output & release output; * [fix] rm requir_grad for output; * [fix] fix requir grad position and detach position and input&output local buffer append position; * [feat] add memory assertation; * [fix] fix mem check; * [fix] mem assertation' * [fix] fix mem assertation * [fix] fix mem; use a new model shape; only assert mem less and equal than theo; * [fix] fix model zoo import; * [fix] fix redundant detach & clone; add buffer assertation in the end; * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap; * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim; * [fix] add testcase with microbatch 4;
Configuration menu - View commit details
-
Copy full SHA for e79d442 - Browse repository at this point
Copy the full SHA e79d442View commit details -
Configuration menu - View commit details
-
Copy full SHA for 696fced - Browse repository at this point
Copy the full SHA 696fcedView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0b14a55 - Browse repository at this point
Copy the full SHA 0b14a55View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0ad3129 - Browse repository at this point
Copy the full SHA 0ad3129View commit details -
Configuration menu - View commit details
-
Copy full SHA for b582319 - Browse repository at this point
Copy the full SHA b582319View commit details
Commits on Sep 14, 2024
-
[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (h…
…pcaitech#6059) * all_gather only internode, fix pytest * fix cuda arch <89 compile pytest error * fix pytest failure * disable all_gather_into_tensor_flat_fp8 * fix fp8 format * fix pytest * fix conversations * fix chunk tuple to list
Configuration menu - View commit details
-
Copy full SHA for f20b066 - Browse repository at this point
Copy the full SHA f20b066View commit details -
[doc] FP8 training and communication document (hpcaitech#6050)
* Add FP8 training and communication document * add fp8 docstring for plugins * fix typo * fix typo
Configuration menu - View commit details
-
Copy full SHA for bdb125f - Browse repository at this point
Copy the full SHA bdb125fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 827ef3e - Browse repository at this point
Copy the full SHA 827ef3eView commit details -
Merge pull request hpcaitech#6061 from wangbluo/sp_fix
[sp] : fix the attention kernel for sp
Configuration menu - View commit details
-
Copy full SHA for 37e3523 - Browse repository at this point
Copy the full SHA 37e3523View commit details
Commits on Sep 16, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 10e4f7d - Browse repository at this point
Copy the full SHA 10e4f7dView commit details
Commits on Sep 18, 2024
-
Merge pull request hpcaitech#6064 from wangbluo/fix_attn
[sp] : fix the attention kernel for sp
Configuration menu - View commit details
-
Copy full SHA for 63314ce - Browse repository at this point
Copy the full SHA 63314ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4fa6b95 - Browse repository at this point
Copy the full SHA 4fa6b95View commit details -
[ColossalEval] support for vllm (hpcaitech#6056)
* support vllm * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify vllm and update readme * run pre-commit * remove dupilicated lines and refine code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update param name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine code * update readme * refine code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for f9546ba - Browse repository at this point
Copy the full SHA f9546baView commit details
Commits on Sep 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for dabc2e7 - Browse repository at this point
Copy the full SHA dabc2e7View commit details
Commits on Sep 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b3b3278 - Browse repository at this point
Copy the full SHA b3b3278View commit details -
[feat] add zerobubble pp (just a frame now); add POC test for dx_dw; …
…add test for zerobubble;
Configuration menu - View commit details
-
Copy full SHA for da4595a - Browse repository at this point
Copy the full SHA da4595aView commit details -
Configuration menu - View commit details
-
Copy full SHA for e450dd2 - Browse repository at this point
Copy the full SHA e450dd2View commit details -
Configuration menu - View commit details
-
Copy full SHA for ccc37a4 - Browse repository at this point
Copy the full SHA ccc37a4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 228d71e - Browse repository at this point
Copy the full SHA 228d71eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8b0ffed - Browse repository at this point
Copy the full SHA 8b0ffedView commit details -
Configuration menu - View commit details
-
Copy full SHA for 97f2443 - Browse repository at this point
Copy the full SHA 97f2443View commit details -
Configuration menu - View commit details
-
Copy full SHA for c90bd98 - Browse repository at this point
Copy the full SHA c90bd98View commit details -
[fix] fix optim bwd; add license for v_schedule; remove redundant att…
…ributes; fix schedule loop "while"--> "for"; add communication dict;
Configuration menu - View commit details
-
Copy full SHA for 5df5965 - Browse repository at this point
Copy the full SHA 5df5965View commit details -
Configuration menu - View commit details
-
Copy full SHA for 94a12f6 - Browse repository at this point
Copy the full SHA 94a12f6View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc5e7dc - Browse repository at this point
Copy the full SHA cc5e7dcView commit details -
Configuration menu - View commit details
-
Copy full SHA for ad8ad64 - Browse repository at this point
Copy the full SHA ad8ad64View commit details -
Configuration menu - View commit details
-
Copy full SHA for f347591 - Browse repository at this point
Copy the full SHA f347591View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4249a36 - Browse repository at this point
Copy the full SHA 4249a36View commit details -
Configuration menu - View commit details
-
Copy full SHA for 497d545 - Browse repository at this point
Copy the full SHA 497d545View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0825700 - Browse repository at this point
Copy the full SHA 0825700View commit details -
Configuration menu - View commit details
-
Copy full SHA for ae4cf5b - Browse repository at this point
Copy the full SHA ae4cf5bView commit details -
Configuration menu - View commit details
-
Copy full SHA for e80179c - Browse repository at this point
Copy the full SHA e80179cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2683d26 - Browse repository at this point
Copy the full SHA 2683d26View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9094cc3 - Browse repository at this point
Copy the full SHA 9094cc3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e2f260 - Browse repository at this point
Copy the full SHA 3e2f260View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ce22ae - Browse repository at this point
Copy the full SHA 8ce22aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for f8d6f98 - Browse repository at this point
Copy the full SHA f8d6f98View commit details -
Configuration menu - View commit details
-
Copy full SHA for 78a439b - Browse repository at this point
Copy the full SHA 78a439bView commit details -
[fix] fix pipeline util func deallocate --> release_tensor_data; fix …
…bwd_b loss bwd branch;
Configuration menu - View commit details
-
Copy full SHA for 8bc8bb0 - Browse repository at this point
Copy the full SHA 8bc8bb0View commit details -
Configuration menu - View commit details
-
Copy full SHA for a3a797d - Browse repository at this point
Copy the full SHA a3a797dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d3eaee - Browse repository at this point
Copy the full SHA 4d3eaeeView commit details -
[plugin] hybrid support zero bubble pipeline (hpcaitech#6060)
* hybrid support zbv * fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * Update zero_bubble_pp.py * fix * fix-ci * fix [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * fix * [zerobubble]Support ZeroBubble Pipeline (hpcaitech#6034) * [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble; * [feat] add dw test; * [fix] fix weight not close; * [update] update text; * [feat] add test run_fwd_bwd automatic scheduling; * [feat] split communication and calculation; fix pop empty send_bwd_buffer error; * [feat] add test for p & p grad; * [feat] add comments for ZBV func; * [fix] rm useless assign and comments; * [fix] fix ci test; add pytest; * [feat] add run_fwd_bwd_with_microbatch (replace input) & test; add p&p.grad assert close test & all pass; * [feat] add apply v_schedule graph; p & p.grad assert err exist; * [fix] update * [feat] fix ci; add assert; * [feat] fix poc format * [feat] fix func name & ci; add comments; * [fix] fix poc test; add comments in poc; * [feat] add optim backward_b_by_grad * [feat] fix optimizer bwd b & w; support return accum loss & output * [feat] add fwd_bwd_step, run_fwd_only; * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict; * [fix] fix communication_map; * [feat] update test; rm comments; * [fix] rm zbv in hybridplugin * [fix] fix optim bwd; * [fix] fix optim bwd; * [fix] rm output.data after send fwd; * [fix] fix bwd step if condition; remove useless comments and format info; * [fix] fix detach output & release output; * [fix] rm requir_grad for output; * [fix] fix requir grad position and detach position and input&output local buffer append position; * [feat] add memory assertation; * [fix] fix mem check; * [fix] mem assertation' * [fix] fix mem assertation * [fix] fix mem; use a new model shape; only assert mem less and equal than theo; * [fix] fix model zoo import; * [fix] fix redundant detach & clone; add buffer assertation in the end; * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap; * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim; * [fix] add testcase with microbatch 4; * hybrid support zbv * fix fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update zero_bubble_pp.py * fix * fix-ci * fix [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * fix * fix * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: duanjunwen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2fd9d3e - Browse repository at this point
Copy the full SHA 2fd9d3eView commit details -
[feat] add zerobubble pp (just a frame now); add POC test for dx_dw; …
…add test for zerobubble;
Configuration menu - View commit details
-
Copy full SHA for 21c62b6 - Browse repository at this point
Copy the full SHA 21c62b6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 28ee5a7 - Browse repository at this point
Copy the full SHA 28ee5a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for d44e7e6 - Browse repository at this point
Copy the full SHA d44e7e6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 49d68eb - Browse repository at this point
Copy the full SHA 49d68ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0055c47 - Browse repository at this point
Copy the full SHA 0055c47View commit details -
Configuration menu - View commit details
-
Copy full SHA for 21bf510 - Browse repository at this point
Copy the full SHA 21bf510View commit details -
Configuration menu - View commit details
-
Copy full SHA for 93ede6b - Browse repository at this point
Copy the full SHA 93ede6bView commit details -
[fix] fix optim bwd; add license for v_schedule; remove redundant att…
…ributes; fix schedule loop "while"--> "for"; add communication dict;
Configuration menu - View commit details
-
Copy full SHA for 4ac0d6e - Browse repository at this point
Copy the full SHA 4ac0d6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 262b27e - Browse repository at this point
Copy the full SHA 262b27eView commit details -
Configuration menu - View commit details
-
Copy full SHA for fe99ca3 - Browse repository at this point
Copy the full SHA fe99ca3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 355a3af - Browse repository at this point
Copy the full SHA 355a3afView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4420dc1 - Browse repository at this point
Copy the full SHA 4420dc1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ba031d - Browse repository at this point
Copy the full SHA 7ba031dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e666f5c - Browse repository at this point
Copy the full SHA e666f5cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 93b3604 - Browse repository at this point
Copy the full SHA 93b3604View commit details -
Configuration menu - View commit details
-
Copy full SHA for 78ed432 - Browse repository at this point
Copy the full SHA 78ed432View commit details -
Configuration menu - View commit details
-
Copy full SHA for df12ae7 - Browse repository at this point
Copy the full SHA df12ae7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9e90356 - Browse repository at this point
Copy the full SHA 9e90356View commit details -
Configuration menu - View commit details
-
Copy full SHA for 993f3db - Browse repository at this point
Copy the full SHA 993f3dbView commit details -
[plugin] hybrid support zero bubble pipeline (hpcaitech#6060)
* hybrid support zbv * fix fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * Update zero_bubble_pp.py * fix * fix-ci * fix [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * fix * [zerobubble]Support ZeroBubble Pipeline (hpcaitech#6034) * [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble; * [feat] add dw test; * [fix] fix weight not close; * [update] update text; * [feat] add test run_fwd_bwd automatic scheduling; * [feat] split communication and calculation; fix pop empty send_bwd_buffer error; * [feat] add test for p & p grad; * [feat] add comments for ZBV func; * [fix] rm useless assign and comments; * [fix] fix ci test; add pytest; * [feat] add run_fwd_bwd_with_microbatch (replace input) & test; add p&p.grad assert close test & all pass; * [feat] add apply v_schedule graph; p & p.grad assert err exist; * [fix] update * [feat] fix ci; add assert; * [feat] fix poc format * [feat] fix func name & ci; add comments; * [fix] fix poc test; add comments in poc; * [feat] add optim backward_b_by_grad * [feat] fix optimizer bwd b & w; support return accum loss & output * [feat] add fwd_bwd_step, run_fwd_only; * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict; * [fix] fix communication_map; * [feat] update test; rm comments; * [fix] rm zbv in hybridplugin * [fix] fix optim bwd; * [fix] fix optim bwd; * [fix] rm output.data after send fwd; * [fix] fix bwd step if condition; remove useless comments and format info; * [fix] fix detach output & release output; * [fix] rm requir_grad for output; * [fix] fix requir grad position and detach position and input&output local buffer append position; * [feat] add memory assertation; * [fix] fix mem check; * [fix] mem assertation' * [fix] fix mem assertation * [fix] fix mem; use a new model shape; only assert mem less and equal than theo; * [fix] fix model zoo import; * [fix] fix redundant detach & clone; add buffer assertation in the end; * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap; * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim; * [fix] add testcase with microbatch 4; * hybrid support zbv * fix fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update zero_bubble_pp.py * fix * fix-ci * fix [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * fix * fix * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix * fix --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: duanjunwen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0767948 - Browse repository at this point
Copy the full SHA 0767948View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3251e68 - Browse repository at this point
Copy the full SHA 3251e68View commit details -
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Configuration menu - View commit details
-
Copy full SHA for 797d1ed - Browse repository at this point
Copy the full SHA 797d1edView commit details