-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM #5854
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* update to fully overlap, still debugging * improve interface * fixed deadlock bug * debug NaN loss * (experimental) use one comm group for send_fw_recv_fw to fix NaN * cleaned up interfaces; use one batch p2p for all * clean up; removed the double p2p batch case * p2p test passsed * improve overlap: send fwd before backward * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tentatively use 2 p2p batches * remove two p2p batches * fix typos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove pp.sh --------- Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: root <root@notebook-c55824c0-7742-45e8-9591-c855bb77ad29-0.notebook-c55824c0-7742-45e8-9591-c855bb77ad29.colossal-ai.svc.cluster.local>
* [gemini] fix missing return * [gemini] fix missing arg pass * [gemini] use gather tensor instead of list * [test] enable flash attention for benchmark by default * [test] enable flash attention for benchmark by default --------- Co-authored-by: genghaozhe <[email protected]>
GuangyaoZhang
changed the title
Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
Jun 26, 2024
* [zero] use bucket during allgather * [zero] rename api
) * t5 token, still pytest fail * Resolve T5 Pytest Failure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typos --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* lazy init support * lazy init llama support * :lazy init support for baichuan * aligh rpc * add note for baichuan --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* delete xformers * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [moe] removed openmoe-coupled code and rectify mixstral code (hpcaitech#5471) * [Feauture] MoE refractor; Intergration with Mixtral (hpcaitech#5682) * cherry pick from refractor-moe branch * tests passed * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support ep + zero --------- Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add mixtral auto policy & move pipeline forward code to modeling folder * [moe refactor] modify kernel test without Route Class * [moe refactor] add moe tensor test path environment variable to github workflow * fix typos * fix moe test bug due to the code rebase * [moe refactor] fix moe zero test, and little bug in low level zero * fix typo * add moe tensor path to github workflow * remove some useless code * fix typo & unify global variable XX_AXIS logic without using -1 * fix typo & prettifier the code * remove print code & support zero 2 test * remove useless code * reanme function * fix typo * fix typo * Further improve the test code * remove print code * [moe refactor] change test model from fake moe model to mixtral moe layer and remove useless test * [moe refactor] skip some unit test which will be refactored later * [moe refactor] fix unit import error * [moe refactor] fix circular import issues * [moe refactor] remove debug code * [moe refactor] update github workflow * [moe/zero] refactor low level optimizer (hpcaitech#5767) * [zero] refactor low level optimizer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] MoE refactor with newest version of ZeRO (hpcaitech#5801) * [zero] remove redundant members in BucketStore (hpcaitech#5802) * [zero] align api with previous version * [Moe/Zero] Update MoeHybridParallelPlugin with refactored ZeRO and Fix Zero bug (hpcaitech#5819) * [moe refactor] update unit test with the refactored ZeRO and remove useless test * move moe checkpoint to checkpoint folder and exchange global axis to class member * update moe hybrid parallel plugin with newest version of zero & fix zero working/master params bug * fix zero unit test * Add an assertion to prevent users from using it incorrectly * [hotfix]Solve the compatibility issue of zero refactor (hpcaitech#5823) * [moe refactor] update unit test with the refactored ZeRO and remove useless test * move moe checkpoint to checkpoint folder and exchange global axis to class member * update moe hybrid parallel plugin with newest version of zero & fix zero working/master params bug * fix zero unit test * Add an assertion to prevent users from using it incorrectly * Modify function parameter names to resolve compatibility issues * [zero] fix missing hook removal (hpcaitech#5824) * [MoE] Resolve .github conflict (hpcaitech#5829) * [Fix/Example] Fix Llama Inference Loading Data Type (hpcaitech#5763) * [fix/example] fix llama inference loading dtype * revise loading dtype of benchmark llama3 * [release] update version (hpcaitech#5752) * [release] update version * [devops] update compatibility test * [devops] update compatibility test * [devops] update compatibility test * [devops] update compatibility test * [test] fix ddp plugin test * [test] fix gptj and rpc test * [devops] fix cuda ext compatibility * [inference] fix flash decoding test * [inference] fix flash decoding test * fix (hpcaitech#5765) * [test] Fix/fix testcase (hpcaitech#5770) * [fix] branch for fix testcase; * [fix] fix test_analyzer & test_auto_parallel; * [fix] remove local change about moe; * [fix] rm local change moe; * [Hotfix] Add missing init file in inference.executor (hpcaitech#5774) * [CI/tests] simplify some test case to reduce testing time (hpcaitech#5755) * [ci/tests] simplify some test case to reduce testing time * [ci/tests] continue to remove test case to reduce ci time cost * restore some test config * [ci/tests] continue to reduce ci time cost * [misc] update dockerfile (hpcaitech#5776) * [misc] update dockerfile * [misc] update dockerfile * [devops] fix docker ci (hpcaitech#5780) * [Inference]Add Streaming LLM (hpcaitech#5745) * Add Streaming LLM * add some parameters to llama_generation.py * verify streamingllm config * add test_streamingllm.py * modified according to the opinions of review * add Citation * change _block_tables tolist * [hotfix] fix llama flash attention forward (hpcaitech#5777) * [misc] Accelerate CI for zero and dist optim (hpcaitech#5758) * remove fp16 from lamb * remove d2h copy in checking states --------- Co-authored-by: Edenzzzz <[email protected]> * [Test/CI] remove test cases to reduce CI duration (hpcaitech#5753) * [test] smaller gpt2 test case * [test] reduce test cases: tests/test_zero/test_gemini/test_zeroddp_state_dict.py * [test] reduce test cases: tests/test_zero/test_gemini/test_grad_accum.py * [test] reduce test cases tests/test_zero/test_gemini/test_optim.py * Revert "[test] smaller gpt2 test case" Some tests might depend on the size of model (num of chunks) This reverts commit df705a5. * [test] reduce test cases: tests/test_checkpoint_io/test_gemini_checkpoint_io.py * [CI] smaller test model for two mwo the two modifid cases * [CI] hardcode gpt model for tests/test_zero/test_gemini/test_search.py since we need a fixed answer there * [hotfix] fix testcase in test_fx/test_tracer (hpcaitech#5779) * [fix] branch for fix testcase; * [fix] fix test_analyzer & test_auto_parallel; * [fix] remove local change about moe; * [fix] rm local change moe; * [fix] fix test_deepfm_model & test_dlrf_model; * [fix] fix test_hf_albert & test_hf_gpt; * [gemini] optimize reduce scatter d2h copy (hpcaitech#5760) * [gemini] optimize reduce scatter d2h copy * [fix] fix missing reduce variable * [refactor] remove legacy async reduce scatter code * [gemini] missing sync * Revert "[refactor] remove legacy async reduce scatter code" This reverts commit 58ad76d. * [gemini] further optimize with async all reduce * [fix] pass flag from manager to chunk * Allow building cuda extension without a device. (hpcaitech#5535) Added FORCE_CUDA environment variable support, to enable building extensions where a GPU device is not present but cuda libraries are. * [misc] fix dist logger (hpcaitech#5782) * [install]fix setup (hpcaitech#5786) * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [misc] update requirements (hpcaitech#5787) * [shardformer] fix import (hpcaitech#5788) * upgrade colossal-chat support tp_group>1, add sp for sft * upgrade ppo dpo rm script * run pre-commit * moupdate ci tests, st ci test cases passed, tp failed in generation for ppo, sp is buggy * fix training script * fix ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix transformers version * remove duplicated test * fix datasets version * remove models that require huggingface auth from ci * remove local data path * update ci * remove baichuan from template test due to transformer version conflict * merge * Refactor modeling by adding attention backend Signed-off-by: char-1ee <[email protected]> * Fix tests and naming Signed-off-by: char-1ee <[email protected]> * Pass inference model shard configs for module init Signed-off-by: char-1ee <[email protected]> * Clean up Signed-off-by: char-1ee <[email protected]> * replace the customized dataloader setup with the build-in one * replace the customized dataloader setup with the build-in one * Remove flash attention backend Signed-off-by: char-1ee <[email protected]> * fix readme * Fix test import Signed-off-by: char-1ee <[email protected]> * update sft trainning script * [Inference]refactor baichuan (hpcaitech#5791) * refactor baichuan * remove unused code and add TODO for lazyinit * [test] fix chatglm test kit (hpcaitech#5793) * [shardformer] fix modeling of bloom and falcon (hpcaitech#5796) * [test] fix qwen2 pytest distLarge (hpcaitech#5797) * [Inference] Fix flash-attn import and add model test (hpcaitech#5794) * Fix torch int32 dtype Signed-off-by: char-1ee <[email protected]> * Fix flash-attn import Signed-off-by: char-1ee <[email protected]> * Add generalized model test Signed-off-by: char-1ee <[email protected]> * Remove exposed path to model Signed-off-by: char-1ee <[email protected]> * Add default value for use_flash_attn Signed-off-by: char-1ee <[email protected]> * Rename model test Signed-off-by: char-1ee <[email protected]> --------- Signed-off-by: char-1ee <[email protected]> * [Gemini] Use async stream to prefetch and h2d data moving (hpcaitech#5781) * use async stream to prefetch and h2d data moving * Remove redundant code * [gemini] quick fix on possible async operation (hpcaitech#5803) * [gemini] quick fix on possible async operation * [gemini] quick fix on possible async operation * [shardformer] upgrade transformers to 4.39.3 (hpcaitech#5815) * [shardformer]upgrade transformers for gpt2/gptj/whisper (hpcaitech#5807) * [shardformer] fix modeling of gpt2 and gptj * [shardformer] fix whisper modeling * [misc] update requirements --------- Co-authored-by: ver217 <[email protected]> * [shardformer]upgrade transformers for mistral (hpcaitech#5808) * upgrade transformers for mistral * fix * fix * [shardformer]upgrade transformers for llama (hpcaitech#5809) * update transformers fix * fix * fix * [inference] upgrade transformers (hpcaitech#5810) * update transformers fix * fix * fix * fix * fix * [gemini] update transformers for gemini (hpcaitech#5814) --------- Co-authored-by: ver217 <[email protected]> * Support 4d parallel + flash attention (hpcaitech#5789) * support tp + sp + pp * remove comments --------- Co-authored-by: Edenzzzz <[email protected]> --------- Signed-off-by: char-1ee <[email protected]> Co-authored-by: Yuanheng Zhao <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: flybird11111 <[email protected]> Co-authored-by: duanjunwen <[email protected]> Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: botbw <[email protected]> Co-authored-by: Charles Coulombe <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: YeAnbang <[email protected]> Co-authored-by: char-1ee <[email protected]> Co-authored-by: Runyu Lu <[email protected]> Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Guangyao Zhang <[email protected]> * [zero] fix hook bug * [zero] add low level optimizer back (hpcaitech#5839) * [zero] fix param & refactor * [zero] add back original low level opt * [zero] remove moe related * [zero] pass zero tests * [zero] refactor * [chore] add del func back * [zero] comments and naming (hpcaitech#5840) * [zero] modify api (hpcaitech#5843) * [zero] modify api * [test] remove _grad_store access in tests * [test] fix (hpcaitech#5857) * [CI] skip openmoe CI check * [CI] fox pre-commit * [zero] remove redundant memebr init (hpcaitech#5862) * [misc] remove useless code, modify the pg mesh implementation * [misc] remove useless code, modify the pg mesh implementation * [misc] use tempfile * resolve conflict with main branch * [misc] use tempfile in test_moe_checkpoint.py * [misc] remove useless code, add assertion about sequence parallel, move logger into function * [misc] remove useless code --------- Signed-off-by: char-1ee <[email protected]> Co-authored-by: Frank Lee <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: Edenzzzz <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: botbw <[email protected]> Co-authored-by: Yuanheng Zhao <[email protected]> Co-authored-by: Hongxin Liu <[email protected]> Co-authored-by: flybird11111 <[email protected]> Co-authored-by: duanjunwen <[email protected]> Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Charles Coulombe <[email protected]> Co-authored-by: YeAnbang <[email protected]> Co-authored-by: char-1ee <[email protected]> Co-authored-by: Runyu Lu <[email protected]> Co-authored-by: YeAnbang <[email protected]> Co-authored-by: Guangyao Zhang <[email protected]>
ver217
reviewed
Jul 1, 2024
Co-authored-by: Edenzzzz <[email protected]>
* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/PyCQA/autoflake: v2.2.1 → v2.3.1](PyCQA/autoflake@v2.2.1...v2.3.1) - [github.com/pycqa/isort: 5.12.0 → 5.13.2](PyCQA/isort@5.12.0...5.13.2) - [github.com/psf/black-pre-commit-mirror: 23.9.1 → 24.4.2](psf/black-pre-commit-mirror@23.9.1...24.4.2) - [github.com/pre-commit/mirrors-clang-format: v13.0.1 → v18.1.7](pre-commit/mirrors-clang-format@v13.0.1...v18.1.7) - [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v4.6.0](pre-commit/pre-commit-hooks@v4.3.0...v4.6.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Edenzzzz <[email protected]>
* [quant] fix bitsandbytes version check * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates: - [github.com/pre-commit/mirrors-clang-format: v18.1.7 → v18.1.8](pre-commit/mirrors-clang-format@v18.1.7...v18.1.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [Feature] deepseek moe expert parallel implement * [misc] fix typo, remove redundant file (hpcaitech#5867) * [misc] fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [Feature] deepseek support & unit test * [misc] remove debug code & useless print * [misc] fix typos (hpcaitech#5872) * [Feature] remove modeling file, use auto config. (hpcaitech#5884) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [Deepseek] remove redundant code (hpcaitech#5888) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [misc] remove redundant code * [Feature/deepseek] resolve comment. (hpcaitech#5889) * [misc] fix typos * [Feature] deepseek support via auto model, remove modeling file * [misc] delete useless file * [misc] fix typos * [misc] remove redundant code * [misc] mv module replacement into if branch * [misc] add some warning message and modify some code in unit test * [misc] fix typos --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Edenzzzz <[email protected]>
…ch#5838) * Diffusion Model Inference support * Stable Diffusion 3 Support * pixartalpha support
* [Hot Fix] CI,import,requirements-test --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📌 Checklist before creating the PR
[doc/gemini/tensor/...]: A concise description
pip install pre-commit && pre-commit install
🚨 Issue number
resolved #5853
📝 What does this PR do?
Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM
💥 Checklist before requesting a review
⭐️ Do you enjoy contributing to Colossal-AI?
Tell us more if you don't enjoy contributing to Colossal-AI.