Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zerobubble] rebase main #6075

Merged

Commits on Jul 1, 2024

  1. fp8 operators for compressed communication

    cast_to_fp8, cast_from_fp8, all_reduce_fp8
    BurkeHulk authored Jul 1, 2024
    Configuration menu
    Copy the full SHA
    f5a52e1 View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2024

  1. Configuration menu
    Copy the full SHA
    6991819 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e17f835 View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2024

  1. fix typo

    GuangyaoZhang committed Jul 10, 2024
    Configuration menu
    Copy the full SHA
    dbfa7d3 View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2024

  1. Configuration menu
    Copy the full SHA
    1e19594 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e881901 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6601874 View commit details
    Browse the repository at this point in the history
  4. Merge remote-tracking branch 'origin/feature/fp8_comm' into feature/f…

    …p8_comm
    
    # Conflicts:
    #	colossalai/quantization/fp8.py
    BurkeHulk committed Jul 12, 2024
    Configuration menu
    Copy the full SHA
    1f1b856 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    51f916b View commit details
    Browse the repository at this point in the history

Commits on Jul 16, 2024

  1. Configuration menu
    Copy the full SHA
    9470701 View commit details
    Browse the repository at this point in the history
  2. shardformer fp8

    GuangyaoZhang committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    457a0de View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2024

  1. fix rebase

    GuangyaoZhang committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    5a310b9 View commit details
    Browse the repository at this point in the history
  2. remove all to all

    GuangyaoZhang committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    6a20f07 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2024

  1. Merge pull request hpcaitech#5899 from BurkeHulk/SP_fp8

    [Feature] FP8 communication in ShardFormer
    GuangyaoZhang authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    d0bdb51 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5b969fd View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#5921 from BurkeHulk/fp8_fix

    [Shardformer] Fix Shardformer FP8 communication training accuracy degradation
    GuangyaoZhang authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    62661cd View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2024

  1. Configuration menu
    Copy the full SHA
    5fd0592 View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2024

  1. Configuration menu
    Copy the full SHA
    ae486ce View commit details
    Browse the repository at this point in the history
  2. [test] add zero fp8 test case

    ver217 committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    91e596d View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#5961 from ver217/feature/zeor-fp8

    [fp8] add fp8 comm for low level zero
    BurkeHulk authored Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c297e21 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2024

  1. [Feature] llama shardformer fp8 support (hpcaitech#5938)

    * add llama shardformer fp8
    
    * Llama Shardformer Parity
    
    * fix typo
    
    * fix all reduce
    
    * fix pytest failure
    
    * fix reduce op and move function to fp8.py
    
    * fix typo
    GuangyaoZhang authored Aug 5, 2024
    Configuration menu
    Copy the full SHA
    53cb960 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2024

  1. [FP8] rebase main (hpcaitech#5963)

    * add SimPO
    
    * fix dataloader
    
    * remove debug code
    
    * add orpo
    
    * fix style
    
    * fix colossalai, transformers version
    
    * fix colossalai, transformers version
    
    * fix colossalai, transformers version
    
    * fix torch colossalai version
    
    * update transformers version
    
    * [shardformer] DeepseekMoE support (hpcaitech#5871)
    
    * [Feature] deepseek moe expert parallel implement
    
    * [misc] fix typo, remove redundant file (hpcaitech#5867)
    
    * [misc] fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [Feature] deepseek support & unit test
    
    * [misc] remove debug code & useless print
    
    * [misc] fix typos (hpcaitech#5872)
    
    * [Feature] remove modeling file, use auto config. (hpcaitech#5884)
    
    * [misc] fix typos
    
    * [Feature] deepseek support via auto model, remove modeling file
    
    * [misc] delete useless file
    
    * [misc] fix typos
    
    * [Deepseek] remove redundant code (hpcaitech#5888)
    
    * [misc] fix typos
    
    * [Feature] deepseek support via auto model, remove modeling file
    
    * [misc] delete useless file
    
    * [misc] fix typos
    
    * [misc] remove redundant code
    
    * [Feature/deepseek] resolve comment. (hpcaitech#5889)
    
    * [misc] fix typos
    
    * [Feature] deepseek support via auto model, remove modeling file
    
    * [misc] delete useless file
    
    * [misc] fix typos
    
    * [misc] remove redundant code
    
    * [misc] mv module replacement into if branch
    
    * [misc] add some warning message and modify some code in unit test
    
    * [misc] fix typos
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (hpcaitech#5838)
    
    * Diffusion Model Inference support
    
    * Stable Diffusion 3 Support
    
    * pixartalpha support
    
    * [HotFix] CI,import,requirements-test for hpcaitech#5838 (hpcaitech#5892)
    
    * [Hot Fix] CI,import,requirements-test
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [Feature] Enable PP + SP for llama (hpcaitech#5868)
    
    * fix cross-PP-stage position id length diff bug
    
    * fix typo
    
    * fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * use a one cross entropy func for all shardformer models
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (hpcaitech#5897)
    
    * add benchmark for sft, dpo, simpo, orpo. Add benchmarking result. Support lora with gradient checkpoint
    
    * fix style
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix eval
    
    * hotfix citation
    
    * [zero] support all-gather overlap (hpcaitech#5898)
    
    * [zero] support all-gather overlap
    
    * [zero] add overlap all-gather flag
    
    * [misc] fix typo
    
    * [zero] update api
    
    * fix orpo cross entropy loss
    
    * [Auto Parallel]: Speed up intra-op plan generation by 44% (hpcaitech#5446)
    
    * Remove unnecessary calls to deepcopy
    
    * Build DimSpec's difference dict only once
    
    This change considerably speeds up construction speed of DimSpec objects. The difference_dict is the same for each DimSpec object, so a single copy of it is enough.
    
    * Fix documentation of DimSpec's difference method
    
    * [ShardFormer] fix qwen2 sp (hpcaitech#5903)
    
    * [compatibility] support torch 2.2 (hpcaitech#5875)
    
    * Support Pytorch 2.2.2
    
    * keep build_on_pr file and update .compatibility
    
    * fix object_to_tensor usage when torch>=2.3.0 (hpcaitech#5820)
    
    * [misc] support torch2.3 (hpcaitech#5893)
    
    * [misc] support torch2.3
    
    * [devops] update compatibility ci
    
    * [devops] update compatibility ci
    
    * [devops] add debug
    
    * [devops] add debug
    
    * [devops] add debug
    
    * [devops] add debug
    
    * [devops] remove debug
    
    * [devops] remove debug
    
    * [release] update version (hpcaitech#5912)
    
    * [plugin] support all-gather overlap for hybrid parallel (hpcaitech#5919)
    
    * [plugin] fixed all-gather overlap support for hybrid parallel
    
    * add kto
    
    * fix style, add kto data sample
    
    * [Examples] Add lazy init to OPT and GPT examples (hpcaitech#5924)
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [ColossalChat] Hotfix for ColossalChat (hpcaitech#5910)
    
    * add ignore and tiny llama
    
    * fix path issue
    
    * run style
    
    * fix issue
    
    * update bash
    
    * add ignore and tiny llama
    
    * fix path issue
    
    * run style
    
    * fix issue
    
    * update bash
    
    * fix ddp issue
    
    * add Qwen 1.5 32B
    
    * refactor tokenization
    
    * [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (hpcaitech#5931)
    
    * cannot access local variable 'default_conversation' where it is not associated with a value
    
    set default value for 'default_conversation'
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * fix test data
    
    * refactor evaluation
    
    * remove real data path
    
    * remove real data path
    
    * Add n_fused as an input from native_module (hpcaitech#5894)
    
    * [FIX BUG] convert env param to int in (hpcaitech#5934)
    
    * [Hotfix] Fix ZeRO typo hpcaitech#5936
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (hpcaitech#5941)
    
    * Add a switch to control whether the model checkpoint needs to be saved after each epoch ends
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * fix style
    
    * fix style
    
    * fix style
    
    * [shardformer] hotfix attn mask (hpcaitech#5945)
    
    * [shardformer] hotfix attn mask (hpcaitech#5947)
    
    * [Feat] Distrifusion Acceleration Support for Diffusion Inference (hpcaitech#5895)
    
    * Distrifusion Support source
    
    * comp comm overlap optimization
    
    * sd3 benchmark
    
    * pixart distrifusion bug fix
    
    * sd3 bug fix and benchmark
    
    * generation bug fix
    
    * naming fix
    
    * add docstring, fix counter and shape error
    
    * add reference
    
    * readme and requirement
    
    * [zero] hotfix update master params (hpcaitech#5951)
    
    * [release] update version (hpcaitech#5952)
    
    * [Chat] Fix lora (hpcaitech#5946)
    
    * fix merging
    
    * remove filepath
    
    * fix style
    
    * Update README.md (hpcaitech#5958)
    
    * [hotfix] Remove unused plan section (hpcaitech#5957)
    
    * remove readme
    
    * fix readme
    
    * update
    
    * [test] add mixtral for sequence classification
    
    * [test] add mixtral transformer test
    
    * [moe] fix plugin
    
    * [test] mixtra pp shard test
    
    * [chore] handle non member group
    
    * [zero] solve hang
    
    * [test] pass mixtral shardformer test
    
    * [moe] implement transit between non moe tp and ep
    
    * [zero] solve hang
    
    * [misc] solve booster hang by rename the variable
    
    * solve hang when parallel mode = pp + dp
    
    * [moe] implement submesh initialization
    
    * [moe] add mixtral dp grad scaling when not all experts are activated
    
    * [chore] manually revert unintended commit
    
    * [chore] trivial fix
    
    * [chore] arg pass & remove drop token
    
    * [test] add mixtral modelling test
    
    * [moe] implement tp
    
    * [moe] test deepseek
    
    * [moe] clean legacy code
    
    * [Feature] MoE Ulysses Support (hpcaitech#5918)
    
    * moe sp support
    
    * moe sp bug solve
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [chore] minor fix
    
    * [moe] init moe plugin comm setting with sp
    
    * moe sp + ep bug fix
    
    * [moe] finalize test (no pp)
    
    * [moe] full test for deepseek and mixtral (pp + sp to fix)
    
    * [chore] minor fix after rebase
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * [chore] solve moe ckpt test failure and some other arg pass failure
    
    * [moe] remove ops
    
    * [test] fix test: test_zero1_2
    
    * [bug] fix: somehow logger hangs the program
    
    * [moe] deepseek moe sp support
    
    * [test] add check
    
    * [deepseek] replace attn (a workaround for bug in transformers)
    
    * [misc] skip redunant test
    
    * [misc] remove debug/print code
    
    * [moe] refactor mesh assignment
    
    * Revert "[moe] implement submesh initialization"
    
    This reverts commit 2f9bce6.
    
    * [chore] change moe_pg_mesh to private
    
    * [misc] remove incompatible test config
    
    * [misc] fix ci failure: change default value to false in moe plugin
    
    * [misc] remove useless condition
    
    * [chore] docstring
    
    * [moe] remove force_overlap_comm flag and add warning instead
    
    * [doc] add MoeHybridParallelPlugin docstring
    
    * [moe] solve dp axis issue
    
    * [chore] remove redundant test case, print string & reduce test tokens
    
    * [feat] Dist Loader for Eval (hpcaitech#5950)
    
    * support auto distributed data loader
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * support auto distributed data loader
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix tp error
    
    * remove unused parameters
    
    * remove unused
    
    * update inference
    
    * update docs
    
    * update inference
    
    ---------
    
    Co-authored-by: Michelle <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [lora] lora support hybrid parallel plugin (hpcaitech#5956)
    
    * lora support hybrid plugin
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fp8 operators for compressed communication
    
    cast_to_fp8, cast_from_fp8, all_reduce_fp8
    
    * fix scaling algorithm in FP8 casting
    
    * support fp8 communication in pipeline parallelism
    
    * add fp8_communication flag in the script
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * shardformer fp8
    
    * fix rebase
    
    * remove all to all
    
    * fix shardformer fp8 communication training degradation
    
    * [fp8] support all-gather flat tensor (hpcaitech#5932)
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * Update low_level_optim.py
    
    ---------
    
    Co-authored-by: YeAnbang <[email protected]>
    Co-authored-by: Haze188 <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: Runyu Lu <[email protected]>
    Co-authored-by: Guangyao Zhang <[email protected]>
    Co-authored-by: YeAnbang <[email protected]>
    Co-authored-by: Hongxin Liu <[email protected]>
    Co-authored-by: Stephan Kö <[email protected]>
    Co-authored-by: アマデウス <[email protected]>
    Co-authored-by: Tong Li <[email protected]>
    Co-authored-by: zhurunhua <[email protected]>
    Co-authored-by: Insu Jang <[email protected]>
    Co-authored-by: Gao, Ruiyuan <[email protected]>
    Co-authored-by: hxwang <[email protected]>
    Co-authored-by: Michelle <[email protected]>
    Co-authored-by: Wang Binluo <[email protected]>
    Co-authored-by: HangXu <[email protected]>
    20 people authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    0c10afd View commit details
    Browse the repository at this point in the history
  2. [fp8]support all2all fp8 (hpcaitech#5953)

    * support all2all fp8
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    flybird11111 and pre-commit-ci[bot] authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    afb26de View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2024

  1. [fp8] add fp8 linear (hpcaitech#5967)

    * [fp8] add fp8 linear
    
    * [test] fix fp8 linear test condition
    
    * [test] fix fp8 linear test condition
    
    * [test] fix fp8 linear test condition
    ver217 authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    76ea164 View commit details
    Browse the repository at this point in the history
  2. [fp8] support fp8 amp for hybrid parallel plugin (hpcaitech#5975)

    * [fp8] support fp8 amp for hybrid parallel plugin
    
    * [test] add fp8 hook test
    
    * [fp8] fix fp8 linear compatibility
    ver217 authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    ccabcf6 View commit details
    Browse the repository at this point in the history
  3. fix (hpcaitech#5976)

    flybird11111 authored Aug 7, 2024
    Configuration menu
    Copy the full SHA
    7739629 View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2024

  1. [Feature]: support FP8 communication in DDP, FSDP, Gemini (hpcaitech#…

    …5928)
    
    * support fp8_communication in the Torch DDP grad comm, FSDP grad comm, and FSDP params comm
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * implement communication hook for FSDP params all-gather
    
    * added unit test for fp8 operators
    
    * support fp8 communication in GeminiPlugin
    
    * update training scripts to support fsdp and fp8 communication
    
    * fixed some minor bugs observed in unit test
    
    * add all_gather_into_tensor_flat_fp8
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add skip the test if torch < 2.2.0
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add skip the test if torch < 2.2.0
    
    * add skip the test if torch < 2.2.0
    
    * add fp8_comm flag
    
    * rebase latest fp8 operators
    
    * rebase latest fp8 operators
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    BurkeHulk and pre-commit-ci[bot] authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    b480eec View commit details
    Browse the repository at this point in the history
  2. [test ci]Feature/fp8 comm (hpcaitech#5981)

    * fix
    
    * fix
    
    * fix
    flybird11111 authored Aug 8, 2024
    Configuration menu
    Copy the full SHA
    4b9bec8 View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. [fp8] support gemini plugin (hpcaitech#5978)

    * [fp8] refactor hook
    
    * [fp8] support gemini plugin
    
    * [example] add fp8 option for llama benchmark
    ver217 authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    8241c0c View commit details
    Browse the repository at this point in the history
  2. [fp8] use torch compile (torch >= 2.3.0) (hpcaitech#5979)

    * [fp8] use torch compile (torch >= 2.4.0)
    
    * [fp8] set use_fast_accum in linear
    
    * [chore] formal version check
    
    * [chore] fix sig
    botbw authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    e4aadee View commit details
    Browse the repository at this point in the history
  3. [fp8]Moe support fp8 communication (hpcaitech#5977)

    * fix
    
    * support moe fp8
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    fix
    
    fi
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    flybird11111 and pre-commit-ci[bot] authored Aug 9, 2024
    Configuration menu
    Copy the full SHA
    f1a3a32 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2024

  1. [fp8] support hybrid parallel plugin (hpcaitech#5982)

    * support fp8 comm for qwen2 model
    
    * support fp8 comm for qwen2 model
    
    * support fp8 comm for qwen2 model
    
    * fp8
    
    * fix
    
    * bert and bloom
    
    * chatglm and command
    
    * gpt2,gptj,bert, falcon,blip2
    
    * mistral,opy,sam,t5,vit,whisper
    
    * fix
    
    * fix
    
    * fix
    wangbluo authored Aug 12, 2024
    Configuration menu
    Copy the full SHA
    b2483c8 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2024

  1. [fp8] refactor fp8 linear with compile (hpcaitech#5993)

    * [fp8] refactor fp8 linear with compile
    
    * [fp8] fix linear test
    
    * [fp8] fix linear test
    ver217 authored Aug 13, 2024
    Configuration menu
    Copy the full SHA
    0978080 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2024

  1. [fp8] support asynchronous FP8 communication (hpcaitech#5997)

    * fix
    
    * fix
    
    * fix
    
    * support async all2all
    
    * support async op for all gather
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    flybird11111 and pre-commit-ci[bot] authored Aug 14, 2024
    Configuration menu
    Copy the full SHA
    597b206 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2024

  1. Configuration menu
    Copy the full SHA
    88fa096 View commit details
    Browse the repository at this point in the history
  2. [fp8] linear perf enhancement

    botbw committed Aug 15, 2024
    Configuration menu
    Copy the full SHA
    1a2e90d View commit details
    Browse the repository at this point in the history
  3. [fp8]update reduce-scatter test (hpcaitech#6002)

    * fix
    
    * fix
    
    * fix
    
    * fix
    flybird11111 authored Aug 15, 2024
    Configuration menu
    Copy the full SHA
    20722a8 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2024

  1. Configuration menu
    Copy the full SHA
    3f09a61 View commit details
    Browse the repository at this point in the history
  2. [fp8] zero support fp8 linear. (hpcaitech#6006)

    * fix
    
    * fix
    
    * fix
    
    * zero fp8
    
    * zero fp8
    
    * Update requirements.txt
    flybird11111 authored Aug 16, 2024
    Configuration menu
    Copy the full SHA
    0a51319 View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2024

  1. merge

    wangbluo committed Aug 17, 2024
    Configuration menu
    Copy the full SHA
    4cf79fa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    81272e9 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2024

  1. fix the merge

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    02636c5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    52289e4 View commit details
    Browse the repository at this point in the history
  3. fix the merge

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    1a5847e View commit details
    Browse the repository at this point in the history
  4. fix the merge

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    3353042 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    64aad96 View commit details
    Browse the repository at this point in the history
  6. fix the merge

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    4c82bfc View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    0d8e82a View commit details
    Browse the repository at this point in the history
  8. fix

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    12b4401 View commit details
    Browse the repository at this point in the history
  9. fix

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    2eb3683 View commit details
    Browse the repository at this point in the history
  10. fix the merge

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    88b3f06 View commit details
    Browse the repository at this point in the history
  11. fix

    wangbluo committed Aug 19, 2024
    Configuration menu
    Copy the full SHA
    1f703e0 View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2024

  1. fix

    wangbluo committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    5382311 View commit details
    Browse the repository at this point in the history
  2. fix

    wangbluo committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    f7acfa1 View commit details
    Browse the repository at this point in the history
  3. fix

    wangbluo committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    2ee6235 View commit details
    Browse the repository at this point in the history
  4. fix

    wangbluo committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    2e4cbe3 View commit details
    Browse the repository at this point in the history
  5. fix merge

    wangbluo committed Aug 20, 2024
    Configuration menu
    Copy the full SHA
    2d362ac View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2024

  1. fix the merge

    wangbluo committed Aug 21, 2024
    Configuration menu
    Copy the full SHA
    eb5ba40 View commit details
    Browse the repository at this point in the history
  2. fix

    wangbluo committed Aug 21, 2024
    Configuration menu
    Copy the full SHA
    193030f View commit details
    Browse the repository at this point in the history
  3. fix

    wangbluo committed Aug 21, 2024
    Configuration menu
    Copy the full SHA
    6aface9 View commit details
    Browse the repository at this point in the history
  4. fix

    wangbluo committed Aug 21, 2024
    Configuration menu
    Copy the full SHA
    698c8b9 View commit details
    Browse the repository at this point in the history
  5. fix

    wangbluo committed Aug 21, 2024
    Configuration menu
    Copy the full SHA
    8b8e282 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2024

  1. [fp8] Merge feature/fp8_comm to main branch of Colossalai (hpcaitech#…

    …6016)
    
    * add SimPO
    
    * fix dataloader
    
    * remove debug code
    
    * add orpo
    
    * fix style
    
    * fix colossalai, transformers version
    
    * fix colossalai, transformers version
    
    * fix colossalai, transformers version
    
    * fix torch colossalai version
    
    * update transformers version
    
    * [shardformer] DeepseekMoE support (hpcaitech#5871)
    
    * [Feature] deepseek moe expert parallel implement
    
    * [misc] fix typo, remove redundant file (hpcaitech#5867)
    
    * [misc] fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [Feature] deepseek support & unit test
    
    * [misc] remove debug code & useless print
    
    * [misc] fix typos (hpcaitech#5872)
    
    * [Feature] remove modeling file, use auto config. (hpcaitech#5884)
    
    * [misc] fix typos
    
    * [Feature] deepseek support via auto model, remove modeling file
    
    * [misc] delete useless file
    
    * [misc] fix typos
    
    * [Deepseek] remove redundant code (hpcaitech#5888)
    
    * [misc] fix typos
    
    * [Feature] deepseek support via auto model, remove modeling file
    
    * [misc] delete useless file
    
    * [misc] fix typos
    
    * [misc] remove redundant code
    
    * [Feature/deepseek] resolve comment. (hpcaitech#5889)
    
    * [misc] fix typos
    
    * [Feature] deepseek support via auto model, remove modeling file
    
    * [misc] delete useless file
    
    * [misc] fix typos
    
    * [misc] remove redundant code
    
    * [misc] mv module replacement into if branch
    
    * [misc] add some warning message and modify some code in unit test
    
    * [misc] fix typos
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (hpcaitech#5838)
    
    * Diffusion Model Inference support
    
    * Stable Diffusion 3 Support
    
    * pixartalpha support
    
    * [HotFix] CI,import,requirements-test for hpcaitech#5838 (hpcaitech#5892)
    
    * [Hot Fix] CI,import,requirements-test
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [Feature] Enable PP + SP for llama (hpcaitech#5868)
    
    * fix cross-PP-stage position id length diff bug
    
    * fix typo
    
    * fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * use a one cross entropy func for all shardformer models
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (hpcaitech#5897)
    
    * add benchmark for sft, dpo, simpo, orpo. Add benchmarking result. Support lora with gradient checkpoint
    
    * fix style
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix eval
    
    * hotfix citation
    
    * [zero] support all-gather overlap (hpcaitech#5898)
    
    * [zero] support all-gather overlap
    
    * [zero] add overlap all-gather flag
    
    * [misc] fix typo
    
    * [zero] update api
    
    * fix orpo cross entropy loss
    
    * [Auto Parallel]: Speed up intra-op plan generation by 44% (hpcaitech#5446)
    
    * Remove unnecessary calls to deepcopy
    
    * Build DimSpec's difference dict only once
    
    This change considerably speeds up construction speed of DimSpec objects. The difference_dict is the same for each DimSpec object, so a single copy of it is enough.
    
    * Fix documentation of DimSpec's difference method
    
    * [ShardFormer] fix qwen2 sp (hpcaitech#5903)
    
    * [compatibility] support torch 2.2 (hpcaitech#5875)
    
    * Support Pytorch 2.2.2
    
    * keep build_on_pr file and update .compatibility
    
    * fix object_to_tensor usage when torch>=2.3.0 (hpcaitech#5820)
    
    * [misc] support torch2.3 (hpcaitech#5893)
    
    * [misc] support torch2.3
    
    * [devops] update compatibility ci
    
    * [devops] update compatibility ci
    
    * [devops] add debug
    
    * [devops] add debug
    
    * [devops] add debug
    
    * [devops] add debug
    
    * [devops] remove debug
    
    * [devops] remove debug
    
    * [release] update version (hpcaitech#5912)
    
    * [plugin] support all-gather overlap for hybrid parallel (hpcaitech#5919)
    
    * [plugin] fixed all-gather overlap support for hybrid parallel
    
    * add kto
    
    * fix style, add kto data sample
    
    * [Examples] Add lazy init to OPT and GPT examples (hpcaitech#5924)
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [ColossalChat] Hotfix for ColossalChat (hpcaitech#5910)
    
    * add ignore and tiny llama
    
    * fix path issue
    
    * run style
    
    * fix issue
    
    * update bash
    
    * add ignore and tiny llama
    
    * fix path issue
    
    * run style
    
    * fix issue
    
    * update bash
    
    * fix ddp issue
    
    * add Qwen 1.5 32B
    
    * refactor tokenization
    
    * [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (hpcaitech#5931)
    
    * cannot access local variable 'default_conversation' where it is not associated with a value
    
    set default value for 'default_conversation'
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * fix test data
    
    * refactor evaluation
    
    * remove real data path
    
    * remove real data path
    
    * Add n_fused as an input from native_module (hpcaitech#5894)
    
    * [FIX BUG] convert env param to int in (hpcaitech#5934)
    
    * [Hotfix] Fix ZeRO typo hpcaitech#5936
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (hpcaitech#5941)
    
    * Add a switch to control whether the model checkpoint needs to be saved after each epoch ends
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * fix style
    
    * fix style
    
    * fix style
    
    * [shardformer] hotfix attn mask (hpcaitech#5945)
    
    * [shardformer] hotfix attn mask (hpcaitech#5947)
    
    * [Feat] Distrifusion Acceleration Support for Diffusion Inference (hpcaitech#5895)
    
    * Distrifusion Support source
    
    * comp comm overlap optimization
    
    * sd3 benchmark
    
    * pixart distrifusion bug fix
    
    * sd3 bug fix and benchmark
    
    * generation bug fix
    
    * naming fix
    
    * add docstring, fix counter and shape error
    
    * add reference
    
    * readme and requirement
    
    * [zero] hotfix update master params (hpcaitech#5951)
    
    * [release] update version (hpcaitech#5952)
    
    * [Chat] Fix lora (hpcaitech#5946)
    
    * fix merging
    
    * remove filepath
    
    * fix style
    
    * Update README.md (hpcaitech#5958)
    
    * [hotfix] Remove unused plan section (hpcaitech#5957)
    
    * remove readme
    
    * fix readme
    
    * update
    
    * [test] add mixtral for sequence classification
    
    * [test] add mixtral transformer test
    
    * [moe] fix plugin
    
    * [test] mixtra pp shard test
    
    * [chore] handle non member group
    
    * [zero] solve hang
    
    * [test] pass mixtral shardformer test
    
    * [moe] implement transit between non moe tp and ep
    
    * [zero] solve hang
    
    * [misc] solve booster hang by rename the variable
    
    * solve hang when parallel mode = pp + dp
    
    * [moe] implement submesh initialization
    
    * [moe] add mixtral dp grad scaling when not all experts are activated
    
    * [chore] manually revert unintended commit
    
    * [chore] trivial fix
    
    * [chore] arg pass & remove drop token
    
    * [test] add mixtral modelling test
    
    * [moe] implement tp
    
    * [moe] test deepseek
    
    * [moe] clean legacy code
    
    * [Feature] MoE Ulysses Support (hpcaitech#5918)
    
    * moe sp support
    
    * moe sp bug solve
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [chore] minor fix
    
    * [moe] init moe plugin comm setting with sp
    
    * moe sp + ep bug fix
    
    * [moe] finalize test (no pp)
    
    * [moe] full test for deepseek and mixtral (pp + sp to fix)
    
    * [chore] minor fix after rebase
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * [chore] solve moe ckpt test failure and some other arg pass failure
    
    * [moe] remove ops
    
    * [test] fix test: test_zero1_2
    
    * [bug] fix: somehow logger hangs the program
    
    * [moe] deepseek moe sp support
    
    * [test] add check
    
    * [deepseek] replace attn (a workaround for bug in transformers)
    
    * [misc] skip redunant test
    
    * [misc] remove debug/print code
    
    * [moe] refactor mesh assignment
    
    * Revert "[moe] implement submesh initialization"
    
    This reverts commit 2f9bce6.
    
    * [chore] change moe_pg_mesh to private
    
    * [misc] remove incompatible test config
    
    * [misc] fix ci failure: change default value to false in moe plugin
    
    * [misc] remove useless condition
    
    * [chore] docstring
    
    * [moe] remove force_overlap_comm flag and add warning instead
    
    * [doc] add MoeHybridParallelPlugin docstring
    
    * [moe] solve dp axis issue
    
    * [chore] remove redundant test case, print string & reduce test tokens
    
    * [feat] Dist Loader for Eval (hpcaitech#5950)
    
    * support auto distributed data loader
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * support auto distributed data loader
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix tp error
    
    * remove unused parameters
    
    * remove unused
    
    * update inference
    
    * update docs
    
    * update inference
    
    ---------
    
    Co-authored-by: Michelle <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [lora] lora support hybrid parallel plugin (hpcaitech#5956)
    
    * lora support hybrid plugin
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * Support overall loss, update KTO logging
    
    * [Docs] clarify launch port
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [Hotfix] README link (hpcaitech#5966)
    
    * update ignore
    
    * update readme
    
    * run style
    
    * update readme
    
    * [Hotfix] Avoid fused RMSnorm import error without apex (hpcaitech#5985)
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * [Chat] fix readme (hpcaitech#5989)
    
    * fix readme
    
    * fix readme, tokenization fully tested
    
    * fix readme, tokenization fully tested
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: root <root@notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9-0.notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9.colossal-ai.svc.cluster.local>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * fix sync condition (hpcaitech#6000)
    
    * [plugin] add cast inputs option for zero (hpcaitech#6003)
    
    * [pre-commit.ci] pre-commit autoupdate (hpcaitech#5995)
    
    updates:
    - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [misc] Bypass the huggingface bug to solve the mask mismatch problem (hpcaitech#5991)
    
    * [Feature] Zigzag Ring attention (hpcaitech#5905)
    
    * halfway
    
    * fix cross-PP-stage position id length diff bug
    
    * fix typo
    
    * fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * unified cross entropy func for all shardformer models
    
    * remove redundant lines
    
    * add basic ring attn; debug cross entropy
    
    * fwd bwd logic complete
    
    * fwd bwd logic complete; add experimental triton rescale
    
    * precision tests passed
    
    * precision tests passed
    
    * fix typos and remove misc files
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add sp_mode to benchmark; fix varlen interface
    
    * update softmax_lse shape by new interface
    
    * change tester name
    
    * remove buffer clone; support packed seq layout
    
    * add varlen tests
    
    * fix typo
    
    * all tests passed
    
    * add dkv_group; fix mask
    
    * remove debug statements
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    
    * [misc] update compatibility (hpcaitech#6008)
    
    * [misc] update compatibility
    
    * [misc] update requirements
    
    * [devops] disable requirements cache
    
    * [test] fix torch ddp test
    
    * [test] fix rerun on address in use
    
    * [test] fix lazy init
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix the merge
    
    * fix the merge
    
    * overlap kv comm with output rescale (hpcaitech#6017)
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * fix the merge
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix the merge
    
    * fix
    
    * fix
    
    * fix the merge
    
    * fix
    
    * [misc] Use dist logger in plugins (hpcaitech#6011)
    
    * use dist logger in plugins
    
    * remove trash
    
    * print on rank 0
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix the merge
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    ---------
    
    Co-authored-by: YeAnbang <[email protected]>
    Co-authored-by: Haze188 <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: Runyu Lu <[email protected]>
    Co-authored-by: Guangyao Zhang <[email protected]>
    Co-authored-by: YeAnbang <[email protected]>
    Co-authored-by: Hongxin Liu <[email protected]>
    Co-authored-by: Stephan Kö <[email protected]>
    Co-authored-by: アマデウス <[email protected]>
    Co-authored-by: Tong Li <[email protected]>
    Co-authored-by: zhurunhua <[email protected]>
    Co-authored-by: Insu Jang <[email protected]>
    Co-authored-by: Gao, Ruiyuan <[email protected]>
    Co-authored-by: hxwang <[email protected]>
    Co-authored-by: Michelle <[email protected]>
    Co-authored-by: root <root@notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9-0.notebook-8f919155-6035-47b4-9c6f-1be133b9e2c9.colossal-ai.svc.cluster.local>
    19 people authored Aug 22, 2024
    Configuration menu
    Copy the full SHA
    eea37da View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d77e66a View commit details
    Browse the repository at this point in the history
  3. fix

    wangbluo committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    971b16a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a292554 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    afe845f View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    caab4a3 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2024

  1. Update train_dpo.py

    flybird11111 authored Aug 23, 2024
    Configuration menu
    Copy the full SHA
    0bc9a87 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3b0df30 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    9e76764 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0bf46c5 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2024

  1. fix

    wangbluo committed Aug 26, 2024
    Configuration menu
    Copy the full SHA
    dae3999 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    80d24ae View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#6033 from wangbluo/fix

    [fp8] fix the merge
    wangbluo authored Aug 26, 2024
    Configuration menu
    Copy the full SHA
    4a6f31e View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. Merge pull request hpcaitech#6012 from hpcaitech/feature/fp8_comm

    [fp8]  support fp8 communication and fp8 training for Colossalai
    ver217 authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    17904cb View commit details
    Browse the repository at this point in the history
  2. [CI] Remove triton version for compatibility bug; update req torch >=…

    …2.2 (hpcaitech#6018)
    
    * remove triton version
    
    * remove torch 2.2
    
    * remove torch 2.1
    
    * debug
    
    * remove 2.1 build tests
    
    * require torch >=2.2
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Edenzzzz and Edenzzzz authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    d383449 View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2024

  1. [plugin] hotfix zero plugin (hpcaitech#6036)

    * [plugin] hotfix zero plugin
    
    * [plugin] hotfix zero plugin
    ver217 authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    cc1b0ef View commit details
    Browse the repository at this point in the history
  2. [Colossal-LLaMA] Refactor latest APIs (hpcaitech#6030)

    * refactor latest code
    
    * update api
    
    * add dummy dataset
    
    * update Readme
    
    * add setup
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * update files
    
    * add PP support
    
    * update arguments
    
    * update argument
    
    * reorg folder
    
    * update version
    
    * remove IB infor
    
    * update utils
    
    * update readme
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * update save for zero
    
    * update save
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add apex
    
    * update
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    TongLi3701 and pre-commit-ci[bot] authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    4a68efb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    0d3a85d View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Configuration menu
    Copy the full SHA
    e96a076 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. [colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model;…

    … format error msg (hpcaitech#6020)
    
    * fix bug in load_state_dict_into_model; format error msg
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Update utils.py
    
    to support checking missing_keys
    
    * Update general_checkpoint_io.py
    
    fix bug in missing_keys error message
    
    * retrigger tests
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    flymin and pre-commit-ci[bot] authored Sep 2, 2024
    Configuration menu
    Copy the full SHA
    e9032fb View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2024

  1. [Hotfix] Remove deprecated install (hpcaitech#6042)

    * remove deprecated install
    
    * remove unused folder
    TongLi3701 authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    c650a90 View commit details
    Browse the repository at this point in the history
  2. [fp8] optimize all-gather (hpcaitech#6043)

    * [fp8] optimize all-gather
    
    * [fp8] fix all gather fp8 ring
    
    * [fp8] enable compile
    
    * [fp8] fix all gather fp8 ring
    ver217 authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    c3b5caf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    26e5539 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2024

  1. [fp8] disable all_to_all_fp8 in intranode (hpcaitech#6045)

    * enhance all_to_all_fp8 with internode comm control
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * disable some fp8 ops due to performance issue
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    BurkeHulk and pre-commit-ci[bot] authored Sep 9, 2024
    Configuration menu
    Copy the full SHA
    5ce6dd7 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. [release] update version (hpcaitech#6041)

    * [release] update version
    
    * [devops] update comp test
    
    * [devops] update comp test debug
    
    * [devops] debug comp test
    
    * [devops] debug comp test
    
    * [devops] debug comp test
    
    * [devops] debug comp test
    
    * [devops] debug comp test
    ver217 authored Sep 10, 2024
    Configuration menu
    Copy the full SHA
    b3db105 View commit details
    Browse the repository at this point in the history
  2. [Feature] Split cross-entropy computation in SP (hpcaitech#5959)

    * halfway
    
    * fix cross-PP-stage position id length diff bug
    
    * fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * unified cross entropy func for all shardformer models
    
    * remove redundant lines
    
    * add basic ring attn; debug cross entropy
    
    * fwd bwd logic complete
    
    * fwd bwd logic complete; add experimental triton rescale
    
    * precision tests passed
    
    * precision tests passed
    
    * fix typos and remove misc files
    
    * update softmax_lse shape by new interface
    
    * change tester name
    
    * remove buffer clone; support packed seq layout
    
    * add varlen tests
    
    * fix typo
    
    * all tests passed
    
    * add dkv_group; fix mask
    
    * remove debug statements
    
    * adapt chatglm, command-R, qwen
    
    * debug
    
    * halfway
    
    * fix cross-PP-stage position id length diff bug
    
    * fix typo
    
    * fix typo
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * unified cross entropy func for all shardformer models
    
    * remove redundant lines
    
    * add basic ring attn; debug cross entropy
    
    * fwd bwd logic complete
    
    * fwd bwd logic complete; add experimental triton rescale
    
    * precision tests passed
    
    * precision tests passed
    
    * fix typos and remove misc files
    
    * add sp_mode to benchmark; fix varlen interface
    
    * update softmax_lse shape by new interface
    
    * add varlen tests
    
    * fix typo
    
    * all tests passed
    
    * add dkv_group; fix mask
    
    * remove debug statements
    
    * add comments
    
    * q1 index only once
    
    * remove events to simplify stream sync
    
    * simplify forward/backward logic
    
    * 2d ring forward passed
    
    * 2d ring backward passed
    
    * fixes
    
    * fix ring attn loss
    
    * 2D ring backward + llama passed
    
    * merge
    
    * update logger
    
    * fix typo
    
    * rebase
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix typo
    
    * remove typos
    
    * fixes
    
    * support GPT
    
    ---------
    
    Co-authored-by: Edenzzzz <[email protected]>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    3 people authored Sep 10, 2024
    Configuration menu
    Copy the full SHA
    8fd25d6 View commit details
    Browse the repository at this point in the history
  3. [hotfix] moe hybrid parallelism benchmark & follow-up fix (hpcaitech#…

    …6048)
    
    * [example] pass use_fp8_comm flag to all plugins
    
    * [example] add mixtral benchmark
    
    * [moe] refine assertion and check
    
    * [moe] fix mixtral & add more tests
    
    * [moe] consider checking dp * sp group and moe_dp_group
    
    * [mixtral] remove gate tp & add more tests
    
    * [deepseek] fix tp & sp for deepseek
    
    * [mixtral] minor fix
    
    * [deepseek] add deepseek benchmark
    botbw authored Sep 10, 2024
    Configuration menu
    Copy the full SHA
    c54c4fc View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2024

  1. [fp8] hotfix backward hook (hpcaitech#6053)

    * [fp8] hotfix backward hook
    
    * [fp8] hotfix pipeline loss accumulation
    ver217 authored Sep 11, 2024
    Configuration menu
    Copy the full SHA
    13946c4 View commit details
    Browse the repository at this point in the history
  2. [doc] update sp doc (hpcaitech#6055)

    * update sp doc
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    flybird11111 and pre-commit-ci[bot] authored Sep 11, 2024
    Configuration menu
    Copy the full SHA
    a35a078 View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2024

  1. fix the sp

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    fdd84b9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    216d54e View commit details
    Browse the repository at this point in the history
  3. fix the attn

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    0a01e2a View commit details
    Browse the repository at this point in the history
  4. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    683179c View commit details
    Browse the repository at this point in the history
  5. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    6eb8832 View commit details
    Browse the repository at this point in the history
  6. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    f393867 View commit details
    Browse the repository at this point in the history
  7. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    dc03217 View commit details
    Browse the repository at this point in the history
  8. [zerobubble]Support ZeroBubble Pipeline (hpcaitech#6034)

    * [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble;
    
    * [feat] add dw test;
    
    * [fix] fix weight not close;
    
    * [update] update text;
    
    * [feat] add test run_fwd_bwd automatic scheduling;
    
    * [feat] split communication and calculation; fix pop empty send_bwd_buffer error;
    
    * [feat] add test for p & p grad;
    
    * [feat] add comments for ZBV func;
    
    * [fix] rm useless assign and comments;
    
    * [fix] fix ci test; add pytest;
    
    * [feat] add run_fwd_bwd_with_microbatch  (replace input) & test; add p&p.grad assert close test & all pass;
    
    * [feat] add apply v_schedule graph; p & p.grad assert err exist;
    
    * [fix] update
    
    * [feat] fix ci; add assert;
    
    * [feat] fix poc format
    
    * [feat] fix func name & ci; add comments;
    
    * [fix] fix poc test; add comments in poc;
    
    * [feat] add optim backward_b_by_grad
    
    * [feat] fix optimizer bwd b & w; support return accum loss & output
    
    * [feat] add fwd_bwd_step, run_fwd_only;
    
    * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict;
    
    * [fix] fix communication_map;
    
    * [feat] update test; rm comments;
    
    * [fix] rm zbv in hybridplugin
    
    * [fix] fix optim bwd;
    
    * [fix] fix optim bwd;
    
    * [fix] rm output.data after send fwd;
    
    * [fix] fix bwd step if condition; remove useless comments and format info;
    
    * [fix] fix detach output & release output;
    
    * [fix] rm requir_grad for output;
    
    * [fix] fix requir grad position and detach position and input&output local buffer append position;
    
    * [feat] add memory assertation;
    
    * [fix] fix mem check;
    
    * [fix] mem assertation'
    
    * [fix] fix mem assertation
    
    * [fix] fix mem; use a new model shape; only assert mem less and equal than theo;
    
    * [fix] fix model zoo import;
    
    * [fix] fix redundant detach & clone; add buffer assertation in the end;
    
    * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap;
    
    * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim;
    
    * [fix] add testcase with microbatch 4;
    duanjunwen authored and flybird11111 committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    e79d442 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    696fced View commit details
    Browse the repository at this point in the history
  10. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    0b14a55 View commit details
    Browse the repository at this point in the history
  11. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    0ad3129 View commit details
    Browse the repository at this point in the history
  12. fix

    wangbluo committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    b582319 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2024

  1. [fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (h…

    …pcaitech#6059)
    
    * all_gather only internode, fix pytest
    
    * fix cuda arch <89 compile pytest error
    
    * fix pytest failure
    
    * disable all_gather_into_tensor_flat_fp8
    
    * fix fp8 format
    
    * fix pytest
    
    * fix conversations
    
    * fix chunk tuple to list
    GuangyaoZhang authored Sep 14, 2024
    Configuration menu
    Copy the full SHA
    f20b066 View commit details
    Browse the repository at this point in the history
  2. [doc] FP8 training and communication document (hpcaitech#6050)

    * Add FP8 training and communication document
    
    * add fp8 docstring for plugins
    
    * fix typo
    
    * fix typo
    GuangyaoZhang authored Sep 14, 2024
    Configuration menu
    Copy the full SHA
    bdb125f View commit details
    Browse the repository at this point in the history
  3. fix

    wangbluo committed Sep 14, 2024
    Configuration menu
    Copy the full SHA
    827ef3e View commit details
    Browse the repository at this point in the history
  4. Merge pull request hpcaitech#6061 from wangbluo/sp_fix

    [sp] : fix the attention kernel for sp
    wangbluo authored Sep 14, 2024
    Configuration menu
    Copy the full SHA
    37e3523 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2024

  1. fix

    wangbluo committed Sep 16, 2024
    Configuration menu
    Copy the full SHA
    10e4f7d View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2024

  1. Merge pull request hpcaitech#6064 from wangbluo/fix_attn

    [sp] : fix the attention kernel for sp
    wangbluo authored Sep 18, 2024
    Configuration menu
    Copy the full SHA
    63314ce View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4fa6b95 View commit details
    Browse the repository at this point in the history
  3. [ColossalEval] support for vllm (hpcaitech#6056)

    * support vllm
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * modify vllm and update readme
    
    * run pre-commit
    
    * remove dupilicated lines and refine code
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * update param name
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * refine code
    
    * update readme
    
    * refine code
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Camille7777 and pre-commit-ci[bot] authored Sep 18, 2024
    Configuration menu
    Copy the full SHA
    f9546ba View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2024

  1. Configuration menu
    Copy the full SHA
    dabc2e7 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2024

  1. Configuration menu
    Copy the full SHA
    b3b3278 View commit details
    Browse the repository at this point in the history
  2. [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; …

    …add test for zerobubble;
    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    da4595a View commit details
    Browse the repository at this point in the history
  3. [update] update text;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    e450dd2 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ccc37a4 View commit details
    Browse the repository at this point in the history
  5. [feat] fix poc format

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    228d71e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    8b0ffed View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    97f2443 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    c90bd98 View commit details
    Browse the repository at this point in the history
  9. [fix] fix optim bwd; add license for v_schedule; remove redundant att…

    …ributes; fix schedule loop "while"--> "for"; add communication dict;
    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    5df5965 View commit details
    Browse the repository at this point in the history
  10. [feat] update test; rm comments;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    94a12f6 View commit details
    Browse the repository at this point in the history
  11. [fix] rm zbv in hybridplugin

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    cc5e7dc View commit details
    Browse the repository at this point in the history
  12. [fix] fix optim bwd;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    ad8ad64 View commit details
    Browse the repository at this point in the history
  13. [fix] fix optim bwd;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    f347591 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    4249a36 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    497d545 View commit details
    Browse the repository at this point in the history
  16. [fix] fix mem check;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    0825700 View commit details
    Browse the repository at this point in the history
  17. [fix] fix mem assertation

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    ae4cf5b View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    e80179c View commit details
    Browse the repository at this point in the history
  19. [fix] fix model zoo import;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    2683d26 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    9094cc3 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    3e2f260 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    8ce22ae View commit details
    Browse the repository at this point in the history
  23. [fix] fix mem assert;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    f8d6f98 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    78a439b View commit details
    Browse the repository at this point in the history
  25. [fix] fix pipeline util func deallocate --> release_tensor_data; fix …

    …bwd_b loss bwd branch;
    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    8bc8bb0 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    a3a797d View commit details
    Browse the repository at this point in the history
  27. [fix] fix test_pipeline_utils ci;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    4d3eaee View commit details
    Browse the repository at this point in the history
  28. [plugin] hybrid support zero bubble pipeline (hpcaitech#6060)

    * hybrid support zbv
    
    * fix
    
    fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * Update zero_bubble_pp.py
    
    * fix
    
    * fix-ci
    
    * fix
    
    [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * [zerobubble]Support ZeroBubble Pipeline (hpcaitech#6034)
    
    * [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble;
    
    * [feat] add dw test;
    
    * [fix] fix weight not close;
    
    * [update] update text;
    
    * [feat] add test run_fwd_bwd automatic scheduling;
    
    * [feat] split communication and calculation; fix pop empty send_bwd_buffer error;
    
    * [feat] add test for p & p grad;
    
    * [feat] add comments for ZBV func;
    
    * [fix] rm useless assign and comments;
    
    * [fix] fix ci test; add pytest;
    
    * [feat] add run_fwd_bwd_with_microbatch  (replace input) & test; add p&p.grad assert close test & all pass;
    
    * [feat] add apply v_schedule graph; p & p.grad assert err exist;
    
    * [fix] update
    
    * [feat] fix ci; add assert;
    
    * [feat] fix poc format
    
    * [feat] fix func name & ci; add comments;
    
    * [fix] fix poc test; add comments in poc;
    
    * [feat] add optim backward_b_by_grad
    
    * [feat] fix optimizer bwd b & w; support return accum loss & output
    
    * [feat] add fwd_bwd_step, run_fwd_only;
    
    * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict;
    
    * [fix] fix communication_map;
    
    * [feat] update test; rm comments;
    
    * [fix] rm zbv in hybridplugin
    
    * [fix] fix optim bwd;
    
    * [fix] fix optim bwd;
    
    * [fix] rm output.data after send fwd;
    
    * [fix] fix bwd step if condition; remove useless comments and format info;
    
    * [fix] fix detach output & release output;
    
    * [fix] rm requir_grad for output;
    
    * [fix] fix requir grad position and detach position and input&output local buffer append position;
    
    * [feat] add memory assertation;
    
    * [fix] fix mem check;
    
    * [fix] mem assertation'
    
    * [fix] fix mem assertation
    
    * [fix] fix mem; use a new model shape; only assert mem less and equal than theo;
    
    * [fix] fix model zoo import;
    
    * [fix] fix redundant detach & clone; add buffer assertation in the end;
    
    * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap;
    
    * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim;
    
    * [fix] add testcase with microbatch 4;
    
    * hybrid support zbv
    
    * fix
    
    fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Update zero_bubble_pp.py
    
    * fix
    
    * fix-ci
    
    * fix
    
    [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: duanjunwen <[email protected]>
    3 people committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    2fd9d3e View commit details
    Browse the repository at this point in the history
  29. [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; …

    …add test for zerobubble;
    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    21c62b6 View commit details
    Browse the repository at this point in the history
  30. [update] update text;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    28ee5a7 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    d44e7e6 View commit details
    Browse the repository at this point in the history
  32. [feat] fix poc format

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    49d68eb View commit details
    Browse the repository at this point in the history
  33. Configuration menu
    Copy the full SHA
    0055c47 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    21bf510 View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    93ede6b View commit details
    Browse the repository at this point in the history
  36. [fix] fix optim bwd; add license for v_schedule; remove redundant att…

    …ributes; fix schedule loop "while"--> "for"; add communication dict;
    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    4ac0d6e View commit details
    Browse the repository at this point in the history
  37. [feat] update test; rm comments;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    262b27e View commit details
    Browse the repository at this point in the history
  38. [fix] fix optim bwd;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    fe99ca3 View commit details
    Browse the repository at this point in the history
  39. [fix] fix optim bwd;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    355a3af View commit details
    Browse the repository at this point in the history
  40. Configuration menu
    Copy the full SHA
    4420dc1 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    7ba031d View commit details
    Browse the repository at this point in the history
  42. [fix] fix mem check;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    e666f5c View commit details
    Browse the repository at this point in the history
  43. [fix] fix mem assertation

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    93b3604 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    78ed432 View commit details
    Browse the repository at this point in the history
  45. [fix] fix model zoo import;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    df12ae7 View commit details
    Browse the repository at this point in the history
  46. [fix] fix mem assert;

    duanjunwen authored and flybird11111 committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    9e90356 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    993f3db View commit details
    Browse the repository at this point in the history
  48. [plugin] hybrid support zero bubble pipeline (hpcaitech#6060)

    * hybrid support zbv
    
    * fix
    
    fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * Update zero_bubble_pp.py
    
    * fix
    
    * fix-ci
    
    * fix
    
    [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * [zerobubble]Support ZeroBubble Pipeline (hpcaitech#6034)
    
    * [feat] add zerobubble pp (just a frame now); add POC test for dx_dw; add test for zerobubble;
    
    * [feat] add dw test;
    
    * [fix] fix weight not close;
    
    * [update] update text;
    
    * [feat] add test run_fwd_bwd automatic scheduling;
    
    * [feat] split communication and calculation; fix pop empty send_bwd_buffer error;
    
    * [feat] add test for p & p grad;
    
    * [feat] add comments for ZBV func;
    
    * [fix] rm useless assign and comments;
    
    * [fix] fix ci test; add pytest;
    
    * [feat] add run_fwd_bwd_with_microbatch  (replace input) & test; add p&p.grad assert close test & all pass;
    
    * [feat] add apply v_schedule graph; p & p.grad assert err exist;
    
    * [fix] update
    
    * [feat] fix ci; add assert;
    
    * [feat] fix poc format
    
    * [feat] fix func name & ci; add comments;
    
    * [fix] fix poc test; add comments in poc;
    
    * [feat] add optim backward_b_by_grad
    
    * [feat] fix optimizer bwd b & w; support return accum loss & output
    
    * [feat] add fwd_bwd_step, run_fwd_only;
    
    * [fix] fix optim bwd; add license for v_schedule; remove redundant attributes; fix schedule loop "while"--> "for"; add communication dict;
    
    * [fix] fix communication_map;
    
    * [feat] update test; rm comments;
    
    * [fix] rm zbv in hybridplugin
    
    * [fix] fix optim bwd;
    
    * [fix] fix optim bwd;
    
    * [fix] rm output.data after send fwd;
    
    * [fix] fix bwd step if condition; remove useless comments and format info;
    
    * [fix] fix detach output & release output;
    
    * [fix] rm requir_grad for output;
    
    * [fix] fix requir grad position and detach position and input&output local buffer append position;
    
    * [feat] add memory assertation;
    
    * [fix] fix mem check;
    
    * [fix] mem assertation'
    
    * [fix] fix mem assertation
    
    * [fix] fix mem; use a new model shape; only assert mem less and equal than theo;
    
    * [fix] fix model zoo import;
    
    * [fix] fix redundant detach & clone; add buffer assertation in the end;
    
    * [fix] add output_obj_grad assert None at bwd b step; replace input_obj.require_grad_ with treemap;
    
    * [fix] update optim state dict assert (include param group & state); fix mem assert after add optim;
    
    * [fix] add testcase with microbatch 4;
    
    * hybrid support zbv
    
    * fix
    
    fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Update zero_bubble_pp.py
    
    * fix
    
    * fix-ci
    
    * fix
    
    [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: duanjunwen <[email protected]>
    3 people committed Sep 29, 2024
    Configuration menu
    Copy the full SHA
    0767948 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    3251e68 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    797d1ed View commit details
    Browse the repository at this point in the history