feat(fp8): [Work In Progress] enable FP8 training #1183
Triggered via pull request
November 6, 2024 10:31
Status
Failure
Total duration
2h 6m 49s
Artifacts
–
e2e_test.yaml
on: pull_request
training_4GPU
1m 31s
training_8GPU_ISP
1m 24s
training_8GPU_ISP_CKPT
20m 15s
training_8GPU_4DP2PP_ZB
50s
Matrix: training_16GPU_4DP2TP2PP_FSP
Matrix: training_16GPU_4DP2TP2PP_MSP
Matrix: training_16GPU_4DP2TP2PP_MTP
Matrix: training_8GPU_4DP2PP
Matrix: training_8GPU_4DP2TP
Matrix: training_8GPU_4DP2TPSP
Matrix: training_internlm2
Matrix: training_llama2
Annotations
25 errors and 15 warnings
training_8GPU_4DP2TP (910B)
Process completed with exit code 1.
|
training_8GPU_4DP2TPSP (910B)
unable to access 'https://github.com/InternLM/InternEvo/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
|
training_8GPU_4DP2TPSP (910B)
unable to access 'https://github.com/InternLM/InternEvo/': Failed to connect to github.com port 443: Connection timed out
|
training_8GPU_4DP2TPSP (910B)
RPC failed; curl 28 Failed to connect to github.com port 443: Connection timed out
|
training_8GPU_4DP2TPSP (910B)
expected 'acknowledgments'
|
training_8GPU_4DP2TPSP (910B)
The process '/usr/local/bin/git' failed with exit code 128
|
training_4GPU
Process completed with exit code 1.
|
training_16GPU_4DP2TP2PP_FSP (910B)
RPC failed; curl 28 Failed to connect to github.com port 443: Connection timed out
|
training_16GPU_4DP2TP2PP_FSP (910B)
expected 'acknowledgments'
|
training_16GPU_4DP2TP2PP_FSP (910B)
Process completed with exit code 1.
|
training_16GPU_4DP2TP2PP_MSP (910B)
Process completed with exit code 1.
|
training_16GPU_4DP2TP2PP_MTP (910B)
unable to access 'https://github.com/InternLM/InternEvo/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
|
training_16GPU_4DP2TP2PP_MTP (910B)
RPC failed; curl 56 GnuTLS recv error (-110): The TLS connection was non-properly terminated.
|
training_16GPU_4DP2TP2PP_MTP (910B)
expected 'acknowledgments'
|
training_16GPU_4DP2TP2PP_MTP (910B)
unable to access 'https://github.com/InternLM/InternEvo/': Failed to connect to github.com port 443: Connection timed out
|
training_16GPU_4DP2TP2PP_MTP (910B)
The process '/usr/local/bin/git' failed with exit code 128
|
training_8GPU_4DP2PP (910B)
Process completed with exit code 1.
|
training_8GPU_4DP2PP (910B)
unable to access 'https://github.com/InternLM/InternEvo/': Failed to connect to github.com port 443: Connection timed out
|
training_internlm2 (910B)
Process completed with exit code 1.
|
training_llama2 (910B)
Process completed with exit code 1.
|
training_llama2 (910B)
unable to access 'https://github.com/InternLM/InternEvo/': GnuTLS recv error (-110): The TLS connection was non-properly terminated.
|
training_8GPU_ISP
Process completed with exit code 143.
|
training_8GPU_ISP_CKPT
The job running on runner evo_t_cluster_two has exceeded the maximum execution time of 20 minutes.
|
training_8GPU_ISP_CKPT
The operation was canceled.
|
training_8GPU_4DP2PP_ZB
Process completed with exit code 143.
|
training_8GPU_4DP2TP (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_8GPU_4DP2TPSP (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_4GPU
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_4GPU
The Actions runner will no longer support your OS version on November 1, 2024. Please upgrade to a supported version. For information, refer https://github.blog/changelog/2024-08-19-notice-of-upcoming-deprecations-and-breaking-changes-in-github-actions-runners/
|
training_16GPU_4DP2TP2PP_FSP (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_16GPU_4DP2TP2PP_MSP (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_16GPU_4DP2TP2PP_MTP (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_8GPU_4DP2PP (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_internlm2 (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_llama2 (910B)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_8GPU_ISP
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_8GPU_ISP
The Actions runner will no longer support your OS version on November 1, 2024. Please upgrade to a supported version. For information, refer https://github.blog/changelog/2024-08-19-notice-of-upcoming-deprecations-and-breaking-changes-in-github-actions-runners/
|
training_8GPU_ISP_CKPT
The Actions runner will no longer support your OS version on November 1, 2024. Please upgrade to a supported version. For information, refer https://github.blog/changelog/2024-08-19-notice-of-upcoming-deprecations-and-breaking-changes-in-github-actions-runners/
|
training_8GPU_4DP2PP_ZB
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
|
training_8GPU_4DP2PP_ZB
The Actions runner will no longer support your OS version on November 1, 2024. Please upgrade to a supported version. For information, refer https://github.blog/changelog/2024-08-19-notice-of-upcoming-deprecations-and-breaking-changes-in-github-actions-runners/
|