onnxruntime/orttraining/orttraining/python/training
Adam Louly ee74fb6908
Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287)
### Description
Introducing a new class ORTPipelineModule to handle wrapping layers in
DeepSpeed pipeline parallel.


### Motivation and Context
To support pipeline parallelism on ORTModule.

This PR will include an initial support of deepspeed Pipeline
parallelism.

- [x] Support Pipeline parallel where layers are nn Modules in
Sequential.
- [ ] Support LayerSpec and TiedLayerSpec
- [ ] Enable partitioning to accept List
- [ ] Full-GPU Graph Consolidation
- [ ] Subgraph Merging for Inference
2024-04-18 11:30:15 -07:00
..
amp [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
api Introduce a Nominal Checkpoint for On-Device Training (#19232) 2024-01-30 22:11:25 -08:00
experimental Manage ORTModule configurations consistently (#16396) 2023-06-27 19:19:36 +08:00
onnxblock Introduce a Nominal Checkpoint for On-Device Training (#19232) 2024-01-30 22:11:25 -08:00
optim Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
ort_triton Support BFloat16 for Triton Codegen (#20353) 2024-04-18 17:15:11 +08:00
ortmodule Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287) 2024-04-18 11:30:15 -07:00
utils Fix and enable few ORTModule Unit Tests (#19847) 2024-03-12 10:49:19 +08:00
__init__.py Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
_utils.py Removed all the deprecated python training code and related tests and utils (#18333) 2023-11-17 18:19:21 -08:00
artifacts.py Add support for SGD optimizer in minimal build (#19901) 2024-03-14 11:31:20 -07:00