onnxruntime/docs
pengwa d90afc697b
Introduce ZeROOffloadSubscriber for ORTModule (#17006)
### Introduce ZeROOffloadSubscriber for ORTModule

As part of the work: integrate ORTModule with DeepSpeed stage3, this PR
mainly focus on moving original PyTorch-based (leveraging hooks) param
partition/offload implementation to ORTModule compatible implementation.

Changes include:
1. Refactor `SubscriberBase`/`SubcriberManager` to support
pre-forward/post_forward hooks.
2. Implement new `ZeROOffloadSubscriber` by re-using DeepSpeed hook
function as much as possible. Since all hook functions are defined in
`DeepSpeedZeRoOffload._register_hooks_recursively` and
`DeepSpeedZeRoOffload.setup_zero_stage3_hooks`, and the good thing is,
the closure is not complex, all hooks are referencing the owning
`DeepSpeedZeRoOffload` instance, so we can create new hook function with
`FunctionType` by binding the owning `DeepSpeedZeRoOffload` instance,
then call the new created function in subscriber's
`pre_forward_module_apply_impl` and `post_forward_module_apply_impl`
interfaces.
3. Monkey patch `DeepSpeedZeRoOffload.setup_zero_stage3_hooks` to
register the `ZeROOffloadSubscriber` for the model, then we don't need
change any code on the DeepSpeed repo (at least so far).
4. Fix the ATen embedding custom symbolic exporter function by
tolerating weights size be (0) (changed by DeepSpeed zero stage 3).

UT will be added once stage3 is fully supported. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-25 00:15:22 +08:00
..
c_cxx Training Documentation (#15612) 2023-04-25 11:44:12 -07:00
execution_providers/images
images
python Openvino ep ort 5.1 (#17042) 2023-08-09 11:50:10 -07:00
ABI_Dev_Notes.md skip windows GPU check if changes only in doc (#13248) 2022-10-11 13:51:44 +08:00
Android_testing.md
C_API_Guidelines.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
cmake_guideline.md fix some typo in docs (#13212) 2022-10-07 15:58:18 -07:00
Coding_Conventions_and_Standards.md [docs] Specify Objective-C max line length. (#16503) 2023-06-28 16:58:23 -07:00
ContribOperators.md 4b quantization for weights of LLMs (#16833) 2023-08-07 12:23:55 -07:00
FAQ.md Fix typo enviroment => environment (#13195) 2022-10-03 17:02:26 -07:00
How_To_Update_ONNX_Dev_Notes.md Remove exclusions for ONNX model tests that now pass. (#14337) 2023-01-24 08:04:27 +10:00
Memory_Optimizer.md Add guidelines for ORTModule (#13553) 2022-11-04 19:42:10 +08:00
Model_Test.md
NotesOnThreading.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
ONNX_Runtime_Server_Usage.md
onnxruntime_dependencies.dot
onnxruntime_dependencies.png
onnxruntime_extensions.md Remove the extensions submodule (#17097) 2023-08-14 10:16:33 -07:00
OperatorKernels.md To support size opset 19 (#15689) 2023-08-11 14:48:53 -07:00
ORT_Format_Update_in_1.13.md Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413) 2023-01-25 08:23:12 -08:00
ORT_Use_Trtion_Kernel.md [ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196) 2023-07-11 13:55:30 +08:00
ORTMobilePackageOperatorTypeSupport.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
ORTModule_Convergence_Notes.md Introduce ZeROOffloadSubscriber for ORTModule (#17006) 2023-08-25 00:15:22 +08:00
ORTModule_ModuleWithLoss_Wrapper.md add steps to write modulewithloss wrapper (#16486) 2023-07-11 09:07:35 +08:00
ORTModule_Training_Guidelines.md Use full qualified name for PythonOp export (#17021) 2023-08-09 10:58:33 +08:00
PR_Guidelines.md
Privacy.md [C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md
Reduced_Operator_Kernel_build.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
ReleaseManagement.md
Roadmap.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Server.md
TVM_EP.md Fix: update hyperlinks to the Jupyter notebooks (#16145) 2023-08-21 09:53:05 -07:00
Versioning.md replace 'master' branch ref to 'main' for onnx repo (#12678) 2022-08-30 13:41:42 -07:00
WinML_principles.md Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00