onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-02 23:39:58 +00:00

History

pengwa 5eda79bdd3 Improve perf for stage3 training (#18099 ) ### Improve perf for stage3 training - first wave Port existing PythonOp/PythonOpGrad python runner to C++, also introduce an unsafe run mode (to skip inplace, save for backward, materrialized grad detection on the fly). This reduce the overhead from XX~XXX us to X ~ lower end of XX us . In LLAMA2 7B training with 8x32GV100, we have observed 6.7% gains over PyTorch. (1.59 v.s. 1.49it/s) Peak memory also dropped from 31GB to 28GB. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->		2023-12-15 13:32:19 +08:00
..
data	Bump ruff in CI (#15533 )	2023-04-17 10:11:44 -07:00
hooks	Improve perf for stage3 training (#18099 )	2023-12-15 13:32:19 +08:00
__init__.py	Improve perf for stage3 training (#18099 )	2023-12-15 13:32:19 +08:00
ptable.py	Allow layer-wise recompute (#18566 )	2023-12-12 08:44:05 +08:00
torch_io_helper.py	Improve perf for stage3 training (#18099 )	2023-12-15 13:32:19 +08:00
torch_profile_utils.py	Improve perf for stage3 training (#18099 )	2023-12-15 13:32:19 +08:00
torch_type_map.py	Optimize 4bit Qlora training (#18131 )	2023-11-02 09:46:11 -07:00