onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-04 23:59:56 +00:00

History

zhijiang 8fadc6c913 Zhijxu/cleanup cached tensors when oom (#19306 ) in pytorch, when oom happens at bp, user could decrease the batch size and rerun it without restarting the process. while in ORT, the intermediate tensors are kept even OOM, so decrease batch size still fail. this is torch run, we can see after oom failure, torch will release tensor before next step ![image](https://github.com/microsoft/onnxruntime/assets/43435212/92b8a2e3-454b-448a-a223-17cb91d463c2) this is from ort, we can see ort not release its tensors after OOM failure. ![image](https://github.com/microsoft/onnxruntime/assets/43435212/bb6a3882-8e14-4f37-8079-e7f70fc2546b) ort with the PR, we can see memory is released, the 4GB memory is not own by ort, and will be released by torch at the end. ![image](https://github.com/microsoft/onnxruntime/assets/43435212/7f39d711-4e36-47d5-aecf-3805433a6d01)		2024-02-21 10:41:42 +08:00
..
amp	[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789 )	2023-07-21 12:53:41 -07:00
api	Introduce a Nominal Checkpoint for On-Device Training (#19232 )	2024-01-30 22:11:25 -08:00
experimental	Manage ORTModule configurations consistently (#16396 )	2023-06-27 19:19:36 +08:00
onnxblock	Introduce a Nominal Checkpoint for On-Device Training (#19232 )	2024-01-30 22:11:25 -08:00
optim	FP16 optimizer automatically detect DeepSpeed compatibility (#18084 )	2023-10-25 15:11:02 +08:00
ort_triton	Bump ruff linter to 0.2.1 (#19471 )	2024-02-08 16:08:27 -08:00
ortmodule	Zhijxu/cleanup cached tensors when oom (#19306 )	2024-02-21 10:41:42 +08:00
utils	ORTModule memory improvement (#18924 )	2024-01-16 08:57:37 +08:00
__init__.py	Removed all the deprecated python training code and related tests and utils (#18333 )	2023-11-17 18:19:21 -08:00
_utils.py	Removed all the deprecated python training code and related tests and utils (#18333 )	2023-11-17 18:19:21 -08:00
artifacts.py	Introduce a Nominal Checkpoint for On-Device Training (#19232 )	2024-01-30 22:11:25 -08:00