onnxruntime/onnxruntime
zhijiang 8fadc6c913
Zhijxu/cleanup cached tensors when oom (#19306)
in pytorch, when oom happens at bp, user could decrease the batch size
and rerun it without restarting the process.

while in ORT, the intermediate tensors are kept even OOM, so decrease
batch size still fail.


this is torch run, we can see after oom failure, torch will release
tensor before next step

![image](https://github.com/microsoft/onnxruntime/assets/43435212/92b8a2e3-454b-448a-a223-17cb91d463c2)

this is from ort, we can see ort not release its tensors after OOM
failure.

![image](https://github.com/microsoft/onnxruntime/assets/43435212/bb6a3882-8e14-4f37-8079-e7f70fc2546b)

ort with the PR, we can see memory is released, **the 4GB memory is not
own by ort, and will be released by torch at the end**.

![image](https://github.com/microsoft/onnxruntime/assets/43435212/7f39d711-4e36-47d5-aecf-3805433a6d01)
2024-02-21 10:41:42 +08:00
..
contrib_ops [JS/WebGPU] Add MatMulNBits (#19446) 2024-02-17 09:19:17 -08:00
core Zhijxu/cleanup cached tensors when oom (#19306) 2024-02-21 10:41:42 +08:00
python add option DefaultTensorType to specify the default tensor type to quantize (#19455) 2024-02-20 08:22:44 -08:00
test add option DefaultTensorType to specify the default tensor type to quantize (#19455) 2024-02-20 08:22:44 -08:00
tool/etw
wasm [js/webgpu] Support capture and replay for jsep (#18989) 2024-01-30 18:28:03 -08:00
__init__.py [ORT 1.17.0 release] Bump up version to 1.18.0 (#19170) 2024-01-17 11:18:32 -08:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings