onnxruntime/include
Yuhong Guo 04a8f50674
New configuration to limit the arena extension (#15983)
Add a configuration `max_power_of_two_extend_bytes ` to limit the arena extension size.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
In our real scenario, we observe that if the model is big enough the
BfcArena will extend uncontrollable.
As showed by the following figures, if a model uses more than 16GB
memory, the BfcArena will totally apply for 32GB memory according to the
`kNextPowerOfTwo` strategy. With the new strategy, the extension is
limited. The default maximum extension size is 1GB.

#### Without the new configuration
After loading the model, ORT uses 32G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/42b93c66-b957-4f20-a13b-d34cb390afff)

#### With the new configuration
After loading the model, ORT uses 23G GPU memory.

![image](https://github.com/microsoft/onnxruntime/assets/19584326/5abffeff-9ca3-4187-a262-37fd2764fe1b)

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-05-25 02:19:07 -07:00
..
onnxruntime/core New configuration to limit the arena extension (#15983) 2023-05-25 02:19:07 -07:00