mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-21 21:52:11 +00:00
Add a configuration `max_power_of_two_extend_bytes ` to limit the arena extension size. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In our real scenario, we observe that if the model is big enough the BfcArena will extend uncontrollable. As showed by the following figures, if a model uses more than 16GB memory, the BfcArena will totally apply for 32GB memory according to the `kNextPowerOfTwo` strategy. With the new strategy, the extension is limited. The default maximum extension size is 1GB. #### Without the new configuration After loading the model, ORT uses 32G GPU memory.  #### With the new configuration After loading the model, ORT uses 23G GPU memory.  Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com> |
||
|---|---|---|
| .. | ||
| alloc_kind.h | ||
| allocator.h | ||
| buffer_deleter.h | ||
| customregistry.h | ||
| data_types.h | ||
| data_types_internal.h | ||
| endian.h | ||
| execution_provider.h | ||
| float16.h | ||
| framework_common.h | ||
| framework_provider_common.h | ||
| func_api.h | ||
| kernel_def_builder.h | ||
| kernel_registry.h | ||
| op_kernel.h | ||
| op_kernel_context.h | ||
| op_kernel_info.h | ||
| op_node_proto_helper.h | ||
| ort_value.h | ||
| ortdevice.h | ||
| ortmemoryinfo.h | ||
| provider_options.h | ||
| provider_options_utils.h | ||
| provider_shutdown.h | ||
| run_options.h | ||
| sparse_tensor.h | ||
| stream_handles.h | ||
| tensor.h | ||
| tensor_shape.h | ||
| to_tensor_proto_element_type.h | ||