onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-12 00:59:23 +00:00

History

Yulong Wang d2a1b7a353 Introduce custom external data loader (#21634 ) ### Description This PR introduces support for custom external data loader. An EP can register a custom external data loader to override the default behavior, making it possible to upload initializers directly to GPU. ### Motivation and Context - In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type (`sizeof(size_t)==4`), which means there is a 4GB hard limit on the maximum memory. As the ONNX models get larger, this becomes a blocker for supporting medium-sized language models. - ORT runs out of memory because the current code always loads data into CPU memory, including the .onnx file (protobuf) and external data file(s). However, if using GPU EP, the big data does not need to be kept on CPU because the only thing that ORT does is to load the data into memory, upload to GPU and then release them. - Some platforms has offered developers way to upload data directly to GPU. For example, webgpu allows uploading from any ArrayBuffer (it can be a side buffer, not count into the 4GB) to GPU directly. This helps to keep the CPU memory usage significantly. ### Design Class `ExternalDataLoader` and `ExternalDataLoaderManager` are introduced. They are similar to `DataTransfer` and `DataTransferManager`. `InferenceSession` owns the manager object, and `SessionState` keeps a reference to it. Added a new method `GetExternalDataLoader` in `IExecutionProvider`. An EP can override the method to register an instance of custom external data loader. The key function in a `ExternalDataLoader` class is method `LoadTensor`: ```c++ // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan). virtual common::Status LoadTensor(const Env& env, const std::filesystem::path& data_file_path, FileOffsetType data_offset, SafeInt<size_t> data_length, Tensor& tensor) const; ``` This function can be registered by EP, going through a few layers and eventually get into `DeserializeTensorProto()` in the finalizing stage of session initialization. In this step, initializer tensors are created. Behavior is changed to first look up for a registered external data loader that can handle the current memory info. If any instance is available, use the loader; otherwise respect the old code path.		2024-08-27 12:18:52 -07:00
..
alloc_kind.h
allocator.h	Expose Reserve() in OrtAllocator to allow custom allocators to work when session.use_device_allocator_for_initializers is specified. (#19904 )	2024-03-28 12:28:37 -07:00
buffer_deleter.h	Multi-stream execution support (#13495 )	2022-12-15 07:39:29 -08:00
customregistry.h
data_types.h	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
data_types_internal.h	[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362 )	2024-05-30 18:56:24 -07:00
endian.h
execution_provider.h	Introduce custom external data loader (#21634 )	2024-08-27 12:18:52 -07:00
float8.h	implement isinf20 and isnan20 (#17874 )	2023-10-24 10:58:54 -07:00
float16.h	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 )	2023-07-14 10:46:52 -07:00
framework_common.h
framework_provider_common.h	Add TRT plugins support using custom ops (#13847 )	2023-04-18 20:24:32 -07:00
func_api.h	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
int4.h	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
kernel_def_builder.h	Release backward inputs per static graph ref count (#20804 )	2024-06-14 14:33:01 +08:00
kernel_registry.h	Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776 )	2023-05-03 13:08:35 -07:00
op_kernel.h	Fix typos - 1st Wave (#21278 )	2024-07-11 13:35:08 +08:00
op_kernel_context.h	ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice (#15618 )	2023-05-01 10:06:00 -07:00
op_kernel_info.h	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
op_node_proto_helper.h	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
ort_value.h	Two fixes involving minimal builds (#17000 )	2023-08-23 16:01:22 +10:00
ortdevice.h	ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice (#15618 )	2023-05-01 10:06:00 -07:00
ortmemoryinfo.h	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
provider_options.h
provider_options_utils.h
provider_shutdown.h
run_options.h	Enable CUDA EP unit testing on Windows (#20039 )	2024-03-27 13:32:36 -07:00
sparse_tensor.h	Run clang-format in CI (#15524 )	2023-04-18 09:26:58 -07:00
stream_handles.h	Update ruff and clang-format versions (#21479 )	2024-07-24 11:50:11 -07:00
tensor.h	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
tensor_shape.h	Remove core/common/gsl.h (#20894 )	2024-07-08 18:09:39 -07:00
to_tensor_proto_element_type.h	[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362 )	2024-05-30 18:56:24 -07:00