onnxruntime/include/onnxruntime/core/framework
Yulong Wang d2a1b7a353
Introduce custom external data loader (#21634)
### Description

This PR introduces support for custom external data loader. An EP can
register a custom external data loader to override the default behavior,
making it possible to upload initializers directly to GPU.



### Motivation and Context

- In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type
(`sizeof(size_t)==4`), which means there is a 4GB hard limit on the
maximum memory. As the ONNX models get larger, this becomes a blocker
for supporting medium-sized language models.

- ORT runs out of memory because the current code always loads data into
CPU memory, including the .onnx file (protobuf) and external data
file(s). However, if using GPU EP, the big data does not need to be kept
on CPU because the only thing that ORT does is to load the data into
memory, upload to GPU and then release them.

- Some platforms has offered developers way to upload data directly to
GPU. For example, webgpu allows uploading from any ArrayBuffer (it can
be a side buffer, not count into the 4GB) to GPU directly. This helps to
keep the CPU memory usage significantly.

### Design

Class `ExternalDataLoader` and `ExternalDataLoaderManager` are
introduced. They are similar to `DataTransfer` and
`DataTransferManager`. `InferenceSession` owns the manager object, and
`SessionState` keeps a reference to it.

Added a new method `GetExternalDataLoader` in `IExecutionProvider`. An
EP can override the method to register an instance of custom external
data loader.

The key function in a `ExternalDataLoader` class is method `LoadTensor`:

```c++
  // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan).
  virtual common::Status LoadTensor(const Env& env,
                                    const std::filesystem::path& data_file_path,
                                    FileOffsetType data_offset,
                                    SafeInt<size_t> data_length,
                                    Tensor& tensor) const;
```

This function can be registered by EP, going through a few layers and
eventually get into `DeserializeTensorProto()` in the finalizing stage
of session initialization. In this step, initializer tensors are
created. Behavior is changed to first look up for a registered external
data loader that can handle the current memory info. If any instance is
available, use the loader; otherwise respect the old code path.
2024-08-27 12:18:52 -07:00
..
alloc_kind.h
allocator.h Expose Reserve() in OrtAllocator to allow custom allocators to work when session.use_device_allocator_for_initializers is specified. (#19904) 2024-03-28 12:28:37 -07:00
buffer_deleter.h Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
customregistry.h
data_types.h Remove core/common/gsl.h (#20894) 2024-07-08 18:09:39 -07:00
data_types_internal.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00
endian.h
execution_provider.h Introduce custom external data loader (#21634) 2024-08-27 12:18:52 -07:00
float8.h implement isinf20 and isnan20 (#17874) 2023-10-24 10:58:54 -07:00
float16.h [C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506) 2023-07-14 10:46:52 -07:00
framework_common.h
framework_provider_common.h Add TRT plugins support using custom ops (#13847) 2023-04-18 20:24:32 -07:00
func_api.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
int4.h Remove core/common/gsl.h (#20894) 2024-07-08 18:09:39 -07:00
kernel_def_builder.h Release backward inputs per static graph ref count (#20804) 2024-06-14 14:33:01 +08:00
kernel_registry.h Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776) 2023-05-03 13:08:35 -07:00
op_kernel.h Fix typos - 1st Wave (#21278) 2024-07-11 13:35:08 +08:00
op_kernel_context.h ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice (#15618) 2023-05-01 10:06:00 -07:00
op_kernel_info.h Remove core/common/gsl.h (#20894) 2024-07-08 18:09:39 -07:00
op_node_proto_helper.h Remove core/common/gsl.h (#20894) 2024-07-08 18:09:39 -07:00
ort_value.h Two fixes involving minimal builds (#17000) 2023-08-23 16:01:22 +10:00
ortdevice.h ExecutionProvider API refactor - replace OrtMemoryInfo with OrtDevice (#15618) 2023-05-01 10:06:00 -07:00
ortmemoryinfo.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
provider_options.h
provider_options_utils.h
provider_shutdown.h
run_options.h Enable CUDA EP unit testing on Windows (#20039) 2024-03-27 13:32:36 -07:00
sparse_tensor.h Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
stream_handles.h Update ruff and clang-format versions (#21479) 2024-07-24 11:50:11 -07:00
tensor.h Remove core/common/gsl.h (#20894) 2024-07-08 18:09:39 -07:00
tensor_shape.h Remove core/common/gsl.h (#20894) 2024-07-08 18:09:39 -07:00
to_tensor_proto_element_type.h [CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362) 2024-05-30 18:56:24 -07:00