# Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.
2.6 KiB
ORT File Format
This directory contains the ORT file format schema and the generated C++ header file for the ORT file format.
The ORT file format schema uses the FlatBuffers serialization library.
Please do not directly modify the generated C++ header file or the generated Python binding files.
The flatbuffers compiler (flatc) is built as part of an ONNX Runtime build. It is located in the external/flatbuffers subdirectory of the build output directory.
e.g.
- Windows Debug build
- \build\Windows\Debug\external\flatbuffers\Debug\flatc.exe
- Linux Debug build
- /build/Linux/external/flatbuffers/Debug/flatc
It is possible to use another flatc as well, e.g., from a separate installation. Note that ONNX Runtime uses FlatBuffers 1.12.
To update the ORT file format schema and generated files:
-
Modify the ORT file format schema.
-
Run compile_schema.py to generate the C++ and Python bindings.
python onnxruntime/core/flatbuffers/schema/compile_schema.py --flatc <path to flatc>
ORT FB format version history
In ort_format_version.h, see IsOrtModelVersionSupported() for the supported versions and
kOrtModelVersion for the current version.
Version 1
History begins.
Initial support for FlatBuffers that includes Model support. Graph support including Attributes, Tensors, Tensor Sequences, Maps and Sequences. Constant initializers are also supported. Constant nodes are converted to constant initializers in the ORT format.
Version 2
Support for sparse initializers. Sparse intializers are stored within ORT FlatBuffers format, which includes sparse initializers converted from a Constant node attribute.
Version 3
Support for storing graph_doc_string field in Model (ORT FlatBuffers format).
Version 4
Update kernel def hashing to not depend on ordering of type constraint types (NOT BACKWARDS COMPATIBLE).
Version 5
Deprecate kernel def hashes and add KernelTypeStrResolver info to replace them (NOT BACKWARDS COMPATIBLE). The change to the ORT format itself is not backwards compatibility-breaking, but ORT does not provide backwards compatibility for processing older models with missing KernelTypeStrResolver info.
The motivation for this update is to support additional execution providers with statically registered kernels. The original approach of using kernel def hashes is not so extensible as it requires the execution provider providing hashes to be enabled at model conversion time.