onnxruntime/docs/ORT_Format_Update_in_1.13.md

28 lines
1.3 KiB
Markdown
Raw Normal View History

Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.
2022-09-20 21:24:59 +00:00
# ORT Format Update in 1.13
In ONNX Runtime 1.13, there was a breaking change to the
ORT 1.14.0 release -- cherry pick round1 (#14456) ### Description <!-- Describe your changes. --> First round cherry pick, total `19` PRs, as below. Please check here for [Here](https://github.com/microsoft/onnxruntime/issues?q=label%3Arelease%3A1.14+sort%3Aupdated-asc) for the total list. <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/ruiren/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/ruiren/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} tr {mso-height-source:auto;} col {mso-width-source:auto;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:Calibri, sans-serif; mso-font-charset:0; mso-number-format:General; text-align:general; vertical-align:bottom; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} .xl65 {text-align:center;} .xl66 {color:windowtext; text-align:center; border:.5pt solid windowtext; background:#E2EFDA; mso-pattern:black none;} .xl67 {color:windowtext; text-align:center; border:.5pt solid windowtext;} .xl68 {color:windowtext; text-align:center; border:.5pt solid windowtext; background:white; mso-pattern:black none;} --> </style> </head> <body link="#0563C1" vlink="#954F72"> Date | PR | # | Commit # | Short # -- | -- | -- | -- | -- 0 | fix headers for training apis | 14350 | ea7bbd667d14332a9c8f1c4f6e832a1663296773 | ea7bbd6 1 | Fix post merge jobs pipeline build issues | 14346 | ae0e090c7b93dae4e71e3fa030e4757b18786da3 | ae0e090 2 | support ScatterND(18) and ScatterElement(18) | 14224 | 5d6a04914112ab6347b62dc1af57fb9d05381f2e | 5d6a049 3 | Exclude a multi-stream case from reduced ops build | 14351 | 36ba3d8d2176df3c17cbfe875f01cdedbfa1343e | 36ba3d8 4 | Support muP in Attention | 14348 | 668586e8f800411ad6088b7a5ec11563b25bacac | 668586e 5 | Add memory efficient attention from CUTLASS | 14343 | 414b012f42e3f9a343f827c363b30b94011dc3f7 | 414b012 6 | Add PyTorch 2.0 to ORT transformer benchmarking | 14300 | 72821a61132c282c058e643e227cbf19504d0048 | 72821a6 7 | Misc transformer fixes - 3 | 14320 | 2d8ee5251cd7037d3d94cd22ada77e10ba95f192 | 2d8ee52 8 | Update quantization_defs.cc | 14380 | de7a868d5f3390d7c095a53c26abd39f402f3f93 | de7a868 9 | Revert "Allow PostAnalysis@2 task to continue on error for Windows_Pa… | 14375 | cf3661ff6d19157d795b0b0bd58a3ea04174ca36 | cf3661f 10 | Fix fuzz test | 14385 | f03c507cf05a1387738bc0d2f75d220511e22758 | f03c507 11 | support Pad(18) | 14219 | 05915d8393dd3bd0378731ab9ed7c47186554a4e | 05915d8 12 | Ort openvino 4.3 cli | 14341 | 77b455b969db15d911cd6de5009f3a30fb42c531 | 77b455b 13 | cpu to support bitwise ops | 14197 | 7b6d880b28db5e52ddaecb0f09533027daab14f2 | 7b6d880 14 | Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. | 14413 | 3bc092b1eaedf30be9cef5388f1a2f026ec6800c | 3bc092b 15 | Upgrade CUTLASS to v2.11 and add sequence length threshold for cutlass FMHA | 14401 | 94b17919745b3c8a86000554871a0a9a3dee5fbf | 94b1791 16 | Add Col2Im CPU op | 12311 | 32c05fcdd12636ef06e2ca3875704183e3da293a | 32c05fc 17 | [DML EP] Upgrade DML to 1.10.1 | 14433 | edb377f2cbc96bdd1753c09e9575226337338c7b | edb377f 18 | cpu support of LpPool(18) | 14205 | 2b1a59f01abd38d5fd40d75b3f41547791980fbc | 2b1a59f </body> </html> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> First round cherry-pick for ORT 1.14.0 release. --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: liqun Fu <liqfu@microsoft.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Preetha <preetha.veeramalai@intel.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com>
2023-01-31 22:35:34 +00:00
[ORT format](https://onnxruntime.ai/docs/reference/ort-format-models.html) (version 5) in order to enable additional
execution providers with statically registered kernels in a minimal build.
Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.
2022-09-20 21:24:59 +00:00
More details can be found [here](../onnxruntime/core/flatbuffers/schema/README.md#version-5).
ORT 1.14.0 release -- cherry pick round1 (#14456) ### Description <!-- Describe your changes. --> First round cherry pick, total `19` PRs, as below. Please check here for [Here](https://github.com/microsoft/onnxruntime/issues?q=label%3Arelease%3A1.14+sort%3Aupdated-asc) for the total list. <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/ruiren/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/ruiren/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} tr {mso-height-source:auto;} col {mso-width-source:auto;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:Calibri, sans-serif; mso-font-charset:0; mso-number-format:General; text-align:general; vertical-align:bottom; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} .xl65 {text-align:center;} .xl66 {color:windowtext; text-align:center; border:.5pt solid windowtext; background:#E2EFDA; mso-pattern:black none;} .xl67 {color:windowtext; text-align:center; border:.5pt solid windowtext;} .xl68 {color:windowtext; text-align:center; border:.5pt solid windowtext; background:white; mso-pattern:black none;} --> </style> </head> <body link="#0563C1" vlink="#954F72"> Date | PR | # | Commit # | Short # -- | -- | -- | -- | -- 0 | fix headers for training apis | 14350 | ea7bbd667d14332a9c8f1c4f6e832a1663296773 | ea7bbd6 1 | Fix post merge jobs pipeline build issues | 14346 | ae0e090c7b93dae4e71e3fa030e4757b18786da3 | ae0e090 2 | support ScatterND(18) and ScatterElement(18) | 14224 | 5d6a04914112ab6347b62dc1af57fb9d05381f2e | 5d6a049 3 | Exclude a multi-stream case from reduced ops build | 14351 | 36ba3d8d2176df3c17cbfe875f01cdedbfa1343e | 36ba3d8 4 | Support muP in Attention | 14348 | 668586e8f800411ad6088b7a5ec11563b25bacac | 668586e 5 | Add memory efficient attention from CUTLASS | 14343 | 414b012f42e3f9a343f827c363b30b94011dc3f7 | 414b012 6 | Add PyTorch 2.0 to ORT transformer benchmarking | 14300 | 72821a61132c282c058e643e227cbf19504d0048 | 72821a6 7 | Misc transformer fixes - 3 | 14320 | 2d8ee5251cd7037d3d94cd22ada77e10ba95f192 | 2d8ee52 8 | Update quantization_defs.cc | 14380 | de7a868d5f3390d7c095a53c26abd39f402f3f93 | de7a868 9 | Revert "Allow PostAnalysis@2 task to continue on error for Windows_Pa… | 14375 | cf3661ff6d19157d795b0b0bd58a3ea04174ca36 | cf3661f 10 | Fix fuzz test | 14385 | f03c507cf05a1387738bc0d2f75d220511e22758 | f03c507 11 | support Pad(18) | 14219 | 05915d8393dd3bd0378731ab9ed7c47186554a4e | 05915d8 12 | Ort openvino 4.3 cli | 14341 | 77b455b969db15d911cd6de5009f3a30fb42c531 | 77b455b 13 | cpu to support bitwise ops | 14197 | 7b6d880b28db5e52ddaecb0f09533027daab14f2 | 7b6d880 14 | Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. | 14413 | 3bc092b1eaedf30be9cef5388f1a2f026ec6800c | 3bc092b 15 | Upgrade CUTLASS to v2.11 and add sequence length threshold for cutlass FMHA | 14401 | 94b17919745b3c8a86000554871a0a9a3dee5fbf | 94b1791 16 | Add Col2Im CPU op | 12311 | 32c05fcdd12636ef06e2ca3875704183e3da293a | 32c05fc 17 | [DML EP] Upgrade DML to 1.10.1 | 14433 | edb377f2cbc96bdd1753c09e9575226337338c7b | edb377f 18 | cpu support of LpPool(18) | 14205 | 2b1a59f01abd38d5fd40d75b3f41547791980fbc | 2b1a59f </body> </html> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> First round cherry-pick for ORT 1.14.0 release. --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: liqun Fu <liqfu@microsoft.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Preetha <preetha.veeramalai@intel.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com>
2023-01-31 22:35:34 +00:00
## Backwards Compatibility
### ONNX Runtime 1.13
Any older models (prior to ORT format version 5) will no longer work with ONNX Runtime 1.13 and must be re-converted.
### ONNX Runtime 1.14+
ONNX Runtime 1.14+ provides limited backwards compatibility for loading older models (prior to ORT format version 5).
- In a full build, older models may be loaded but any saved runtime optimizations will be ignored.
- In a minimal build, older models cannot be loaded.
An older model may be re-converted.
It is also possible to load an older ORT format model in a full build and then save it back out as an ORT format model.
This process may be used to upgrade an ORT format model. However, any saved runtime optimizations from the older model
will be ignored.
## Re-converting an ORT format model
Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.
2022-09-20 21:24:59 +00:00
Please refer
[here](https://onnxruntime.ai/docs/reference/ort-format-models.html#convert-onnx-models-to-ort-format) for instructions
on how to convert an ONNX model to ORT format.