### Description update with ONNX 1.16.0 branch according to https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md ONNX 1.16.0 release notes: https://github.com/onnx/onnx/releases/tag/v1.16.0 #### Updated ops for CPU EP: - DequantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block dequantization support - QuantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block quantization support - Cast(21) - Missing int4 and uint4 support - CastLike(21) - Missing int4 and uint4 support - ConstantOfShape(21) - Missing int4 and uint4 support - Identity(21) - Missing int4 and uint4 support - If(21) - Missing int4 and uint4 support - Loop(21) - Missing int4 and uint4 support - Reshape(21) - Missing int4 and uint4 support - Scan(21) - Missing int4 and uint4 support - Shape(21) - Missing int4 and uint4 support - Size(21) - Missing int4 and uint4 support - Flatten(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Pad(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Squeeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Transpose(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Unsqueeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support #### Unimplemented opset 21 features/ops - int4 and uint4 data type - QLinearMatMul(21) - GroupNormalization(21) - ai.onnx.ml.TreeEnsemble(5) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Disabled tests #### ORT Training orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py - test_ort_custom_ops: Potential shape inference bug for custom ops #### Python quantization unit tests test/onnx/python/quantization (shape inference bug) - test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16 - test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16 - test_op_gemm.py: test_quantize_qop_gemm_s8s8 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3 - test_op_matmul.py: test_quantize_matmul_u8u8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy - test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile - test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution - test_op_relu.py: test_quantize_qop_relu_s8s8 #### ONNX tests - test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a maxpool output size bug and added this test. Enable this test when [ORT PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged. Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741). - test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op ai.onnx.ml.TreeEnsemble - test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same - test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same - test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same - test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4 yet - test_cast_INT4_to_INT8_cpu: same - test_cast_UINT4_to_FLOAT_cpu: same - test_cast_UINT4_to_UINT8_cpu: same - test_cast_INT4_to_FLOAT_cuda - test_cast_INT4_to_INT8_cuda - test_cast_UINT4_to_FLOAT_cuda - test_cast_UINT4_to_UINT8_cuda - test_constantofshape_float_ones_cuda: ConstantOfShape(21) not implemented for cuda - test_constantofshape_int_shape_zero_cuda: same - test_constantofshape_int_zeros_cuda: same - test_flatten_axis0_cuda: Flatten(21) not implemented for cuda - test_flatten_axis1_cuda: same - test_flatten_axis2_cuda: same - test_flatten_axis3_cuda: same - test_flatten_default_axis_cuda: same - test_flatten_negative_axis1_cuda: same - test_flatten_negative_axis2_cuda: same - test_flatten_negative_axis3_cuda: same - test_flatten_negative_axis4_cuda: same - test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not implemented in ORT yet - test_qlinearmatmul_2D_int8_float32_cpu: same - test_qlinearmatmul_2D_uint8_float16_cpu: same - test_qlinearmatmul_2D_uint8_float32_cpu: same - test_qlinearmatmul_3D_int8_float16_cpu: same - test_qlinearmatmul_3D_int8_float32_cpu: same - test_qlinearmatmul_3D_uint8_float16_cpu: same - test_qlinearmatmul_3D_uint8_float32_cpu: same - test_qlinearmatmul_2D_int8_float16_cuda: same - test_qlinearmatmul_2D_int8_float32_cuda: same - test_qlinearmatmul_2D_uint8_float16_cuda: same - test_qlinearmatmul_2D_uint8_float32_cuda: same - test_qlinearmatmul_3D_int8_float16_cuda: same - test_qlinearmatmul_3D_int8_float32_cuda: same - test_qlinearmatmul_3D_uint8_float16_cuda: same - test_qlinearmatmul_3D_uint8_float32_cuda: same - test_size_cuda: Size(21) not implemented for cuda - test_size_example_cuda: same - test_dequantizelinear_blocked: Missing implementation for block dequant for DequantizeLinear(21) - test_quantizelinear_blocked_asymmetric: Missing implementation for block quant for QuantizeLinear(21) - test_quantizelinear_blocked_symmetric: Missing implementation for block quant for QuantizeLinear(21) --------- Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com> |
||
|---|---|---|
| .. | ||
| docs | ||
| lib | ||
| script | ||
| test | ||
| .gitignore | ||
| .npmignore | ||
| karma.conf.js | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
| types.d.ts | ||
ONNX Runtime Web
ONNX Runtime Web is a Javascript library for running ONNX models on browsers and on Node.js.
ONNX Runtime Web has adopted WebAssembly and WebGL technologies for providing an optimized ONNX model inference runtime for both CPUs and GPUs.
Why ONNX models
The Open Neural Network Exchange (ONNX) is an open standard for representing machine learning models. The biggest advantage of ONNX is that it allows interoperability across different open source AI frameworks, which itself offers more flexibility for AI frameworks adoption.
Why ONNX Runtime Web
With ONNX Runtime Web, web developers can score models directly on browsers with various benefits including reducing server-client communication and protecting user privacy, as well as offering install-free and cross-platform in-browser ML experience.
ONNX Runtime Web can run on both CPU and GPU. On CPU side, WebAssembly is adopted to execute the model at near-native speed. ONNX Runtime Web compiles the native ONNX Runtime CPU engine into WebAssembly backend by using Emscripten, so it supports most functionalities native ONNX Runtime offers, including full ONNX operator coverage, multi-threading, ONNX Runtime Quantization as well as ONNX Runtime Mobile. For performance acceleration with GPUs, ONNX Runtime Web leverages WebGL, a popular standard for accessing GPU capabilities. We are keeping improving op coverage and optimizing performance in WebGL backend.
See Compatibility and Operators Supported for a list of platforms and operators ONNX Runtime Web currently supports.
Usage
Refer to ONNX Runtime JavaScript examples for samples and tutorials.
Documents
Development
Refer to the following links for development information:
Compatibility
| OS/Browser | Chrome | Edge | Safari | Electron | Node.js |
|---|---|---|---|---|---|
| Windows 10 | wasm, webgl | wasm, webgl | - | wasm, webgl | wasm |
| macOS | wasm, webgl | wasm, webgl | wasm, webgl | wasm, webgl | wasm |
| Ubuntu LTS 18.04 | wasm, webgl | wasm, webgl | - | wasm, webgl | wasm |
| iOS | wasm, webgl | wasm, webgl | wasm, webgl | - | - |
| Android | wasm, webgl | wasm, webgl | - | - | - |
Operators
WebAssembly backend
ONNX Runtime Web currently support all operators in ai.onnx and ai.onnx.ml.
WebGL backend
ONNX Runtime Web currently supports a subset of operators in ai.onnx operator set. See webgl-operators.md for a complete, detailed list of which ONNX operators are supported by WebGL backend.
WebGPU backend
WebGPU backend is still an experimental feature. See webgpu-operators.md for a detailed list of which ONNX operators are supported by WebGPU backend.
License
License information can be found here.