2021-06-02 07:47:40 +00:00
## Supported Operators and Data Types
2022-08-22 17:48:12 +00:00
*This file is automatically generated from the registered kernels by [this script ](https://github.com/microsoft/onnxruntime/blob/main/tools/python/gen_opkernel_doc.py ).
2021-06-02 07:47:40 +00:00
Do not modify directly.*
2019-08-15 01:12:24 +00:00
2021-06-02 07:47:40 +00:00
## Execution Providers
2019-08-15 01:12:24 +00:00
2021-06-02 07:47:40 +00:00
- [CPUExecutionProvider ](#cpuexecutionprovider )
- [CUDAExecutionProvider ](#cudaexecutionprovider )
2022-09-09 17:21:25 +00:00
- [DmlExecutionProvider ](#dmlexecutionprovider )
2021-06-02 07:47:40 +00:00
---------------
< a name = "cpuexecutionprovider" / >
2019-08-15 01:12:24 +00:00
## Operators implemented by CPUExecutionProvider
| Op Name | Parameters | OpSet Version | Types Supported |
|---------|------------|---------------|-----------------|
2021-06-02 07:47:40 +00:00
|**Operator Domain:** *ai.onnx* ||||
|Abs|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Acos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Acosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
|Add|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Affine|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2023-10-25 17:46:04 +00:00
|AffineGrid|*in* theta:**T1**< br > *in* size:**T2**< br > *out* grid:**T1**|20+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|And|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-01-12 22:12:56 +00:00
|ArgMax|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2022-01-18 22:37:34 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ArgMin|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(double), tensor(float), tensor(int32)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float), tensor(int32)|
2021-06-02 07:47:40 +00:00
|Asin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Asinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
|Atan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Atanh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
2023-05-15 17:46:24 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|19+|**T** = tensor(float)|
|||[11, 18]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||10|**T** = tensor(float)|
|||[7, 9]|**T** = tensor(float)|
2021-08-25 19:04:20 +00:00
|BatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* input_mean:**U**< br > *in* input_var:**U**< br > *out* Y:**T**< br > *out* running_mean:**U**< br > *out* running_var:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T1**< br > *in* B:**T1**< br > *in* input_mean:**T2**< br > *in* input_var:**T2**< br > *out* Y:**T**< br > *out* running_mean:**T2**< br > *out* running_var:**T2**|15+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(double), tensor(float)< br /> **T2** = tensor(double), tensor(float)|
|||14|**T** = tensor(double), tensor(float)< br /> **U** = tensor(double), tensor(float)|
2021-05-08 03:17:29 +00:00
|||[9, 13]|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|BitShift|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**|11+|**T** = tensor(uint32), tensor(uint64), tensor(uint8)|
2023-01-24 00:42:18 +00:00
|BitwiseAnd|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseNot|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseOr|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseXor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-06-27 00:26:55 +00:00
|BlackmanWindow|*in* size:**T1**< br > *out* output:**T2**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Cast|*in* input:**T1**< br > *out* output:**T2**|21+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[19, 20]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[13, 18]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[6, 12]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-12-14 22:57:14 +00:00
|Ceil|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Celu|*in* X:**T**< br > *out* Y:**T**|12+|**T** = tensor(float)|
2023-04-04 20:44:50 +00:00
|Clip|*in* input:**T**< br > *in* min:**T**< br > *in* max:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||11|**T** = tensor(float)|
|||[6, 10]|**T** = tensor(float)|
2023-01-25 20:23:00 +00:00
|Col2Im|*in* input:**T**< br > *in* image_shape:**tensor(int64)**< br > *in* block_shape:**tensor(int64)**< br > *out* output:**T**|18+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Compress|*in* input:**T**< br > *in* condition:**T1**< br > *out* output:**T**|11+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Concat|*in* inputs:**T**< br > *out* concat_result:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[4, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ConcatFromSequence|*in* input_sequence:**S**< br > *out* concat_result:**T**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|ConstantOfShape|*in* input:**T1**< br > *out* output:**T2**|21+|**T1** = tensor(int64)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||20|**T1** = tensor(int64)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-04-25 18:28:34 +00:00
|||[9, 19]|**T1** = tensor(int64)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|ConvInteger|*in* x:**T1**< br > *in* w:**T2**< br > *in* x_zero_point:**T1**< br > *in* w_zero_point:**T2**< br > *out* y:**T3**|10+|**T1** = tensor(uint8)< br /> **T2** = tensor(uint8)< br /> **T3** = tensor(int32)|
|ConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Cos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Cosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
|Crop|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|CumSum|*in* x:**T**< br > *in* axis:**T2**< br > *out* y:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 13]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int32), tensor(int64)|
2023-12-19 18:42:54 +00:00
|DFT|*in* input:**T1**< br > *in* dft_length:**T2**< br > *in* axis:**tensor(int64)**< br > *out* output:**T1**< br >< br > or< br >< br > *in* input:**T1**< br > *in* dft_length:**T2**< br > *out* output:**T1**|20+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
|||[17, 19]|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
2021-07-09 08:00:22 +00:00
|DepthToSpace|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
|||[11, 12]|**T** = tensor(double), tensor(float)|
|||[1, 10]|**T** = tensor(double), tensor(float)|
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362)
### Description
- 4-bit QuantizeLinear(21). **Blocked quantization still missing (i.e.,
do not support the new `block_size` attribute)**
- 4-bit DequantizeLinear(21). **Blocked dequantization still missing
(i.e., do not support the new `block_size` attribute)**
- 4-bit Transpose(21).
- Update quantization tool with int4 types.
- Disable QDQ fusions for 4-bit types. See:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc
- MLAS 4-bit quantization kernels for intel, neon, powerpc.
##### Notes
To calculate a tensor's storage size, we normally get the number of
elements from the shape (i.e., `tensor_shape.Size()`) and multiply by
the size of a single element. This does not directly work for sub-byte
elements like int4 as each element in a `Tensor<Int4x2>` stores **two**
packed int4 elements in a byte. The `Tensor::
CalculateTensorStorageSize` should be called to perform the correct
calculation for any tensor element type.
### Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4
type to ORT and adds int4 implementations for the Quant, Dequant, and
Transpose ops on CPU EP. We still need to add int4 support for many ops
and execution providers. See the ONNX 1.16 release notes:
https://github.com/onnx/onnx/releases.
2024-05-31 01:56:24 +00:00
|DequantizeLinear|*in* x:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *out* y:**tensor(float)**< br >< br > or< br >< br > *in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|21+|**T1** = tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|||[19, 20]|**T1** = tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int32), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[13, 18]|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[10, 12]|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Det|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(float)|
|Div|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Dropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T1**|13+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
|||[10, 11]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
2022-02-18 06:55:32 +00:00
|||[7, 9]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|DynamicQuantizeLinear|*in* x:**T1**< br > *out* y:**T2**< br > *out* y_scale:**tensor(float)**< br > *out* y_zero_point:**T2**|11+|**T2** = tensor(uint8)|
|DynamicSlice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|Einsum|*in* Inputs:**T**< br > *out* Output:**T**|12+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|Elu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Equal|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|19+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string)< br /> **T1** = tensor(bool)|
|||[13, 18]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[7, 10]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Erf|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Exp|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Expand|*in* input:**T**< br > *in* shape:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[8, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|EyeLike|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)< br /> **T2** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Flatten|*in* input:**T**< br > *out* output:**T**|21+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 20]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 8]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-12-14 22:57:14 +00:00
|Floor|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|GRU|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
|Gather|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|GatherElements|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-02-18 06:55:32 +00:00
|GatherND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **indices** = tensor(int64)|
|||12|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **indices** = tensor(int64)|
|||11|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **indices** = tensor(int64)|
2024-02-23 03:05:16 +00:00
|Gelu|*in* X:**T**< br > *out* Y:**T**|20+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Gemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float)|
|||[9, 10]|**T** = tensor(double), tensor(float)|
|||[7, 8]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GlobalLpPool|*in* X:**T**< br > *out* Y:**T**|2+|**T** = tensor(float)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Greater|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[7, 8]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|GreaterOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2023-11-07 18:42:41 +00:00
|GridSample|*in* X:**T1**< br > *in* grid:**T2**< br > *out* Y:**T1**|20+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(double), tensor(float)|
|||[16, 19]|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2022-06-27 00:26:55 +00:00
|HammingWindow|*in* size:**T1**< br > *out* output:**T2**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|HannWindow|*in* size:**T1**< br > *out* output:**T2**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|HardSigmoid|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float)|
|Hardmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Identity|*in* input:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**V**< br > *out* output:**V**|21+|**V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[19, 20]|**V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[16, 18]|**V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-04 22:01:42 +00:00
|||[14, 15]|**V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|If|*in* cond:**B**< br > *out* outputs:**V**|21+|**B** = tensor(bool)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[19, 20]|**B** = tensor(bool)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[16, 18]|**B** = tensor(bool)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-04 22:01:42 +00:00
|||[13, 15]|**B** = tensor(bool)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ImageScaler|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|InstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|6+|**T** = tensor(float)|
2024-03-05 21:33:01 +00:00
|IsInf|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)< br /> **T2** = tensor(bool)|
2023-10-24 17:58:54 +00:00
|||[10, 19]|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
2024-03-07 23:46:11 +00:00
|IsNaN|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)< br /> **T2** = tensor(bool)|
2023-10-24 17:58:54 +00:00
|||[13, 19]|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2022-12-14 22:57:14 +00:00
|||[9, 12]|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|LRN|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|LSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|14+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
Update Attention operator to support separated Q/K/V inputs (#13410)
### Description
Allow separated Q, K and V inputs to support cross attention:
* Q: [batch_size, sequence_length, hidden_size]
* K: [batch_size, kv_sequence_length, hidden_size]
* V: [batch_size, kv_sequence_length, v_hidden_size]
* Output: [batch_size, sequence_length, v_hidden_size]
To use separated Q/K/V inputs, the input tensor is for query, and two
optional inputs are added for key and value. Weights for input
projection is not included for now, so the MatMul of input projection
shall be done out of Attention operator, but Add bias is included for
performance consideration.
2022-10-25 18:51:06 +00:00
|LayerNormalization|*in* X:**T**< br > *in* Scale:**T**< br > *in* B:**T**< br > *out* Y:**T**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* Scale:**V**< br > *in* B:**V**< br > *out* Y:**V**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**|17+|**T** = tensor(double), tensor(float)< br /> **U** = tensor(float)|
|||[1, 16]|**T** = tensor(double), tensor(float)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(double), tensor(float)|
2022-03-08 17:18:39 +00:00
|LeakyRelu|*in* X:**T**< br > *out* Y:**T**|16+|**T** = tensor(float)|
|||[6, 15]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Less|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[7, 8]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|LessOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Log|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|LogSoftmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Loop|*in* M:**I**< br > *in* cond:**B**< br > *in* v_initial:**V**< br > *out* v_final_and_scan_outputs:**V**|21+|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[19, 20]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[16, 18]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-04 22:01:42 +00:00
|||[13, 15]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|LpNormalization|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float)|
2023-01-26 07:14:56 +00:00
|LpPool|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(float)|
|||[11, 17]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[2, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2020-09-02 22:07:50 +00:00
|||[1, 8]|**T** = tensor(double), tensor(float)|
2021-12-10 19:33:19 +00:00
|MatMulInteger|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *out* Y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int32)|
2021-06-02 07:47:40 +00:00
|Max|*in* data_0:**T**< br > *out* max:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[8, 11]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[6, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MaxPool|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**< br > *out* Indices:**I**|12+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(int8), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[8, 11]|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float)|
|||[1, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MaxRoiPool|*in* X:**T**< br > *in* rois:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|MaxUnpool|*in* X:**T1**< br > *in* I:**T2**< br > *in* output_shape:**T2**< br > *out* output:**T1**|11+|**T1** = tensor(float)< br /> **T2** = tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T1** = tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Mean|*in* data_0:**T**< br > *out* mean:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[8, 12]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[6, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MeanVarianceNormalization|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 8]|**T** = tensor(float)|
2022-06-27 00:26:55 +00:00
|MelWeightMatrix|*in* num_mel_bins:**T1**< br > *in* dft_length:**T1**< br > *in* sample_rate:**T1**< br > *in* lower_edge_hertz:**T2**< br > *in* upper_edge_hertz:**T2**< br > *out* output:**T3**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(float)< br /> **T3** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Min|*in* data_0:**T**< br > *out* min:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[8, 11]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[6, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Mod|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[10, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Mul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Multinomial|*in* input:**T1**< br > *out* output:**T2**|7+|**T1** = tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
|Neg|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8)|
2021-06-02 07:47:40 +00:00
|NonZero|*in* X:**T**< br > *out* Y:**tensor(int64)**|13+|**T** = tensor(bool), tensor(float), tensor(int32), tensor(int64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(bool), tensor(float), tensor(int32), tensor(int64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|Not|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|OneHot|*in* indices:**T1**< br > *in* depth:**T2**< br > *in* values:**T3**< br > *out* output:**T3**|11+|**T1** = tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(float), tensor(int32), tensor(int64)< br /> **T3** = tensor(float), tensor(int32), tensor(int64), tensor(string)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T1** = tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(float), tensor(int32), tensor(int64)< br /> **T3** = tensor(float), tensor(int32), tensor(int64), tensor(string)|
2021-11-04 22:01:42 +00:00
|Optional|*in* input:**V**< br > *out* output:**O**|15+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-01-09 18:26:16 +00:00
|OptionalGetElement|*in* input:**O**< br > *out* output:**V**|18+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[15, 17]|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|OptionalHasElement|*in* input:**O**< br > *out* output:**B**|18+|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[15, 17]|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))|
2021-06-02 07:47:40 +00:00
|Or|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|PRelu|*in* X:**T**< br > *in* slope:**T**< br > *out* Y:**T**|16+|**T** = tensor(float)|
|||[9, 15]|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[7, 8]|**T** = tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *in* axes:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|21+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[19, 20]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-15 17:46:24 +00:00
|||18|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-01-23 20:14:35 +00:00
|||[13, 17]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[2, 10]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|ParametricSoftplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2021-08-25 19:04:20 +00:00
|Pow|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* Y:**T1**< br > *out* Z:**T**|15+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[13, 14]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 11]|**T** = tensor(double), tensor(float)|
2021-12-10 19:33:19 +00:00
|QLinearConv|*in* x:**T1**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T1**< br > *in* w:**T2**< br > *in* w_scale:**tensor(float)**< br > *in* w_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *in* B:**T4**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)< br /> **T4** = tensor(int32)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|QLinearMatMul|*in* a:**T1**< br > *in* a_scale:**TS**< br > *in* a_zero_point:**T1**< br > *in* b:**T2**< br > *in* b_scale:**TS**< br > *in* b_zero_point:**T2**< br > *in* y_scale:**TS**< br > *in* y_zero_point:**T3**< br > *out* y:**T3**< br >< br > or< br >< br > *in* a:**T1**< br > *in* a_scale:**tensor(float)**< br > *in* a_zero_point:**T1**< br > *in* b:**T2**< br > *in* b_scale:**tensor(float)**< br > *in* b_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)|
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362)
### Description
- 4-bit QuantizeLinear(21). **Blocked quantization still missing (i.e.,
do not support the new `block_size` attribute)**
- 4-bit DequantizeLinear(21). **Blocked dequantization still missing
(i.e., do not support the new `block_size` attribute)**
- 4-bit Transpose(21).
- Update quantization tool with int4 types.
- Disable QDQ fusions for 4-bit types. See:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc
- MLAS 4-bit quantization kernels for intel, neon, powerpc.
##### Notes
To calculate a tensor's storage size, we normally get the number of
elements from the shape (i.e., `tensor_shape.Size()`) and multiply by
the size of a single element. This does not directly work for sub-byte
elements like int4 as each element in a `Tensor<Int4x2>` stores **two**
packed int4 elements in a byte. The `Tensor::
CalculateTensorStorageSize` should be called to perform the correct
calculation for any tensor element type.
### Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4
type to ORT and adds int4 implementations for the Quant, Dequant, and
Transpose ops on CPU EP. We still need to add int4 support for many ops
and execution providers. See the ONNX 1.16 release notes:
https://github.com/onnx/onnx/releases.
2024-05-31 01:56:24 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**< br >< br > or< br >< br > *in* x:**T1**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|21+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|||[19, 20]|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[13, 18]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[10, 12]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|RNN|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(float)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(float)< br /> **T1** = tensor(int32)|
|RandomNormal|*out* output:**T**|1+|**T** = tensor(double), tensor(float)|
|RandomNormalLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float)|
|RandomUniform|*out* output:**T**|1+|**T** = tensor(double), tensor(float)|
|RandomUniformLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float)|
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* output:**T**|11+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|Reciprocal|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2023-04-05 16:19:43 +00:00
|ReduceL1|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
|ReduceL2|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
|ReduceLogSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
|ReduceLogSumExp|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2024-01-05 01:41:01 +00:00
|ReduceMax|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|20+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||[18, 19]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2023-01-09 18:26:16 +00:00
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||11|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceMean|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32)|
2024-01-05 01:41:01 +00:00
|ReduceMin|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|20+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||[18, 19]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2023-01-09 18:26:16 +00:00
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||11|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceProd|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|ReduceSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-04-05 16:19:43 +00:00
|ReduceSumSquare|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2024-01-11 23:50:07 +00:00
|RegexFullMatch|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(string)< br /> **T2** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Relu|*in* X:**T**< br > *out* Y:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int8)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float)|
|||[6, 12]|**T** = tensor(double), tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Reshape|*in* data:**T**< br > *in* shape:**tensor(int64)**< br > *out* reshaped:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reshaped:**T**|21+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[19, 20]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[14, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[5, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[1, 4]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-01 17:49:17 +00:00
|Resize|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T1**< br > *in* roi:**T2**< br > *in* scales:**tensor(float)**< br > *in* sizes:**tensor(int64)**< br > *out* Y:**T1**|19+|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||18|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2023-01-09 18:26:16 +00:00
|||[13, 17]|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2021-12-17 23:36:09 +00:00
|||[11, 12]|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||10|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ReverseSequence|*in* input:**T**< br > *in* sequence_lens:**tensor(int64)**< br > *out* Y:**T**|10+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 05:10:55 +00:00
|RoiAlign|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
|||[10, 15]|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Round|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2022-06-27 00:26:55 +00:00
|STFT|*in* signal:**T1**< br > *in* frame_step:**T2**< br > *in* window:**T1**< br > *in* frame_length:**T2**< br > *out* output:**T1**|17+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Scale|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|ScaledTanh|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Scan|*in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**< br >< br > or< br >< br > *in* sequence_lens:**I**< br > *in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**|21+|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[19, 20]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[16, 18]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 17:18:39 +00:00
|||[11, 15]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|||[9, 10]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||8|**I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Scatter|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-01-19 21:54:20 +00:00
|ScatterElements|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[16, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-03-08 05:10:55 +00:00
|||[13, 15]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-01-19 21:54:20 +00:00
|ScatterND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *in* updates:**T**< br > *out* output:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 05:10:55 +00:00
|||[13, 15]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Selu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float)|
|SequenceAt|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* tensor:**T**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceConstruct|*in* inputs:**T**< br > *out* output_sequence:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceEmpty|*out* output:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceErase|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceInsert|*in* input_sequence:**S**< br > *in* tensor:**T**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceLength|*in* input_sequence:**S**< br > *out* length:**I**|11+|**I** = tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Shape|*in* data:**T**< br > *out* shape:**T1**|21+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[19, 20]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|||[15, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-08-25 19:04:20 +00:00
|||[13, 14]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Shrink|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sigmoid|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Sign|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-29 13:22:04 +00:00
|SimplifiedLayerNormalization|*in* X:**T**< br > *in* scale:**V**< br > *out* Y:**V**< br > *out* inv_std_var:**U**|1+|**T** = tensor(double), tensor(float)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Sin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(double), tensor(float)|
|Sinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Size|*in* data:**T**< br > *out* size:**T1**|21+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[19, 20]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2023-08-11 21:48:53 +00:00
|||[13, 18]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Slice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *in* steps:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||10|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[1, 9]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Softmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Softplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Softsign|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
2021-07-09 08:00:22 +00:00
|SpaceToDepth|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
|||[1, 12]|**T** = tensor(double), tensor(float)|
2023-01-11 22:14:10 +00:00
|Split|*in* input:**T**< br > *in* split:**T**< br > *out* outputs...:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* split:**tensor(int64)**< br > *out* outputs:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* outputs:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-12-12 17:29:15 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[2, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-11-29 18:44:59 +00:00
|SplitToSequence|*in* input:**T**< br > *in* split:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(string)|
2021-06-02 07:47:40 +00:00
|Sqrt|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Squeeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* squeezed:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* squeezed:**T**|21+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 20]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-01-11 18:01:43 +00:00
|StringConcat|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**|20+|**T** = tensor(string)|
2022-02-18 06:55:32 +00:00
|StringNormalizer|*in* X:**tensor(string)**< br > *out* Y:**tensor(string)**|10+|**X** = tensor(string)|
2024-01-12 17:46:23 +00:00
|StringSplit|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**T3**|20+|**T1** = tensor(string)< br /> **T2** = tensor(string)< br /> **T3** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Sub|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Sum|*in* data_0:**T**< br > *out* sum:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[8, 12]|**T** = tensor(double), tensor(float)|
|||[6, 7]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Tan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Tanh|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|TfIdfVectorizer|*in* X:**T**< br > *out* Y:**T1**|9+|**T** = tensor(int32), tensor(int64), tensor(string)< br /> **T1** = tensor(float)|
|ThresholdedRelu|*in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 9]|**T** = tensor(float)|
2023-02-16 22:59:44 +00:00
|Tile|*in* input:**T**< br > *in* repeats:**T1**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* tiles:**T**< br > *in* axis:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[6, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|TopK|*in* X:**T**< br > *in* K:**tensor(int64)**< br > *out* Values:**T**< br > *out* Indices:**I**< br >< br > or< br >< br > *in* X:**T**< br > *out* Values:**T**< br > *out* Indices:**I**|11+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||10|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float)|
|||[1, 9]|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float)|
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362)
### Description
- 4-bit QuantizeLinear(21). **Blocked quantization still missing (i.e.,
do not support the new `block_size` attribute)**
- 4-bit DequantizeLinear(21). **Blocked dequantization still missing
(i.e., do not support the new `block_size` attribute)**
- 4-bit Transpose(21).
- Update quantization tool with int4 types.
- Disable QDQ fusions for 4-bit types. See:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc
- MLAS 4-bit quantization kernels for intel, neon, powerpc.
##### Notes
To calculate a tensor's storage size, we normally get the number of
elements from the shape (i.e., `tensor_shape.Size()`) and multiply by
the size of a single element. This does not directly work for sub-byte
elements like int4 as each element in a `Tensor<Int4x2>` stores **two**
packed int4 elements in a byte. The `Tensor::
CalculateTensorStorageSize` should be called to perform the correct
calculation for any tensor element type.
### Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4
type to ORT and adds int4 implementations for the Quant, Dequant, and
Transpose ops on CPU EP. We still need to add int4 support for many ops
and execution providers. See the ONNX 1.16 release notes:
https://github.com/onnx/onnx/releases.
2024-05-31 01:56:24 +00:00
|Transpose|*in* data:**T**< br > *out* transposed:**T**|21+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int4), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint4), tensor(uint64), tensor(uint8)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|||[13, 20]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-06-06 05:21:34 +00:00
|Trilu|*in* input:**T**< br > *in* k:**tensor(int64)**< br > *out* output:**T**|14+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int64)|
2023-07-12 03:24:14 +00:00
|Unique|*in* X:**T**< br > *out* Y:**T**< br > *out* indices:**tensor(int64)**< br > *out* inverse_indices:**tensor(int64)**< br > *out* counts:**tensor(int64)**|11+|**T** = tensor(double), tensor(float), tensor(int64), tensor(int8), tensor(string)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|Unsqueeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* expanded:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* expanded:**T**|21+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 20]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-12-17 23:36:09 +00:00
|Upsample|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**|9|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||[7, 8]|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2022-03-08 05:10:55 +00:00
|Where|*in* condition:**B**< br > *in* X:**T**< br > *in* Y:**T**< br > *out* output:**T**|16+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string), tensor(uint8)|
|||[9, 15]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Xor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
| |
| |
|**Operator Domain:** *ai.onnx.ml* ||||
|ArrayFeatureExtractor|*in* X:**T**< br > *in* Y:**tensor(int64)**< br > *out* Z:**T**|1+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string)|
|Binarizer|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|CastMap|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = map(int64,tensor(float)), map(int64,tensor(string))< br /> **T2** = tensor(float), tensor(int64), tensor(string)|
|CategoryMapper|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(int64), tensor(string)< br /> **T2** = tensor(int64), tensor(string)|
|DictVectorizer|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = map(int64,tensor(double)), map(int64,tensor(float)), map(int64,tensor(string)), map(string,tensor(double)), map(string,tensor(float)), map(string,tensor(int64))< br /> **T2** = tensor(double), tensor(float), tensor(int64), tensor(string)|
|FeatureVectorizer|*in* X:**T1**< br > *out* Y:**tensor(float)**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|Imputer|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(int64)|
2024-01-12 20:43:44 +00:00
|LabelEncoder|*in* X:**T1**< br > *out* Y:**T2**|4+|**T1** = tensor(double), tensor(float), tensor(int64), tensor(string)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int64), tensor(string)|
|||[2, 3]|**T1** = tensor(float), tensor(int64), tensor(string)< br /> **T2** = tensor(float), tensor(int64), tensor(string)|
2021-06-02 07:47:40 +00:00
|||1|**T1** = tensor(int64), tensor(string)< br /> **T2** = tensor(int64), tensor(string)|
|LinearClassifier|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**tensor(float)**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|LinearRegressor|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(float)|
|Normalizer|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|OneHotEncoder|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(double), tensor(float), tensor(int64), tensor(string)|
|SVMClassifier|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**tensor(float)**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|SVMRegressor|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(float)|
|Scaler|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2022-03-30 10:53:12 +00:00
|TreeEnsembleClassifier|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**tensor(float)**|3+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|||[1, 2]|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|TreeEnsembleRegressor|*in* X:**T**< br > *out* Y:**tensor(float)**|3+|**T** = tensor(double), tensor(float)|
|||[1, 2]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|ZipMap|*in* X:**tensor(float)**< br > *out* Z:**T**|1+|**T** = seq(map(int64,tensor(float))), seq(map(string,tensor(float)))|
2019-08-15 01:12:24 +00:00
| |
| |
2020-09-02 22:07:50 +00:00
|**Operator Domain:** *com.microsoft* ||||
2023-02-07 19:51:06 +00:00
|Attention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|AttnLSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *in* QW:**T**< br > *in* MW:**T**< br > *in* V:**T**< br > *in* M:**T**< br > *in* memory_seq_lens:**T1**< br > *in* AW:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|1+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
2023-05-17 04:40:00 +00:00
|BeamSearch|*in* input_ids:**F**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* num_beams:**I**< br > *in* num_return_sequences:**I**< br > *in* length_penalty:**T**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**M**< br > *in* prefix_vocab_mask:**M**< br > *in* attention_mask:**I**< br > *in* decoder_input_ids:**I**< br > *in* logits_processor:**I**< br > *out* sequences:**I**< br > *out* sequences_scores:**T**< br > *out* scores:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|BiasGelu|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float)|
2021-10-20 02:53:56 +00:00
|BifurcationDetector|*in* src_tokens:**T**< br > *in* cur_tokens:**T**< br > *in* prev_suffix_match_idx:**T**< br > *in* pred_tokens:**T**< br > *out* tokens:**T**< br > *out* suffix_match_idx:**T**|1+|**T** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|CDist|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(double), tensor(float)|
|ConvTransposeWithDynamicPads|*in* X:**T**< br > *in* W:**T**< br > *in* Pads:**tensor(int64)**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2022-02-18 06:55:32 +00:00
|CropAndResize|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *in* crop_size:**T2**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int32)|
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362)
### Description
- 4-bit QuantizeLinear(21). **Blocked quantization still missing (i.e.,
do not support the new `block_size` attribute)**
- 4-bit DequantizeLinear(21). **Blocked dequantization still missing
(i.e., do not support the new `block_size` attribute)**
- 4-bit Transpose(21).
- Update quantization tool with int4 types.
- Disable QDQ fusions for 4-bit types. See:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc
- MLAS 4-bit quantization kernels for intel, neon, powerpc.
##### Notes
To calculate a tensor's storage size, we normally get the number of
elements from the shape (i.e., `tensor_shape.Size()`) and multiply by
the size of a single element. This does not directly work for sub-byte
elements like int4 as each element in a `Tensor<Int4x2>` stores **two**
packed int4 elements in a byte. The `Tensor::
CalculateTensorStorageSize` should be called to perform the correct
calculation for any tensor element type.
### Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4
type to ORT and adds int4 implementations for the Quant, Dequant, and
Transpose ops on CPU EP. We still need to add int4 support for many ops
and execution providers. See the ONNX 1.16 release notes:
https://github.com/onnx/onnx/releases.
2024-05-31 01:56:24 +00:00
|DequantizeLinear|*in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|1+|**T1** = tensor(int16), tensor(int32), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8)< br /> **T2** = tensor(float)|
2021-06-02 07:47:40 +00:00
|DynamicQuantizeLSTM|*in* X:**T**< br > *in* W:**T2**< br > *in* R:**T2**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *in* W_scale:**T**< br > *in* W_zero_point:**T2**< br > *in* R_scale:**T**< br > *in* R_zero_point:**T2**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|1+|**T** = tensor(float)< br /> **T1** = tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
|DynamicQuantizeMatMul|*in* A:**T1**< br > *in* B:**T2**< br > *in* b_scale:**T1**< br > *in* b_zero_point:**T2**< br > *in* bias:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-10-28 18:06:26 +00:00
|EmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding:**T**< br > *in* position_embedding:**T**< br > *in* segment_embedding:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* mask:**T1**< br > *in* position_ids:**T1**< br > *out* output:**T**< br > *out* mask_index:**T1**< br > *out* embedding_sum:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|ExpandDims|*in* X:**T**< br > *in* axis:**tensor(int32)**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **axis** = tensor(int32)|
|FastGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *in* Z:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedGemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GatherND|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|Gelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2022-10-21 22:00:18 +00:00
|GreedySearch|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *out* sequences:**I**|1+|**T** = tensor(float)|
2022-02-18 06:55:32 +00:00
|GridSample|*in* X:**T1**< br > *in* Grid:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2024-04-23 02:57:05 +00:00
|GroupQueryAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* seqlens_k:**M**< br > *in* total_sequence_length:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Inverse|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-10-25 22:34:58 +00:00
|MatMulBnb4|*in* A:**T1**< br > *in* B:**T2**< br > *in* absmax:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(uint8)|
2023-08-07 19:23:55 +00:00
|MatMulFpQ4|*in* A:**T1**< br > *in* B:**T2**< br > *in* B_shape:**T3**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(uint8)< br /> **T3** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|MatMulInteger16|*in* A:**T1**< br > *in* B:**T2**< br > *out* Y:**T3**|1+|**T1** = tensor(int16)< br /> **T2** = tensor(int16)< br /> **T3** = tensor(int32)|
2021-12-10 19:33:19 +00:00
|MatMulIntegerToFloat|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_scale:**T3**< br > *in* b_scale:**T3**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *in* bias:**T3**< br > *out* Y:**T3**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(float)|
2024-05-16 18:00:59 +00:00
|MatMulNBits|*in* A:**T1**< br > *in* B:**T2**< br > *in* scales:**T1**< br > *in* zero_points:**T3**< br > *in* g_idx:**T4**< br > *in* bias:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(uint8)< br /> **T3** = tensor(float), tensor(uint8)< br /> **T4** = tensor(int32)|
2022-09-20 21:24:59 +00:00
|MaxpoolWithMask|*in* X:**T**< br > *in* M:**tensor(int32)**< br > *out* Y:**T**|1+|**T** = tensor(float)|
Whisper Model Optimization (#15473)
### Description
This PR contains fusion-level and kernel-level optimizations for
[OpenAI's Whisper](https://github.com/openai/whisper).
Some of the added optimizations include:
- Pruning of duplicate/unnecessary inputs and outputs
- Fusion support for Whisper models with or without these inputs/outputs
(e.g. with these inputs/outputs if exporting with an older official
Optimum version, without these inputs/outputs if exporting with Optimum
from source)
- Attention fusions
- For Whisper's encoder and decoder
- Modified symbolic shape inference for present output when no past
input exists (for decoder)
- Multi-head attention fusions
- For Whisper's decoder and decoder with past
- Packed MatMul for the 3 MatMuls excluded in multi-head attention
fusion
- Attention kernel changes
- CPU:
- Different Q and KV sequence lengths
- Parallel memset for large sequence lengths
- Convert broadcast add after MatMul of Q and K (add_qk) to element-wise
add
- Separate present key-value output into present key and present value
(for multi-head attention spec)
- CUDA:
- Use memory efficient attention compute kernel with present state (for
decoder)
- Multi-head attention kernel changes
- CPU:
- Introduction of multi-head attention CPU kernel (previously did not
exist)
- Use AddBiasReshape instead of AddBiasTranspose when sequence length =
1 (for decoder with past)
- Different Q, K, V input shapes
- Pass past key and past value directly as key and value
- CUDA:
- Use memory efficient attention compute kernel with past and/or present
state (for decoder with past)
### Usage
To use the optimizations, run the ORT transformer optimizer script as
follows:
```
$ cd onnxruntime/onnxruntime/python/tools/transformers/
$ python3 optimizer.py --input <filename>.onnx --output <filename>.onnx --model_type bart --num_heads <number of attention heads, depends on the size of the whisper model used> --hidden_size <attention hidden size, depends on the size of the whisper model used> --use_external_data_format --use_multi_head_attention
```
Once optimized, here's an example of how to run Whisper with [Hugging
Face's Optimum](https://github.com/huggingface/optimum):
```
from transformers.onnx.utils import get_preprocessor
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from optimum.pipelines import pipeline as ort_pipeline
import whisper # Installed from OpenAI's repo - setup instructions at https://github.com/openai/whisper/
directory = './whisper_opt' # Where the optimized ONNX models are located
model_name = 'openai/whisper-tiny'
device = 'cpu'
# Get pipeline
processor = get_preprocessor(model_name)
model = ORTModelForSpeechSeq2Seq.from_pretrained(
directory,
use_io_binding=(device == 'cuda'),
provider='CPUExecutionProvider',
).to(device)
pipe = ort_pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
device=(-1 if device == 'cpu' else 0),
)
# Load audio file and run pipeline
audio = whisper.load_audio('tests/jfk.flac')
audio = whisper.pad_or_trim(audio)
outputs = pipe([audio])
print(outputs)
```
Note: In order to use these changes with Optimum, it is recommended to
use Optimum from source to have the following changes:
- https://github.com/huggingface/optimum/pull/872
- https://github.com/huggingface/optimum/pull/920
### Motivation and Context
This PR helps the following issues:
- https://github.com/microsoft/onnxruntime/issues/15100
- https://github.com/microsoft/onnxruntime/issues/15235
- https://github.com/huggingface/optimum/issues/869 (work in progress)
This PR can be used with the other currently merged Whisper PRs:
- https://github.com/microsoft/onnxruntime/pull/15247
- https://github.com/microsoft/onnxruntime/pull/15339
- https://github.com/microsoft/onnxruntime/pull/15362
- https://github.com/microsoft/onnxruntime/pull/15365
- https://github.com/microsoft/onnxruntime/pull/15427
This PR uses changes from the following merged PRs:
- https://github.com/microsoft/onnxruntime/pull/14198
- https://github.com/microsoft/onnxruntime/pull/14146
- https://github.com/microsoft/onnxruntime/pull/14201
- https://github.com/microsoft/onnxruntime/pull/14928 (this introduced
the new multi-head attention spec)
2023-04-19 00:13:54 +00:00
|MultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MurmurHash3|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(int32), tensor(uint32)|
2021-06-21 17:21:48 +00:00
|NGramRepeatBlock|*in* input_ids:**Tid**< br > *in* scores:**T**< br > *out* scores_out:**T**|1+|**T** = tensor(float)< br /> **Tid** = tensor(int64)|
2021-11-30 02:43:43 +00:00
|NhwcMaxPool|*in* x:**T**< br > *out* y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* value:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|QAttention|*in* input:**T1**< br > *in* weight:**T2**< br > *in* bias:**T3**< br > *in* input_scale:**T3**< br > *in* weight_scale:**T3**< br > *in* mask_index:**T4**< br > *in* input_zero_point:**T1**< br > *in* weight_zero_point:**T2**< br > *in* past:**T3**< br > *out* output:**T3**< br > *out* present:**T3**|1+|**T1** = tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(float)< br /> **T4** = tensor(int32)|
2021-06-25 22:51:43 +00:00
|QEmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding_quant:**T2**< br > *in* position_embedding_quant:**T2**< br > *in* segment_embedding:**T2**< br > *in* gamma_quant:**T2**< br > *in* beta_quant:**T2**< br > *in* mask:**T1**< br > *in* word_embedding_scale:**T**< br > *in* position_embedding_scale:**T**< br > *in* segment_embedding_scale:**T**< br > *in* gamma_scale:**T**< br > *in* beta_scale:**T**< br > *in* word_embedding_zero_point:**T2**< br > *in* position_embedding_zero_point:**T2**< br > *in* segment_embedding_zero_point:**T2**< br > *in* gamma_zero_point:**T2**< br > *in* beta_zero_point:**T2**< br > *out* layernorm_out:**T**< br > *out* mask_index_out:**T1**|1+|**T** = tensor(float)|
2022-02-02 18:35:29 +00:00
|QGemm|*in* A:**TA**< br > *in* a_scale:**T**< br > *in* a_zero_point:**TA**< br > *in* B:**TB**< br > *in* b_scale:**T**< br > *in* b_zero_point:**TB**< br > *in* C:**TC**< br > *in* y_scale:**T**< br > *in* y_zero_point:**TYZ**< br > *out* Y:**TY**|1+|**T** = tensor(float)< br /> **TA** = tensor(int8), tensor(uint8)< br /> **TB** = tensor(int8), tensor(uint8)< br /> **TC** = tensor(int32)< br /> **TY** = tensor(float), tensor(int8), tensor(uint8)< br /> **TYZ** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|QLinearAdd|*in* A:**T**< br > *in* A_scale:**tensor(float)**< br > *in* A_zero_point:**T**< br > *in* B:**T**< br > *in* B_scale:**tensor(float)**< br > *in* B_zero_point:**T**< br > *in* C_scale:**tensor(float)**< br > *in* C_zero_point:**T**< br > *out* C:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2021-12-10 19:33:19 +00:00
|QLinearConv|*in* x:**T1**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T1**< br > *in* w:**T2**< br > *in* w_scale:**tensor(float)**< br > *in* w_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *in* B:**T4**< br > *out* y:**T3**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)< br /> **T4** = tensor(int32)|
2021-06-02 07:47:40 +00:00
|QLinearLeakyRelu|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* X_zero_point:**T**< br > *in* Y_scale:**tensor(float)**< br > *in* Y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
|QLinearMul|*in* A:**T**< br > *in* A_scale:**tensor(float)**< br > *in* A_zero_point:**T**< br > *in* B:**T**< br > *in* B_scale:**tensor(float)**< br > *in* B_zero_point:**T**< br > *in* C_scale:**tensor(float)**< br > *in* C_zero_point:**T**< br > *out* C:**T**|1+|**T** = tensor(int8), tensor(uint8)|
|QLinearSigmoid|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* X_zero_point:**T**< br > *in* Y_scale:**tensor(float)**< br > *in* Y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2022-08-10 02:52:02 +00:00
|QLinearSoftmax|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2022-12-12 21:27:47 +00:00
|QLinearWhere|*in* condition:**B**< br > *in* X:**T**< br > *in* x_scale:**TF**< br > *in* x_zero_point:**T**< br > *in* Y:**T**< br > *in* y_scale:**TF**< br > *in* y_zero_point:**T**< br > *in* z_scale:**TF**< br > *in* z_zero_point:**T**< br > *out* Z:**T**|1+|**T** = tensor(int8), tensor(uint8)|
[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362)
### Description
- 4-bit QuantizeLinear(21). **Blocked quantization still missing (i.e.,
do not support the new `block_size` attribute)**
- 4-bit DequantizeLinear(21). **Blocked dequantization still missing
(i.e., do not support the new `block_size` attribute)**
- 4-bit Transpose(21).
- Update quantization tool with int4 types.
- Disable QDQ fusions for 4-bit types. See:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc
- MLAS 4-bit quantization kernels for intel, neon, powerpc.
##### Notes
To calculate a tensor's storage size, we normally get the number of
elements from the shape (i.e., `tensor_shape.Size()`) and multiply by
the size of a single element. This does not directly work for sub-byte
elements like int4 as each element in a `Tensor<Int4x2>` stores **two**
packed int4 elements in a byte. The `Tensor::
CalculateTensorStorageSize` should be called to perform the correct
calculation for any tensor element type.
### Motivation and Context
ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4
type to ORT and adds int4 implementations for the Quant, Dequant, and
Transpose ops on CPU EP. We still need to add int4 support for many ops
and execution providers. See the ONNX 1.16 release notes:
https://github.com/onnx/onnx/releases.
2024-05-31 01:56:24 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int16), tensor(int4), tensor(int8), tensor(uint16), tensor(uint4), tensor(uint8)|
QuickGelu Fusion (#12417)
Some models have QuickGelu(x)=x*sigmoid(1.702x), which has 3 Ops for
forward and 5 Ops for backward. The PR is to fuse this to a single Op
named QuickGelu and its gradient QuickGeluGrad.
For CUDA, tested in V100 using input tensor with shape [64,128,2048] and
float16 type:
Before, FW takes 335us, BW takes 614us

After, FW takes 115us, BW takes 139us, which is much faster.

For CPU kernel, using same shape and float type:
Before, FW takes 10us, BW takes 49us
Mul: 3480[µs]
Sigmoid: 1996[µs]
Mul: 4789[µs]
Mul: 4642[µs]
Mul: 4195[µs]
SigmoidGrad: 18328[µs]
Mul: 2988[µs]
Sum: 18576[µs]
After, FW takes 4us, BW takes 5us, which is also much faster.
QuickGelu: 3939[µs]
QuickGeluGrad: 5089[µs]
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-10-28 10:12:07 +00:00
|QuickGelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64)|
2023-10-23 20:00:56 +00:00
|RotaryEmbedding|*in* input:**T**< br > *in* position_ids:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**|1+|**M** = tensor(int64)< br /> **T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|SampleOp|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2023-01-12 22:15:26 +00:00
|Sampling|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *in* presence_mask:**I**< br > *in* seed:**I**< br > *out* sequences:**I**< br > *out* filtered_logits:**T**|1+|**T** = tensor(float)|
2023-01-06 15:27:10 +00:00
|SkipLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(double), tensor(float)|
2023-10-23 20:00:56 +00:00
|SkipSimplifiedLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(double), tensor(float)|
2024-07-04 04:51:57 +00:00
|SparseAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* block_row_indices:**M**< br > *in* block_col_indices:**M**< br > *in* total_sequence_length:**M**< br > *in* key_total_sequence_lengths:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float)|
2021-07-22 22:24:36 +00:00
|SparseToDenseMatMul|*in* A:**T**< br > *in* B:**T1**< br > *out* Y:**T1**|1+|**T** = sparse_tensor(double), sparse_tensor(float), sparse_tensor(int32), sparse_tensor(int64), sparse_tensor(uint32), sparse_tensor(uint64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Tokenizer|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(string)|
|TransposeMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Trilu|*in* X:**T**< br > *in* k:**tensor(int64)**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(int64)|
|Unique|*in* x:**T**< br > *out* y:**T**< br > *out* idx:**tensor(int64)**< br > *out* counts:**tensor(int64)**|1+|**T** = tensor(float)|
2024-01-23 21:44:34 +00:00
|WhisperBeamSearch|*in* input_ids:**F**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* num_beams:**I**< br > *in* num_return_sequences:**I**< br > *in* length_penalty:**T**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**M**< br > *in* prefix_vocab_mask:**M**< br > *in* attention_mask:**I**< br > *in* decoder_input_ids:**I**< br > *in* logits_processor:**I**< br > *in* cross_qk_layer_head:**I**< br > *in* extra_decoding_ids:**I**< br > *in* temperature:**T**< br > *out* sequences:**I**< br > *out* sequences_scores:**T**< br > *out* scores:**T**< br > *out* cross_qk:**V**< br > *out* non_speech_probs:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|WordConvEmbedding|*in* Sequence:**T**< br > *in* W:**T1**< br > *in* B:**T1**< br > *in* C:**T1**< br > *out* Y:**T1**|1+|**T** = tensor(int32)< br /> **T1** = tensor(float)|
2019-08-15 01:12:24 +00:00
| |
| |
2020-09-02 22:07:50 +00:00
|**Operator Domain:** *com.microsoft.nchwc* ||||
2021-06-02 07:47:40 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *in* Sum:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|MaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|ReorderInput|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|ReorderOutput|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Upsample|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2019-08-15 01:12:24 +00:00
| |
| |
2021-05-08 03:17:29 +00:00
2021-06-02 07:47:40 +00:00
< a name = "cudaexecutionprovider" / >
2021-05-08 03:17:29 +00:00
## Operators implemented by CUDAExecutionProvider
| Op Name | Parameters | OpSet Version | Types Supported |
|---------|------------|---------------|-----------------|
2021-06-02 07:47:40 +00:00
|**Operator Domain:** *ai.onnx* ||||
|Abs|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Add|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Affine|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|And|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2023-10-26 23:57:21 +00:00
|ArgMax|*in* data:**T**< br > *out* reduced:**tensor(int64)**|[1, 11]|**T** = tensor(double), tensor(float), tensor(float16)|
|ArgMin|*in* data:**T**< br > *out* reduced:**tensor(int64)**|[1, 11]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2022-02-18 06:55:32 +00:00
|||10|**T** = tensor(double), tensor(float), tensor(float16)|
|||[7, 9]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-08-25 19:04:20 +00:00
|BatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* input_mean:**U**< br > *in* input_var:**U**< br > *out* Y:**T**< br > *out* running_mean:**U**< br > *out* running_var:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T1**< br > *in* B:**T1**< br > *in* input_mean:**T2**< br > *in* input_var:**T2**< br > *out* Y:**T**< br > *out* running_mean:**T2**< br > *out* running_var:**T2**|15+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(double), tensor(float), tensor(float16)|
|||14|**T** = tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|||[9, 13]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Cast|*in* input:**T1**< br > *out* output:**T2**|19+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[9, 12]|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[6, 8]|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Ceil|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Clip|*in* input:**T**< br > *in* min:**T**< br > *in* max:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int64), tensor(int8), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int64), tensor(int8), tensor(uint64), tensor(uint8)|
|||11|**T** = tensor(float)|
|||[6, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Compress|*in* input:**T**< br > *in* condition:**T1**< br > *out* output:**T**|11+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Concat|*in* inputs:**T**< br > *out* concat_result:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[4, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-07 22:30:26 +00:00
|ConcatFromSequence|*in* input_sequence:**S**< br > *out* concat_result:**T**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
2021-06-02 07:47:40 +00:00
|ConstantOfShape|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(int64)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|ConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Cos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(double), tensor(float), tensor(float16)|
|Crop|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|CumSum|*in* x:**T**< br > *in* axis:**T2**< br > *out* y:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 13]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(int32), tensor(int64)|
2021-07-09 08:00:22 +00:00
|DepthToSpace|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|DequantizeLinear|*in* x:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *out* y:**tensor(float)**< br >< br > or< br >< br > *in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|19+|**T1** = tensor(float8e4m3fn), tensor(float8e5m2), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
|||[13, 18]|**T** = tensor(int8), tensor(uint8)|
|||[10, 12]|**T** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Div|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Dropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T1**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|||[10, 11]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
|||[7, 9]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|DynamicSlice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|Einsum|*in* Inputs:**T**< br > *out* Output:**T**|12+|**T** = tensor(double), tensor(float), tensor(float16)|
|Elu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
|Equal|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[7, 10]|**T** = tensor(bool), tensor(int32), tensor(int64)|
2024-04-25 18:28:34 +00:00
|Erf|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2024-04-25 18:28:34 +00:00
|Exp|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Expand|*in* input:**T**< br > *in* shape:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[8, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|EyeLike|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)< br /> **T2** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)|
|Flatten|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 8]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Floor|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|GRU|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|Gather|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|GatherElements|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-02-18 06:55:32 +00:00
|GatherND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int64)< br /> **indices** = tensor(int64)|
|||12|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int64)< br /> **indices** = tensor(int64)|
|||11|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int64)< br /> **indices** = tensor(int64)|
2024-02-23 03:05:16 +00:00
|Gelu|*in* X:**T**< br > *out* Y:**T**|20+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Gemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[9, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|Greater|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2022-03-08 17:18:39 +00:00
|GreaterOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2024-02-23 03:47:15 +00:00
|GridSample|*in* X:**T1**< br > *in* grid:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2021-06-02 07:47:40 +00:00
|HardSigmoid|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Identity|*in* input:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**V**< br > *out* output:**V**|19+|**V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[14, 18]|**V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|If|*in* cond:**B**< br > *out* outputs:**V**|19+|**B** = tensor(bool)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**B** = tensor(bool)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ImageScaler|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|InstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
2024-03-05 21:33:01 +00:00
|IsInf|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)< br /> **T2** = tensor(bool)|
|||[10, 19]|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
2024-03-07 23:46:11 +00:00
|IsNaN|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)< br /> **T2** = tensor(bool)|
|||[13, 19]|**T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|||[9, 12]|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|LRN|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|LSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
Update Attention operator to support separated Q/K/V inputs (#13410)
### Description
Allow separated Q, K and V inputs to support cross attention:
* Q: [batch_size, sequence_length, hidden_size]
* K: [batch_size, kv_sequence_length, hidden_size]
* V: [batch_size, kv_sequence_length, v_hidden_size]
* Output: [batch_size, sequence_length, v_hidden_size]
To use separated Q/K/V inputs, the input tensor is for query, and two
optional inputs are added for key and value. Weights for input
projection is not included for now, so the MatMul of input projection
shall be done out of Attention operator, but Add bias is included for
performance consideration.
2022-10-25 18:51:06 +00:00
|LayerNormalization|*in* X:**T**< br > *in* Scale:**T**< br > *in* B:**T**< br > *out* Y:**T**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* Scale:**V**< br > *in* B:**V**< br > *out* Y:**V**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**|17+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(float)|
|||[1, 16]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2022-03-08 17:18:39 +00:00
|LeakyRelu|*in* X:**T**< br > *out* Y:**T**|16+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[6, 15]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Less|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2022-03-08 17:18:39 +00:00
|LessOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Log|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|LogSoftmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Loop|*in* M:**I**< br > *in* cond:**B**< br > *in* v_initial:**V**< br > *out* v_final_and_scan_outputs:**V**|19+|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|MatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|MatMulInteger|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *out* Y:**T3**|10+|**T1** = tensor(int8)< br /> **T2** = tensor(int8)< br /> **T3** = tensor(int32)|
|Max|*in* data_0:**T**< br > *out* max:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[6, 11]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2022-02-18 06:55:32 +00:00
|MaxPool|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**< br > *out* Indices:**I**|12+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||11|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16)|
|||10|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16)|
|||[8, 9]|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16)|
2022-02-18 06:55:32 +00:00
|||[1, 7]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|MemcpyFromHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|MemcpyToHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Min|*in* data_0:**T**< br > *out* min:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[6, 11]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2022-08-09 05:05:40 +00:00
|Mod|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[10, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Mul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-11-09 02:32:12 +00:00
|Neg|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2022-03-24 23:35:45 +00:00
|NonZero|*in* X:**T**< br > *out* Y:**tensor(int64)**|13+|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
|||[9, 12]|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|Not|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|OneHot|*in* indices:**T1**< br > *in* depth:**T2**< br > *in* values:**T3**< br > *out* output:**T3**|11+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(int32), tensor(int64)< br /> **T3** = tensor(float), tensor(float16), tensor(int64)|
|Or|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|PRelu|*in* X:**T**< br > *in* slope:**T**< br > *out* Y:**T**|16+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[9, 15]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2024-01-25 02:12:04 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *in* axes:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|18+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16)|
|||[13, 17]|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[2, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|ParametricSoftplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-08-25 19:04:20 +00:00
|Pow|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* Y:**T1**< br > *out* Z:**T**|15+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[13, 14]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[7, 11]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**< br >< br > or< br >< br > *in* x:**T1**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|19+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float8e4m3fn), tensor(float8e5m2), tensor(int8), tensor(uint8)|
|||[13, 18]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
|||[10, 12]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|RNN|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
2021-11-19 00:18:34 +00:00
|RandomNormal|*out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|RandomNormalLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float), tensor(float16)|
|RandomUniform|*out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|RandomUniformLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* output:**T**|11+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|Reciprocal|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-10-26 23:57:21 +00:00
|ReduceL1|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|ReduceL2|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|ReduceLogSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16)|
|ReduceLogSumExp|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16)|
|ReduceMax|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|ReduceMean|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|ReduceMin|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|ReduceProd|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2021-06-02 07:47:40 +00:00
|ReduceSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2023-10-26 23:57:21 +00:00
|||[1, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|ReduceSumSquare|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 17]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Relu|*in* X:**T**< br > *out* Y:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Reshape|*in* data:**T**< br > *in* shape:**tensor(int64)**< br > *out* reshaped:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reshaped:**T**|19+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[14, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[5, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[1, 4]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-02-29 22:46:42 +00:00
|Resize|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T1**< br > *in* roi:**T2**< br > *in* scales:**tensor(float)**< br > *in* sizes:**tensor(int64)**< br > *out* Y:**T1**|18+|**T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
|||[13, 17]|**T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
|||10|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ReverseSequence|*in* input:**T**< br > *in* sequence_lens:**tensor(int64)**< br > *out* Y:**T**|10+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|RoiAlign|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *out* Y:**T1**|10+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Round|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
|ScaledTanh|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Scan|*in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**< br >< br > or< br >< br > *in* sequence_lens:**I**< br > *in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**|19+|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 18]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 17:18:39 +00:00
|||[11, 15]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|||[9, 10]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||8|**I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Scatter|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2024-01-30 17:18:50 +00:00
|ScatterElements|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[16, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[13, 15]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2024-04-24 12:08:50 +00:00
|ScatterND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *in* updates:**T**< br > *out* output:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 15]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Selu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-07 22:30:26 +00:00
|SequenceAt|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* tensor:**T**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceConstruct|*in* inputs:**T**< br > *out* output_sequence:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceEmpty|*out* output:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceErase|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceInsert|*in* input_sequence:**S**< br > *in* tensor:**T**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceLength|*in* input_sequence:**S**< br > *out* length:**I**|11+|**I** = tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Shape|*in* data:**T**< br > *out* shape:**T1**|19+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[15, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-08-25 19:04:20 +00:00
|||[13, 14]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Shrink|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sigmoid|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-08-29 04:03:58 +00:00
|Sign|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898)
### Description
<!-- Describe your changes. -->
Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which
will provide speedup for Llama-v2 on A100 using bfloat16 numerical
format.
_layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_

### Repro Instructions
```python
from torch import nn
from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel
import torch
dtype = torch.bfloat16
# dtype = torch.float16
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(784, 10, dtype=dtype)
self.layernorm = nn.LayerNorm([784], dtype=dtype)
def forward(self, x):
x = x.view(x.shape[0], -1)
x = self.layernorm(x)
x = self.fc(x)
return x
model = Net()
model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO))
model.to("cuda")
images = torch.randn((8, 28, 28), dtype=dtype).to("cuda")
output = model(images)
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ONNX Runtime integration with Llama-v2 family of LLMs.
---------
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2024-02-14 18:05:16 +00:00
|SimplifiedLayerNormalization|*in* X:**T**< br > *in* scale:**V**< br > *out* Y:**V**< br > *out* inv_std_var:**U**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Sin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(double), tensor(float), tensor(float16)|
|Size|*in* data:**T**< br > *out* size:**T1**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2022-02-18 06:55:32 +00:00
|Slice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *in* steps:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||10|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[1, 9]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Softmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Softplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|Softsign|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-07-09 08:00:22 +00:00
|SpaceToDepth|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-11 22:14:10 +00:00
|Split|*in* input:**T**< br > *in* split:**T**< br > *out* outputs...:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* split:**tensor(int64)**< br > *out* outputs:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* outputs:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[2, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-02-15 02:07:51 +00:00
|Sqrt|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Squeeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* squeezed:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* squeezed:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Sub|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Sum|*in* data_0:**T**< br > *out* sum:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[8, 12]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|||[6, 7]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Tanh|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|ThresholdedRelu|*in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||1+|**T** = tensor(double), tensor(float), tensor(float16)|
2024-06-17 12:52:13 +00:00
|Tile|*in* input:**T**< br > *in* repeats:**T1**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* tiles:**T**< br > *in* axis:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(int64)|
2023-02-07 17:03:14 +00:00
|TopK|*in* X:**T**< br > *in* K:**tensor(int64)**< br > *out* Values:**T**< br > *out* Indices:**I**< br >< br > or< br >< br > *in* X:**T**< br > *out* Values:**T**< br > *out* Indices:**I**|11+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||10|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[1, 9]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Transpose|*in* data:**T**< br > *out* transposed:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-09 19:20:17 +00:00
|Trilu|*in* input:**T**< br > *in* k:**tensor(int64)**< br > *out* output:**T**|14+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Unsqueeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* expanded:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* expanded:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Upsample|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**|9|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2023-11-02 19:23:20 +00:00
|Where|*in* condition:**B**< br > *in* X:**T**< br > *in* Y:**T**< br > *out* output:**T**|16+|**B** = tensor(bool)< br /> **T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
2022-03-08 17:18:39 +00:00
|||[9, 15]|**B** = tensor(bool)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Xor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
| |
| |
2022-10-27 21:20:48 +00:00
|**Operator Domain:** *com.microsoft* ||||
2023-02-07 19:51:06 +00:00
|Attention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-05-17 04:40:00 +00:00
|BeamSearch|*in* input_ids:**F**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* num_beams:**I**< br > *in* num_return_sequences:**I**< br > *in* length_penalty:**T**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**M**< br > *in* prefix_vocab_mask:**M**< br > *in* attention_mask:**I**< br > *in* decoder_input_ids:**I**< br > *in* logits_processor:**I**< br > *out* sequences:**I**< br > *out* sequences_scores:**T**< br > *out* scores:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-02-14 20:46:50 +00:00
|BiasAdd|*in* X:**T**< br > *in* bias:**T**< br > *in* skip:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|BiasDropout|*in* data:**T**< br > *in* bias:**T**< br > *in* residual:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|BiasGelu|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|BiasSoftmax|*in* data:**T**< br > *in* bias:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-02-03 07:43:51 +00:00
|BiasSplitGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|BitmaskBiasDropout|*in* data:**T**< br > *in* bias:**T**< br > *in* residual:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T3**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)< br /> **T3** = tensor(uint32)|
|BitmaskDropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T3**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)< br /> **T3** = tensor(uint32)|
|ComplexMul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
|ComplexMulConj|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
|ConvTransposeWithDynamicPads|*in* X:**T**< br > *in* W:**T**< br > *in* Pads:**tensor(int64)**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|DecoderAttention|*in* query:**T**< br > *in* key:**T**< br > *in* q_weight:**T**< br > *in* kv_weight:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**B**< br > *in* key_cache:**T**< br > *in* value_cache:**T**< br > *in* static_kv:**B**< br > *in* use_past:**B**< br > *in* has_layer_state:**B**< br > *in* has_key_padding_mask:**B**< br > *out* output:**T**< br > *out* new_key_cache:**T**< br > *out* new_value_cache:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-10-13 18:47:15 +00:00
|DecoderMaskedMultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* mask_index:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* past_sequence_length:**M**< br > *in* beam_width:**M**< br > *in* cache_indirection:**M**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**< br > *out* qk:**V**|1+|**T** = tensor(float), tensor(float16)|
2023-03-23 19:31:38 +00:00
|DecoderMaskedSelfAttention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *in* beam_width:**M**< br > *in* cache_indirection:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|DequantizeLinear|*in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(float16)|
|DequantizeWithOrder|*in* input:**Q**< br > *in* scale_input:**S**< br > *out* output:**F**|1+|**F** = tensor(float), tensor(float16)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
2023-10-13 18:47:15 +00:00
|DynamicTimeWarping|*in* input:**F**< br > *out* output:**I**|1+|**F** = tensor(float)< br /> **I** = tensor(int32)|
2022-10-27 21:20:48 +00:00
|EmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding:**T**< br > *in* position_embedding:**T**< br > *in* segment_embedding:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* mask:**T1**< br > *in* position_ids:**T1**< br > *out* output:**T**< br > *out* mask_index:**T1**< br > *out* embedding_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
|FastGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(float), tensor(float16)|
|FusedConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *in* Z:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2023-08-01 23:39:09 +00:00
|GatedRelativePositionBias|*in* query_layer:**T**< br > *in* query_bias:**T**< br > *in* rel_pos:**T**< br > *in* weight:**T**< br > *in* bias:**T**< br > *in* eco_a:**T**< br > *in* token_offset:**M**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Gelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-10-27 12:33:55 +00:00
|GemmFloat8|*in* A:**TA**< br > *in* B:**TB**< br > *in* C:**TC**< br > *in* scaleA:**TS**< br > *in* scaleB:**TS**< br > *in* scaleY:**TS**< br > *out* Y:**TR**|1+|**TA** = tensor(bfloat16), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2)< br /> **TB** = tensor(bfloat16), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2)< br /> **TR** = tensor(bfloat16), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2)< br /> **TS** = tensor(float)|
2024-04-16 22:31:56 +00:00
|GemmaRotaryEmbedding|*in* emb:**U**< br > *in* q:**T**< br > *in* q_rot:**T**< br > *in* k:**T**< br > *in* k_rot:**T**< br > *out* output1:**T**< br > *out* output2:**T**|1+|**T** = tensor(float16)< br /> **U** = tensor(float)|
2022-10-27 21:20:48 +00:00
|GreedySearch|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *out* sequences:**I**|1+|**T** = tensor(float), tensor(float16)|
|GridSample|*in* X:**T1**< br > *in* Grid:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2023-02-03 07:43:51 +00:00
|GroupNorm|*in* X:**T**< br > *in* gamma:**M**< br > *in* beta:**M**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2024-01-24 00:34:26 +00:00
|GroupQueryAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* seqlens_k:**M**< br > *in* total_sequence_length:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(bfloat16), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Inverse|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|Irfft|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|LongformerAttention|*in* input:**T**< br > *in* weight:**T**< br > *in* bias:**T**< br > *in* mask:**T**< br > *in* global_weight:**T**< br > *in* global_bias:**T**< br > *in* global:**G**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-11-20 17:52:58 +00:00
|MatMulBnb4|*in* A:**T1**< br > *in* B:**T2**< br > *in* absmax:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(bfloat16), tensor(float), tensor(float16)< br /> **T2** = tensor(uint8)|
2024-05-16 18:00:59 +00:00
|MatMulNBits|*in* A:**T1**< br > *in* B:**T2**< br > *in* scales:**T1**< br > *in* zero_points:**T3**< br > *in* g_idx:**T4**< br > *in* bias:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(uint8)|
2024-03-20 04:28:15 +00:00
|MoE|*in* input:**T**< br > *in* router_probs:**T**< br > *in* fc1_experts_weights:**T**< br > *in* fc1_experts_bias:**T**< br > *in* fc2_experts_weights:**T**< br > *in* fc2_experts_bias:**T**< br > *in* fc3_experts_weights:**T**< br > *in* fc3_experts_bias:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-13 21:29:16 +00:00
|MultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|NGramRepeatBlock|*in* input_ids:**Tid**< br > *in* scores:**T**< br > *out* scores_out:**T**|1+|**T** = tensor(float)< br /> **Tid** = tensor(int64)|
2023-02-03 07:43:51 +00:00
|NhwcConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-21 19:59:29 +00:00
|PackedAttention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* token_offset:**M**< br > *in* cumulative_sequence_length:**M**< br > *in* relative_position_bias:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-08-01 22:30:41 +00:00
|PackedMultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* token_offset:**M**< br > *in* cumulative_sequence_length:**M**< br > *in* relative_position_bias:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|QAttention|*in* input:**T1**< br > *in* weight:**T2**< br > *in* bias:**T3**< br > *in* input_scale:**T3**< br > *in* weight_scale:**T3**< br > *in* mask_index:**T4**< br > *in* input_zero_point:**T1**< br > *in* weight_zero_point:**T2**< br > *in* past:**T3**< br > *out* output:**T3**< br > *out* present:**T3**|1+|**T1** = tensor(int8)< br /> **T2** = tensor(int8)< br /> **T3** = tensor(float), tensor(float16)< br /> **T4** = tensor(int32)|
2024-03-29 17:24:19 +00:00
|QMoE|*in* input:**T**< br > *in* router_probs:**T**< br > *in* fc1_experts_weights:**T1**< br > *in* fc1_scales:**T**< br > *in* fc1_experts_bias:**T**< br > *in* fc2_experts_weights:**T1**< br > *in* fc2_scales:**T**< br > *in* fc2_experts_bias:**T**< br > *in* fc3_experts_weights:**T1**< br > *in* fc3_scales:**T**< br > *in* fc3_experts_bias:**T**< br > *out* output:**T**|1+|**T** = tensor(float16)< br /> **T1** = tensor(uint8)|
2023-02-07 19:51:06 +00:00
|QOrderedAttention|*in* input:**Q**< br > *in* scale_input:**S**< br > *in* scale_Q_gemm:**S**< br > *in* scale_K_gemm:**S**< br > *in* scale_V_gemm:**S**< br > *in* Q_weight:**Q**< br > *in* K_weight:**Q**< br > *in* V_weight:**Q**< br > *in* scale_Q_weight:**S**< br > *in* scale_K_weight:**S**< br > *in* scale_V_weight:**S**< br > *in* Q_bias:**S**< br > *in* K_bias:**S**< br > *in* V_bias:**S**< br > *in* scale_QKT_gemm:**S**< br > *in* scale_QKT_softmax:**S**< br > *in* scale_values_gemm:**S**< br > *in* mask_index:**G**< br > *in* past:**Q**< br > *in* relative_position_bias:**S**< br > *out* output:**Q**|1+|**G** = tensor(int32)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
2022-10-27 21:20:48 +00:00
|QOrderedGelu|*in* X:**Q**< br > *in* scale_X:**S**< br > *in* scale_Y:**S**< br > *out* Y:**Q**|1+|**Q** = tensor(int8)< br /> **S** = tensor(float)|
|QOrderedLayerNormalization|*in* X:**Q**< br > *in* scale_X:**S**< br > *in* scale:**F**< br > *in* B:**F**< br > *in* scale_Y:**S**< br > *out* Y:**Q**|1+|**F** = tensor(float), tensor(float16)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
|QOrderedLongformerAttention|*in* input:**Q**< br > *in* scale_input:**S**< br > *in* weight:**Q**< br > *in* scale_weight:**S**< br > *in* bias:**S**< br > *in* scale_bias:**S**< br > *in* scale_qkv_gemm:**S**< br > *in* mask:**F**< br > *in* global_weight:**Q**< br > *in* scale_global_weight:**S**< br > *in* global_bias:**S**< br > *in* scale_global_gemm:**S**< br > *in* global:**G**< br > *in* scale_output:**S**< br > *out* output:**Q**|1+|**F** = tensor(float16)< br /> **G** = tensor(int32)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
|QOrderedMatMul|*in* A:**Q**< br > *in* scale_A:**S**< br > *in* B:**Q**< br > *in* scale_B:**S**< br > *in* scale_Y:**S**< br > *in* bias:**S**< br > *in* C:**Q**< br > *in* scale_C:**S**< br > *out* Y:**Q**|1+|**Q** = tensor(int8)< br /> **S** = tensor(float)|
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|1+|**T1** = tensor(float16)< br /> **T2** = tensor(int8), tensor(uint8)|
|QuantizeWithOrder|*in* input:**F**< br > *in* scale_input:**S**< br > *out* output:**Q**|1+|**F** = tensor(float), tensor(float16)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
QuickGelu Fusion (#12417)
Some models have QuickGelu(x)=x*sigmoid(1.702x), which has 3 Ops for
forward and 5 Ops for backward. The PR is to fuse this to a single Op
named QuickGelu and its gradient QuickGeluGrad.
For CUDA, tested in V100 using input tensor with shape [64,128,2048] and
float16 type:
Before, FW takes 335us, BW takes 614us

After, FW takes 115us, BW takes 139us, which is much faster.

For CPU kernel, using same shape and float type:
Before, FW takes 10us, BW takes 49us
Mul: 3480[µs]
Sigmoid: 1996[µs]
Mul: 4789[µs]
Mul: 4642[µs]
Mul: 4195[µs]
SigmoidGrad: 18328[µs]
Mul: 2988[µs]
Sum: 18576[µs]
After, FW takes 4us, BW takes 5us, which is also much faster.
QuickGelu: 3939[µs]
QuickGeluGrad: 5089[µs]
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-10-28 10:12:07 +00:00
|QuickGelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-07 01:32:58 +00:00
|RelativePositionBias|*in* bias_table:**T**< br > *in* query_length:**U**< br > *in* key_length:**U**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-11-22 18:00:23 +00:00
|RemovePadding|*in* input:**T**< br > *in* sequence_token_count:**M**< br > *out* output:**T**< br > *out* token_offset:**M**< br > *out* cumulated_seq_len:**M**< br > *out* max_seq_len:**M**|1+|**T** = tensor(float), tensor(float16)|
|RestorePadding|*in* input:**T**< br > *in* token_offset:**M**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Rfft|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2024-01-22 18:17:11 +00:00
|RotaryEmbedding|*in* input:**T**< br > *in* position_ids:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**|1+|**M** = tensor(int64)< br /> **T** = tensor(bfloat16), tensor(float), tensor(float16)|
2023-01-12 22:15:26 +00:00
|Sampling|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *in* presence_mask:**I**< br > *in* seed:**I**< br > *out* sequences:**I**< br > *out* filtered_logits:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-10-31 17:27:20 +00:00
|SkipGroupNorm|*in* X:**T**< br > *in* gamma:**M**< br > *in* beta:**M**< br > *in* skip:**T**< br > *in* bias:**T**< br > *out* Y:**T**< br > *out* S:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-01-06 15:27:10 +00:00
|SkipLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
|SkipSimplifiedLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2024-05-10 05:15:21 +00:00
|SparseAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* block_row_indices:**M**< br > *in* block_col_indices:**M**< br > *in* total_sequence_length:**M**< br > *in* key_total_sequence_lengths:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(bfloat16), tensor(float16)|
2022-10-27 21:20:48 +00:00
|TransposeMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|Trilu|*in* X:**T**< br > *in* k:**tensor(int64)**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-10-13 18:47:15 +00:00
|UnfoldTensor|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-01-23 21:44:34 +00:00
|WhisperBeamSearch|*in* input_ids:**F**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* num_beams:**I**< br > *in* num_return_sequences:**I**< br > *in* length_penalty:**T**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**M**< br > *in* prefix_vocab_mask:**M**< br > *in* attention_mask:**I**< br > *in* decoder_input_ids:**I**< br > *in* logits_processor:**I**< br > *in* cross_qk_layer_head:**I**< br > *in* extra_decoding_ids:**I**< br > *in* temperature:**T**< br > *out* sequences:**I**< br > *out* sequences_scores:**T**< br > *out* scores:**T**< br > *out* cross_qk:**V**< br > *out* non_speech_probs:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
| |
| |
2022-09-09 17:21:25 +00:00
< a name = "dmlexecutionprovider" / >
## Operators implemented by DmlExecutionProvider
| Op Name | Parameters | OpSet Version | Types Supported |
|---------|------------|---------------|-----------------|
|**Operator Domain:** *ai.onnx* ||||
2022-10-28 03:11:49 +00:00
|Abs|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
|||6+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2022-09-09 17:21:25 +00:00
|Acos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Acosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|Add|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Affine|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|And|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)|
|ArgMax|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|ArgMin|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Asin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Asinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|Atan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Atanh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
2024-01-04 19:27:03 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|19+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||10+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|BatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* input_mean:**U**< br > *in* input_var:**U**< br > *out* Y:**T**< br > *out* running_mean:**U**< br > *out* running_var:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T1**< br > *in* B:**T1**< br > *in* input_mean:**T2**< br > *in* input_var:**T2**< br > *out* Y:**T**< br > *out* running_mean:**T2**< br > *out* running_var:**T2**|15+|**T** = tensor(float), tensor(float16)|
|||14+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|BitShift|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**|11+|**T** = tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-17 20:27:49 +00:00
|BitwiseAnd|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseNot|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseOr|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseXor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-01-22 23:37:09 +00:00
|Cast|*in* input:**T1**< br > *out* output:**T2**|19+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||9+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||6+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-01-22 23:37:09 +00:00
|CastLike|*in* input:**T1**< br > *in* target_type:**T2**< br > *out* output:**T2**|19+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||15+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Ceil|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Celu|*in* X:**T**< br > *out* Y:**T**|12+|**T** = tensor(float), tensor(float16)|
|Clip|*in* input:**T**< br > *in* min:**T**< br > *in* max:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2024-07-22 23:59:03 +00:00
|Col2Im|*in* input:**T**< br > *in* image_shape:**tensor(int64)**< br > *in* block_shape:**tensor(int64)**< br > *out* output:**T**|18+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Concat|*in* inputs:**T**< br > *out* concat_result:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||4+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|ConcatFromSequence|*in* input_sequence:**S**< br > *out* concat_result:**T**|11+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|ConstantOfShape|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(int64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|ConvInteger|*in* x:**T1**< br > *in* w:**T2**< br > *in* x_zero_point:**T1**< br > *in* w_zero_point:**T2**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int32)|
|ConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|Cos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Cosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|Crop|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|CumSum|*in* x:**T**< br > *in* axis:**T2**< br > *out* y:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2024-05-02 18:08:39 +00:00
|DFT|*in* input:**T1**< br > *in* dft_length:**T2**< br > *in* axis:**tensor(int64)**< br > *out* output:**T1**< br >< br > or< br >< br > *in* input:**T1**< br > *in* dft_length:**T2**< br > *out* output:**T1**|20+|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
|||17+|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|DepthToSpace|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-01-22 23:37:09 +00:00
|DequantizeLinear|*in* x:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *out* y:**tensor(float)**< br >< br > or< br >< br > *in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|19+|**T1** = tensor(int32), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
|||13+|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||10+|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2023-06-08 20:49:39 +00:00
|Div|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Dropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T1**|7+|**T** = tensor(float), tensor(float16)|
2024-01-04 19:27:03 +00:00
|DynamicQuantizeLinear|*in* x:**T1**< br > *out* y:**T2**< br > *out* y_scale:**tensor(float)**< br > *out* y_zero_point:**T2**|11+|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Einsum|*in* Inputs:**T**< br > *out* Output:**T**|12+|**T** = tensor(float), tensor(float16)|
|Elu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float), tensor(float16)|
2024-01-22 23:37:09 +00:00
|Equal|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|19+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||7+|**T** = tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
|Erf|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|Exp|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Expand|*in* input:**T**< br > *in* shape:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||8+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|EyeLike|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Flatten|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Floor|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|GRU|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|Gather|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|GatherElements|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|GatherND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Gemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|GlobalLpPool|*in* X:**T**< br > *out* Y:**T**|2+|**T** = tensor(float), tensor(float16)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|Greater|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||7+|**T** = tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
2022-12-21 17:05:12 +00:00
|GreaterOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2023-05-05 22:59:33 +00:00
|GridSample|*in* X:**T1**< br > *in* grid:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|HardSigmoid|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float), tensor(float16)|
|Hardmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2024-01-22 23:37:09 +00:00
|Identity|*in* input:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**V**< br > *out* output:**V**|19+|**V** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||16+|**V** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|||14+|**V** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-08-01 02:45:59 +00:00
|If|*in* cond:**B**< br > *out* outputs:**V**|19+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||16+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|ImageScaler|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|InstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|6+|**T** = tensor(float), tensor(float16)|
2024-04-22 19:01:59 +00:00
|IsInf|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(float)< br /> **T2** = tensor(bool)|
|||10+|**T1** = tensor(float)< br /> **T2** = tensor(bool)|
|IsNaN|*in* X:**T1**< br > *out* Y:**T2**|20+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|||13+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2022-09-09 17:21:25 +00:00
|||9+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|LRN|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|LSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|14+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|LayerNormalization|*in* X:**T**< br > *in* Scale:**T**< br > *in* B:**T**< br > *out* Y:**T**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* Scale:**V**< br > *in* B:**V**< br > *out* Y:**V**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**|17+|**T** = tensor(float), tensor(float16)< br /> **U** = tensor(float)|
2024-04-19 05:17:31 +00:00
|||1+|**T** = tensor(float), tensor(float16)< br /> **U** = tensor(float), tensor(float16)< br /> **V** = tensor(float), tensor(float16)|
2022-12-21 17:05:12 +00:00
|LeakyRelu|*in* X:**T**< br > *out* Y:**T**|16+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|Less|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||7+|**T** = tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
2022-12-21 17:05:12 +00:00
|LessOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2022-09-09 17:21:25 +00:00
|Log|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|LogSoftmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|LpNormalization|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2024-01-04 19:27:03 +00:00
|LpPool|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||2+|**T** = tensor(float), tensor(float16)|
|MatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|MatMulInteger|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *out* Y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int32)|
|Max|*in* data_0:**T**< br > *out* max:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|MaxPool|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**< br > *out* Indices:**I**|12+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||11+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||10+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||8+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||1+|**T** = tensor(float), tensor(float16)|
|MaxRoiPool|*in* X:**T**< br > *in* rois:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|MaxUnpool|*in* X:**T1**< br > *in* I:**T2**< br > *in* output_shape:**T2**< br > *out* output:**T1**|11+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int64)|
|||9+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int64)|
|Mean|*in* data_0:**T**< br > *out* mean:**T**|13+|**T** = tensor(float), tensor(float16)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|MeanVarianceNormalization|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|MemcpyFromHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|MemcpyToHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Min|*in* data_0:**T**< br > *out* min:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Mod|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||10+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|Mul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Neg|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
|||6+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2023-06-08 20:49:39 +00:00
|NonZero|*in* X:**T**< br > *out* Y:**tensor(int64)**|13+|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||9+|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Not|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(bool)|
|OneHot|*in* indices:**T1**< br > *in* depth:**T2**< br > *in* values:**T3**< br > *out* output:**T3**|11+|**T1** = tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T3** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T1** = tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T3** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-15 16:53:35 +00:00
|OptionalGetElement|*in* input:**O**< br > *out* output:**V**|18+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||15+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|OptionalHasElement|*in* input:**O**< br > *out* output:**B**|18+|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||15+|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))|
2022-09-09 17:21:25 +00:00
|Or|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)|
2022-12-21 17:05:12 +00:00
|PRelu|*in* X:**T**< br > *in* slope:**T**< br > *out* Y:**T**|16+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8)|
2022-09-09 17:21:25 +00:00
|||7+|**T** = tensor(float), tensor(float16)|
2024-07-22 23:59:03 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *in* axes:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|19+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||18+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-24 01:25:36 +00:00
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||2+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|ParametricSoftplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|Pow|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* Y:**T1**< br > *out* Z:**T**|15+|**T** = tensor(float), tensor(float16), tensor(int32)< br /> **T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32)< br /> **T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int32)< br /> **T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16)|
|QLinearConv|*in* x:**T1**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T1**< br > *in* w:**T2**< br > *in* w_scale:**tensor(float)**< br > *in* w_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *in* B:**T4**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)< br /> **T4** = tensor(int32)|
Integration with ONNX 1.16.0 (#19745)
### Description
update with ONNX 1.16.0 branch according to
https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md
ONNX 1.16.0 release notes:
https://github.com/onnx/onnx/releases/tag/v1.16.0
#### Updated ops for CPU EP:
- DequantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block dequantization support
- QuantizeLinear(21)
- Added int16 and uint16 support + various optimizer tests
- Missing int4 and uint4 support
- Missing block quantization support
- Cast(21)
- Missing int4 and uint4 support
- CastLike(21)
- Missing int4 and uint4 support
- ConstantOfShape(21)
- Missing int4 and uint4 support
- Identity(21)
- Missing int4 and uint4 support
- If(21)
- Missing int4 and uint4 support
- Loop(21)
- Missing int4 and uint4 support
- Reshape(21)
- Missing int4 and uint4 support
- Scan(21)
- Missing int4 and uint4 support
- Shape(21)
- Missing int4 and uint4 support
- Size(21)
- Missing int4 and uint4 support
- Flatten(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Pad(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Squeeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Transpose(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
- Unsqueeze(21)
- Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4
support
#### Unimplemented opset 21 features/ops
- int4 and uint4 data type
- QLinearMatMul(21)
- GroupNormalization(21)
- ai.onnx.ml.TreeEnsemble(5)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Disabled tests
#### ORT Training
orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py
- test_ort_custom_ops: Potential shape inference bug for custom ops
#### Python quantization unit tests
test/onnx/python/quantization (shape inference bug)
- test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16
- test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16
- test_op_gemm.py: test_quantize_qop_gemm_s8s8
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same
- test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3
- test_op_matmul.py: test_quantize_matmul_u8u8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile
- test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution
- test_op_relu.py: test_quantize_qop_relu_s8s8
#### ONNX tests
- test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a
maxpool output size bug and added this test. Enable this test when [ORT
PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged.
Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741).
- test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op
ai.onnx.ml.TreeEnsemble
- test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same
- test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same
- test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same
- test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4
yet
- test_cast_INT4_to_INT8_cpu: same
- test_cast_UINT4_to_FLOAT_cpu: same
- test_cast_UINT4_to_UINT8_cpu: same
- test_cast_INT4_to_FLOAT_cuda
- test_cast_INT4_to_INT8_cuda
- test_cast_UINT4_to_FLOAT_cuda
- test_cast_UINT4_to_UINT8_cuda
- test_constantofshape_float_ones_cuda: ConstantOfShape(21) not
implemented for cuda
- test_constantofshape_int_shape_zero_cuda: same
- test_constantofshape_int_zeros_cuda: same
- test_flatten_axis0_cuda: Flatten(21) not implemented for cuda
- test_flatten_axis1_cuda: same
- test_flatten_axis2_cuda: same
- test_flatten_axis3_cuda: same
- test_flatten_default_axis_cuda: same
- test_flatten_negative_axis1_cuda: same
- test_flatten_negative_axis2_cuda: same
- test_flatten_negative_axis3_cuda: same
- test_flatten_negative_axis4_cuda: same
- test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not
implemented in ORT yet
- test_qlinearmatmul_2D_int8_float32_cpu: same
- test_qlinearmatmul_2D_uint8_float16_cpu: same
- test_qlinearmatmul_2D_uint8_float32_cpu: same
- test_qlinearmatmul_3D_int8_float16_cpu: same
- test_qlinearmatmul_3D_int8_float32_cpu: same
- test_qlinearmatmul_3D_uint8_float16_cpu: same
- test_qlinearmatmul_3D_uint8_float32_cpu: same
- test_qlinearmatmul_2D_int8_float16_cuda: same
- test_qlinearmatmul_2D_int8_float32_cuda: same
- test_qlinearmatmul_2D_uint8_float16_cuda: same
- test_qlinearmatmul_2D_uint8_float32_cuda: same
- test_qlinearmatmul_3D_int8_float16_cuda: same
- test_qlinearmatmul_3D_int8_float32_cuda: same
- test_qlinearmatmul_3D_uint8_float16_cuda: same
- test_qlinearmatmul_3D_uint8_float32_cuda: same
- test_size_cuda: Size(21) not implemented for cuda
- test_size_example_cuda: same
- test_dequantizelinear_blocked: Missing implementation for block
dequant for DequantizeLinear(21)
- test_quantizelinear_blocked_asymmetric: Missing implementation for
block quant for QuantizeLinear(21)
- test_quantizelinear_blocked_symmetric: Missing implementation for
block quant for QuantizeLinear(21)
---------
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>
2024-04-12 16:46:49 +00:00
|QLinearMatMul|*in* a:**T1**< br > *in* a_scale:**TS**< br > *in* a_zero_point:**T1**< br > *in* b:**T2**< br > *in* b_scale:**TS**< br > *in* b_zero_point:**T2**< br > *in* y_scale:**TS**< br > *in* y_zero_point:**T3**< br > *out* y:**T3**< br >< br > or< br >< br > *in* a:**T1**< br > *in* a_scale:**tensor(float)**< br > *in* a_zero_point:**T1**< br > *in* b:**T2**< br > *in* b_scale:**tensor(float)**< br > *in* b_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)|
2024-01-22 23:37:09 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**< br >< br > or< br >< br > *in* x:**T1**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|19+|**T1** = tensor(float), tensor(float16), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
|||13+|**T1** = tensor(float), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||10+|**T1** = tensor(float), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
|RNN|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* output:**T**|11+|**T** = tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|Reciprocal|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceL1|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-04-28 03:32:11 +00:00
|ReduceL2|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceLogSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceLogSumExp|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2024-04-22 19:01:59 +00:00
|ReduceMax|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|20+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-04-28 03:32:11 +00:00
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceMean|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2024-07-26 00:06:30 +00:00
|ReduceMin|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|20+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-04-28 03:32:11 +00:00
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceProd|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|ReduceSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-04-28 03:32:11 +00:00
|ReduceSumSquare|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|Relu|*in* X:**T**< br > *out* Y:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8)|
|||13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2024-01-22 23:37:09 +00:00
|Reshape|*in* data:**T**< br > *in* shape:**tensor(int64)**< br > *out* reshaped:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reshaped:**T**|19+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||14+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||5+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-07-22 23:59:03 +00:00
|Resize|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T1**< br > *in* roi:**T2**< br > *in* scales:**tensor(float)**< br > *in* sizes:**tensor(int64)**< br > *out* Y:**T1**|19+|**T1** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
|||18+|**T1** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
|||13+|**T1** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
2024-01-04 19:27:03 +00:00
|||11+|**T1** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||10+|**T** = tensor(float), tensor(float16)|
|ReverseSequence|*in* input:**T**< br > *in* sequence_lens:**tensor(int64)**< br > *out* Y:**T**|10+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-10 04:56:41 +00:00
|RoiAlign|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
|||10+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|Round|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
2023-02-24 05:12:22 +00:00
|STFT|*in* signal:**T1**< br > *in* frame_step:**T2**< br > *in* window:**T1**< br > *in* frame_length:**T2**< br > *out* output:**T1**|17+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|ScaledTanh|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|Scatter|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||9+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-02-06 18:01:02 +00:00
|ScatterElements|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|16+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-02-01 17:46:37 +00:00
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-01-12 18:39:25 +00:00
|ScatterND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *in* updates:**T**< br > *out* output:**T**|16+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Selu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float), tensor(float16)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|SequenceAt|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* tensor:**T**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceConstruct|*in* inputs:**T**< br > *out* output_sequence:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceEmpty|*out* output:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceErase|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceInsert|*in* input_sequence:**S**< br > *in* tensor:**T**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceLength|*in* input_sequence:**S**< br > *out* length:**I**|11+|**I** = tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
2024-01-22 23:37:09 +00:00
|Shape|*in* data:**T**< br > *out* shape:**T1**|19+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||15+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|||13+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2022-09-09 17:21:25 +00:00
|Shrink|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|Sigmoid|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2022-10-28 03:11:49 +00:00
|Sign|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2024-04-19 05:17:31 +00:00
|SimplifiedLayerNormalization|*in* X:**T**< br > *in* scale:**V**< br > *out* Y:**V**< br > *out* inv_std_var:**U**|1+|**T** = tensor(float), tensor(float16)< br /> **U** = tensor(float), tensor(float16)< br /> **V** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|Sin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Sinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
2024-01-22 23:37:09 +00:00
|Size|*in* data:**T**< br > *out* size:**T1**|19+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||13+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|||1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2022-09-09 17:21:25 +00:00
|Slice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *in* steps:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||10+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Softmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|Softplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|Softsign|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|SpaceToDepth|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-16 18:58:19 +00:00
|Split|*in* input:**T**< br > *in* split:**T**< br > *out* outputs...:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* split:**tensor(int64)**< br > *out* outputs:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* outputs:**T**|18+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||2+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sqrt|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Squeeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* squeezed:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* squeezed:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sub|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sum|*in* data_0:**T**< br > *out* sum:**T**|13+|**T** = tensor(float), tensor(float16)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Tan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Tanh|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|ThresholdedRelu|*in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|Tile|*in* input:**T**< br > *in* repeats:**T1**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* tiles:**T**< br > *in* axis:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||6+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|TopK|*in* X:**T**< br > *in* K:**tensor(int64)**< br > *out* Values:**T**< br > *out* Indices:**I**< br >< br > or< br >< br > *in* X:**T**< br > *out* Values:**T**< br > *out* Indices:**I**|11+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||10+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Transpose|*in* data:**T**< br > *out* transposed:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Trilu|*in* input:**T**< br > *in* k:**tensor(int64)**< br > *out* output:**T**|14+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Unsqueeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* expanded:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* expanded:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Upsample|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
2022-12-21 17:05:12 +00:00
|Where|*in* condition:**B**< br > *in* X:**T**< br > *in* Y:**T**< br > *out* output:**T**|16+|**B** = tensor(bool)< br /> **T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**B** = tensor(bool)< br /> **T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Xor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)|
| |
| |
2021-05-08 03:17:29 +00:00
|**Operator Domain:** *com.microsoft* ||||
2023-02-07 19:51:06 +00:00
|Attention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float), tensor(float16)|
2023-04-10 21:46:33 +00:00
|BiasAdd|*in* X:**T**< br > *in* bias:**T**< br > *in* skip:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-12-01 17:23:19 +00:00
|BiasGelu|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-04-11 15:30:37 +00:00
|BiasSplitGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|ConvTransposeWithDynamicPads|*in* X:**T**< br > *in* W:**T**< br > *in* Pads:**tensor(int64)**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-06-16 01:21:56 +00:00
|DequantizeLinear|*in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|1+|**T1** = tensor(int32), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
2024-03-08 23:35:10 +00:00
|DynamicQuantizeMatMul|*in* A:**T1**< br > *in* B:**T2**< br > *in* b_scale:**T1**< br > *in* b_zero_point:**T2**< br > *in* bias:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2022-12-13 21:23:53 +00:00
|EmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding:**T**< br > *in* position_embedding:**T**< br > *in* segment_embedding:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* mask:**T1**< br > *in* position_ids:**T1**< br > *out* output:**T**< br > *out* mask_index:**T1**< br > *out* embedding_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2024-04-11 21:40:28 +00:00
|FastGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|FusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-05-19 02:37:12 +00:00
|FusedMatMulActivation|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Gelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-27 19:52:53 +00:00
|GroupNorm|*in* X:**T**< br > *in* gamma:**M**< br > *in* beta:**M**< br > *out* Y:**T**|1+|**M** = tensor(float), tensor(float16)< br /> **T** = tensor(float), tensor(float16)|
2024-04-19 17:25:29 +00:00
|GroupQueryAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* seqlens_k:**M**< br > *in* total_sequence_length:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float), tensor(float16)|
2024-03-04 19:55:35 +00:00
|MatMulIntegerToFloat|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_scale:**T3**< br > *in* b_scale:**T3**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *in* bias:**T3**< br > *out* Y:**T3**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(float), tensor(float16)|
2024-05-16 18:00:59 +00:00
|MatMulNBits|*in* A:**T1**< br > *in* B:**T2**< br > *in* scales:**T1**< br > *in* zero_points:**T3**< br > *in* g_idx:**T4**< br > *in* bias:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(uint8)|
2023-05-19 22:07:14 +00:00
|MultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float), tensor(float16)|
2023-04-11 06:16:09 +00:00
|NhwcConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2024-03-11 17:44:34 +00:00
|QAttention|*in* input:**T1**< br > *in* weight:**T2**< br > *in* bias:**T3**< br > *in* input_scale:**T3**< br > *in* weight_scale:**T3**< br > *in* mask_index:**T4**< br > *in* input_zero_point:**T1**< br > *in* weight_zero_point:**T2**< br > *in* past:**T3**< br > *out* output:**T3**< br > *out* present:**T3**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(float), tensor(float16)< br /> **T4** = tensor(int32)|
2022-10-27 21:20:48 +00:00
|QLinearAdd|*in* A:**T**< br > *in* A_scale:**tensor(float)**< br > *in* A_zero_point:**T**< br > *in* B:**T**< br > *in* B_scale:**tensor(float)**< br > *in* B_zero_point:**T**< br > *in* C_scale:**tensor(float)**< br > *in* C_zero_point:**T**< br > *out* C:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2024-01-04 19:27:03 +00:00
|QLinearAveragePool|*in* X:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
|QLinearConcat|*in* Y_scale:**TF**< br > *in* Y_zero_point:**T8**< br > *in* inputs:**TV**< br > *out* Y:**T8**|1+|**T8** = tensor(int8), tensor(uint8)< br /> **TF** = tensor(float)< br /> **TV** = tensor(float), tensor(int8), tensor(uint8)|
|QLinearGlobalAveragePool|*in* X:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2022-10-27 21:20:48 +00:00
|QLinearSigmoid|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* X_zero_point:**T**< br > *in* Y_scale:**tensor(float)**< br > *in* Y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2023-06-16 01:21:56 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|1+|**T1** = tensor(float), tensor(float16), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
2023-04-05 17:49:34 +00:00
|QuickGelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-11-07 16:26:11 +00:00
|RotaryEmbedding|*in* input:**T**< br > *in* position_ids:**M**< br > *in* cos_cache:**T**< br > *in* sin_cache:**T**< br > *out* output:**T**|1+|**M** = tensor(int64)< br /> **T** = tensor(float), tensor(float16)|
2023-01-06 15:27:10 +00:00
|SkipLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2024-04-19 05:17:31 +00:00
|SkipSimplifiedLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
| |
| |
|**Operator Domain:** *com.microsoft.dml* ||||
|DmlFusedAdd|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedBatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedGemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedInstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedMeanVarianceNormalization|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedSum|*in* data_0:**T**< br > *out* sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2021-05-14 05:05:30 +00:00
| |
| |