2021-06-02 07:47:40 +00:00
## Supported Operators and Data Types
2022-08-22 17:48:12 +00:00
*This file is automatically generated from the registered kernels by [this script ](https://github.com/microsoft/onnxruntime/blob/main/tools/python/gen_opkernel_doc.py ).
2021-06-02 07:47:40 +00:00
Do not modify directly.*
2019-08-15 01:12:24 +00:00
2021-06-02 07:47:40 +00:00
## Execution Providers
2019-08-15 01:12:24 +00:00
2021-06-02 07:47:40 +00:00
- [CPUExecutionProvider ](#cpuexecutionprovider )
- [CUDAExecutionProvider ](#cudaexecutionprovider )
2022-09-09 17:21:25 +00:00
- [DmlExecutionProvider ](#dmlexecutionprovider )
2021-06-02 07:47:40 +00:00
---------------
< a name = "cpuexecutionprovider" / >
2019-08-15 01:12:24 +00:00
## Operators implemented by CPUExecutionProvider
| Op Name | Parameters | OpSet Version | Types Supported |
|---------|------------|---------------|-----------------|
2021-06-02 07:47:40 +00:00
|**Operator Domain:** *ai.onnx* ||||
|Abs|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Acos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Acosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
|Add|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Affine|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|And|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-01-12 22:12:56 +00:00
|ArgMax|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2022-01-18 22:37:34 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ArgMin|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(double), tensor(float), tensor(int32)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float), tensor(int32)|
2021-06-02 07:47:40 +00:00
|Asin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Asinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
|Atan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Atanh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
2023-05-15 17:46:24 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|19+|**T** = tensor(float)|
|||[11, 18]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||10|**T** = tensor(float)|
|||[7, 9]|**T** = tensor(float)|
2021-08-25 19:04:20 +00:00
|BatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* input_mean:**U**< br > *in* input_var:**U**< br > *out* Y:**T**< br > *out* running_mean:**U**< br > *out* running_var:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T1**< br > *in* B:**T1**< br > *in* input_mean:**T2**< br > *in* input_var:**T2**< br > *out* Y:**T**< br > *out* running_mean:**T2**< br > *out* running_var:**T2**|15+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(double), tensor(float)< br /> **T2** = tensor(double), tensor(float)|
|||14|**T** = tensor(double), tensor(float)< br /> **U** = tensor(double), tensor(float)|
2021-05-08 03:17:29 +00:00
|||[9, 13]|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|BitShift|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**|11+|**T** = tensor(uint32), tensor(uint64), tensor(uint8)|
2023-01-24 00:42:18 +00:00
|BitwiseAnd|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseNot|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseOr|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseXor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-06-27 00:26:55 +00:00
|BlackmanWindow|*in* size:**T1**< br > *out* output:**T2**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Cast|*in* input:**T1**< br > *out* output:**T2**|19+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[6, 12]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-12-14 22:57:14 +00:00
|Ceil|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Celu|*in* X:**T**< br > *out* Y:**T**|12+|**T** = tensor(float)|
2023-04-04 20:44:50 +00:00
|Clip|*in* input:**T**< br > *in* min:**T**< br > *in* max:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||11|**T** = tensor(float)|
|||[6, 10]|**T** = tensor(float)|
2023-01-25 20:23:00 +00:00
|Col2Im|*in* input:**T**< br > *in* image_shape:**tensor(int64)**< br > *in* block_shape:**tensor(int64)**< br > *out* output:**T**|18+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Compress|*in* input:**T**< br > *in* condition:**T1**< br > *out* output:**T**|11+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Concat|*in* inputs:**T**< br > *out* concat_result:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[4, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ConcatFromSequence|*in* input_sequence:**S**< br > *out* concat_result:**T**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
2023-09-26 21:44:48 +00:00
|ConstantOfShape|*in* input:**T1**< br > *out* output:**T2**|20+|**T1** = tensor(int64)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[9, 19]|**T1** = tensor(int64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|ConvInteger|*in* x:**T1**< br > *in* w:**T2**< br > *in* x_zero_point:**T1**< br > *in* w_zero_point:**T2**< br > *out* y:**T3**|10+|**T1** = tensor(uint8)< br /> **T2** = tensor(uint8)< br /> **T3** = tensor(int32)|
|ConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Cos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Cosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
|Crop|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|CumSum|*in* x:**T**< br > *in* axis:**T2**< br > *out* y:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 13]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int32), tensor(int64)|
2023-09-26 21:44:48 +00:00
|DFT|*in* input:**T1**< br > *in* dft_length:**T2**< br > *in* axis:**tensor(int64)**< br > *out* output:**T1**< br >< br > or< br >< br > *in* input:**T1**< br > *in* dft_length:**T2**< br > *out* output:**T1**|17+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
2021-07-09 08:00:22 +00:00
|DepthToSpace|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
|||[11, 12]|**T** = tensor(double), tensor(float)|
|||[1, 10]|**T** = tensor(double), tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|DequantizeLinear|*in* x:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *out* y:**tensor(float)**< br >< br > or< br >< br > *in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|19+|**T1** = tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int32), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
|||[13, 18]|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[10, 12]|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Det|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(float)|
|Div|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Dropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T1**|13+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
|||[10, 11]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
2022-02-18 06:55:32 +00:00
|||[7, 9]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|DynamicQuantizeLinear|*in* x:**T1**< br > *out* y:**T2**< br > *out* y_scale:**tensor(float)**< br > *out* y_zero_point:**T2**|11+|**T2** = tensor(uint8)|
|DynamicSlice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|Einsum|*in* Inputs:**T**< br > *out* Output:**T**|12+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|Elu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Equal|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|19+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string)< br /> **T1** = tensor(bool)|
|||[13, 18]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[7, 10]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Erf|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Exp|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Expand|*in* input:**T**< br > *in* shape:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[8, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|EyeLike|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)< br /> **T2** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)|
|Flatten|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 8]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-12-14 22:57:14 +00:00
|Floor|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|GRU|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
|Gather|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|GatherElements|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-02-18 06:55:32 +00:00
|GatherND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **indices** = tensor(int64)|
|||12|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **indices** = tensor(int64)|
|||11|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **indices** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Gemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float)|
|||[9, 10]|**T** = tensor(double), tensor(float)|
|||[7, 8]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GlobalLpPool|*in* X:**T**< br > *out* Y:**T**|2+|**T** = tensor(float)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Greater|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[7, 8]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|GreaterOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2023-01-09 18:26:16 +00:00
|GridSample|*in* X:**T1**< br > *in* grid:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2022-06-27 00:26:55 +00:00
|HammingWindow|*in* size:**T1**< br > *out* output:**T2**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|HannWindow|*in* size:**T1**< br > *out* output:**T2**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|HardSigmoid|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float)|
|Hardmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Identity|*in* input:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**V**< br > *out* output:**V**|19+|**V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 18]|**V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-04 22:01:42 +00:00
|||[14, 15]|**V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|If|*in* cond:**B**< br > *out* outputs:**V**|19+|**B** = tensor(bool)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 18]|**B** = tensor(bool)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-04 22:01:42 +00:00
|||[13, 15]|**B** = tensor(bool)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ImageScaler|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|InstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|6+|**T** = tensor(float)|
|IsInf|*in* X:**T1**< br > *out* Y:**T2**|10+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(bool)|
2022-12-14 22:57:14 +00:00
|IsNaN|*in* X:**T1**< br > *out* Y:**T2**|13+|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|||[9, 12]|**T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|LRN|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|LSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|14+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
Update Attention operator to support separated Q/K/V inputs (#13410)
### Description
Allow separated Q, K and V inputs to support cross attention:
* Q: [batch_size, sequence_length, hidden_size]
* K: [batch_size, kv_sequence_length, hidden_size]
* V: [batch_size, kv_sequence_length, v_hidden_size]
* Output: [batch_size, sequence_length, v_hidden_size]
To use separated Q/K/V inputs, the input tensor is for query, and two
optional inputs are added for key and value. Weights for input
projection is not included for now, so the MatMul of input projection
shall be done out of Attention operator, but Add bias is included for
performance consideration.
2022-10-25 18:51:06 +00:00
|LayerNormalization|*in* X:**T**< br > *in* Scale:**T**< br > *in* B:**T**< br > *out* Y:**T**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* Scale:**V**< br > *in* B:**V**< br > *out* Y:**V**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**|17+|**T** = tensor(double), tensor(float)< br /> **U** = tensor(float)|
|||[1, 16]|**T** = tensor(double), tensor(float)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(double), tensor(float)|
2022-03-08 17:18:39 +00:00
|LeakyRelu|*in* X:**T**< br > *out* Y:**T**|16+|**T** = tensor(float)|
|||[6, 15]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Less|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[7, 8]|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|LessOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Log|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|LogSoftmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Loop|*in* M:**I**< br > *in* cond:**B**< br > *in* v_initial:**V**< br > *out* v_final_and_scan_outputs:**V**|19+|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 18]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-04 22:01:42 +00:00
|||[13, 15]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|LpNormalization|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float)|
2023-01-26 07:14:56 +00:00
|LpPool|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(float)|
|||[11, 17]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[2, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2020-09-02 22:07:50 +00:00
|||[1, 8]|**T** = tensor(double), tensor(float)|
2021-12-10 19:33:19 +00:00
|MatMulInteger|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *out* Y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int32)|
2021-06-02 07:47:40 +00:00
|Max|*in* data_0:**T**< br > *out* max:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[8, 11]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[6, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MaxPool|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**< br > *out* Indices:**I**|12+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(int8), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[8, 11]|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float)|
|||[1, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MaxRoiPool|*in* X:**T**< br > *in* rois:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|MaxUnpool|*in* X:**T1**< br > *in* I:**T2**< br > *in* output_shape:**T2**< br > *out* output:**T1**|11+|**T1** = tensor(float)< br /> **T2** = tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T1** = tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Mean|*in* data_0:**T**< br > *out* mean:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[8, 12]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[6, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MeanVarianceNormalization|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 8]|**T** = tensor(float)|
2022-06-27 00:26:55 +00:00
|MelWeightMatrix|*in* num_mel_bins:**T1**< br > *in* dft_length:**T1**< br > *in* sample_rate:**T1**< br > *in* lower_edge_hertz:**T2**< br > *in* upper_edge_hertz:**T2**< br > *out* output:**T3**|17+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(float)< br /> **T3** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Min|*in* data_0:**T**< br > *out* min:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[8, 11]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[6, 7]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Mod|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[10, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Mul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Multinomial|*in* input:**T1**< br > *out* output:**T2**|7+|**T1** = tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
|Neg|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8)|
2021-06-02 07:47:40 +00:00
|NonZero|*in* X:**T**< br > *out* Y:**tensor(int64)**|13+|**T** = tensor(bool), tensor(float), tensor(int32), tensor(int64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(bool), tensor(float), tensor(int32), tensor(int64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|Not|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|OneHot|*in* indices:**T1**< br > *in* depth:**T2**< br > *in* values:**T3**< br > *out* output:**T3**|11+|**T1** = tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(float), tensor(int32), tensor(int64)< br /> **T3** = tensor(float), tensor(int32), tensor(int64), tensor(string)|
2020-09-02 22:07:50 +00:00
|||[9, 10]|**T1** = tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(float), tensor(int32), tensor(int64)< br /> **T3** = tensor(float), tensor(int32), tensor(int64), tensor(string)|
2021-11-04 22:01:42 +00:00
|Optional|*in* input:**V**< br > *out* output:**O**|15+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-01-09 18:26:16 +00:00
|OptionalGetElement|*in* input:**O**< br > *out* output:**V**|18+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[15, 17]|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|OptionalHasElement|*in* input:**O**< br > *out* output:**B**|18+|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[15, 17]|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))|
2021-06-02 07:47:40 +00:00
|Or|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|PRelu|*in* X:**T**< br > *in* slope:**T**< br > *out* Y:**T**|16+|**T** = tensor(float)|
|||[9, 15]|**T** = tensor(float)|
2021-04-26 20:38:40 +00:00
|||[7, 8]|**T** = tensor(float)|
2023-05-15 17:46:24 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *in* axes:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|19+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
|||18|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-01-23 20:14:35 +00:00
|||[13, 17]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[2, 10]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|ParametricSoftplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2021-08-25 19:04:20 +00:00
|Pow|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* Y:**T1**< br > *out* Z:**T**|15+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[13, 14]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 11]|**T** = tensor(double), tensor(float)|
2021-12-10 19:33:19 +00:00
|QLinearConv|*in* x:**T1**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T1**< br > *in* w:**T2**< br > *in* w_scale:**tensor(float)**< br > *in* w_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *in* B:**T4**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)< br /> **T4** = tensor(int32)|
|QLinearMatMul|*in* a:**T1**< br > *in* a_scale:**tensor(float)**< br > *in* a_zero_point:**T1**< br > *in* b:**T2**< br > *in* b_scale:**tensor(float)**< br > *in* b_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**< br >< br > or< br >< br > *in* x:**T1**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|19+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int8), tensor(uint8)|
|||[13, 18]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[10, 12]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|RNN|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(float)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(float)< br /> **T1** = tensor(int32)|
|RandomNormal|*out* output:**T**|1+|**T** = tensor(double), tensor(float)|
|RandomNormalLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float)|
|RandomUniform|*out* output:**T**|1+|**T** = tensor(double), tensor(float)|
|RandomUniformLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float)|
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* output:**T**|11+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|Reciprocal|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2023-04-05 16:19:43 +00:00
|ReduceL1|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
|ReduceL2|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
|ReduceLogSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
|ReduceLogSumExp|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceMax|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||11|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceMean|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32)|
2023-01-09 18:26:16 +00:00
|ReduceMin|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||11|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceProd|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(float), tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|ReduceSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2023-04-05 16:19:43 +00:00
|ReduceSumSquare|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[13, 17]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Relu|*in* X:**T**< br > *out* Y:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int8)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float)|
|||[6, 12]|**T** = tensor(double), tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Reshape|*in* data:**T**< br > *in* shape:**tensor(int64)**< br > *out* reshaped:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reshaped:**T**|19+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[14, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[5, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[1, 4]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-01 17:49:17 +00:00
|Resize|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T1**< br > *in* roi:**T2**< br > *in* scales:**tensor(float)**< br > *in* sizes:**tensor(int64)**< br > *out* Y:**T1**|19+|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||18|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2023-01-09 18:26:16 +00:00
|||[13, 17]|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2021-12-17 23:36:09 +00:00
|||[11, 12]|**T1** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||10|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ReverseSequence|*in* input:**T**< br > *in* sequence_lens:**tensor(int64)**< br > *out* Y:**T**|10+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 05:10:55 +00:00
|RoiAlign|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
|||[10, 15]|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Round|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2022-06-27 00:26:55 +00:00
|STFT|*in* signal:**T1**< br > *in* frame_step:**T2**< br > *in* window:**T1**< br > *in* frame_length:**T2**< br > *out* output:**T1**|17+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Scale|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|ScaledTanh|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Scan|*in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**< br >< br > or< br >< br > *in* sequence_lens:**I**< br > *in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**|19+|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 18]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 17:18:39 +00:00
|||[11, 15]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|||[9, 10]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||8|**I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Scatter|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-01-19 21:54:20 +00:00
|ScatterElements|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[16, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-03-08 05:10:55 +00:00
|||[13, 15]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-01-19 21:54:20 +00:00
|ScatterND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *in* updates:**T**< br > *out* output:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 05:10:55 +00:00
|||[13, 15]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Selu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float)|
|SequenceAt|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* tensor:**T**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceConstruct|*in* inputs:**T**< br > *out* output_sequence:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceEmpty|*out* output:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceErase|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceInsert|*in* input_sequence:**S**< br > *in* tensor:**T**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceLength|*in* input_sequence:**S**< br > *out* length:**I**|11+|**I** = tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Shape|*in* data:**T**< br > *out* shape:**T1**|19+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[15, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-08-25 19:04:20 +00:00
|||[13, 14]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Shrink|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sigmoid|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Sign|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[9, 12]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-29 13:22:04 +00:00
|SimplifiedLayerNormalization|*in* X:**T**< br > *in* scale:**V**< br > *out* Y:**V**< br > *out* inv_std_var:**U**|1+|**T** = tensor(double), tensor(float)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Sin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(double), tensor(float)|
|Sinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float)|
2023-08-11 21:48:53 +00:00
|Size|*in* data:**T**< br > *out* size:**T1**|19+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[13, 18]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Slice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *in* steps:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2020-09-02 22:07:50 +00:00
|||10|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[1, 9]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Softmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Softplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Softsign|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
2021-07-09 08:00:22 +00:00
|SpaceToDepth|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
|||[1, 12]|**T** = tensor(double), tensor(float)|
2023-01-11 22:14:10 +00:00
|Split|*in* input:**T**< br > *in* split:**T**< br > *out* outputs...:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* split:**tensor(int64)**< br > *out* outputs:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* outputs:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-12-12 17:29:15 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[2, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|SplitToSequence|*in* input:**T**< br > *in* split:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string)|
|Sqrt|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Squeeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* squeezed:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* squeezed:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|StringNormalizer|*in* X:**tensor(string)**< br > *out* Y:**tensor(string)**|10+|**X** = tensor(string)|
2021-06-02 07:47:40 +00:00
|Sub|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Sum|*in* data_0:**T**< br > *out* sum:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[8, 12]|**T** = tensor(double), tensor(float)|
|||[6, 7]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Tan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float)|
|Tanh|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float)|
2021-04-26 20:38:40 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|TfIdfVectorizer|*in* X:**T**< br > *out* Y:**T1**|9+|**T** = tensor(int32), tensor(int64), tensor(string)< br /> **T1** = tensor(float)|
|ThresholdedRelu|*in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(float)|
2020-09-02 22:07:50 +00:00
|||[1, 9]|**T** = tensor(float)|
2023-02-16 22:59:44 +00:00
|Tile|*in* input:**T**< br > *in* repeats:**T1**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* tiles:**T**< br > *in* axis:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[6, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|TopK|*in* X:**T**< br > *in* K:**tensor(int64)**< br > *out* Values:**T**< br > *out* Indices:**I**< br >< br > or< br >< br > *in* X:**T**< br > *out* Values:**T**< br > *out* Indices:**I**|11+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2021-04-26 20:38:40 +00:00
|||10|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float)|
|||[1, 9]|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|Transpose|*in* data:**T**< br > *out* transposed:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Trilu|*in* input:**T**< br > *in* k:**tensor(int64)**< br > *out* output:**T**|14+|**T** = tensor(double), tensor(float), tensor(int64)|
2023-07-12 03:24:14 +00:00
|Unique|*in* X:**T**< br > *out* Y:**T**< br > *out* indices:**tensor(int64)**< br > *out* inverse_indices:**tensor(int64)**< br > *out* counts:**tensor(int64)**|11+|**T** = tensor(double), tensor(float), tensor(int64), tensor(int8), tensor(string)|
2021-06-02 07:47:40 +00:00
|Unsqueeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* expanded:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* expanded:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-04-26 20:38:40 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2020-09-02 22:07:50 +00:00
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-12-17 23:36:09 +00:00
|Upsample|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**|9|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
|||[7, 8]|**T** = tensor(float), tensor(int32), tensor(int8), tensor(uint8)|
2022-03-08 05:10:55 +00:00
|Where|*in* condition:**B**< br > *in* X:**T**< br > *in* Y:**T**< br > *out* output:**T**|16+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string), tensor(uint8)|
|||[9, 15]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Xor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
| |
| |
|**Operator Domain:** *ai.onnx.ml* ||||
|ArrayFeatureExtractor|*in* X:**T**< br > *in* Y:**tensor(int64)**< br > *out* Z:**T**|1+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string)|
|Binarizer|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|CastMap|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = map(int64,tensor(float)), map(int64,tensor(string))< br /> **T2** = tensor(float), tensor(int64), tensor(string)|
|CategoryMapper|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(int64), tensor(string)< br /> **T2** = tensor(int64), tensor(string)|
|DictVectorizer|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = map(int64,tensor(double)), map(int64,tensor(float)), map(int64,tensor(string)), map(string,tensor(double)), map(string,tensor(float)), map(string,tensor(int64))< br /> **T2** = tensor(double), tensor(float), tensor(int64), tensor(string)|
|FeatureVectorizer|*in* X:**T1**< br > *out* Y:**tensor(float)**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|Imputer|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(int64)|
|LabelEncoder|*in* X:**T1**< br > *out* Y:**T2**|2+|**T1** = tensor(float), tensor(int64), tensor(string)< br /> **T2** = tensor(float), tensor(int64), tensor(string)|
|||1|**T1** = tensor(int64), tensor(string)< br /> **T2** = tensor(int64), tensor(string)|
|LinearClassifier|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**tensor(float)**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|LinearRegressor|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(float)|
|Normalizer|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
|OneHotEncoder|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(double), tensor(float), tensor(int64), tensor(string)|
|SVMClassifier|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**tensor(float)**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|SVMRegressor|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(float)|
|Scaler|*in* X:**T**< br > *out* Y:**tensor(float)**|1+|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64)|
2022-03-30 10:53:12 +00:00
|TreeEnsembleClassifier|*in* X:**T1**< br > *out* Y:**T2**< br > *out* Z:**tensor(float)**|3+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|||[1, 2]|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64)< br /> **T2** = tensor(int64), tensor(string)|
|TreeEnsembleRegressor|*in* X:**T**< br > *out* Y:**tensor(float)**|3+|**T** = tensor(double), tensor(float)|
|||[1, 2]|**T** = tensor(double), tensor(float)|
2021-06-02 07:47:40 +00:00
|ZipMap|*in* X:**tensor(float)**< br > *out* Z:**T**|1+|**T** = seq(map(int64,tensor(float))), seq(map(string,tensor(float)))|
2019-08-15 01:12:24 +00:00
| |
| |
2020-09-02 22:07:50 +00:00
|**Operator Domain:** *com.microsoft* ||||
2023-02-07 19:51:06 +00:00
|Attention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|AttnLSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *in* QW:**T**< br > *in* MW:**T**< br > *in* V:**T**< br > *in* M:**T**< br > *in* memory_seq_lens:**T1**< br > *in* AW:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|1+|**T** = tensor(double), tensor(float)< br /> **T1** = tensor(int32)|
2023-05-17 04:40:00 +00:00
|BeamSearch|*in* input_ids:**F**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* num_beams:**I**< br > *in* num_return_sequences:**I**< br > *in* length_penalty:**T**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**M**< br > *in* prefix_vocab_mask:**M**< br > *in* attention_mask:**I**< br > *in* decoder_input_ids:**I**< br > *in* logits_processor:**I**< br > *out* sequences:**I**< br > *out* sequences_scores:**T**< br > *out* scores:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|BiasGelu|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float)|
2021-10-20 02:53:56 +00:00
|BifurcationDetector|*in* src_tokens:**T**< br > *in* cur_tokens:**T**< br > *in* prev_suffix_match_idx:**T**< br > *in* pred_tokens:**T**< br > *out* tokens:**T**< br > *out* suffix_match_idx:**T**|1+|**T** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|CDist|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(double), tensor(float)|
|ConvTransposeWithDynamicPads|*in* X:**T**< br > *in* W:**T**< br > *in* Pads:**tensor(int64)**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2022-02-18 06:55:32 +00:00
|CropAndResize|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *in* crop_size:**T2**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int32)|
[QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015)
### Description
- Adds 16-bit integer support to:
- Quantization kernel implementations: Intel, Neon, and Power intrinsics
- DequantizeLinear and QuantizeLinear contrib ops
- QNN EP Quantize and Dequantize operators
- Python quantization scripts
- Disables QDQ fusions for most 16-bit QDQ node groups (need to add
16-bit support to QLinear* ops)
- Retains support for dropping QDQ nodes from Split, Gather, Reshape,
Transpose, Squeeze, and Unsqueeze node groups.
Sample python code to generate QDQ model with 16-bit activations and
8-bit weights:
```python
quantize_static(
input_model_path,
output_model_path,
data_reader,
quant_format=args.quant_format,
per_channel=args.per_channel,
activation_type=QuantType.QUInt16,
weight_type=QuantType.QUInt8,
extra_options={"DedicatedQDQPair": True, "ForceQuantizeNoInputCheck": True, "UseQDQContribOps": True},
)
```
Note that enabling the `UseQDQContribOps` extra option is not strictly
necessary. If the 16bit types are used without enabling
`UseQDQContribOps`, the QDQ ops domains are overridden to
'com.microsoft', and a warning is printed to stdout.
### Automated Tests
MLAS/CPU EP:
- [x] 16-bit QuantizeLinear computation
- [x] 16-bit DequantizeLinear computation
Optimizer:
- [x] Transpose QDQ fusion
- [x] Gather QDQ fusion
- [x] Reshape QDQ fusion
- [x] Squeeze QDQ fusion
- [x] Unsqueeze QDQ fusion
- [x] Split drop QDQ
- [x] DoubleQDQPairRemover
- [x] Transpose optimization
- [x] EnsureUniqueDQForNodeUnit
- [x] Common subexpression elimination (DQ not removed)
- [x] Constant folding
QNN EP:
- [x] Conv 16-bit activations, 8-bit weights
- [x] MatMul 16-bit activations, 8-bit weights
- [x] Unary 16-bit QDQ ops
- [x] Binary 16-bit QDQ ops
Quantization tool:
- [x] Test creation of 16-bit QDQ model
### Motivation and Context
Support mixed precision (8bit weights, 16bit activations) models.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-09-18 16:43:34 +00:00
|DequantizeLinear|*in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|1+|**T1** = tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint8)< br /> **T2** = tensor(float)|
2021-06-02 07:47:40 +00:00
|DynamicQuantizeLSTM|*in* X:**T**< br > *in* W:**T2**< br > *in* R:**T2**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *in* W_scale:**T**< br > *in* W_zero_point:**T2**< br > *in* R_scale:**T**< br > *in* R_zero_point:**T2**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|1+|**T** = tensor(float)< br /> **T1** = tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
|DynamicQuantizeMatMul|*in* A:**T1**< br > *in* B:**T2**< br > *in* b_scale:**T1**< br > *in* b_zero_point:**T2**< br > *in* bias:**T1**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-10-28 18:06:26 +00:00
|EmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding:**T**< br > *in* position_embedding:**T**< br > *in* segment_embedding:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* mask:**T1**< br > *in* position_ids:**T1**< br > *out* output:**T**< br > *out* mask_index:**T1**< br > *out* embedding_sum:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|ExpandDims|*in* X:**T**< br > *in* axis:**tensor(int32)**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **axis** = tensor(int32)|
|FastGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *in* Z:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedGemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GatherND|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|Gelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2022-10-21 22:00:18 +00:00
|GreedySearch|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *out* sequences:**I**|1+|**T** = tensor(float)|
2022-02-18 06:55:32 +00:00
|GridSample|*in* X:**T1**< br > *in* Grid:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Inverse|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-08-07 19:23:55 +00:00
|MatMulFpQ4|*in* A:**T1**< br > *in* B:**T2**< br > *in* B_shape:**T3**< br > *out* Y:**T1**|1+|**T1** = tensor(float)< br /> **T2** = tensor(uint8)< br /> **T3** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|MatMulInteger16|*in* A:**T1**< br > *in* B:**T2**< br > *out* Y:**T3**|1+|**T1** = tensor(int16)< br /> **T2** = tensor(int16)< br /> **T3** = tensor(int32)|
2021-12-10 19:33:19 +00:00
|MatMulIntegerToFloat|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_scale:**T3**< br > *in* b_scale:**T3**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *in* bias:**T3**< br > *out* Y:**T3**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(float)|
2022-09-20 21:24:59 +00:00
|MaxpoolWithMask|*in* X:**T**< br > *in* M:**tensor(int32)**< br > *out* Y:**T**|1+|**T** = tensor(float)|
Whisper Model Optimization (#15473)
### Description
This PR contains fusion-level and kernel-level optimizations for
[OpenAI's Whisper](https://github.com/openai/whisper).
Some of the added optimizations include:
- Pruning of duplicate/unnecessary inputs and outputs
- Fusion support for Whisper models with or without these inputs/outputs
(e.g. with these inputs/outputs if exporting with an older official
Optimum version, without these inputs/outputs if exporting with Optimum
from source)
- Attention fusions
- For Whisper's encoder and decoder
- Modified symbolic shape inference for present output when no past
input exists (for decoder)
- Multi-head attention fusions
- For Whisper's decoder and decoder with past
- Packed MatMul for the 3 MatMuls excluded in multi-head attention
fusion
- Attention kernel changes
- CPU:
- Different Q and KV sequence lengths
- Parallel memset for large sequence lengths
- Convert broadcast add after MatMul of Q and K (add_qk) to element-wise
add
- Separate present key-value output into present key and present value
(for multi-head attention spec)
- CUDA:
- Use memory efficient attention compute kernel with present state (for
decoder)
- Multi-head attention kernel changes
- CPU:
- Introduction of multi-head attention CPU kernel (previously did not
exist)
- Use AddBiasReshape instead of AddBiasTranspose when sequence length =
1 (for decoder with past)
- Different Q, K, V input shapes
- Pass past key and past value directly as key and value
- CUDA:
- Use memory efficient attention compute kernel with past and/or present
state (for decoder with past)
### Usage
To use the optimizations, run the ORT transformer optimizer script as
follows:
```
$ cd onnxruntime/onnxruntime/python/tools/transformers/
$ python3 optimizer.py --input <filename>.onnx --output <filename>.onnx --model_type bart --num_heads <number of attention heads, depends on the size of the whisper model used> --hidden_size <attention hidden size, depends on the size of the whisper model used> --use_external_data_format --use_multi_head_attention
```
Once optimized, here's an example of how to run Whisper with [Hugging
Face's Optimum](https://github.com/huggingface/optimum):
```
from transformers.onnx.utils import get_preprocessor
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
from optimum.pipelines import pipeline as ort_pipeline
import whisper # Installed from OpenAI's repo - setup instructions at https://github.com/openai/whisper/
directory = './whisper_opt' # Where the optimized ONNX models are located
model_name = 'openai/whisper-tiny'
device = 'cpu'
# Get pipeline
processor = get_preprocessor(model_name)
model = ORTModelForSpeechSeq2Seq.from_pretrained(
directory,
use_io_binding=(device == 'cuda'),
provider='CPUExecutionProvider',
).to(device)
pipe = ort_pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
device=(-1 if device == 'cpu' else 0),
)
# Load audio file and run pipeline
audio = whisper.load_audio('tests/jfk.flac')
audio = whisper.pad_or_trim(audio)
outputs = pipe([audio])
print(outputs)
```
Note: In order to use these changes with Optimum, it is recommended to
use Optimum from source to have the following changes:
- https://github.com/huggingface/optimum/pull/872
- https://github.com/huggingface/optimum/pull/920
### Motivation and Context
This PR helps the following issues:
- https://github.com/microsoft/onnxruntime/issues/15100
- https://github.com/microsoft/onnxruntime/issues/15235
- https://github.com/huggingface/optimum/issues/869 (work in progress)
This PR can be used with the other currently merged Whisper PRs:
- https://github.com/microsoft/onnxruntime/pull/15247
- https://github.com/microsoft/onnxruntime/pull/15339
- https://github.com/microsoft/onnxruntime/pull/15362
- https://github.com/microsoft/onnxruntime/pull/15365
- https://github.com/microsoft/onnxruntime/pull/15427
This PR uses changes from the following merged PRs:
- https://github.com/microsoft/onnxruntime/pull/14198
- https://github.com/microsoft/onnxruntime/pull/14146
- https://github.com/microsoft/onnxruntime/pull/14201
- https://github.com/microsoft/onnxruntime/pull/14928 (this introduced
the new multi-head attention spec)
2023-04-19 00:13:54 +00:00
|MultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|MurmurHash3|*in* X:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(int32), tensor(uint32)|
2021-06-21 17:21:48 +00:00
|NGramRepeatBlock|*in* input_ids:**Tid**< br > *in* scores:**T**< br > *out* scores_out:**T**|1+|**T** = tensor(float)< br /> **Tid** = tensor(int64)|
2021-11-30 02:43:43 +00:00
|NhwcMaxPool|*in* x:**T**< br > *out* y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* value:**T**< br > *out* output:**T**|1+|**T** = tensor(float)|
|QAttention|*in* input:**T1**< br > *in* weight:**T2**< br > *in* bias:**T3**< br > *in* input_scale:**T3**< br > *in* weight_scale:**T3**< br > *in* mask_index:**T4**< br > *in* input_zero_point:**T1**< br > *in* weight_zero_point:**T2**< br > *in* past:**T3**< br > *out* output:**T3**< br > *out* present:**T3**|1+|**T1** = tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(float)< br /> **T4** = tensor(int32)|
2021-06-25 22:51:43 +00:00
|QEmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding_quant:**T2**< br > *in* position_embedding_quant:**T2**< br > *in* segment_embedding:**T2**< br > *in* gamma_quant:**T2**< br > *in* beta_quant:**T2**< br > *in* mask:**T1**< br > *in* word_embedding_scale:**T**< br > *in* position_embedding_scale:**T**< br > *in* segment_embedding_scale:**T**< br > *in* gamma_scale:**T**< br > *in* beta_scale:**T**< br > *in* word_embedding_zero_point:**T2**< br > *in* position_embedding_zero_point:**T2**< br > *in* segment_embedding_zero_point:**T2**< br > *in* gamma_zero_point:**T2**< br > *in* beta_zero_point:**T2**< br > *out* layernorm_out:**T**< br > *out* mask_index_out:**T1**|1+|**T** = tensor(float)|
2022-02-02 18:35:29 +00:00
|QGemm|*in* A:**TA**< br > *in* a_scale:**T**< br > *in* a_zero_point:**TA**< br > *in* B:**TB**< br > *in* b_scale:**T**< br > *in* b_zero_point:**TB**< br > *in* C:**TC**< br > *in* y_scale:**T**< br > *in* y_zero_point:**TYZ**< br > *out* Y:**TY**|1+|**T** = tensor(float)< br /> **TA** = tensor(int8), tensor(uint8)< br /> **TB** = tensor(int8), tensor(uint8)< br /> **TC** = tensor(int32)< br /> **TY** = tensor(float), tensor(int8), tensor(uint8)< br /> **TYZ** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|QLinearAdd|*in* A:**T**< br > *in* A_scale:**tensor(float)**< br > *in* A_zero_point:**T**< br > *in* B:**T**< br > *in* B_scale:**tensor(float)**< br > *in* B_zero_point:**T**< br > *in* C_scale:**tensor(float)**< br > *in* C_zero_point:**T**< br > *out* C:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2021-12-10 19:33:19 +00:00
|QLinearConv|*in* x:**T1**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T1**< br > *in* w:**T2**< br > *in* w_scale:**tensor(float)**< br > *in* w_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *in* B:**T4**< br > *out* y:**T3**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)< br /> **T4** = tensor(int32)|
2021-06-02 07:47:40 +00:00
|QLinearLeakyRelu|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* X_zero_point:**T**< br > *in* Y_scale:**tensor(float)**< br > *in* Y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
|QLinearMul|*in* A:**T**< br > *in* A_scale:**tensor(float)**< br > *in* A_zero_point:**T**< br > *in* B:**T**< br > *in* B_scale:**tensor(float)**< br > *in* B_zero_point:**T**< br > *in* C_scale:**tensor(float)**< br > *in* C_zero_point:**T**< br > *out* C:**T**|1+|**T** = tensor(int8), tensor(uint8)|
|QLinearSigmoid|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* X_zero_point:**T**< br > *in* Y_scale:**tensor(float)**< br > *in* Y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2022-08-10 02:52:02 +00:00
|QLinearSoftmax|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2022-12-12 21:27:47 +00:00
|QLinearWhere|*in* condition:**B**< br > *in* X:**T**< br > *in* x_scale:**TF**< br > *in* x_zero_point:**T**< br > *in* Y:**T**< br > *in* y_scale:**TF**< br > *in* y_zero_point:**T**< br > *in* z_scale:**TF**< br > *in* z_zero_point:**T**< br > *out* Z:**T**|1+|**T** = tensor(int8), tensor(uint8)|
[QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015)
### Description
- Adds 16-bit integer support to:
- Quantization kernel implementations: Intel, Neon, and Power intrinsics
- DequantizeLinear and QuantizeLinear contrib ops
- QNN EP Quantize and Dequantize operators
- Python quantization scripts
- Disables QDQ fusions for most 16-bit QDQ node groups (need to add
16-bit support to QLinear* ops)
- Retains support for dropping QDQ nodes from Split, Gather, Reshape,
Transpose, Squeeze, and Unsqueeze node groups.
Sample python code to generate QDQ model with 16-bit activations and
8-bit weights:
```python
quantize_static(
input_model_path,
output_model_path,
data_reader,
quant_format=args.quant_format,
per_channel=args.per_channel,
activation_type=QuantType.QUInt16,
weight_type=QuantType.QUInt8,
extra_options={"DedicatedQDQPair": True, "ForceQuantizeNoInputCheck": True, "UseQDQContribOps": True},
)
```
Note that enabling the `UseQDQContribOps` extra option is not strictly
necessary. If the 16bit types are used without enabling
`UseQDQContribOps`, the QDQ ops domains are overridden to
'com.microsoft', and a warning is printed to stdout.
### Automated Tests
MLAS/CPU EP:
- [x] 16-bit QuantizeLinear computation
- [x] 16-bit DequantizeLinear computation
Optimizer:
- [x] Transpose QDQ fusion
- [x] Gather QDQ fusion
- [x] Reshape QDQ fusion
- [x] Squeeze QDQ fusion
- [x] Unsqueeze QDQ fusion
- [x] Split drop QDQ
- [x] DoubleQDQPairRemover
- [x] Transpose optimization
- [x] EnsureUniqueDQForNodeUnit
- [x] Common subexpression elimination (DQ not removed)
- [x] Constant folding
QNN EP:
- [x] Conv 16-bit activations, 8-bit weights
- [x] MatMul 16-bit activations, 8-bit weights
- [x] Unary 16-bit QDQ ops
- [x] Binary 16-bit QDQ ops
Quantization tool:
- [x] Test creation of 16-bit QDQ model
### Motivation and Context
Support mixed precision (8bit weights, 16bit activations) models.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-09-18 16:43:34 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|1+|**T1** = tensor(float)< br /> **T2** = tensor(int16), tensor(int8), tensor(uint16), tensor(uint8)|
QuickGelu Fusion (#12417)
Some models have QuickGelu(x)=x*sigmoid(1.702x), which has 3 Ops for
forward and 5 Ops for backward. The PR is to fuse this to a single Op
named QuickGelu and its gradient QuickGeluGrad.
For CUDA, tested in V100 using input tensor with shape [64,128,2048] and
float16 type:
Before, FW takes 335us, BW takes 614us

After, FW takes 115us, BW takes 139us, which is much faster.

For CPU kernel, using same shape and float type:
Before, FW takes 10us, BW takes 49us
Mul: 3480[µs]
Sigmoid: 1996[µs]
Mul: 4789[µs]
Mul: 4642[µs]
Mul: 4195[µs]
SigmoidGrad: 18328[µs]
Mul: 2988[µs]
Sum: 18576[µs]
After, FW takes 4us, BW takes 5us, which is also much faster.
QuickGelu: 3939[µs]
QuickGeluGrad: 5089[µs]
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-10-28 10:12:07 +00:00
|QuickGelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|SampleOp|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2023-01-12 22:15:26 +00:00
|Sampling|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *in* presence_mask:**I**< br > *in* seed:**I**< br > *out* sequences:**I**< br > *out* filtered_logits:**T**|1+|**T** = tensor(float)|
2023-01-06 15:27:10 +00:00
|SkipLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(double), tensor(float)|
2021-07-22 22:24:36 +00:00
|SparseToDenseMatMul|*in* A:**T**< br > *in* B:**T1**< br > *out* Y:**T1**|1+|**T** = sparse_tensor(double), sparse_tensor(float), sparse_tensor(int32), sparse_tensor(int64), sparse_tensor(uint32), sparse_tensor(uint64)< br /> **T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Tokenizer|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(string)|
|TransposeMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Trilu|*in* X:**T**< br > *in* k:**tensor(int64)**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(int64)|
|Unique|*in* x:**T**< br > *out* y:**T**< br > *out* idx:**tensor(int64)**< br > *out* counts:**tensor(int64)**|1+|**T** = tensor(float)|
|WordConvEmbedding|*in* Sequence:**T**< br > *in* W:**T1**< br > *in* B:**T1**< br > *in* C:**T1**< br > *out* Y:**T1**|1+|**T** = tensor(int32)< br /> **T1** = tensor(float)|
2019-08-15 01:12:24 +00:00
| |
| |
2020-09-02 22:07:50 +00:00
|**Operator Domain:** *com.microsoft.nchwc* ||||
2021-06-02 07:47:40 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *in* Sum:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|MaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|ReorderInput|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|ReorderOutput|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|Upsample|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
2019-08-15 01:12:24 +00:00
| |
| |
2021-05-08 03:17:29 +00:00
2021-06-02 07:47:40 +00:00
< a name = "cudaexecutionprovider" / >
2021-05-08 03:17:29 +00:00
## Operators implemented by CUDAExecutionProvider
| Op Name | Parameters | OpSet Version | Types Supported |
|---------|------------|---------------|-----------------|
2021-06-02 07:47:40 +00:00
|**Operator Domain:** *ai.onnx* ||||
|Abs|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Add|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Affine|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|And|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-12-22 00:01:00 +00:00
|ArgMax|*in* data:**T**< br > *out* reduced:**tensor(int64)**|11|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2022-12-22 00:01:00 +00:00
|ArgMin|*in* data:**T**< br > *out* reduced:**tensor(int64)**|11|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|AveragePool|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2022-02-18 06:55:32 +00:00
|||10|**T** = tensor(double), tensor(float), tensor(float16)|
|||[7, 9]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-08-25 19:04:20 +00:00
|BatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* input_mean:**U**< br > *in* input_var:**U**< br > *out* Y:**T**< br > *out* running_mean:**U**< br > *out* running_var:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T1**< br > *in* B:**T1**< br > *in* input_mean:**T2**< br > *in* input_var:**T2**< br > *out* Y:**T**< br > *out* running_mean:**T2**< br > *out* running_var:**T2**|15+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(double), tensor(float), tensor(float16)|
|||14|**T** = tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|||[9, 13]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Cast|*in* input:**T1**< br > *out* output:**T2**|19+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[9, 12]|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[6, 8]|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e5m2), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Ceil|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Clip|*in* input:**T**< br > *in* min:**T**< br > *in* max:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int64), tensor(int8), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int64), tensor(int8), tensor(uint64), tensor(uint8)|
|||11|**T** = tensor(float)|
|||[6, 10]|**T** = tensor(float)|
2021-06-02 07:47:40 +00:00
|Compress|*in* input:**T**< br > *in* condition:**T1**< br > *out* output:**T**|11+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Concat|*in* inputs:**T**< br > *out* concat_result:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[4, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-07 22:30:26 +00:00
|ConcatFromSequence|*in* input_sequence:**S**< br > *out* concat_result:**T**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
2021-06-02 07:47:40 +00:00
|ConstantOfShape|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(int64)< br /> **T2** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|ConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Cos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(double), tensor(float), tensor(float16)|
|Crop|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|CumSum|*in* x:**T**< br > *in* axis:**T2**< br > *out* y:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 13]|**T** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(int32), tensor(int64)|
2021-07-09 08:00:22 +00:00
|DepthToSpace|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|DequantizeLinear|*in* x:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *out* y:**tensor(float)**< br >< br > or< br >< br > *in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|19+|**T1** = tensor(float8e4m3fn), tensor(float8e5m2), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
|||[13, 18]|**T** = tensor(int8), tensor(uint8)|
|||[10, 12]|**T** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Div|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Dropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T1**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|||[10, 11]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
|||[7, 9]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|DynamicSlice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *out* output:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|Einsum|*in* Inputs:**T**< br > *out* Output:**T**|12+|**T** = tensor(double), tensor(float), tensor(float16)|
|Elu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
|Equal|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[7, 10]|**T** = tensor(bool), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Erf|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Exp|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Expand|*in* input:**T**< br > *in* shape:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[8, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|EyeLike|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)< br /> **T2** = tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint64)|
|Flatten|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 8]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Floor|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|GRU|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|Gather|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|GatherElements|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-02-18 06:55:32 +00:00
|GatherND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int64)< br /> **indices** = tensor(int64)|
|||12|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int64)< br /> **indices** = tensor(int64)|
|||11|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int64)< br /> **indices** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Gemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[9, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|Greater|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2022-03-08 17:18:39 +00:00
|GreaterOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|HardSigmoid|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Identity|*in* input:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**V**< br > *out* output:**V**|19+|**V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[14, 18]|**V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|If|*in* cond:**B**< br > *out* outputs:**V**|19+|**B** = tensor(bool)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**B** = tensor(bool)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ImageScaler|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|InstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
|LRN|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|LSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
Update Attention operator to support separated Q/K/V inputs (#13410)
### Description
Allow separated Q, K and V inputs to support cross attention:
* Q: [batch_size, sequence_length, hidden_size]
* K: [batch_size, kv_sequence_length, hidden_size]
* V: [batch_size, kv_sequence_length, v_hidden_size]
* Output: [batch_size, sequence_length, v_hidden_size]
To use separated Q/K/V inputs, the input tensor is for query, and two
optional inputs are added for key and value. Weights for input
projection is not included for now, so the MatMul of input projection
shall be done out of Attention operator, but Add bias is included for
performance consideration.
2022-10-25 18:51:06 +00:00
|LayerNormalization|*in* X:**T**< br > *in* Scale:**T**< br > *in* B:**T**< br > *out* Y:**T**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* Scale:**V**< br > *in* B:**V**< br > *out* Y:**V**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**|17+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(float)|
|||[1, 16]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2022-03-08 17:18:39 +00:00
|LeakyRelu|*in* X:**T**< br > *out* Y:**T**|16+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[6, 15]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Less|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2022-03-08 17:18:39 +00:00
|LessOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
|||[12, 15]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T1** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|Log|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|LogSoftmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Loop|*in* M:**I**< br > *in* cond:**B**< br > *in* v_initial:**V**< br > *out* v_final_and_scan_outputs:**V**|19+|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 18]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**B** = tensor(bool)< br /> **I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|MatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[9, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|MatMulInteger|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *out* Y:**T3**|10+|**T1** = tensor(int8)< br /> **T2** = tensor(int8)< br /> **T3** = tensor(int32)|
|Max|*in* data_0:**T**< br > *out* max:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[6, 11]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2022-02-18 06:55:32 +00:00
|MaxPool|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**< br > *out* Indices:**I**|12+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||11|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16)|
|||10|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16)|
|||[8, 9]|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16)|
2022-02-18 06:55:32 +00:00
|||[1, 7]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|MemcpyFromHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|MemcpyToHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(float8e4m3fn)), seq(tensor(float8e4m3fnuz)), seq(tensor(float8e5m2)), seq(tensor(float8e5m2fnuz)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Min|*in* data_0:**T**< br > *out* min:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[6, 11]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2022-08-09 05:05:40 +00:00
|Mod|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||[10, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Mul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Neg|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2022-03-24 23:35:45 +00:00
|NonZero|*in* X:**T**< br > *out* Y:**tensor(int64)**|13+|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
|||[9, 12]|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|Not|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(bool)|
2021-06-02 07:47:40 +00:00
|OneHot|*in* indices:**T1**< br > *in* depth:**T2**< br > *in* values:**T3**< br > *out* output:**T3**|11+|**T1** = tensor(int32), tensor(int64)< br /> **T2** = tensor(int32), tensor(int64)< br /> **T3** = tensor(float), tensor(float16), tensor(int64)|
|Or|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2022-03-08 17:18:39 +00:00
|PRelu|*in* X:**T**< br > *in* slope:**T**< br > *out* Y:**T**|16+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[9, 15]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-09 18:26:16 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *in* axes:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[2, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|ParametricSoftplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-08-25 19:04:20 +00:00
|Pow|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* Y:**T1**< br > *out* Z:**T**|15+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[13, 14]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[7, 11]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**< br >< br > or< br >< br > *in* x:**T1**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|19+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float8e4m3fn), tensor(float8e5m2), tensor(int8), tensor(uint8)|
|||[13, 18]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
|||[10, 12]|**T1** = tensor(float)< br /> **T2** = tensor(int8), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|RNN|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
|||[7, 13]|**T** = tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(int32)|
2021-11-19 00:18:34 +00:00
|RandomNormal|*out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|RandomNormalLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float), tensor(float16)|
|RandomUniform|*out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|RandomUniformLike|*in* input:**T1**< br > *out* output:**T2**|1+|**T1** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* output:**T**|11+|**T** = tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|Reciprocal|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-09 18:26:16 +00:00
|ReduceL1|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2023-01-09 18:26:16 +00:00
|ReduceL2|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2023-01-09 18:26:16 +00:00
|ReduceLogSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-09 18:26:16 +00:00
|ReduceLogSumExp|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-09 18:26:16 +00:00
|ReduceMax|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||11|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceMean|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2023-01-09 18:26:16 +00:00
|ReduceMin|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|14+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-09-08 00:01:26 +00:00
|||13|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
|||12|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||11|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2023-01-09 18:26:16 +00:00
|ReduceProd|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32)|
2021-06-02 07:47:40 +00:00
|ReduceSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2023-01-09 18:26:16 +00:00
|ReduceSumSquare|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Relu|*in* X:**T**< br > *out* Y:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Reshape|*in* data:**T**< br > *in* shape:**tensor(int64)**< br > *out* reshaped:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reshaped:**T**|19+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[14, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|||13|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[5, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **shape** = tensor(int64)|
|||[1, 4]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Resize|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T1**< br > *in* roi:**T2**< br > *in* scales:**tensor(float)**< br > *in* sizes:**tensor(int64)**< br > *out* Y:**T1**|13+|**T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T1** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
|||10|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|ReverseSequence|*in* input:**T**< br > *in* sequence_lens:**tensor(int64)**< br > *out* Y:**T**|10+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|RoiAlign|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *out* Y:**T1**|10+|**T1** = tensor(double), tensor(float)< br /> **T2** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Round|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(double), tensor(float), tensor(float16)|
|ScaledTanh|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Scan|*in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**< br >< br > or< br >< br > *in* sequence_lens:**I**< br > *in* initial_state_and_scan_inputs:**V**< br > *out* final_state_and_scan_outputs:**V**|19+|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[16, 18]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-08 17:18:39 +00:00
|||[11, 15]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-02-18 06:55:32 +00:00
|||[9, 10]|**V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||8|**I** = tensor(int64)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Scatter|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|[9, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|ScatterElements|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|ScatterND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *in* updates:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Selu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-07 22:30:26 +00:00
|SequenceAt|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* tensor:**T**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceConstruct|*in* inputs:**T**< br > *out* output_sequence:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceEmpty|*out* output:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceErase|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceInsert|*in* input_sequence:**S**< br > *in* tensor:**T**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceLength|*in* input_sequence:**S**< br > *out* length:**I**|11+|**I** = tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
Introduce float 8 types (#14731)
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-05-30 20:25:58 +00:00
|Shape|*in* data:**T**< br > *out* shape:**T1**|19+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||[15, 18]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-08-25 19:04:20 +00:00
|||[13, 14]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-06-02 07:47:40 +00:00
|Shrink|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sigmoid|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-08-29 04:03:58 +00:00
|Sign|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-03-29 13:22:04 +00:00
|SimplifiedLayerNormalization|*in* X:**T**< br > *in* scale:**V**< br > *out* Y:**V**< br > *out* inv_std_var:**U**|1+|**T** = tensor(double), tensor(float), tensor(float16)< br /> **U** = tensor(double), tensor(float)< br /> **V** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Sin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(double), tensor(float), tensor(float16)|
|Size|*in* data:**T**< br > *out* size:**T1**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2022-02-18 06:55:32 +00:00
|Slice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *in* steps:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||10|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||[1, 9]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Softmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 10]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Softplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|Softsign|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-07-09 08:00:22 +00:00
|SpaceToDepth|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
|||[1, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-11 22:14:10 +00:00
|Split|*in* input:**T**< br > *in* split:**T**< br > *out* outputs...:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* split:**tensor(int64)**< br > *out* outputs:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* outputs:**T**|18+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[13, 17]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[2, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Sqrt|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Squeeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* squeezed:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* squeezed:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Sub|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-05-08 03:17:29 +00:00
|||[7, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2021-06-02 07:47:40 +00:00
|Sum|*in* data_0:**T**< br > *out* sum:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[8, 12]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|||[6, 7]|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Tanh|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|ThresholdedRelu|*in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-05-08 03:17:29 +00:00
|||1+|**T** = tensor(double), tensor(float), tensor(float16)|
2021-06-02 07:47:40 +00:00
|Tile|*in* input:**T**< br > *in* repeats:**T1**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* tiles:**T**< br > *in* axis:**T**< br > *out* output:**T**|13+|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(int64)|
2021-05-08 03:17:29 +00:00
|||[6, 12]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)< br /> **T1** = tensor(int64)|
2023-02-07 17:03:14 +00:00
|TopK|*in* X:**T**< br > *in* K:**tensor(int64)**< br > *out* Values:**T**< br > *out* Indices:**I**< br >< br > or< br >< br > *in* X:**T**< br > *out* Values:**T**< br > *out* Indices:**I**|11+|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||10|**I** = tensor(int64)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
|||[1, 9]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64)|
2021-06-02 07:47:40 +00:00
|Transpose|*in* data:**T**< br > *out* transposed:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[1, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-11-09 19:20:17 +00:00
|Trilu|*in* input:**T**< br > *in* k:**tensor(int64)**< br > *out* output:**T**|14+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Unsqueeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* expanded:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* expanded:**T**|13+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[11, 12]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||[1, 10]|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Upsample|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**|9|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2021-05-08 03:17:29 +00:00
|||[7, 8]|**T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(uint8)|
2022-03-08 17:18:39 +00:00
|Where|*in* condition:**B**< br > *in* X:**T**< br > *in* Y:**T**< br > *out* output:**T**|16+|**B** = tensor(bool)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
|||[9, 15]|**B** = tensor(bool)< br /> **T** = tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint8)|
2021-06-02 07:47:40 +00:00
|Xor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)< br /> **T1** = tensor(bool)|
2021-05-08 03:17:29 +00:00
| |
| |
2022-10-27 21:20:48 +00:00
|**Operator Domain:** *com.microsoft* ||||
2023-02-07 19:51:06 +00:00
|Attention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-05-17 04:40:00 +00:00
|BeamSearch|*in* input_ids:**F**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* num_beams:**I**< br > *in* num_return_sequences:**I**< br > *in* length_penalty:**T**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**M**< br > *in* prefix_vocab_mask:**M**< br > *in* attention_mask:**I**< br > *in* decoder_input_ids:**I**< br > *in* logits_processor:**I**< br > *out* sequences:**I**< br > *out* sequences_scores:**T**< br > *out* scores:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-02-14 20:46:50 +00:00
|BiasAdd|*in* X:**T**< br > *in* bias:**T**< br > *in* skip:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|BiasDropout|*in* data:**T**< br > *in* bias:**T**< br > *in* residual:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|BiasGelu|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|BiasSoftmax|*in* data:**T**< br > *in* bias:**T**< br > *out* output:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-02-03 07:43:51 +00:00
|BiasSplitGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|BitmaskBiasDropout|*in* data:**T**< br > *in* bias:**T**< br > *in* residual:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T3**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)< br /> **T3** = tensor(uint32)|
|BitmaskDropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T3**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T1** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)< br /> **T2** = tensor(bool)< br /> **T3** = tensor(uint32)|
|ComplexMul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
|ComplexMulConj|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
|ConvTransposeWithDynamicPads|*in* X:**T**< br > *in* W:**T**< br > *in* Pads:**tensor(int64)**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|DecoderAttention|*in* query:**T**< br > *in* key:**T**< br > *in* q_weight:**T**< br > *in* kv_weight:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**B**< br > *in* key_cache:**T**< br > *in* value_cache:**T**< br > *in* static_kv:**B**< br > *in* use_past:**B**< br > *in* has_layer_state:**B**< br > *in* has_key_padding_mask:**B**< br > *out* output:**T**< br > *out* new_key_cache:**T**< br > *out* new_value_cache:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-05-18 22:38:31 +00:00
|DecoderMaskedMultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* mask_index:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *in* past_sequence_length:**M**< br > *in* beam_width:**M**< br > *in* cache_indirection:**M**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-23 19:31:38 +00:00
|DecoderMaskedSelfAttention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *in* beam_width:**M**< br > *in* cache_indirection:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|DequantizeLinear|*in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|1+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(float16)|
|DequantizeWithOrder|*in* input:**Q**< br > *in* scale_input:**S**< br > *out* output:**F**|1+|**F** = tensor(float), tensor(float16)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
|EmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding:**T**< br > *in* position_embedding:**T**< br > *in* segment_embedding:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* mask:**T1**< br > *in* position_ids:**T1**< br > *out* output:**T**< br > *out* mask_index:**T1**< br > *out* embedding_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
|FastGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(float), tensor(float16)|
|FusedConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *in* Z:**T**< br > *out* Y:**T**|1+|**T** = tensor(float)|
|FusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
2023-08-01 23:39:09 +00:00
|GatedRelativePositionBias|*in* query_layer:**T**< br > *in* query_bias:**T**< br > *in* rel_pos:**T**< br > *in* weight:**T**< br > *in* bias:**T**< br > *in* eco_a:**T**< br > *in* token_offset:**M**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Gelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|GreedySearch|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *out* sequences:**I**|1+|**T** = tensor(float), tensor(float16)|
|GridSample|*in* X:**T1**< br > *in* Grid:**T1**< br > *out* Y:**T2**|1+|**T1** = tensor(float)< br /> **T2** = tensor(float)|
2023-02-03 07:43:51 +00:00
|GroupNorm|*in* X:**T**< br > *in* gamma:**M**< br > *in* beta:**M**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Inverse|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|Irfft|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
|LongformerAttention|*in* input:**T**< br > *in* weight:**T**< br > *in* bias:**T**< br > *in* mask:**T**< br > *in* global_weight:**T**< br > *in* global_bias:**T**< br > *in* global:**G**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-13 21:29:16 +00:00
|MultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|NGramRepeatBlock|*in* input_ids:**Tid**< br > *in* scores:**T**< br > *out* scores_out:**T**|1+|**T** = tensor(float)< br /> **Tid** = tensor(int64)|
2023-02-03 07:43:51 +00:00
|NhwcConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-21 19:59:29 +00:00
|PackedAttention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* token_offset:**M**< br > *in* cumulative_sequence_length:**M**< br > *in* relative_position_bias:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-08-01 22:30:41 +00:00
|PackedMultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* token_offset:**M**< br > *in* cumulative_sequence_length:**M**< br > *in* relative_position_bias:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|QAttention|*in* input:**T1**< br > *in* weight:**T2**< br > *in* bias:**T3**< br > *in* input_scale:**T3**< br > *in* weight_scale:**T3**< br > *in* mask_index:**T4**< br > *in* input_zero_point:**T1**< br > *in* weight_zero_point:**T2**< br > *in* past:**T3**< br > *out* output:**T3**< br > *out* present:**T3**|1+|**T1** = tensor(int8)< br /> **T2** = tensor(int8)< br /> **T3** = tensor(float), tensor(float16)< br /> **T4** = tensor(int32)|
2023-02-07 19:51:06 +00:00
|QOrderedAttention|*in* input:**Q**< br > *in* scale_input:**S**< br > *in* scale_Q_gemm:**S**< br > *in* scale_K_gemm:**S**< br > *in* scale_V_gemm:**S**< br > *in* Q_weight:**Q**< br > *in* K_weight:**Q**< br > *in* V_weight:**Q**< br > *in* scale_Q_weight:**S**< br > *in* scale_K_weight:**S**< br > *in* scale_V_weight:**S**< br > *in* Q_bias:**S**< br > *in* K_bias:**S**< br > *in* V_bias:**S**< br > *in* scale_QKT_gemm:**S**< br > *in* scale_QKT_softmax:**S**< br > *in* scale_values_gemm:**S**< br > *in* mask_index:**G**< br > *in* past:**Q**< br > *in* relative_position_bias:**S**< br > *out* output:**Q**|1+|**G** = tensor(int32)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
2022-10-27 21:20:48 +00:00
|QOrderedGelu|*in* X:**Q**< br > *in* scale_X:**S**< br > *in* scale_Y:**S**< br > *out* Y:**Q**|1+|**Q** = tensor(int8)< br /> **S** = tensor(float)|
|QOrderedLayerNormalization|*in* X:**Q**< br > *in* scale_X:**S**< br > *in* scale:**F**< br > *in* B:**F**< br > *in* scale_Y:**S**< br > *out* Y:**Q**|1+|**F** = tensor(float), tensor(float16)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
|QOrderedLongformerAttention|*in* input:**Q**< br > *in* scale_input:**S**< br > *in* weight:**Q**< br > *in* scale_weight:**S**< br > *in* bias:**S**< br > *in* scale_bias:**S**< br > *in* scale_qkv_gemm:**S**< br > *in* mask:**F**< br > *in* global_weight:**Q**< br > *in* scale_global_weight:**S**< br > *in* global_bias:**S**< br > *in* scale_global_gemm:**S**< br > *in* global:**G**< br > *in* scale_output:**S**< br > *out* output:**Q**|1+|**F** = tensor(float16)< br /> **G** = tensor(int32)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
|QOrderedMatMul|*in* A:**Q**< br > *in* scale_A:**S**< br > *in* B:**Q**< br > *in* scale_B:**S**< br > *in* scale_Y:**S**< br > *in* bias:**S**< br > *in* C:**Q**< br > *in* scale_C:**S**< br > *out* Y:**Q**|1+|**Q** = tensor(int8)< br /> **S** = tensor(float)|
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|1+|**T1** = tensor(float16)< br /> **T2** = tensor(int8), tensor(uint8)|
|QuantizeWithOrder|*in* input:**F**< br > *in* scale_input:**S**< br > *out* output:**Q**|1+|**F** = tensor(float), tensor(float16)< br /> **Q** = tensor(int8)< br /> **S** = tensor(float)|
QuickGelu Fusion (#12417)
Some models have QuickGelu(x)=x*sigmoid(1.702x), which has 3 Ops for
forward and 5 Ops for backward. The PR is to fuse this to a single Op
named QuickGelu and its gradient QuickGeluGrad.
For CUDA, tested in V100 using input tensor with shape [64,128,2048] and
float16 type:
Before, FW takes 335us, BW takes 614us

After, FW takes 115us, BW takes 139us, which is much faster.

For CPU kernel, using same shape and float type:
Before, FW takes 10us, BW takes 49us
Mul: 3480[µs]
Sigmoid: 1996[µs]
Mul: 4789[µs]
Mul: 4642[µs]
Mul: 4195[µs]
SigmoidGrad: 18328[µs]
Mul: 2988[µs]
Sum: 18576[µs]
After, FW takes 4us, BW takes 5us, which is also much faster.
QuickGelu: 3939[µs]
QuickGeluGrad: 5089[µs]
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2022-10-28 10:12:07 +00:00
|QuickGelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-07 01:32:58 +00:00
|RelativePositionBias|*in* bias_table:**T**< br > *in* query_length:**U**< br > *in* key_length:**U**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-11-22 18:00:23 +00:00
|RemovePadding|*in* input:**T**< br > *in* sequence_token_count:**M**< br > *out* output:**T**< br > *out* token_offset:**M**< br > *out* cumulated_seq_len:**M**< br > *out* max_seq_len:**M**|1+|**T** = tensor(float), tensor(float16)|
|RestorePadding|*in* input:**T**< br > *in* token_offset:**M**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Rfft|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(double), tensor(float), tensor(float16)|
2023-01-12 22:15:26 +00:00
|Sampling|*in* input_ids:**I**< br > *in* max_length:**I**< br > *in* min_length:**I**< br > *in* repetition_penalty:**T**< br > *in* vocab_mask:**I**< br > *in* prefix_vocab_mask:**I**< br > *in* attention_mask:**I**< br > *in* presence_mask:**I**< br > *in* seed:**I**< br > *out* sequences:**I**< br > *out* filtered_logits:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-01-06 15:27:10 +00:00
|SkipLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
|SkipSimplifiedLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|TransposeMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(double), tensor(float), tensor(float16)|
|Trilu|*in* X:**T**< br > *in* k:**tensor(int64)**< br > *out* Y:**T**|1+|**T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
| |
| |
2022-09-09 17:21:25 +00:00
< a name = "dmlexecutionprovider" / >
## Operators implemented by DmlExecutionProvider
| Op Name | Parameters | OpSet Version | Types Supported |
|---------|------------|---------------|-----------------|
|**Operator Domain:** *ai.onnx* ||||
2022-10-28 03:11:49 +00:00
|Abs|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
|||6+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2022-09-09 17:21:25 +00:00
|Acos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Acosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|Add|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Affine|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|And|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)|
|ArgMax|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|ArgMin|*in* data:**T**< br > *out* reduced:**tensor(int64)**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Asin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Asinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|Atan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Atanh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|AveragePool|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
|||10+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|BatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* input_mean:**U**< br > *in* input_var:**U**< br > *out* Y:**T**< br > *out* running_mean:**U**< br > *out* running_var:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* scale:**T1**< br > *in* B:**T1**< br > *in* input_mean:**T2**< br > *in* input_var:**T2**< br > *out* Y:**T**< br > *out* running_mean:**T2**< br > *out* running_var:**T2**|15+|**T** = tensor(float), tensor(float16)|
|||14+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|BitShift|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**|11+|**T** = tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-17 20:27:49 +00:00
|BitwiseAnd|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseNot|*in* X:**T**< br > *out* Y:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseOr|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|BitwiseXor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|18+|**T** = tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Cast|*in* input:**T1**< br > *out* output:**T2**|13+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||6+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|CastLike|*in* input:**T1**< br > *in* target_type:**T2**< br > *out* output:**T2**|15+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Ceil|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Celu|*in* X:**T**< br > *out* Y:**T**|12+|**T** = tensor(float), tensor(float16)|
|Clip|*in* input:**T**< br > *in* min:**T**< br > *in* max:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Concat|*in* inputs:**T**< br > *out* concat_result:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||4+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|ConcatFromSequence|*in* input_sequence:**S**< br > *out* concat_result:**T**|11+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|ConstantOfShape|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(int64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Conv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|ConvInteger|*in* x:**T1**< br > *in* w:**T2**< br > *in* x_zero_point:**T1**< br > *in* w_zero_point:**T2**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int32)|
|ConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|Cos|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Cosh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
|Crop|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|CumSum|*in* x:**T**< br > *in* axis:**T2**< br > *out* y:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-09-26 21:44:48 +00:00
|DFT|*in* input:**T1**< br > *in* dft_length:**T2**< br > *in* axis:**tensor(int64)**< br > *out* output:**T1**< br >< br > or< br >< br > *in* input:**T1**< br > *in* dft_length:**T2**< br > *out* output:**T1**|17+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int64)|
2022-09-09 17:21:25 +00:00
|DepthToSpace|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-04-18 15:42:51 +00:00
|DequantizeLinear|*in* x:**T**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T**< br > *out* y:**tensor(float)**< br >< br > or< br >< br > *in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|13+|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||10+|**T** = tensor(int32), tensor(int8), tensor(uint8)|
2023-06-08 20:49:39 +00:00
|Div|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Dropout|*in* data:**T**< br > *in* ratio:**T1**< br > *in* training_mode:**T2**< br > *out* output:**T**< br > *out* mask:**T2**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**< br > *out* mask:**T1**|7+|**T** = tensor(float), tensor(float16)|
|DynamicQuantizeLinear|*in* x:**T1**< br > *out* y:**T2**< br > *out* y_scale:**tensor(float)**< br > *out* y_zero_point:**T2**|11+|**T1** = tensor(float)< br /> **T2** = tensor(uint8)|
|Einsum|*in* Inputs:**T**< br > *out* Output:**T**|12+|**T** = tensor(float), tensor(float16)|
|Elu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float), tensor(float16)|
|Equal|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||7+|**T** = tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
|Erf|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|Exp|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Expand|*in* input:**T**< br > *in* shape:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||8+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|EyeLike|*in* input:**T1**< br > *out* output:**T2**|9+|**T1** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Flatten|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Floor|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|GRU|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|Gather|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|GatherElements|*in* data:**T**< br > *in* indices:**Tind**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|GatherND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Gemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|GlobalAveragePool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|GlobalLpPool|*in* X:**T**< br > *out* Y:**T**|2+|**T** = tensor(float), tensor(float16)|
|GlobalMaxPool|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|Greater|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||7+|**T** = tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
2022-12-21 17:05:12 +00:00
|GreaterOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2023-05-05 22:59:33 +00:00
|GridSample|*in* X:**T1**< br > *in* grid:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|HardSigmoid|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float), tensor(float16)|
|Hardmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|Identity|*in* input:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**V**< br > *out* output:**V**|16+|**V** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||14+|**V** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-08-01 02:45:59 +00:00
|If|*in* cond:**B**< br > *out* outputs:**V**|19+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||16+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**B** = tensor(bool)< br /> **V** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|ImageScaler|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|InstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|6+|**T** = tensor(float), tensor(float16)|
|IsInf|*in* X:**T1**< br > *out* Y:**T2**|10+|**T1** = tensor(float)< br /> **T2** = tensor(bool)|
|IsNaN|*in* X:**T1**< br > *out* Y:**T2**|13+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|||9+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(bool)|
|LRN|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|LSTM|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *in* initial_c:**T**< br > *in* P:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**< br > *out* Y_c:**T**|14+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|LayerNormalization|*in* X:**T**< br > *in* Scale:**T**< br > *in* B:**T**< br > *out* Y:**T**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**< br >< br > or< br >< br > *in* X:**T**< br > *in* Scale:**V**< br > *in* B:**V**< br > *out* Y:**V**< br > *out* Mean:**U**< br > *out* InvStdDev:**U**|17+|**T** = tensor(float), tensor(float16)< br /> **U** = tensor(float)|
2022-12-01 22:08:18 +00:00
|||1+|**T** = tensor(float), tensor(float16)< br /> **V** = tensor(float), tensor(float16)|
2022-12-21 17:05:12 +00:00
|LeakyRelu|*in* X:**T**< br > *out* Y:**T**|16+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|Less|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||7+|**T** = tensor(float), tensor(float16)< br /> **T1** = tensor(bool)|
2022-12-21 17:05:12 +00:00
|LessOrEqual|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|16+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(bool)|
2022-09-09 17:21:25 +00:00
|Log|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|LogSoftmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|LpNormalization|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|LpPool|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
|||2+|**T** = tensor(float), tensor(float16)|
|MatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|MatMulInteger|*in* A:**T1**< br > *in* B:**T2**< br > *in* a_zero_point:**T1**< br > *in* b_zero_point:**T2**< br > *out* Y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int32)|
|Max|*in* data_0:**T**< br > *out* max:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|MaxPool|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**< br > *out* Indices:**I**|12+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||11+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||10+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||8+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int8), tensor(uint8)|
|||1+|**T** = tensor(float), tensor(float16)|
|MaxRoiPool|*in* X:**T**< br > *in* rois:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|MaxUnpool|*in* X:**T1**< br > *in* I:**T2**< br > *in* output_shape:**T2**< br > *out* output:**T1**|11+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int64)|
|||9+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int64)|
|Mean|*in* data_0:**T**< br > *out* mean:**T**|13+|**T** = tensor(float), tensor(float16)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|MeanVarianceNormalization|*in* X:**T**< br > *out* Y:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|MemcpyFromHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|MemcpyToHost|*in* X:**T**< br > *out* Y:**T**|1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Min|*in* data_0:**T**< br > *out* min:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Mod|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||10+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|Mul|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Neg|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
|||6+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8)|
2023-06-08 20:49:39 +00:00
|NonZero|*in* X:**T**< br > *out* Y:**tensor(int64)**|13+|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||9+|**T** = tensor(bool), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Not|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(bool)|
|OneHot|*in* indices:**T1**< br > *in* depth:**T2**< br > *in* values:**T3**< br > *out* output:**T3**|11+|**T1** = tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T3** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T1** = tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)< br /> **T2** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T3** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-15 16:53:35 +00:00
|OptionalGetElement|*in* input:**O**< br > *out* output:**V**|18+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||15+|**O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))< br /> **V** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|OptionalHasElement|*in* input:**O**< br > *out* output:**B**|18+|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||15+|**B** = tensor(bool)< br /> **O** = optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8))|
2022-09-09 17:21:25 +00:00
|Or|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)|
2022-12-21 17:05:12 +00:00
|PRelu|*in* X:**T**< br > *in* slope:**T**< br > *out* Y:**T**|16+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8)|
2022-09-09 17:21:25 +00:00
|||7+|**T** = tensor(float), tensor(float16)|
2023-05-24 01:25:36 +00:00
|Pad|*in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *in* axes:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *in* pads:**tensor(int64)**< br > *in* constant_value:**T**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|18+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||2+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|ParametricSoftplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|Pow|*in* X:**T**< br > *in* Y:**T**< br > *out* Z:**T**< br >< br > or< br >< br > *in* X:**T**< br > *in* Y:**T1**< br > *out* Z:**T**|15+|**T** = tensor(float), tensor(float16), tensor(int32)< br /> **T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32)< br /> **T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||12+|**T** = tensor(float), tensor(float16), tensor(int32)< br /> **T1** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16)|
|QLinearConv|*in* x:**T1**< br > *in* x_scale:**tensor(float)**< br > *in* x_zero_point:**T1**< br > *in* w:**T2**< br > *in* w_scale:**tensor(float)**< br > *in* w_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *in* B:**T4**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)< br /> **T4** = tensor(int32)|
|QLinearMatMul|*in* a:**T1**< br > *in* a_scale:**tensor(float)**< br > *in* a_zero_point:**T1**< br > *in* b:**T2**< br > *in* b_scale:**tensor(float)**< br > *in* b_zero_point:**T2**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T3**< br > *out* y:**T3**|10+|**T1** = tensor(int8), tensor(uint8)< br /> **T2** = tensor(int8), tensor(uint8)< br /> **T3** = tensor(int8), tensor(uint8)|
2023-04-18 15:42:51 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**< br >< br > or< br >< br > *in* x:**T1**< br > *in* y_scale:**tensor(float)**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|13+|**T1** = tensor(float), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||10+|**T1** = tensor(float), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
|RNN|*in* X:**T**< br > *in* W:**T**< br > *in* R:**T**< br > *in* B:**T**< br > *in* sequence_lens:**T1**< br > *in* initial_h:**T**< br > *out* Y:**T**< br > *out* Y_h:**T**|14+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
|Range|*in* start:**T**< br > *in* limit:**T**< br > *in* delta:**T**< br > *out* output:**T**|11+|**T** = tensor(float), tensor(int16), tensor(int32), tensor(int64)|
|Reciprocal|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceL1|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-04-28 03:32:11 +00:00
|ReduceL2|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceLogSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceLogSumExp|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceMax|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceMean|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceMin|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||12+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
2023-04-28 03:32:11 +00:00
|ReduceProd|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|ReduceSum|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2023-04-28 03:32:11 +00:00
|ReduceSumSquare|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* reduced:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reduced:**T**|18+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|||1+|**T** = tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64)|
|Relu|*in* X:**T**< br > *out* Y:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8)|
|||13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Reshape|*in* data:**T**< br > *in* shape:**tensor(int64)**< br > *out* reshaped:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* reshaped:**T**|14+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||5+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Resize|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T1**< br > *in* roi:**T2**< br > *in* scales:**tensor(float)**< br > *in* sizes:**tensor(int64)**< br > *out* Y:**T1**|13+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float), tensor(float16)|
|||11+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(float), tensor(float16)|
|||10+|**T** = tensor(float), tensor(float16)|
|ReverseSequence|*in* input:**T**< br > *in* sequence_lens:**tensor(int64)**< br > *out* Y:**T**|10+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-10 04:56:41 +00:00
|RoiAlign|*in* X:**T1**< br > *in* rois:**T1**< br > *in* batch_indices:**T2**< br > *out* Y:**T1**|16+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
|||10+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|Round|*in* X:**T**< br > *out* Y:**T**|11+|**T** = tensor(float), tensor(float16)|
2023-02-24 05:12:22 +00:00
|STFT|*in* signal:**T1**< br > *in* frame_step:**T2**< br > *in* window:**T1**< br > *in* frame_length:**T2**< br > *out* output:**T1**|17+|**T1** = tensor(float), tensor(float16)< br /> **T2** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|ScaledTanh|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|Scatter|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||9+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-02-06 18:01:02 +00:00
|ScatterElements|*in* data:**T**< br > *in* indices:**Tind**< br > *in* updates:**T**< br > *out* output:**T**|16+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-02-01 17:46:37 +00:00
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
2023-01-12 18:39:25 +00:00
|ScatterND|*in* data:**T**< br > *in* indices:**tensor(int64)**< br > *in* updates:**T**< br > *out* output:**T**|16+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Selu|*in* X:**T**< br > *out* Y:**T**|6+|**T** = tensor(float), tensor(float16)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|SequenceAt|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* tensor:**T**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceConstruct|*in* inputs:**T**< br > *out* output_sequence:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))< br /> **T** = tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|SequenceEmpty|*out* output:**S**|11+|**S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceErase|*in* input_sequence:**S**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceInsert|*in* input_sequence:**S**< br > *in* tensor:**T**< br > *in* position:**I**< br > *out* output_sequence:**S**|11+|**I** = tensor(int32), tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|SequenceLength|*in* input_sequence:**S**< br > *out* length:**I**|11+|**I** = tensor(int64)< br /> **S** = seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8))|
|Shape|*in* data:**T**< br > *out* shape:**T1**|15+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||13+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2022-09-09 17:21:25 +00:00
|Shrink|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint8)|
|Sigmoid|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
2022-10-28 03:11:49 +00:00
|Sign|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Sin|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Sinh|*in* input:**T**< br > *out* output:**T**|9+|**T** = tensor(float), tensor(float16)|
Enable Opset11 Sequence Ops on DirectML, and make the CPU implementations agnostic to backend EP (#14442)
Enable Opset11 Sequence Ops on DirectML, and make the CPU
implementations agnostic to backend EP
Opset 11 introduced the following sequence related operators:
- SequenceAt
- SequenceConstruct
- SequenceEmpty
- SequenceLength
- SequenceErase
- SequenceInsert
- ConcatFromSequence
With the exception of ConcatFromSequence, all of the above operators
were implemented with CPU kernels that a) required all of the contained
tensors to also be on CPU, and b) would clone each tensor into a new
sequence as a side effect of each operator. The implementation of
sequences are backend agnostic, as they dont affect actual tensor layout
or manipulate the contents of the tensors. In addition, with the
exception of SequenceAt, the other operators need not make copies of the
underlying referenced tensors.
Consequently, this change does the following:
1) Sequence* operators (except SequenceAt) no longer copies the contents
of a sequence of tensors on every kernel execution.
2) SequenceAt uses the DataTransferManager to copy tensors agnostic to
backend.
3) The internal container implemented by TensorSeq has changed from
onnxruntime::Tensor to OrtValue. This is because onnxruntime::Tensor
does not support copy or assignment construction, so it must have a
singular owner. However, is same tensor participates in multiple
containers it would have multiple container "owners" and this would not
be possible.
4) Other code that accessed values from TensorSeq have associated
changes to extract Tensors from OrtValues now.
In addition, DirectML execution was very slow when the above Sequence
operators were added to a graph, as this caused MemcpyToHost and
MemcpyFromHost kernels to be inserted between the graph and the sequence
operators. To optimize DirectML,
1) The CPU implementations for the Sequence* ops were registered as DML
implementations. Since the above changes also includes making the CPU
kernel implementations EP agnostic, the CPU kernels can be added as is.
2) The ConcatFromSequence operator needed to be implemented on DirectML.
However, there was little DirectML EP operator framework support for
operators that accept/output sequences of tensors. This change has
modified the internal COM interfaces to include new apis to interrogate
for sequence shapes, and extract the needed tensors from TensorSeq.
---------
Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>
2023-02-22 02:08:28 +00:00
|Size|*in* data:**T**< br > *out* size:**T1**|13+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
|||1+|**T** = seq(tensor(bool)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **T1** = tensor(int64)|
2022-09-09 17:21:25 +00:00
|Slice|*in* data:**T**< br > *in* starts:**Tind**< br > *in* ends:**Tind**< br > *in* axes:**Tind**< br > *in* steps:**Tind**< br > *out* output:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||10+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)< br /> **Tind** = tensor(int32), tensor(int64)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Softmax|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||11+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|Softplus|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|Softsign|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|SpaceToDepth|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2023-05-16 18:58:19 +00:00
|Split|*in* input:**T**< br > *in* split:**T**< br > *out* outputs...:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* split:**tensor(int64)**< br > *out* outputs:**T**< br >< br > or< br >< br > *in* input:**T**< br > *out* outputs:**T**|18+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||2+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sqrt|*in* X:**T**< br > *out* Y:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Squeeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* squeezed:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* squeezed:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sub|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|14+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||13+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||7+|**T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Sum|*in* data_0:**T**< br > *out* sum:**T**|13+|**T** = tensor(float), tensor(float16)|
|||8+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|Tan|*in* input:**T**< br > *out* output:**T**|7+|**T** = tensor(float), tensor(float16)|
|Tanh|*in* input:**T**< br > *out* output:**T**|13+|**T** = tensor(float), tensor(float16)|
|||6+|**T** = tensor(float), tensor(float16)|
|ThresholdedRelu|*in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(float), tensor(float16)|
|||1+|**T** = tensor(float), tensor(float16)|
|Tile|*in* input:**T**< br > *in* repeats:**T1**< br > *out* output:**T**< br >< br > or< br >< br > *in* input:**T**< br > *in* tiles:**T**< br > *in* axis:**T**< br > *out* output:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||6+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|TopK|*in* X:**T**< br > *in* K:**tensor(int64)**< br > *out* Values:**T**< br > *out* Indices:**I**< br >< br > or< br >< br > *in* X:**T**< br > *out* Values:**T**< br > *out* Indices:**I**|11+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||10+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**I** = tensor(int64)< br /> **T** = tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Transpose|*in* data:**T**< br > *out* transposed:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Trilu|*in* input:**T**< br > *in* k:**tensor(int64)**< br > *out* output:**T**|14+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Unsqueeze|*in* data:**T**< br > *in* axes:**tensor(int64)**< br > *out* expanded:**T**< br >< br > or< br >< br > *in* data:**T**< br > *out* expanded:**T**|13+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||11+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||1+|**T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|Upsample|*in* X:**T**< br > *in* scales:**tensor(float)**< br > *out* Y:**T**< br >< br > or< br >< br > *in* X:**T**< br > *out* Y:**T**|10+|**T** = tensor(float), tensor(float16)|
|||9+|**T** = tensor(float), tensor(float16)|
|||7+|**T** = tensor(float), tensor(float16)|
2022-12-21 17:05:12 +00:00
|Where|*in* condition:**B**< br > *in* X:**T**< br > *in* Y:**T**< br > *out* output:**T**|16+|**B** = tensor(bool)< br /> **T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
|||9+|**B** = tensor(bool)< br /> **T** = tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8)|
2022-09-09 17:21:25 +00:00
|Xor|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T1**|7+|**T** = tensor(bool)|
| |
| |
2021-05-08 03:17:29 +00:00
|**Operator Domain:** *com.microsoft* ||||
2023-02-07 19:51:06 +00:00
|Attention|*in* input:**T**< br > *in* weights:**T**< br > *in* bias:**T**< br > *in* mask_index:**M**< br > *in* past:**T**< br > *in* relative_position_bias:**T**< br > *in* past_sequence_length:**M**< br > *out* output:**T**< br > *out* present:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float), tensor(float16)|
2023-04-10 21:46:33 +00:00
|BiasAdd|*in* X:**T**< br > *in* bias:**T**< br > *in* skip:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-12-01 17:23:19 +00:00
|BiasGelu|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-04-11 15:30:37 +00:00
|BiasSplitGelu|*in* X:**T**< br > *in* bias:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|ConvTransposeWithDynamicPads|*in* X:**T**< br > *in* W:**T**< br > *in* Pads:**tensor(int64)**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-06-16 01:21:56 +00:00
|DequantizeLinear|*in* x:**T1**< br > *in* x_scale:**T2**< br > *in* x_zero_point:**T1**< br > *out* y:**T2**|1+|**T1** = tensor(int32), tensor(int8), tensor(uint8)< br /> **T2** = tensor(float), tensor(float16)|
2022-12-13 21:23:53 +00:00
|EmbedLayerNormalization|*in* input_ids:**T1**< br > *in* segment_ids:**T1**< br > *in* word_embedding:**T**< br > *in* position_embedding:**T**< br > *in* segment_embedding:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* mask:**T1**< br > *in* position_ids:**T1**< br > *out* output:**T**< br > *out* mask_index:**T1**< br > *out* embedding_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|FusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-05-19 02:37:12 +00:00
|FusedMatMulActivation|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|Gelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-03-27 19:52:53 +00:00
|GroupNorm|*in* X:**T**< br > *in* gamma:**M**< br > *in* beta:**M**< br > *out* Y:**T**|1+|**M** = tensor(float), tensor(float16)< br /> **T** = tensor(float), tensor(float16)|
2023-05-19 22:07:14 +00:00
|MultiHeadAttention|*in* query:**T**< br > *in* key:**T**< br > *in* value:**T**< br > *in* bias:**T**< br > *in* key_padding_mask:**M**< br > *in* relative_position_bias:**T**< br > *in* past_key:**T**< br > *in* past_value:**T**< br > *out* output:**T**< br > *out* present_key:**T**< br > *out* present_value:**T**|1+|**M** = tensor(int32)< br /> **T** = tensor(float), tensor(float16)|
2023-04-11 06:16:09 +00:00
|NhwcConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
|QLinearAdd|*in* A:**T**< br > *in* A_scale:**tensor(float)**< br > *in* A_zero_point:**T**< br > *in* B:**T**< br > *in* B_scale:**tensor(float)**< br > *in* B_zero_point:**T**< br > *in* C_scale:**tensor(float)**< br > *in* C_zero_point:**T**< br > *out* C:**T**|1+|**T** = tensor(int8), tensor(uint8)|
|QLinearSigmoid|*in* X:**T**< br > *in* X_scale:**tensor(float)**< br > *in* X_zero_point:**T**< br > *in* Y_scale:**tensor(float)**< br > *in* Y_zero_point:**T**< br > *out* Y:**T**|1+|**T** = tensor(int8), tensor(uint8)|
2023-06-16 01:21:56 +00:00
|QuantizeLinear|*in* x:**T1**< br > *in* y_scale:**T1**< br > *in* y_zero_point:**T2**< br > *out* y:**T2**|1+|**T1** = tensor(float), tensor(float16), tensor(int32)< br /> **T2** = tensor(int8), tensor(uint8)|
2023-04-05 17:49:34 +00:00
|QuickGelu|*in* X:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
2023-01-06 15:27:10 +00:00
|SkipLayerNormalization|*in* input:**T**< br > *in* skip:**T**< br > *in* gamma:**T**< br > *in* beta:**T**< br > *in* bias:**T**< br > *out* output:**T**< br > *out* mean:**U**< br > *out* inv_std_var:**U**< br > *out* input_skip_bias_sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2022-10-27 21:20:48 +00:00
| |
| |
|**Operator Domain:** *com.microsoft.dml* ||||
|DmlFusedAdd|*in* A:**T**< br > *in* B:**T**< br > *out* C:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedBatchNormalization|*in* X:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *in* mean:**T**< br > *in* var:**T**< br > *out* Y:**T**< br > *out* mean:**T**< br > *out* var:**T**< br > *out* saved_mean:**T**< br > *out* saved_var:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedConv|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedConvTranspose|*in* X:**T**< br > *in* W:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedGemm|*in* A:**T**< br > *in* B:**T**< br > *in* C:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedInstanceNormalization|*in* input:**T**< br > *in* scale:**T**< br > *in* B:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedMatMul|*in* A:**T**< br > *in* B:**T**< br > *out* Y:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedMeanVarianceNormalization|*in* input:**T**< br > *out* output:**T**|1+|**T** = tensor(float), tensor(float16)|
|DmlFusedSum|*in* data_0:**T**< br > *out* sum:**T**|1+|**T** = tensor(float), tensor(float16)|
2021-05-14 05:05:30 +00:00
| |
| |