onnxruntime/docs/ContribOperators.md

175 KiB

Contrib Operator Schemas

This file is automatically generated from the registered contrib operator schemas by this script. Do not modify directly.

com.microsoft

com.microsoft.Attention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

do_rotary : int
Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
past_present_share_buffer : int
Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
qkv_hidden_sizes : list of ints
Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
rotary_embedding_dim : int
Dimension of rotary embedding. Limited to 32, 64 or 128. Default value is head_size
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.

Inputs (2 - 7)

input : T
weights : T
bias (optional) : T
mask_index (optional) : M
past (optional) : T
attention_bias (optional) : T
past_sequence_length (optional) : M

Outputs (1 - 2)

output : T
present (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types

com.microsoft.AttnLSTM

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation_alpha : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
activation_beta : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
activations : list of strings
A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
clip : float
Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
direction : string
Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
hidden_size : int
Number of neurons in the hidden layer.
input_forget : int
Couple the input and forget gates if 1, default 0.

Inputs (3 - 14)

X : T
W : T
R : T
B (optional) : T
sequence_lens (optional) : T1
initial_h (optional) : T
initial_c (optional) : T
P (optional) : T
QW (optional) : T
MW (optional) : T
V (optional) : T
M (optional) : T
memory_seq_lens (optional) : T1
AW (optional) : T

Outputs (0 - 3)

Y (optional) : T
Y_h (optional) : T
Y_c (optional) : T

Type Constraints

T : tensor(float), tensor(double)
Constrain input and output types to float tensors.
T1 : tensor(int32)
Constrain seq_lens to integral tensors.

com.microsoft.BeamSearch

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_start_token_id : int
The id of the token that indicates decoding starts.
early_stopping : int
early stop or not
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int
model type: 0 for GPT-2; 1 for encoder decoder like T5
no_repeat_ngram_size : int
no repeat ngrams size
pad_token_id : int (required)
The id of the padding token
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

Inputs (5 - 12)

input_ids : F
max_length : I
min_length (optional) : I
num_beams : I
num_return_sequences : I
length_penalty (optional) : T
repetition_penalty (optional) : T
vocab_mask (optional) : M
prefix_vocab_mask (optional) : M
attention_mask (optional) : I
decoder_input_ids (optional) : I
logits_processor (optional) : I

Outputs (1 - 3)

sequences : I
sequences_scores (optional) : T
scores (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain to float tensors.
F : tensor(float), tensor(int32), tensor(float16)
Constrain input type to float or int tensors.
I : tensor(int32)
Constrain to integer types
M : tensor(int32)
Constrain mask to integer types

com.microsoft.BiasAdd

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

X : T
bias : T
skip : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float)
Constrain input and output types to float tensors.

com.microsoft.BiasDropout

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

seed : int
(Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs (2 - 5)

data : T
bias : T
residual (optional) : T
ratio (optional) : T1
training_mode (optional) : T2

Outputs (1 - 2)

output : T
mask (optional) : T2

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input 'ratio' types to float tensors.
T2 : tensor(bool)
Constrain output 'mask' types to boolean tensors.

com.microsoft.BiasGelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

A : T
B : T

Outputs

C : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.microsoft.BiasSoftmax

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axis : int
apply softmax to elements for dimensions axis or higher
is_inner_broadcast : int (required)
true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis - 1

Inputs

data : T
bias : T

Outputs

output : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.microsoft.BiasSplitGelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

X : T
bias : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float)
Constrain input X and output Y types to float tensors.

com.microsoft.BifurcationDetector

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

max_ngram_size : int
The maximum NGram size for suffix matching.
min_ngram_size : int
The minimum NGram size for suffix matching.

Inputs (3 - 4)

src_tokens : T
cur_tokens : T
prev_suffix_match_idx : T
pred_tokens (optional) : T

Outputs

tokens : T
suffix_match_idx : T

Type Constraints

T : tensor(int64)
Constrain to integer types.

com.microsoft.BitmaskBiasDropout

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

seed : int
(Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs (2 - 5)

data : T
bias : T
residual (optional) : T
ratio (optional) : T1
training_mode (optional) : T2

Outputs (1 - 2)

output : T
mask (optional) : T3

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input 'ratio' types to float tensors.
T2 : tensor(bool)
Constrain input 'training_mode' types to boolean tensors.
T3 : tensor(uint32)
Constrain output 'mask' types to uint32 tensors.

com.microsoft.BitmaskDropout

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

seed : int
(Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs (1 - 3)

data : T
ratio (optional) : T1
training_mode (optional) : T2

Outputs (1 - 2)

output : T
mask (optional) : T3

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input 'ratio' types to float tensors.
T2 : tensor(bool)
Constrain 'training_mode' to boolean tensor.
T3 : tensor(uint32)
Constrain output 'mask' types to bit-packed uint32 tensor.

com.microsoft.CDist

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

metric : string
The distance metric to use. If a string, the distance function can be "braycurtis", "canberra", "chebyshev", "cityblock", "correlation", "cosine", "dice", "euclidean", "hamming", "jaccard", "jensenshannon", "kulsinski", "mahalanobis", "matching", "minkowski", "rogerstanimoto", "russellrao", "seuclidean", "sokalmichener", "sokalsneath", "sqeuclidean", "wminkowski", "yule".

Inputs

A : T
B : T

Outputs

C : T

Type Constraints

T : tensor(float), tensor(double)
Constrains input to only numeric types.

com.microsoft.ComplexMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

A : T
B : T

Outputs

C : T

Type Constraints

T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.

com.microsoft.ComplexMulConj

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

A : T
B : T

Outputs

C : T

Type Constraints

T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.

com.microsoft.ConvTransposeWithDynamicPads

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
output_padding : list of ints
strides : list of ints

Inputs (2 - 4)

X : T
W : T
Pads (optional) : tensor(int64)
B (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors

com.microsoft.CropAndResize

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

extrapolation_value : float
Value used for extrapolation, when applicable. Default is 0.0f.
mode : string
The pooling method. Two modes are supported: 'bilinear' and 'nearest'. Default is 'bilinear'.

Inputs

X : T1
rois : T1
batch_indices : T2
crop_size : T2

Outputs

Y : T1

Type Constraints

T1 : tensor(float16), tensor(float), tensor(double)
Constrain types to float tensors.
T2 : tensor(int32)
Constrain types to int tensors.

com.microsoft.DecoderAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads

Inputs

query : T
key : T
q_weight : T
kv_weight : T
bias : T
key_padding_mask (optional) : B
key_cache (optional) : T
value_cache (optional) : T
static_kv : B
use_past : B
has_layer_state : B
has_key_padding_mask : B

Outputs (1 - 3)

output : T
new_key_cache (optional) : T
new_value_cache (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float and float16 tensors.
B : tensor(bool)
Constrain key_padding_mask to bool tensors.

com.microsoft.DecoderMaskedMultiHeadAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
output_qk : int
Need output the cross attention MatMul(Q, K)
past_present_share_buffer : int
Corresponding past and present are same tensor, its size is (batch_size, num_heads, max_sequence_length, head_size)
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)

Inputs (1 - 11)

query : T
key (optional) : T
value (optional) : T
mask_index (optional) : M
attention_bias (optional) : T
past_key (optional) : T
past_value (optional) : T
past_sequence_length (optional) : M
beam_width (optional) : M
cache_indirection (optional) : M
bias (optional) : T

Outputs (1 - 4)

output : T
present_key (optional) : T
present_value (optional) : T
qk (optional) : V

Type Constraints

V : tensor(float)
Constrain qk output types to float32 tensors.
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types

com.microsoft.DecoderMaskedSelfAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

do_rotary : int
Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
past_present_share_buffer : int
Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)

Inputs (7 - 9)

input : T
weights : T
bias : T
mask_index (optional) : M
past : T
attention_bias (optional) : T
past_sequence_length : M
beam_width (optional) : M
cache_indirection (optional) : M

Outputs

output : T
present : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types

com.microsoft.DequantizeBFP

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

bfp_type : int (required)
The type of BFP - must match with the BFPType enum
block_dim : int
Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.
dtype : int
The datatype to dequantize to.

Inputs

x : T1
shape : T2
strides : T2

Outputs

y : T3

Type Constraints

T1 : tensor(uint8)
Constrain the input to uint8.
T2 : tensor(int64)
Constrain shape and strides to uint64.
T3 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain y to float and bfloat16.

com.microsoft.DequantizeLinear

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axis : int
The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.

Inputs (2 - 3)

x : T1
x_scale : T2
x_zero_point (optional) : T1

Outputs

y : T2

Type Constraints

T1 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int32), tensor(int4), tensor(uint4)
Constrain 'x' and 'x_zero_point' to 8-bit integer tensors, 16-bit integer tensors, or 32-bit signed integer tensors.
T2 : tensor(float16), tensor(float)
Constrain 'y', 'x_scale' to float tensors.

com.microsoft.DequantizeWithOrder

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

order_input : int (required)
cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required)
cublasLt order of output matrix
to : int (required)
The output data type, only support TensorProto_DataType_FLOAT (1) and TensorProto_DataType_FLOAT16 (10)

Inputs

input : Q
scale_input : S

Outputs

output : F

Type Constraints

Q : tensor(int8)
Constrain input and output types to int8 tensors.
F : tensor(float16), tensor(float)
Constrain to float types
S : tensor(float)
Constrain Scale to float32 types

com.microsoft.DynamicQuantizeLSTM

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation_alpha : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
activation_beta : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
activations : list of strings
A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
clip : float
Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
direction : string
Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
hidden_size : int
Number of neurons in the hidden layer
input_forget : int
Couple the input and forget gates if 1.

Inputs

X : T
W : T2
R : T2
B (optional) : T
sequence_lens (optional) : T1
initial_h (optional) : T
initial_c (optional) : T
P (optional) : T
W_scale : T
W_zero_point : T2
R_scale : T
R_zero_point : T2

Outputs (0 - 3)

Y (optional) : T
Y_h (optional) : T
Y_c (optional) : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors.
T1 : tensor(int32)
Constrain seq_lens to integer tensor.
T2 : tensor(uint8), tensor(int8)
Constrain weights types to 8 bit tensors.

com.microsoft.DynamicQuantizeMatMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (3 - 5)

A : T1
B : T2
b_scale : T1
b_zero_point (optional) : T2
bias (optional) : T1

Outputs

Y : T1

Type Constraints

T1 : tensor(float)
Constrain input A, b_scale and output Y data type as float tensor.
T2 : tensor(int8), tensor(uint8)
Constrain input B data type to 8-bit integer tensor.

com.microsoft.DynamicTimeWarping

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

input : F

Outputs

output : I

Type Constraints

F : tensor(float)
Constrain to float tensors.
I : tensor(int32)
Constrain to integer types.

com.microsoft.EPContext

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

embed_mode : int
1: indicate ep_cache_context is the context content. 0: indicate ep_cache_context is the file path to the context content.The path is relative to this Onnx file. Default is 1.
ep_cache_context : string
payload of the execution provider context if embed_mode=1, or path to the context file if embed_mode=0.
ep_sdk_version : string
(Optional) SDK version used to convert the model.
hardware_architecture : string
(Optional) Hardware architecture.
main_context : int
Usually each single EPContext associate with a graph partition.But for some case like QNN, it has single EPContext contains all partitions.In that case, the node with ep_cache_context should set main_context=1. Other nodes set main_context=0 and skip ep_cache_context.The path is relative to this Onnx file. Default is 1.
max_size : int
max size in the context. Usage depend on the EP.
notes : string
(Optional) Some notes for the model
onnx_model_filename : string
(Optional) Filename of the original ONNX model.
partition_name : string
(Optional) partitioned graph name.
source : string
(Optional) the source used to generate the engine/context cache file. Ort EP or native SDK tool chain

Inputs (1 - ∞)

inputs (variadic, heterogeneous) : T

Outputs (1 - ∞)

outputs (variadic, heterogeneous) : T

Type Constraints

T : tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types.

com.microsoft.EmbedLayerNormalization

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

epsilon : float
The epsilon value to use to avoid division by zero.
mask_index_type : int
The mask index tensor type for shape inference (0: None, 1: 1D mask_index)

Inputs (7 - 9)

input_ids : T1
segment_ids (optional) : T1
word_embedding : T
position_embedding : T
segment_embedding (optional) : T
gamma : T
beta : T
mask (optional) : T1
position_ids (optional) : T1

Outputs (1 - 3)

output : T
mask_index (optional) : T1
embedding_sum (optional) : T

Type Constraints

T1 : tensor(int32)
Constrain input and output integer tensors types
T : tensor(float), tensor(float16)
Constrain input and output float tensors types.

com.microsoft.ExpandDims

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

X : T
axis : tensor(int32)

Outputs

Y : T

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

com.microsoft.FastGelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (1 - 2)

X : T
bias (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float or half tensors.

com.microsoft.FusedConv

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

Inputs (2 - 4)

X : T
W : T
B (optional) : T
Z (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors

com.microsoft.FusedGemm

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : string
activation_alpha : float
activation_beta : float
activation_gamma : float
alpha : float
Scalar multiplier for the product of input tensors A * B.
beta : float
Scalar multiplier for input tensor C.
transA : int
Whether A should be transposed
transB : int
Whether B should be transposed

Inputs (2 - 3)

A : T
B : T
C (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(uint32), tensor(uint64), tensor(int32), tensor(int64)
Constrain input and output types to float/int tensors.

com.microsoft.FusedMatMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication
transBatchA : int
Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
transBatchB : int
Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication

Inputs

A : T
B : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.microsoft.FusedMatMulActivation

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : string (required)
activation_alpha : float
activation_axis : int
activation_beta : float
activation_gamma : float
alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication
transBatchA : int
Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
transBatchB : int
Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication

Inputs

A : T
B : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.microsoft.GatedRelativePositionBias

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

num_heads : int (required)
Number of attention heads

Inputs (6 - 7)

query_layer : T
query_bias : T
rel_pos : T
weight : T
bias : T
eco_a : T
token_offset (optional) : M

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain token_offset to integer types

com.microsoft.GatherBlockQuantized

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

block_size : int
(Optional) block size used for weight quantization. It needs to be a power of 2 and not smaller than 16.
gather_axis : int
(Optional) Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
quantize_axis : int
(Optional) Which axis to block-wise quantize. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).

Inputs (3 - 4)

data : T1
indices : Tind
scales : T2
zero_points (optional) : T1

Outputs

output : T2

Type Constraints

T1 : tensor(int4), tensor(uint4)
Constrain quantized types.
T2 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain dequantized types.
Tind : tensor(int32), tensor(int64)
Constrain indices to integer types.

com.microsoft.GatherND

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

data : T
indices : Tind

Outputs

output : T

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input and output types to any tensor type.
Tind : tensor(int32), tensor(int64)
Constrain indice type to int32 or int64

com.microsoft.Gelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.microsoft.GemmFastGelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (2 - 3)

X : T
W : T
bias (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float or half tensors.

com.microsoft.GemmFloat8

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : string
Activation function, RELU or GELU or NONE (default).
alpha : float
Scalar multiplier for the product of input tensors A * B.
beta : float
Scalar multiplier for the product of input bias C.
dtype : int
Output Type. Same definition as attribute 'to' for operator Cast.
transA : int
Whether A should be transposed. Float 8 only supprted transA=0.
transB : int
Whether B should be transposed. Float 8 only supprted transB=1.

Inputs (2 - 6)

A : TA
B : TB
C (optional) : TC
scaleA (optional) : TS
scaleB (optional) : TS
scaleY (optional) : TS

Outputs

Y : TR

Type Constraints

TA : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to input A.
TB : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to input B.
TC : tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to input C.
TR : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to result type.
TS : tensor(float)
Constrain type for all input scales (scaleA, scaleB, scaleY).

com.microsoft.GemmaRotaryEmbedding

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

emb : U
q : T
q_rot : T
k : T
k_rot : T

Outputs

output1 : T
output2 : T

Type Constraints

T : tensor(float16)
Constrain input and output types to float16 tensors.
U : tensor(float)
Constrain input 0 type to float tensors

com.microsoft.GreedySearch

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_start_token_id : int
The id of the token that indicates decoding starts.
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before `decoder` subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int
model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
no_repeat_ngram_size : int
no repeat ngrams size
pad_token_id : int (required)
The id of the padding token
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

Inputs (2 - 7)

input_ids : I
max_length : I
min_length (optional) : I
repetition_penalty (optional) : T
vocab_mask (optional) : I
prefix_vocab_mask (optional) : I
attention_mask (optional) : I

Outputs

sequences : I

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors.
I : tensor(int32)
Constrain to integer types

com.microsoft.GridSample

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

align_corners : int
If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
mode : string
Three interpolation modes: bilinear (default), nearest and bicubic.
padding_mode : string
Support padding modes for outside grid values: `zeros`(default), `border`, `reflection`. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations.

Inputs

X : T1
Grid : T1

Outputs

Y : T2

Type Constraints

T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input types to all tensor types.
T2 : tensor(float16), tensor(float), tensor(double)
Constrain output types to float tensors.

com.microsoft.GroupNorm

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : int (required)
Activation after group normalization: 0 for None, 1 for SiLU
channels_last : int
1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
epsilon : float
The epsilon value to use to avoid division by zero
groups : int (required)
The number of groups of channels. It should be a divisor of the number of channels C

Inputs

X : T
gamma : M
beta : M

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float)
Constrain input X and output Y types to float tensors.
M : tensor(float16), tensor(float)
Constrain gamma and beta to float tensors.

com.microsoft.GroupQueryAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

do_rotary : int
Whether to use rotary position embedding. Default value is 0.
kv_num_heads : int (required)
Number of attention heads for k and v
local_window_size : int
left_window_size for local attention (like Mistral). Default value is -1 meaning unused.
num_heads : int (required)
Number of attention heads for q
rotary_interleaved : int
Rotate using interleaved pattern. Default value is 0 (False).
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
smooth_softmax : int
Use a smooth factor in softmax.
softcap : float
Softcap value for attention weights. Default value is 0.

Inputs (7 - 9)

query : T
key (optional) : T
value (optional) : T
past_key (optional) : T
past_value (optional) : T
seqlens_k : M
total_sequence_length : M
cos_cache (optional) : T
sin_cache (optional) : T

Outputs

output : T
present_key : T
present_value : T

Type Constraints

T : tensor(float16), tensor(bfloat16), tensor(float)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain mask to int tensor.

com.microsoft.Inverse

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.microsoft.Irfft

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

normalized : int
must be 0, normalization currently not supported
onesided : int
must be 1, only one sided FFTs supported
signal_ndim : int (required)
number of dimensions comprising the signal

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.

com.microsoft.LongformerAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

num_heads : int (required)
Number of attention heads
window : int (required)
One sided attention windows length W, or half of total window length

Inputs

input : T
weight : T
bias : T
mask : T
global_weight : T
global_bias : T
global : G

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
G : tensor(int32)
Constrain to integer types

com.microsoft.MatMulBnb4

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

K : int (required)
size of each input feature
N : int (required)
size of each output feature
block_size : int (required)
number of groupsize used for weight quantization. It needs to be a power of 2 and not smaller than 16.
quant_type : int (required)
quantization data type. 0 for FP4, 1 for NF4.
training_mode : int
Indicate if the ops run in training_mode, by default, False.
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication. Default to be 1.

Inputs

A : T1
B : T2
absmax : T1

Outputs

Y : T1

Type Constraints

T1 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float/half_float/brain_float tensors.
T2 : tensor(uint8)
Constrain quantized weight types to uint8.

com.microsoft.MatMulFpQ4

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

blk_quant_type : int
Quantization type

Inputs

A : T1
B : T2
B_shape : T3

Outputs

Y : T1

Type Constraints

T1 : tensor(float)
Constrain input matrix data types as single precision float tensor
T2 : tensor(uint8)
Constrain input B data types as data blob
T3 : tensor(int64)
Constrain shape of B must be int64 tensor.

com.microsoft.MatMulInteger16

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

A : T1
B : T2

Outputs

Y : T3

Type Constraints

T1 : tensor(int16), tensor(uint16)
Constrain input A data types as 16-bit integer tensor
T2 : tensor(int16), tensor(uint16)
Constrain input B data types as 16-bit integer tensor
T3 : tensor(int32), tensor(uint32)
Constrain output Y data types as 32-bit integer tensor.T3 must be tensor(uint32) when both T1 and T2 are tensor(uint16),or must be tensor(int32) when either T1 or T2 is tensor(int16).

com.microsoft.MatMulIntegerToFloat

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (4 - 7)

A : T1
B : T2
a_scale : T3
b_scale : T3
a_zero_point (optional) : T1
b_zero_point (optional) : T2
bias (optional) : T3

Outputs

Y : T3

Type Constraints

T1 : tensor(int8), tensor(uint8)
Constrain input A data type to 8-bit integer tensor.
T2 : tensor(int8), tensor(uint8)
Constrain input B data type to 8-bit integer tensor.
T3 : tensor(float), tensor(float16)
Constrain input a_scale, b_scale and output Y data type as float tensor.

com.microsoft.MatMulNBits

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

K : int (required)
size of each input feature
N : int (required)
size of each output feature
accuracy_level : int
The minimum accuracy level of input A, can be: 0(unset), 1(fp32), 2(fp16), 3(bf16), or 4(int8) (default unset). It is used to control how input A is quantized or downcast internally while doing computation, for example: 0 means input A will not be quantized or downcast while doing computation. 4 means input A can be quantized with the same block_size to int8 internally from type T1.
bits : int (required)
number of bits used for weight quantization (default 4)
block_size : int (required)
number of groupsize used for weight quantization,(default 128). It needs to be a power of 2 and not smaller than 16.

Inputs (3 - 6)

A : T1
B : T2
scales : T1
zero_points (optional) : T3
g_idx (optional) : T4
bias (optional) : T1

Outputs

Y : T1

Type Constraints

T1 : tensor(float), tensor(float16)
Constrain input and output types to float/half_float tensors.
T2 : tensor(uint8), tensor(int32)
Constrain quantized weight types to uint8/int32.
T3 : tensor(uint8), tensor(int32), tensor(float16), tensor(float)
Constrain quantized zero point types to uint8/int32/float16/float.
T4 : tensor(int32)
the index tensor.

com.microsoft.MaxpoolWithMask

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

auto_pad : string
kernel_shape : list of ints
pads : list of ints
storage_order : int
strides : list of ints

Inputs

X : T
M : tensor(int32)

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input0 and output types to float tensors

com.microsoft.MoE

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation_type : string
Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
k : int
Number of top experts to select from expert pool
normalize_routing_weights : int
Whether to normalize routing weights
use_sparse_mixer : int
Whether to use sparse mixer

Inputs (5 - 8)

input : T
router_probs : T
fc1_experts_weights : T
fc1_experts_bias (optional) : T
fc2_experts_weights : T
fc2_experts_bias (optional) : T
fc3_experts_weights (optional) : T
fc3_experts_bias (optional) : T

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float or float16 tensors.

com.microsoft.MulInteger

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (3 - 4)

A : T
A_zero_point (optional) : T
B : T
B_zero_point (optional) : T

Outputs

C : T1

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input types to 8 bit signed and unsigned tensors.
T1 : tensor(int32)
Constrain output types to 32 bit tensors.

com.microsoft.MultiHeadAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.

Inputs (1 - 8)

query : T
key (optional) : T
value (optional) : T
bias (optional) : T
key_padding_mask (optional) : M
attention_bias (optional) : T
past_key (optional) : T
past_value (optional) : T

Outputs (1 - 3)

output : T
present_key (optional) : T
present_value (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain mask to integer types

com.microsoft.MurmurHash3

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

positive : int
If value is 1, output type is uint32_t, else int32_t. Default value is 1.
seed : int
Seed for the hashing algorithm, unsigned 32-bit integer, default to 0.

Inputs

X : T1

Outputs

Y : T2

Type Constraints

T1 : tensor(uint32), tensor(int32), tensor(uint64), tensor(int64), tensor(float), tensor(double), tensor(string)
Constrain input type to unsigned or signed 32-bit integer tensor, or string tensor. It should be utf-8 encoded if using unicode.
T2 : tensor(uint32), tensor(int32)
Constrain output type to unsigned and signed 32-bit integer tensor.

com.microsoft.NGramRepeatBlock

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

ngram_size : int (required)
The NGram size.

Inputs

input_ids : Tid
scores : T

Outputs

scores_out : T

Type Constraints

Tid : tensor(int64)
Constrain indices to integer types
T : tensor(float)
Constrain scores input and output types to float tensors.

com.microsoft.NhwcConv

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

auto_pad : string
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
group : int
number of groups input channels and output channels are divided into.
kernel_shape : list of ints
The shape of the convolution kernel. If not present, should be inferred from input W.
pads : list of ints
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

Inputs (2 - 3)

X : T
W : T
B (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.microsoft.NhwcFusedConv

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

Inputs (2 - 4)

X : T
W : T
B (optional) : T
Z (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float16)
Constrain input and output types to float tensors

com.microsoft.NhwcMaxPool

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

auto_pad : string
ceil_mode : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
strides : list of ints

Inputs

x : T

Outputs

y : T

Type Constraints

T : tensor(int8), tensor(uint8)

com.microsoft.PackedAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

num_heads : int (required)
Number of attention heads
qkv_hidden_sizes : list of ints
Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)

Inputs (5 - 6)

input : T
weights : T
bias : T
token_offset : M
cumulative_sequence_length : M
attention_bias (optional) : T

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types

com.microsoft.PackedMultiHeadAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)

Inputs (6 - 7)

query : T
key (optional) : T
value (optional) : T
bias (optional) : T
token_offset : M
cumulative_sequence_length : M
attention_bias (optional) : T

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain mask, offset and sequence length to integer types

com.microsoft.Pad

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

mode : string
Three modes: `constant`(default) - pads with a given constant value, `reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, `edge` - pads with the edge values of array

Inputs (2 - 3)

data : T
pads : tensor(int64)
value (optional) : T

Outputs

output : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.microsoft.QAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

do_rotary : int
Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
past_present_share_buffer : int
Corresponding past and present are same tensor, its shape is (2, batch_size, num_heads, max_sequence_length, head_size)
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.

Inputs (5 - 9)

input : T1
weight : T2
bias : T3
input_scale : T3
weight_scale : T3
mask_index (optional) : T4
input_zero_point (optional) : T1
weight_zero_point (optional) : T2
past (optional) : T3

Outputs (1 - 2)

output : T3
present (optional) : T3

Type Constraints

T1 : tensor(int8), tensor(uint8)
Constrain input and output types to int8 tensors.
T2 : tensor(int8), tensor(uint8)
Constrain input and output types to int8 tensors.
T3 : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
T4 : tensor(int32)
Constrain mask index to integer types

com.microsoft.QGemm

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

alpha : float
Scalar multiplier for the product of input tensors A * B.
transA : int
Whether A should be transposed
transB : int
Whether B should be transposed

Inputs (6 - 9)

A : TA
a_scale : T
a_zero_point : TA
B : TB
b_scale : T
b_zero_point : TB
C (optional) : TC
y_scale (optional) : T
y_zero_point (optional) : TYZ

Outputs

Y : TY

Type Constraints

T : tensor(float)
Constrain scale types to float tensors.
TA : tensor(uint8), tensor(int8)
Constrain input A and its zero point types to 8 bit tensors.
TB : tensor(uint8), tensor(int8)
Constrain input B and its zero point types to 8 bit tensors.
TC : tensor(int32)
Constrain input C to 32 bit integer tensors.
TYZ : tensor(uint8), tensor(int8)
Constrain output zero point types to 8 bit tensors.
TY : tensor(float), tensor(uint8), tensor(int8)
Constrain output type to float32 or 8 bit tensors.

com.microsoft.QLinearAdd

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (7 - 8)

A : T
A_scale : tensor(float)
A_zero_point (optional) : T
B : T
B_scale : tensor(float)
B_zero_point (optional) : T
C_scale : tensor(float)
C_zero_point (optional) : T

Outputs

C : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.

com.microsoft.QLinearAveragePool

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding.
ceil_mode : int
Whether to use ceil or floor (default) to compute the output shape.
channels_last : int
Works on NHWC layout or not? Default not.
count_include_pad : int
Whether include pad pixels when calculating values for the edges. Default is 0, doesn't count include pad.
kernel_shape : list of ints (required)
The size of the kernel along each axis.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs (4 - 5)

X : T
x_scale : tensor(float)
x_zero_point (optional) : T
y_scale : tensor(float)
y_zero_point (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit tensors.

com.microsoft.QLinearConcat

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axis : int (required)
Which axis to concat on

Inputs (3 - ∞)

Y_scale : TF
Y_zero_point : T8
inputs (variadic, heterogeneous) : TV

Outputs

Y : T8

Type Constraints

T8 : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float)
Constrain scale types to any float tensor type.
TV : tensor(uint8), tensor(int8), tensor(float)
Sequence of (Tensor, Scale, ZeroPoint) tuples. The type is sequence of (T8, TF, T8).

com.microsoft.QLinearConv

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

auto_pad : string
channels_last : int
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

Inputs (8 - 9)

x : T1
x_scale : tensor(float)
x_zero_point : T1
w : T2
w_scale : tensor(float)
w_zero_point : T2
y_scale : tensor(float)
y_zero_point : T3
B (optional) : T4

Outputs

y : T3

Type Constraints

T1 : tensor(int8), tensor(uint8)
T2 : tensor(int8), tensor(uint8)
T3 : tensor(int8), tensor(uint8)
T4 : tensor(int32)

com.microsoft.QLinearGlobalAveragePool

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

channels_last : int

Inputs

X : T
x_scale : tensor(float)
x_zero_point : T
y_scale : tensor(float)
y_zero_point : T

Outputs

Y : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to signed/unsigned int8 tensors.

com.microsoft.QLinearLeakyRelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

alpha : float
Coefficient of leakage.

Inputs (4 - 5)

X : T
X_scale : tensor(float)
X_zero_point (optional) : T
Y_scale : tensor(float)
Y_zero_point (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit tensors.

com.microsoft.QLinearMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (7 - 8)

A : T
A_scale : tensor(float)
A_zero_point (optional) : T
B : T
B_scale : tensor(float)
B_zero_point (optional) : T
C_scale : tensor(float)
C_zero_point (optional) : T

Outputs

C : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.

com.microsoft.QLinearReduceMean

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axes : list of ints (required)
A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
keepdims : int (required)
Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs (4 - 5)

data : T
data_scale : tensor(float)
data_zero_point (optional) : T
reduced_scale : tensor(float)
reduced_zero_point (optional) : T

Outputs

reduced : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input types to 8 bit signed and unsigned tensors.

com.microsoft.QLinearSigmoid

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (4 - 5)

X : T
X_scale : tensor(float)
X_zero_point (optional) : T
Y_scale : tensor(float)
Y_zero_point (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit tensors.

com.microsoft.QLinearSoftmax

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axis : int
apply softmax to elements for dimensions axis,or all dims along with axis according to op-version
opset : int (required)
opset version of corresponding SoftMax.

Inputs

X : T
X_scale : tensor(float)
x_zero_point (optional) : T
y_scale : tensor(float)
y_zero_point : T

Outputs

Y : T

Type Constraints

T : tensor(uint8), tensor(int8)
Constrain input and output types to signed/unsigned int8 tensors.

com.microsoft.QLinearWhere

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

condition : B
X : T
x_scale : TF
x_zero_point : T
Y : T
y_scale : TF
y_zero_point : T
z_scale : TF
z_zero_point : T

Outputs

Z : T

Type Constraints

B : tensor(bool)
Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float)
Constrain scale types to any float tensor type.
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.

com.microsoft.QMoE

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation_type : string
Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
expert_weight_bits : int
Number of bits used in quantized weights. Default is 4 bits
k : int
Number of top experts to select from expert pool
normalize_routing_weights : int
Whether to normalize routing weights
use_sparse_mixer : int
Whether to use sparse mixer

Inputs (7 - 11)

input : T
router_probs : T
fc1_experts_weights : T1
fc1_scales : T
fc1_experts_bias (optional) : T
fc2_experts_weights : T1
fc2_scales : T
fc2_experts_bias (optional) : T
fc3_experts_weights (optional) : T1
fc3_scales (optional) : T
fc3_experts_bias (optional) : T

Outputs

output : T

Type Constraints

T : tensor(float16)
Constrain input and output types to float or float16 tensors.
T1 : tensor(uint8)
Constrain weights type to uint8 tensors.

com.microsoft.QOrderedAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

num_heads : int (required)
Number of attention heads
order_input : int (required)
cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required)
cublasLt order of global bias
order_weight : int (required)
cublasLt order of weight matrix
qkv_hidden_sizes : list of ints
Hidden layer sizes of Q, K, V paths in Attention
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.

Inputs (17 - 20)

input : Q
scale_input : S
scale_Q_gemm : S
scale_K_gemm : S
scale_V_gemm : S
Q_weight : Q
K_weight : Q
V_weight : Q
scale_Q_weight : S
scale_K_weight : S
scale_V_weight : S
Q_bias : S
K_bias : S
V_bias : S
scale_QKT_gemm (optional) : S
scale_QKT_softmax (optional) : S
scale_values_gemm : S
mask_index (optional) : G
past (optional) : Q
attention_bias (optional) : S

Outputs

output : Q

Type Constraints

Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain scales to float32 tensors.
G : tensor(int32)
Constrain to integer types

com.microsoft.QOrderedGelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

order_X : int
cublasLt order of input X. Optional. See the schema of QuantizeWithOrder for order definition.
order_Y : int
cublasLt order of matrix Y, must be same as order_X if specified together. Optional.

Inputs

X : Q
scale_X : S
scale_Y : S

Outputs

Y : Q

Type Constraints

Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain scales to float32

com.microsoft.QOrderedLayerNormalization

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axis : int
The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs).
epsilon : float
The epsilon value to use to avoid division by zero.
order_X : int
cublasLt order of input X. Default is ROW MAJOR. See the schema of QuantizeWithOrder for order definition.
order_Y : int
cublasLt order of matrix Y, must be same as order_X. Default is ROW MAJOR.

Inputs

X : Q
scale_X : S
scale : F
B (optional) : F
scale_Y : S

Outputs

Y : Q

Type Constraints

F : tensor(float16), tensor(float)
Constrain input gamma and bias could be float16/float tensors. float may get better precision, float16 runs faster.
S : tensor(float)
quantization scale must be float tensors.
Q : tensor(int8)
quantization tensor must be int8 tensors.

com.microsoft.QOrderedLongformerAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

num_heads : int (required)
Number of attention heads
order_global_weight : int (required)
cublasLt order of weight matrix
order_input : int (required)
cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required)
cublasLt order of global bias
order_weight : int (required)
cublasLt order of weight matrix
window : int (required)
One sided attention windows length W, or half of total window length

Inputs

input : Q
scale_input : S
weight : Q
scale_weight : S
bias : S
scale_bias : S
scale_qkv_gemm : S
mask : F
global_weight : Q
scale_global_weight : S
global_bias : S
scale_global_gemm : S
global : G
scale_output : S

Outputs

output : Q

Type Constraints

Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain scales to float32 tensors.
G : tensor(int32)
Constrain to integer types
F : tensor(float16)
Be compatible with float version.

com.microsoft.QOrderedMatMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

order_A : int (required)
cublasLt order of matrix A. See the schema of QuantizeWithOrder for order definition.
order_B : int (required)
cublasLt order of matrix B
order_Y : int (required)
cublasLt order of matrix Y and optional matrix C

Inputs (5 - 8)

A : Q
scale_A : S
B : Q
scale_B : S
scale_Y : S
bias (optional) : S
C (optional) : Q
scale_C (optional) : S

Outputs

Y : Q

Type Constraints

Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain bias and scales to float32

com.microsoft.QuantizeBFP

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

bfp_type : int (required)
The type of BFP - must match with the BFPType enum
block_dim : int
Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.

Inputs

x : T1

Outputs

y : T2
shape : T3
strides : T3

Type Constraints

T1 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain the input to float and bfloat.
T2 : tensor(uint8)
Constrain y to uint8.
T3 : tensor(int64)
Constrain shape and strides to uint64.

com.microsoft.QuantizeLinear

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axis : int
The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.

Inputs (2 - 3)

x : T1
y_scale : T1
y_zero_point (optional) : T2

Outputs

y : T2

Type Constraints

T1 : tensor(float16), tensor(float)
Constrain 'x', 'y_scale' to float tensors.
T2 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int4), tensor(uint4)
Constrain 'y_zero_point' and 'y' to 8-bit and 16-bit integer tensors.

com.microsoft.QuantizeWithOrder

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

order_input : int (required)
cublasLt order of input matrix. ORDER_COL = 0, ORDER_ROW = 1, ORDER_COL32 = 2, ORDER_COL4_4R2_8C = 3, ORDER_COL32_2R_4R4 = 4. Please refer https://docs.nvidia.com/cuda/cublas/index.html#cublasLtOrder_t for their meaning.
order_output : int (required)
cublasLt order of output matrix.

Inputs

input : F
scale_input : S

Outputs

output : Q

Type Constraints

Q : tensor(int8)
Constrain input and output types to int8 tensors.
F : tensor(float16), tensor(float)
Constrain to float types
S : tensor(float)
Constrain Scale to float32 types

com.microsoft.QuickGelu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

alpha : float
Alpha value.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.microsoft.Range

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (2 - 3)

start : T
limit : T
delta (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float), tensor(double), tensor(int16), tensor(int32), tensor(int64)
Constrain input and output types.

com.microsoft.ReduceSumInteger

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

axes : list of ints (required)
A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
keepdims : int (required)
Keep the reduced dimension or not, default 1 mean keep reduced dimension.

Inputs

data : T1

Outputs

reduced : T2

Type Constraints

T1 : tensor(int8), tensor(uint8)
Constrain input type to 8-bit integer tensor.
T2 : tensor(int32), tensor(uint32)
Constrain output data type to 32-bit integer tensor.T2 must be tensor(uint32) when T1 is tensor(uint8),or must be tensor(int32) when T1 is tensor(int8).

com.microsoft.RelativePositionBias

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

is_bidirectional : int
Default value is 0.
max_distance : int (required)
Max distance

Inputs

bias_table : T
query_length : U
key_length : U

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float or half tensors.
U : tensor(int64)
Constrain sequence_length to int tensors.

com.microsoft.RemovePadding

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

input : T
sequence_token_count : M

Outputs

output : T
token_offset : M
cumulated_seq_len : M
max_seq_len : M

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain sequence_token_count and token_offset to integer types

com.microsoft.RestorePadding

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

input : T
token_offset : M

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain token_offset to integer types

com.microsoft.Rfft

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

normalized : int
must be 0, normalization currently not supported
onesided : int
must be 1, only one sided FFTs supported
signal_ndim : int
number of dimensions comprising the signal, collected in reverse order (e.g. 1 = last dimension is the signal)

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.

com.microsoft.RotaryEmbedding

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

interleaved : int
Rotate using interleaved pattern. Default value is 0 (False).
is_packed_batching : int
ragged batch inputs or not. Default value is 0
num_heads : int
Number of attention heads. Default value is 0. Must use with rotary_embedding_dim
rotary_embedding_dim : int
Rotary embedding dimension. Default value is 0.
scale : float
Custom scale will be used if specified. Default value is 1.0

Inputs

input : T
position_ids : M
cos_cache : T
sin_cache : T

Outputs

output : T

Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float tensors.
M : tensor(int64)
Constrain input and output types to integer tensors

com.microsoft.SampleOp

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(uint32), tensor(uint64), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)
Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

com.microsoft.Sampling

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

custom : int
If 1 custom sampling logic
decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_start_token_id : int
The id of the token that indicates decoding starts.
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
filter_value : float
All filtered values will be set to this float value.
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
min_tokens_to_keep : int
Minimumber of tokens we keep per batch example in the output.
model_type : int
Model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
no_repeat_ngram_size : int
no repeat ngrams size
pad_token_id : int (required)
The id of the padding token
presence_penalty : float
Presence penalty for custom sampling
temperature : float
The value used to module the next token probabilities.
top_p : float
If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

Inputs (2 - 9)

input_ids : I
max_length : I
min_length (optional) : I
repetition_penalty (optional) : T
vocab_mask (optional) : I
prefix_vocab_mask (optional) : I
attention_mask (optional) : I
presence_mask (optional) : I
seed (optional) : I

Outputs (1 - 2)

sequences : I
filtered_logits (optional) : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors.
I : tensor(int32)
Constrain to integer types

com.microsoft.SkipGroupNorm

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

activation : int (required)
Activation after group normalization: 0 for None, 1 for SiLU
channels_last : int
1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
epsilon : float
The epsilon value to use to avoid division by zero
groups : int (required)
The number of groups of channels. It should be a divisor of the number of channels C

Inputs (4 - 5)

X : T
gamma : M
beta : M
skip : T
bias (optional) : T

Outputs (1 - 2)

Y : T
S (optional) : T

Type Constraints

T : tensor(float16), tensor(float)
Constrain input X, skip, bias and output Y, S types to float tensors.
M : tensor(float16), tensor(float)
Constrain gamma and beta to float tensors.

com.microsoft.SkipLayerNormalization

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

epsilon : float
The epsilon value to use to avoid division by zero.

Inputs (3 - 5)

input : T
skip : T
gamma : T
beta (optional) : T
bias (optional) : T

Outputs (1 - 4)

output : T
mean (optional) : U
inv_std_var (optional) : U
input_skip_bias_sum (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float or half tensors.
U : tensor(float)
Constrain mean and inv_std_var to float tensors.

com.microsoft.SkipSimplifiedLayerNormalization

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

epsilon : float
The epsilon value to use to avoid division by zero.

Inputs (3 - 4)

input : T
skip : T
gamma : T
bias (optional) : T

Outputs (1 - 4)

output : T
mean (optional) : U
inv_std_var (optional) : U
input_skip_bias_sum (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain input and output types to float or half tensors.
U : tensor(float)
Constrain mean and inv_std_var to float tensors.

com.microsoft.Snpe

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

DLC : string (required)
payload of the SNPE DLC file.
notes : string
(Optional) Some notes for the model
snpe_version : string
(Optional) SNPE version used to convert the model.
target_device : string
(Optional) Target device like CPU, DSP, etc.

Inputs (1 - ∞)

inputs (variadic) : T

Outputs (1 - ∞)

outputs (variadic) : T

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(float)
Constrain input and output types to uint8, uint16, float tensors.

com.microsoft.SparseAttention

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

do_rotary : int
Whether to use rotary position embedding. Default value is 0.
kv_num_heads : int (required)
Number of attention heads for key and value
num_heads : int (required)
Number of attention heads for query
rotary_interleaved : int
Rotary use interleaved pattern or not. Default value is 0.
scale : float
Scaling factor applied prior to softmax. The default value is 1/sqrt(head_size)
sparse_block_size : int (required)
Number of tokens per sparse block. Choices: 16, 32, 64, 128

Inputs (9 - 11)

query : T
key (optional) : T
value (optional) : T
past_key : T
past_value : T
block_row_indices : M
block_col_indices : M
total_sequence_length : M
key_total_sequence_lengths : M
cos_cache (optional) : T
sin_cache (optional) : T

Outputs

output : T
present_key : T
present_value : T

Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain integer type.

com.microsoft.SparseToDenseMatMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication

Inputs

A : T
B : T1

Outputs

Y : T1

Type Constraints

T : sparse_tensor(float), sparse_tensor(double), sparse_tensor(int64), sparse_tensor(int32), sparse_tensor(uint64), sparse_tensor(uint32)
Constrain input and output types to float tensors.
T1 : tensor(float), tensor(double), tensor(int64), tensor(int32), tensor(uint64), tensor(uint32)
Constrain input and output types to float tensors.

com.microsoft.Tokenizer

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

mark : int (required)
Boolean whether to mark the beginning/end character with start of text character (0x02)/end of text character (0x03).
mincharnum : int (required)
Minimum number of characters allowed in the output. For example, if mincharnum is 2, tokens such as "A" and "B" would be ignored
pad_value : string (required)
The string used to pad output tensors when the tokens extracted doesn't match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers.
separators : list of strings
an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is "Hello World!" and this attribute contains only one space character, the corresponding output would be ["Hello", "World!"]. To achieve character-level tokenization, one should set the 'separators' to [""], which contains an empty string.
tokenexp : string
An optional string. Token's regular expression in basic POSIX format (pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of 'tokenexp' and 'separators' should be set.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(string)
Input/Output is a string tensor

com.microsoft.TorchEmbedding

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs (2 - 4)

weight : T
indices : tensor(int64)
padding_idx (optional) : tensor(int64)
scale_grad_by_freq (optional) : tensor(bool)

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64)
Constrain input and output types to all numeric tensors.

com.microsoft.TransposeMatMul

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication

Inputs

A : T
B : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.microsoft.Trilu

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

upper : int
Boolean. Indicates whether upper or lower part of matrix is retained. Default is true.

Inputs (1 - 2)

X : T
k (optional) : tensor(int64)

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool)
Constrain input and output types to all numeric tensors and bool tensors.

com.microsoft.UnfoldTensor

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

dim : int
specify the dimension to unfold
size : int (required)
specify the size
step : int
specify the step.

Inputs

input : T

Outputs

output : T

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Allow inputs and outputs to be any kind of tensor.

com.microsoft.Unique

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Inputs

x : T

Outputs

y : T
idx : tensor(int64)
counts : tensor(int64)

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Input can be of any tensor type.

com.microsoft.WhisperBeamSearch

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

beginning_timestamp_token_id : int
The id of the first timestamp
decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_output_cross_qk : int
If nozero, decoder subgraph contains output Q*K from cross attentions. Default 0.
decoder_start_token_id : int
The id of the token that indicates decoding starts (i.e. the start of transcription token id)
early_stopping : int
early stop or not
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int
Must be 2 for whisper
no_repeat_ngram_size : int
no repeat ngrams size
no_speech_token_id : int
The token in whisper model that marks all sequence empty. With this model, whisper could output no_speech_prob after. Default -1.
no_timestamps_token_id : int
The id of the token that indicates no timestamps
pad_token_id : int (required)
The id of the padding token
start_of_lm_token_id : int
The id of the token that indicates LM starts
transcribe_token_id : int
The id of the transcribe task
translate_token_id : int
The id of the translate task
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

Inputs (5 - 15)

input_ids : F
max_length : I
min_length (optional) : I
num_beams : I
num_return_sequences : I
length_penalty (optional) : T
repetition_penalty (optional) : T
vocab_mask (optional) : M
prefix_vocab_mask (optional) : M
attention_mask (optional) : I
decoder_input_ids (optional) : I
logits_processor (optional) : I
cross_qk_layer_head (optional) : I
extra_decoding_ids (optional) : I
temperature (optional) : T

Outputs (1 - 5)

sequences : I
sequences_scores (optional) : T
scores (optional) : T
cross_qk (optional) : V
non_speech_probs (optional) : T

Type Constraints

T : tensor(float), tensor(float16)
Constrain to float tensors.
F : tensor(float), tensor(int32), tensor(float16)
Constrain input type to float or int tensors.
I : tensor(int32)
Constrain to integer types
M : tensor(int32)
Constrain mask to integer types
V : tensor(float)
Constrain cross_qk to float32 tensors.

com.microsoft.WordConvEmbedding

Version

This version of the operator has been available since version 1 of the 'com.microsoft' operator set.

Attributes

char_embedding_size : int
Integer representing the embedding vector size for each char.If not provide, use the char embedding size of embedding vector.
conv_window_size : int
This operator applies convolution to word from left to right with window equal to conv_window_size and stride to 1.Take word 'example' for example, with conv_window_size equal to 2, conv is applied to [ex],[xa], [am], [mp]...If not provide, use the first dimension of conv kernel shape.
embedding_size : int
Integer representing the embedding vector size for each word.If not provide, use the filter size of conv weight

Inputs

Sequence : T
W : T1
B : T1
C : T1

Outputs

Y : T1

Type Constraints

T : tensor(int32)
Constrain to tensor(int32).
T1 : tensor(float)
Constrain to tensor(float).

experimental com.microsoft.IsAllFinite

Version

No versioning maintained for experimental ops.

Attributes

isinf_only : int
If true, check only for Inf, -Inf.
isnan_only : int
If true, check only for NaN.

Inputs (1 - ∞)

input (variadic) : V

Outputs

output : T

Type Constraints

V : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T : tensor(bool)
Constrain the output to a boolean tensor.

experimental com.microsoft.QEmbedLayerNormalization

Version

No versioning maintained for experimental ops.

Attributes

epsilon : float
The epsilon value to use to avoid division by zero.

Inputs

input_ids : T1
segment_ids (optional) : T1
word_embedding_quant : T2
position_embedding_quant : T2
segment_embedding (optional) : T2
gamma_quant : T2
beta_quant : T2
mask (optional) : T1
word_embedding_scale : T
position_embedding_scale : T
segment_embedding_scale (optional) : T
gamma_scale : T
beta_scale : T
word_embedding_zero_point : T2
position_embedding_zero_point : T2
segment_embedding_zero_point (optional) : T2
gamma_zero_point : T2
beta_zero_point : T2

Outputs

layernorm_out : T
mask_index_out : T1

Type Constraints

T1 : tensor(int32)
Constrain mask index to integer types
T2 : tensor(int8), tensor(uint8)
Constrain input and output types to int8 tensors.
T : tensor(float)
Constrain input and output types to float32 tensors.

com.microsoft.nchwc

com.microsoft.nchwc.AveragePool

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Attributes

auto_pad : string
ceil_mode : int
count_include_pad : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
strides : list of ints

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.Conv

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Attributes

activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

Inputs (2 - 4)

X : T
W : T
B (optional) : T
Sum (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.GlobalAveragePool

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.GlobalMaxPool

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.MaxPool

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Attributes

auto_pad : string
ceil_mode : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
storage_order : int
strides : list of ints

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.ReorderInput

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Attributes

channels_last : int

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.ReorderOutput

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Attributes

channels : int
channels_last : int

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.microsoft.nchwc.Upsample

Version

This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.

Attributes

coordinate_transformation_mode : string
mode : string
scales : list of ints

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float)
Constrain input and output types to float tensors

com.ms.internal.nhwc

com.ms.internal.nhwc.BatchNormalization

Version

This version of the operator has been available since version 15 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.BatchNormalization-7, com.ms.internal.nhwc.BatchNormalization-9, com.ms.internal.nhwc.BatchNormalization-14

Attributes

activation : string
activation_params : list of floats
epsilon : float
The epsilon value to use to avoid division by zero.
momentum : float
Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).
training_mode : int
If set to true, it indicates BatchNormalization is being used for training, and outputs 1 and 2 are to be computed.

Inputs

X : T
scale : T1
B : T1
input_mean : T2
input_var : T2

Outputs (1 - 3)

Y : T
running_mean (optional) : T2
running_var (optional) : T2

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain scale and bias types to float tensors.
T2 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain mean and variance types to float tensors.

com.ms.internal.nhwc.ConvTranspose

Version

This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.ConvTranspose-1

Attributes

activation : string
activation_params : list of floats
auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = input_shape[i] * strides[i]` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
group : int
number of groups input channels and output channels are divided into.
kernel_shape : list of ints
The shape of the convolution kernel. If not present, should be inferred from input W.
output_padding : list of ints
Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
output_shape : list of ints
The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads. Note that the output_shape attribute value should not include dimensions for batch size and channels, which are automatically inferred.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs (2 - 3)

X : T
W : T
B (optional) : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.ms.internal.nhwc.DepthToSpace

Version

This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.DepthToSpace-1, com.ms.internal.nhwc.DepthToSpace-11

Attributes

blocksize : int (required)
Blocks of [blocksize, blocksize] are moved.
mode : string
DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order.

Inputs

input : T

Outputs

output : T

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input and output types to all tensor types.

com.ms.internal.nhwc.GlobalLpPool

Version

This version of the operator has been available since version 2 of the 'com.ms.internal.nhwc' operator set.

Attributes

p : int
p value of the Lp norm used to pool over the input data.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(bfloat16), tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.ms.internal.nhwc.InstanceNormalization

Version

This version of the operator has been available since version 6 of the 'com.ms.internal.nhwc' operator set.

Attributes

activation : string
activation_params : list of floats
epsilon : float
The epsilon value to use to avoid division by zero.

Inputs

input : T
scale : T
B : T

Outputs

output : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.ms.internal.nhwc.LRN

Version

This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.LRN-1

Attributes

alpha : float
Scaling parameter.
beta : float
The exponent.
bias : float
size : int (required)
The number of channels to sum over

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.

com.ms.internal.nhwc.LpPool

Version

This version of the operator has been available since version 18 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.LpPool-11

Attributes

auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = ceil(input_shape[i] / strides[i])` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
ceil_mode : int
Whether to use ceil or floor (default) to compute the output shape.
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
kernel_shape : list of ints (required)
The size of the kernel along each axis.
p : int
p value of the Lp norm used to pool over the input data.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

X : T

Outputs

Y : T

Type Constraints

T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.

com.ms.internal.nhwc.MaxUnpool

Version

This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.MaxUnpool-9

Attributes

activation : string
activation_params : list of floats
kernel_shape : list of ints (required)
The size of the kernel along each axis.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs (2 - 3)

X : T1
I : T2
output_shape (optional) : T2

Outputs

output : T1

Type Constraints

T1 : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
T2 : tensor(int64)
Constrain index tensor to int64

com.ms.internal.nhwc.QLinearConvTranspose

Version

This version of the operator has been available since version 1 of the 'com.ms.internal.nhwc' operator set.

Attributes

auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
group : int
number of groups input channels and output channels are divided into.
kernel_shape : list of ints
The shape of the convolution kernel. If not present, should be inferred from input W.
output_padding : list of ints
Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
output_shape : list of ints
The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads
pads : list of ints
Padding for the beginning and ending along each spatial axis
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs (8 - 9)

x : T1
x_scale : tensor(float)
x_zero_point : T1
w : T2
w_scale : tensor(float)
w_zero_point : T2
y_scale : tensor(float)
y_zero_point : T3
B (optional) : T4

Outputs

y : T3

Type Constraints

T1 : tensor(int8), tensor(uint8)
Constrain input type to 8-bit integer tensor.
T2 : tensor(int8), tensor(uint8)
Constrain filter type to 8-bit integer tensor.
T3 : tensor(int8), tensor(uint8)
Constrain output type to 8-bit integer tensor.
T4 : tensor(int32)
Constrain bias type to 32-bit integer tensor.

com.ms.internal.nhwc.Resize

Version

This version of the operator has been available since version 19 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.Resize-11, com.ms.internal.nhwc.Resize-13, com.ms.internal.nhwc.Resize-18

Attributes

antialias : int
If set to 1, "linear" and "cubic" interpolation modes will use an antialiasing filter when downscaling. Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel.
axes : list of ints
If provided, it specifies a subset of axes that 'roi', 'scales' and 'sizes' refer to. If not provided, all axes are assumed [0, 1, ..., r-1], where r = rank(data). Non-specified dimensions are interpreted as non-resizable. Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.
coordinate_transformation_mode : string
This attribute describes how to transform the coordinate in the resized tensor to the coordinate in the original tensor.

The coordinate of each dimension is transformed individually. Let's describe a case using axis x as an example. Denote x_resized as the coordinate of axis x in the resized tensor, x_original as the coordinate of axis x in the original tensor, length_original as the length of the original tensor in axis x, length_resized as the length of the resized tensor in axis x, scale = length_resized / length_original, output_width the target length on the axis x which can be a fractional number when it is calculated out of a scale factor, and output_width_int the effective output width as an integer.

if coordinate_transformation_mode is "half_pixel",

x_original = (x_resized + 0.5) / scale - 0.5

if coordinate_transformation_mode is "half_pixel_symmetric",

adjustment = output_width_int / output_width
center = input_width / 2
offset = center * (1 - adjustment)
x_ori = offset + (x + 0.5) / scale - 0.5

if coordinate_transformation_mode is "pytorch_half_pixel",

x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0

if coordinate_transformation_mode is "align_corners",

x_original = x_resized * (length_original - 1) / (length_resized - 1)

if coordinate_transformation_mode is "asymmetric",

x_original = x_resized / scale

if coordinate_transformation_mode is "tf_crop_and_resize",

x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1)

.

cubic_coeff_a : float
The coefficient 'a' used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if mode is "cubic".
exclude_outside : int
If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0.
extrapolation_value : float
When coordinate_transformation_mode is "tf_crop_and_resize" and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f.
keep_aspect_ratio_policy : string
This attribute describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input, and it is not applicable when the `scales` input is used.

Given a set of sizes, associated with a subset of axes (explicitly provided or default), and assuming d = axes[i], with i being the index of the provided sizes.

If keep_aspect_ratio_policy is "stretch", the original aspect ratio is disregarded, and the input is resized to the specified size: out_size[d] = sizes[i]

If keep_aspect_ratio_policy is "not_larger", the sizes are adjusted so that no extent of the output is larger than the specified size, while keeping the original aspect ratio:

scale = Min(sizes[i] / in_size[d])
out_size[d] = round_int(scale * in_size[i])

If keep_aspect_ratio_policy is "not_smaller", the sizes are adjusted so that no extent of the output is smaller than the specified size, while keeping the original aspect ratio:

scale = Max(sizes[i] / in_size[d])
out_size[d] = round_int(scale * in_size[i])

For non-resizable axes (those not specified in axes), the output size will be equal to the input size.

Note: round_int stands for computing the nearest integer value, rounding halfway cases up.

mode : string
Three interpolation modes: "nearest" (default), "linear" and "cubic". The "linear" mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The "cubic" mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor).
nearest_mode : string
Four modes: "round_prefer_floor" (default, as known as round half down), "round_prefer_ceil" (as known as round half up), "floor", "ceil". Only used by nearest interpolation. It indicates how to get "nearest" pixel in input tensor from x_original, so this attribute is valid only if "mode" is "nearest".

Inputs (1 - 4)

X : T1
roi (optional) : T2
scales (optional) : tensor(float)
sizes (optional) : tensor(int64)

Outputs

Y : T1

Type Constraints

T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input 'X' and output 'Y' to all tensor types.
T2 : tensor(float16), tensor(float), tensor(double)
Constrain roi type to float or double.

com.ms.internal.nhwc.SpaceToDepth

Version

This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set.

Other versions of this operator: com.ms.internal.nhwc.SpaceToDepth-1

Attributes

blocksize : int (required)
Blocks of [blocksize, blocksize] are moved.

Inputs

input : T

Outputs

output : T

Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input and output types to all tensor types.