do_rotary : int: Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads
past_present_share_buffer : int: Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
qkv_hidden_sizes : list of ints: Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
rotary_embedding_dim : int: Dimension of rotary embedding. Limited to 32, 64 or 128. Default value is head_size
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int: Whether every token can only attend to previous tokens. Default value is 0.

#### Inputs (2 - 7)

input : T
weights : T
bias (optional) : T
mask_index (optional) : M
past (optional) : T
attention_bias (optional) : T
past_sequence_length (optional) : M

#### Outputs (1 - 2)

output : T
present (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain mask index to integer types

### **com.microsoft.AttnLSTM** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation_alpha : list of floats: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
activation_beta : list of floats: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
activations : list of strings: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
clip : float: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
direction : string: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
hidden_size : int: Number of neurons in the hidden layer.
input_forget : int: Couple the input and forget gates if 1, default 0.

#### Inputs (3 - 14)

X : T
W : T
R : T
B (optional) : T
sequence_lens (optional) : T1
initial_h (optional) : T
initial_c (optional) : T
P (optional) : T
QW (optional) : T
MW (optional) : T
V (optional) : T
M (optional) : T
memory_seq_lens (optional) : T1
AW (optional) : T

#### Outputs (0 - 3)

Y (optional) : T
Y_h (optional) : T
Y_c (optional) : T

#### Type Constraints

T : tensor(float), tensor(double): Constrain input and output types to float tensors.
T1 : tensor(int32): Constrain seq_lens to integral tensors.

### **com.microsoft.BeamSearch** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

decoder : graph (required): Decoder subgraph to execute in a loop.
decoder_start_token_id : int: The id of the token that indicates decoding starts.
early_stopping : int: early stop or not
encoder : graph: The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required): The id of the end-of-sequence token
init_decoder : graph: The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int: model type: 0 for GPT-2; 1 for encoder decoder like T5
no_repeat_ngram_size : int: no repeat ngrams size
pad_token_id : int (required): The id of the padding token
vocab_size : int: Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

#### Inputs (5 - 12)

input_ids : F
max_length : I
min_length (optional) : I
num_beams : I
num_return_sequences : I
length_penalty (optional) : T
repetition_penalty (optional) : T
vocab_mask (optional) : M
prefix_vocab_mask (optional) : M
attention_mask (optional) : I
decoder_input_ids (optional) : I
logits_processor (optional) : I

#### Outputs (1 - 3)

sequences : I
sequences_scores (optional) : T
scores (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain to float tensors.
F : tensor(float), tensor(int32), tensor(float16): Constrain input type to float or int tensors.
I : tensor(int32): Constrain to integer types
M : tensor(int32): Constrain mask to integer types

### **com.microsoft.BiasAdd** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

X : T
bias : T
skip : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float): Constrain input and output types to float tensors.

### **com.microsoft.BiasDropout** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

seed : int: (Optional) Seed to the random generator, if not specified we will auto generate one.

#### Inputs (2 - 5)

data : T
bias : T
residual (optional) : T
ratio (optional) : T1
training_mode (optional) : T2

#### Outputs (1 - 2)

output : T
mask (optional) : T2

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input 'ratio' types to float tensors.
T2 : tensor(bool): Constrain output 'mask' types to boolean tensors.

### **com.microsoft.BiasGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

A : T
B : T

#### Outputs

C : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.microsoft.BiasSoftmax** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axis : int: apply softmax to elements for dimensions axis or higher
is_inner_broadcast : int (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis - 1

#### Inputs

data : T
bias : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.microsoft.BiasSplitGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

X : T
bias : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float): Constrain input X and output Y types to float tensors.

### **com.microsoft.BifurcationDetector** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

max_ngram_size : int: The maximum NGram size for suffix matching.
min_ngram_size : int: The minimum NGram size for suffix matching.

#### Inputs (3 - 4)

src_tokens : T
cur_tokens : T
prev_suffix_match_idx : T
pred_tokens (optional) : T

#### Outputs

tokens : T
suffix_match_idx : T

#### Type Constraints

T : tensor(int64): Constrain to integer types.

### **com.microsoft.BitmaskBiasDropout** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

seed : int: (Optional) Seed to the random generator, if not specified we will auto generate one.

#### Inputs (2 - 5)

data : T
bias : T
residual (optional) : T
ratio (optional) : T1
training_mode (optional) : T2

#### Outputs (1 - 2)

output : T
mask (optional) : T3

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input 'ratio' types to float tensors.
T2 : tensor(bool): Constrain input 'training_mode' types to boolean tensors.
T3 : tensor(uint32): Constrain output 'mask' types to uint32 tensors.

### **com.microsoft.BitmaskDropout** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

seed : int: (Optional) Seed to the random generator, if not specified we will auto generate one.

#### Inputs (1 - 3)

data : T
ratio (optional) : T1
training_mode (optional) : T2

#### Outputs (1 - 2)

output : T
mask (optional) : T3

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input 'ratio' types to float tensors.
T2 : tensor(bool): Constrain 'training_mode' to boolean tensor.
T3 : tensor(uint32): Constrain output 'mask' types to bit-packed uint32 tensor.

### **com.microsoft.CDist** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

metric : string: The distance metric to use. If a string, the distance function can be "braycurtis", "canberra", "chebyshev", "cityblock", "correlation", "cosine", "dice", "euclidean", "hamming", "jaccard", "jensenshannon", "kulsinski", "mahalanobis", "matching", "minkowski", "rogerstanimoto", "russellrao", "seuclidean", "sokalmichener", "sokalsneath", "sqeuclidean", "wminkowski", "yule".

#### Inputs

A : T
B : T

#### Outputs

C : T

#### Type Constraints

T : tensor(float), tensor(double): Constrains input to only numeric types.

### **com.microsoft.ComplexMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

A : T
B : T

#### Outputs

C : T

#### Type Constraints

T : tensor(float), tensor(double), tensor(float16): Constrain input and output types to float or half tensors.

### **com.microsoft.ComplexMulConj** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

A : T
B : T

#### Outputs

C : T

#### Type Constraints

T : tensor(float), tensor(double), tensor(float16): Constrain input and output types to float or half tensors.

### **com.microsoft.ConvTransposeWithDynamicPads** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
output_padding : list of ints
strides : list of ints

#### Inputs (2 - 4)

X : T
W : T
Pads (optional) : tensor(int64)
B (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors

### **com.microsoft.CropAndResize** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

extrapolation_value : float: Value used for extrapolation, when applicable. Default is 0.0f.
mode : string: The pooling method. Two modes are supported: 'bilinear' and 'nearest'. Default is 'bilinear'.

#### Inputs

X : T1
rois : T1
batch_indices : T2
crop_size : T2

#### Outputs

Y : T1

#### Type Constraints

T1 : tensor(float16), tensor(float), tensor(double): Constrain types to float tensors.
T2 : tensor(int32): Constrain types to int tensors.

### **com.microsoft.DecoderAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads

#### Inputs

query : T
key : T
q_weight : T
kv_weight : T
bias : T
key_padding_mask (optional) : B
key_cache (optional) : T
value_cache (optional) : T
static_kv : B
use_past : B
has_layer_state : B
has_key_padding_mask : B

#### Outputs (1 - 3)

output : T
new_key_cache (optional) : T
new_value_cache (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float and float16 tensors.
B : tensor(bool): Constrain key_padding_mask to bool tensors.

### **com.microsoft.DecoderMaskedMultiHeadAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads
output_qk : int: Need output the cross attention MatMul(Q, K)
past_present_share_buffer : int: Corresponding past and present are same tensor, its size is (batch_size, num_heads, max_sequence_length, head_size)
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)

#### Inputs (1 - 11)

query : T
key (optional) : T
value (optional) : T
mask_index (optional) : M
attention_bias (optional) : T
past_key (optional) : T
past_value (optional) : T
past_sequence_length (optional) : M
beam_width (optional) : M
cache_indirection (optional) : M
bias (optional) : T

#### Outputs (1 - 4)

output : T
present_key (optional) : T
present_value (optional) : T
qk (optional) : V

#### Type Constraints

V : tensor(float): Constrain qk output types to float32 tensors.
T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain mask index to integer types

### **com.microsoft.DecoderMaskedSelfAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

do_rotary : int: Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads
past_present_share_buffer : int: Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)

#### Inputs (7 - 9)

input : T
weights : T
bias : T
mask_index (optional) : M
past : T
attention_bias (optional) : T
past_sequence_length : M
beam_width (optional) : M
cache_indirection (optional) : M

#### Outputs

output : T
present : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain mask index to integer types

### **com.microsoft.DequantizeBFP** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

bfp_type : int (required): The type of BFP - must match with the BFPType enum
block_dim : int: Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.
dtype : int: The datatype to dequantize to.

#### Inputs

x : T1
shape : T2
strides : T2

#### Outputs

y : T3

#### Type Constraints

T1 : tensor(uint8): Constrain the input to uint8.
T2 : tensor(int64): Constrain shape and strides to uint64.
T3 : tensor(float), tensor(float16), tensor(bfloat16): Constrain y to float and bfloat16.

### **com.microsoft.DequantizeLinear** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axis : int: The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.

#### Inputs (2 - 3)

x : T1
x_scale : T2
x_zero_point (optional) : T1

#### Outputs

y : T2

#### Type Constraints

T1 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int32), tensor(int4), tensor(uint4): Constrain 'x' and 'x_zero_point' to 8-bit integer tensors, 16-bit integer tensors, or 32-bit signed integer tensors.
T2 : tensor(float16), tensor(float): Constrain 'y', 'x_scale' to float tensors.

### **com.microsoft.DequantizeWithOrder** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

order_input : int (required): cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required): cublasLt order of output matrix
to : int (required): The output data type, only support TensorProto_DataType_FLOAT (1) and TensorProto_DataType_FLOAT16 (10)

#### Inputs

input : Q
scale_input : S

#### Outputs

output : F

#### Type Constraints

Q : tensor(int8): Constrain input and output types to int8 tensors.
F : tensor(float16), tensor(float): Constrain to float types
S : tensor(float): Constrain Scale to float32 types

### **com.microsoft.DynamicQuantizeLSTM** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation_alpha : list of floats: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
activation_beta : list of floats: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
activations : list of strings: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
clip : float: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
direction : string: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
hidden_size : int: Number of neurons in the hidden layer
input_forget : int: Couple the input and forget gates if 1.

#### Inputs

X : T
W : T2
R : T2
B (optional) : T
sequence_lens (optional) : T1
initial_h (optional) : T
initial_c (optional) : T
P (optional) : T
W_scale : T
W_zero_point : T2
R_scale : T
R_zero_point : T2

#### Outputs (0 - 3)

Y (optional) : T
Y_h (optional) : T
Y_c (optional) : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors.
T1 : tensor(int32): Constrain seq_lens to integer tensor.
T2 : tensor(uint8), tensor(int8): Constrain weights types to 8 bit tensors.

### **com.microsoft.DynamicQuantizeMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (3 - 5)

A : T1
B : T2
b_scale : T1
b_zero_point (optional) : T2
bias (optional) : T1

#### Outputs

Y : T1

#### Type Constraints

T1 : tensor(float): Constrain input A, b_scale and output Y data type as float tensor.
T2 : tensor(int8), tensor(uint8): Constrain input B data type to 8-bit integer tensor.

### **com.microsoft.DynamicTimeWarping** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

input : F

#### Outputs

output : I

#### Type Constraints

F : tensor(float): Constrain to float tensors.
I : tensor(int32): Constrain to integer types.

### **com.microsoft.EPContext** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

embed_mode : int: 1: indicate ep_cache_context is the context content. 0: indicate ep_cache_context is the file path to the context content.The path is relative to this Onnx file. Default is 1.
ep_cache_context : string: payload of the execution provider context if embed_mode=1, or path to the context file if embed_mode=0.
ep_sdk_version : string: (Optional) SDK version used to convert the model.
hardware_architecture : string: (Optional) Hardware architecture.
main_context : int: Usually each single EPContext associate with a graph partition.But for some case like QNN, it has single EPContext contains all partitions.In that case, the node with ep_cache_context should set main_context=1. Other nodes set main_context=0 and skip ep_cache_context.The path is relative to this Onnx file. Default is 1.
max_size : int: max size in the context. Usage depend on the EP.
notes : string: (Optional) Some notes for the model
onnx_model_filename : string: (Optional) Filename of the original ONNX model.
partition_name : string: (Optional) partitioned graph name.
source : string: (Optional) the source used to generate the engine/context cache file. Ort EP or native SDK tool chain

#### Inputs (1 - ∞)

inputs (variadic, heterogeneous) : T

#### Outputs (1 - ∞)

outputs (variadic, heterogeneous) : T

#### Type Constraints

T : tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types.

### **com.microsoft.EmbedLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

epsilon : float: The epsilon value to use to avoid division by zero.
mask_index_type : int: The mask index tensor type for shape inference (0: None, 1: 1D mask_index)

#### Inputs (7 - 9)

input_ids : T1
segment_ids (optional) : T1
word_embedding : T
position_embedding : T
segment_embedding (optional) : T
gamma : T
beta : T
mask (optional) : T1
position_ids (optional) : T1

#### Outputs (1 - 3)

output : T
mask_index (optional) : T1
embedding_sum (optional) : T

#### Type Constraints

T1 : tensor(int32): Constrain input and output integer tensors types
T : tensor(float), tensor(float16): Constrain input and output float tensors types.

### **com.microsoft.ExpandDims** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

X : T
axis : tensor(int32)

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

### **com.microsoft.FastGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (1 - 2)

X : T
bias (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16): Constrain input and output types to float or half tensors.

### **com.microsoft.FusedConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

#### Inputs (2 - 4)

X : T
W : T
B (optional) : T
Z (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors

### **com.microsoft.FusedGemm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : string
activation_alpha : float
activation_beta : float
activation_gamma : float
alpha : float: Scalar multiplier for the product of input tensors A * B.
beta : float: Scalar multiplier for input tensor C.
transA : int: Whether A should be transposed
transB : int: Whether B should be transposed

#### Inputs (2 - 3)

A : T
B : T
C (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(uint32), tensor(uint64), tensor(int32), tensor(int64): Constrain input and output types to float/int tensors.

### **com.microsoft.FusedMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

alpha : float: Scalar multiplier for the product of the input tensors.
transA : int: Whether A should be transposed on the last two dimensions before doing multiplication
transB : int: Whether B should be transposed on the last two dimensions before doing multiplication
transBatchA : int: Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
transBatchB : int: Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication

#### Inputs

A : T
B : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.microsoft.FusedMatMulActivation** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : string (required)
activation_alpha : float
activation_axis : int
activation_beta : float
activation_gamma : float
alpha : float: Scalar multiplier for the product of the input tensors.
transA : int: Whether A should be transposed on the last two dimensions before doing multiplication
transB : int: Whether B should be transposed on the last two dimensions before doing multiplication
transBatchA : int: Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
transBatchB : int: Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication

#### Inputs

A : T
B : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.microsoft.GatedRelativePositionBias** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

num_heads : int (required): Number of attention heads

#### Inputs (6 - 7)

query_layer : T
query_bias : T
rel_pos : T
weight : T
bias : T
eco_a : T
token_offset (optional) : M

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain token_offset to integer types

### **com.microsoft.GatherBlockQuantized** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

block_size : int: (Optional) block size used for weight quantization. It needs to be a power of 2 and not smaller than 16.
gather_axis : int: (Optional) Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
quantize_axis : int: (Optional) Which axis to block-wise quantize. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).

#### Inputs (3 - 4)

data : T1
indices : Tind
scales : T2
zero_points (optional) : T1

#### Outputs

output : T2

#### Type Constraints

T1 : tensor(int4), tensor(uint4): Constrain quantized types.
T2 : tensor(float), tensor(float16), tensor(bfloat16): Constrain dequantized types.
Tind : tensor(int32), tensor(int64): Constrain indices to integer types.

### **com.microsoft.GatherND** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

data : T
indices : Tind

#### Outputs

output : T

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Constrain input and output types to any tensor type.
Tind : tensor(int32), tensor(int64): Constrain indice type to int32 or int64

### **com.microsoft.Gelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.microsoft.GemmFastGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (2 - 3)

X : T
W : T
bias (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16): Constrain input and output types to float or half tensors.

### **com.microsoft.GemmFloat8** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : string: Activation function, RELU or GELU or NONE (default).
alpha : float: Scalar multiplier for the product of input tensors A * B.
beta : float: Scalar multiplier for the product of input bias C.
dtype : int: Output Type. Same definition as attribute 'to' for operator Cast.
transA : int: Whether A should be transposed. Float 8 only supprted transA=0.
transB : int: Whether B should be transposed. Float 8 only supprted transB=1.

#### Inputs (2 - 6)

A : TA
B : TB
C (optional) : TC
scaleA (optional) : TS
scaleB (optional) : TS
scaleY (optional) : TS

#### Outputs

Y : TR

#### Type Constraints

TA : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float): Constrain type to input A.
TB : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float): Constrain type to input B.
TC : tensor(float16), tensor(bfloat16), tensor(float): Constrain type to input C.
TR : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float): Constrain type to result type.
TS : tensor(float): Constrain type for all input scales (scaleA, scaleB, scaleY).

### **com.microsoft.GemmaRotaryEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

emb : U
q : T
q_rot : T
k : T
k_rot : T

#### Outputs

output1 : T
output2 : T

#### Type Constraints

T : tensor(float16): Constrain input and output types to float16 tensors.
U : tensor(float): Constrain input 0 type to float tensors

### **com.microsoft.GreedySearch** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

decoder : graph (required): Decoder subgraph to execute in a loop.
decoder_start_token_id : int: The id of the token that indicates decoding starts.
encoder : graph: The subgraph for initialization of encoder and decoder. It will be called once before `decoder` subgraph.
eos_token_id : int (required): The id of the end-of-sequence token
init_decoder : graph: The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int: model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
no_repeat_ngram_size : int: no repeat ngrams size
pad_token_id : int (required): The id of the padding token
vocab_size : int: Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

#### Inputs (2 - 7)

input_ids : I
max_length : I
min_length (optional) : I
repetition_penalty (optional) : T
vocab_mask (optional) : I
prefix_vocab_mask (optional) : I
attention_mask (optional) : I

#### Outputs

sequences : I

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors.
I : tensor(int32): Constrain to integer types

### **com.microsoft.GridSample** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

align_corners : int: If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
mode : string: Three interpolation modes: bilinear (default), nearest and bicubic.
padding_mode : string: Support padding modes for outside grid values: `zeros`(default), `border`, `reflection`. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations.

#### Inputs

X : T1
Grid : T1

#### Outputs

Y : T2

#### Type Constraints

T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Constrain input types to all tensor types.
T2 : tensor(float16), tensor(float), tensor(double): Constrain output types to float tensors.

### **com.microsoft.GroupNorm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : int (required): Activation after group normalization: 0 for None, 1 for SiLU
channels_last : int: 1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
epsilon : float: The epsilon value to use to avoid division by zero
groups : int (required): The number of groups of channels. It should be a divisor of the number of channels C

#### Inputs

X : T
gamma : M
beta : M

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float): Constrain input X and output Y types to float tensors.
M : tensor(float16), tensor(float): Constrain gamma and beta to float tensors.

### **com.microsoft.GroupQueryAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

do_rotary : int: Whether to use rotary position embedding. Default value is 0.
kv_num_heads : int (required): Number of attention heads for k and v
local_window_size : int: left_window_size for local attention (like Mistral). Default value is -1 meaning unused.
num_heads : int (required): Number of attention heads for q
rotary_interleaved : int: Rotate using interleaved pattern. Default value is 0 (False).
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)
smooth_softmax : int: Use a smooth factor in softmax.
softcap : float: Softcap value for attention weights. Default value is 0.

#### Inputs (7 - 9)

query : T
key (optional) : T
value (optional) : T
past_key (optional) : T
past_value (optional) : T
seqlens_k : M
total_sequence_length : M
cos_cache (optional) : T
sin_cache (optional) : T

#### Outputs

output : T
present_key : T
present_value : T

#### Type Constraints

T : tensor(float16), tensor(bfloat16), tensor(float): Constrain input and output to float tensors.
M : tensor(int32): Constrain mask to int tensor.

### **com.microsoft.Inverse** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.microsoft.Irfft** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

normalized : int: must be 0, normalization currently not supported
onesided : int: must be 1, only one sided FFTs supported
signal_ndim : int (required): number of dimensions comprising the signal

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float), tensor(double), tensor(float16): Constrain input and output types to float or half tensors.

### **com.microsoft.LongformerAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

num_heads : int (required): Number of attention heads
window : int (required): One sided attention windows length W, or half of total window length

#### Inputs

input : T
weight : T
bias : T
mask : T
global_weight : T
global_bias : T
global : G

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
G : tensor(int32): Constrain to integer types

### **com.microsoft.MatMulBnb4** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

K : int (required): size of each input feature
N : int (required): size of each output feature
block_size : int (required): number of groupsize used for weight quantization. It needs to be a power of 2 and not smaller than 16.
quant_type : int (required): quantization data type. 0 for FP4, 1 for NF4.
training_mode : int: Indicate if the ops run in training_mode, by default, False.
transB : int: Whether B should be transposed on the last two dimensions before doing multiplication. Default to be 1.

#### Inputs

A : T1
B : T2
absmax : T1

#### Outputs

Y : T1

#### Type Constraints

T1 : tensor(float), tensor(float16), tensor(bfloat16): Constrain input and output types to float/half_float/brain_float tensors.
T2 : tensor(uint8): Constrain quantized weight types to uint8.

### **com.microsoft.MatMulFpQ4** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

blk_quant_type : int: Quantization type

#### Inputs

A : T1
B : T2
B_shape : T3

#### Outputs

Y : T1

#### Type Constraints

T1 : tensor(float): Constrain input matrix data types as single precision float tensor
T2 : tensor(uint8): Constrain input B data types as data blob
T3 : tensor(int64): Constrain shape of B must be int64 tensor.

### **com.microsoft.MatMulInteger16** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

A : T1
B : T2

#### Outputs

Y : T3

#### Type Constraints

T1 : tensor(int16), tensor(uint16): Constrain input A data types as 16-bit integer tensor
T2 : tensor(int16), tensor(uint16): Constrain input B data types as 16-bit integer tensor
T3 : tensor(int32), tensor(uint32): Constrain output Y data types as 32-bit integer tensor.T3 must be tensor(uint32) when both T1 and T2 are tensor(uint16),or must be tensor(int32) when either T1 or T2 is tensor(int16).

### **com.microsoft.MatMulIntegerToFloat** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (4 - 7)

A : T1
B : T2
a_scale : T3
b_scale : T3
a_zero_point (optional) : T1
b_zero_point (optional) : T2
bias (optional) : T3

#### Outputs

Y : T3

#### Type Constraints

T1 : tensor(int8), tensor(uint8): Constrain input A data type to 8-bit integer tensor.
T2 : tensor(int8), tensor(uint8): Constrain input B data type to 8-bit integer tensor.
T3 : tensor(float), tensor(float16): Constrain input a_scale, b_scale and output Y data type as float tensor.

### **com.microsoft.MatMulNBits** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

K : int (required): size of each input feature
N : int (required): size of each output feature
accuracy_level : int: The minimum accuracy level of input A, can be: 0(unset), 1(fp32), 2(fp16), 3(bf16), or 4(int8) (default unset). It is used to control how input A is quantized or downcast internally while doing computation, for example: 0 means input A will not be quantized or downcast while doing computation. 4 means input A can be quantized with the same block_size to int8 internally from type T1.
bits : int (required): number of bits used for weight quantization (default 4)
block_size : int (required): number of groupsize used for weight quantization,(default 128). It needs to be a power of 2 and not smaller than 16.

#### Inputs (3 - 6)

A : T1
B : T2
scales : T1
zero_points (optional) : T3
g_idx (optional) : T4
bias (optional) : T1

#### Outputs

Y : T1

#### Type Constraints

T1 : tensor(float), tensor(float16): Constrain input and output types to float/half_float tensors.
T2 : tensor(uint8), tensor(int32): Constrain quantized weight types to uint8/int32.
T3 : tensor(uint8), tensor(int32), tensor(float16), tensor(float): Constrain quantized zero point types to uint8/int32/float16/float.
T4 : tensor(int32): the index tensor.

### **com.microsoft.MaxpoolWithMask** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

auto_pad : string
kernel_shape : list of ints
pads : list of ints
storage_order : int
strides : list of ints

#### Inputs

X : T
M : tensor(int32)

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input0 and output types to float tensors

### **com.microsoft.MoE** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation_type : string: Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
k : int: Number of top experts to select from expert pool
normalize_routing_weights : int: Whether to normalize routing weights
use_sparse_mixer : int: Whether to use sparse mixer

#### Inputs (5 - 8)

input : T
router_probs : T
fc1_experts_weights : T
fc1_experts_bias (optional) : T
fc2_experts_weights : T
fc2_experts_bias (optional) : T
fc3_experts_weights (optional) : T
fc3_experts_bias (optional) : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float or float16 tensors.

### **com.microsoft.MulInteger** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (3 - 4)

A : T
A_zero_point (optional) : T
B : T
B_zero_point (optional) : T

#### Outputs

C : T1

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input types to 8 bit signed and unsigned tensors.
T1 : tensor(int32): Constrain output types to 32 bit tensors.

### **com.microsoft.MultiHeadAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int: Whether every token can only attend to previous tokens. Default value is 0.

#### Inputs (1 - 8)

query : T
key (optional) : T
value (optional) : T
bias (optional) : T
key_padding_mask (optional) : M
attention_bias (optional) : T
past_key (optional) : T
past_value (optional) : T

#### Outputs (1 - 3)

output : T
present_key (optional) : T
present_value (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output to float tensors.
M : tensor(int32): Constrain mask to integer types

### **com.microsoft.MurmurHash3** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

positive : int: If value is 1, output type is uint32_t, else int32_t. Default value is 1.
seed : int: Seed for the hashing algorithm, unsigned 32-bit integer, default to 0.

#### Inputs

X : T1

#### Outputs

Y : T2

#### Type Constraints

T1 : tensor(uint32), tensor(int32), tensor(uint64), tensor(int64), tensor(float), tensor(double), tensor(string): Constrain input type to unsigned or signed 32-bit integer tensor, or string tensor. It should be utf-8 encoded if using unicode.
T2 : tensor(uint32), tensor(int32): Constrain output type to unsigned and signed 32-bit integer tensor.

### **com.microsoft.NGramRepeatBlock** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

ngram_size : int (required): The NGram size.

#### Inputs

input_ids : Tid
scores : T

#### Outputs

scores_out : T

#### Type Constraints

Tid : tensor(int64): Constrain indices to integer types
T : tensor(float): Constrain scores input and output types to float tensors.

### **com.microsoft.NhwcConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

auto_pad : string
dilations : list of ints: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
group : int: number of groups input channels and output channels are divided into.
kernel_shape : list of ints: The shape of the convolution kernel. If not present, should be inferred from input W.
pads : list of ints
strides : list of ints: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

#### Inputs (2 - 3)

X : T
W : T
B (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.microsoft.NhwcFusedConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

#### Inputs (2 - 4)

X : T
W : T
B (optional) : T
Z (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16): Constrain input and output types to float tensors

### **com.microsoft.NhwcMaxPool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

auto_pad : string
ceil_mode : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
strides : list of ints

#### Inputs

x : T

#### Outputs

y : T

#### Type Constraints

T : tensor(int8), tensor(uint8)

### **com.microsoft.PackedAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

num_heads : int (required): Number of attention heads
qkv_hidden_sizes : list of ints: Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)

#### Inputs (5 - 6)

input : T
weights : T
bias : T
token_offset : M
cumulative_sequence_length : M
attention_bias (optional) : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain mask index to integer types

### **com.microsoft.PackedMultiHeadAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)

#### Inputs (6 - 7)

query : T
key (optional) : T
value (optional) : T
bias (optional) : T
token_offset : M
cumulative_sequence_length : M
attention_bias (optional) : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output to float tensors.
M : tensor(int32): Constrain mask, offset and sequence length to integer types

### **com.microsoft.Pad** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

mode : string: Three modes: `constant`(default) - pads with a given constant value, `reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, `edge` - pads with the edge values of array

#### Inputs (2 - 3)

data : T
pads : tensor(int64)
value (optional) : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.microsoft.QAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

do_rotary : int: Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float: The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required): Number of attention heads
past_present_share_buffer : int: Corresponding past and present are same tensor, its shape is (2, batch_size, num_heads, max_sequence_length, head_size)
scale : float: Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int: Whether every token can only attend to previous tokens. Default value is 0.

#### Inputs (5 - 9)

input : T1
weight : T2
bias : T3
input_scale : T3
weight_scale : T3
mask_index (optional) : T4
input_zero_point (optional) : T1
weight_zero_point (optional) : T2
past (optional) : T3

#### Outputs (1 - 2)

output : T3
present (optional) : T3

#### Type Constraints

T1 : tensor(int8), tensor(uint8): Constrain input and output types to int8 tensors.
T2 : tensor(int8), tensor(uint8): Constrain input and output types to int8 tensors.
T3 : tensor(float), tensor(float16): Constrain input and output types to float tensors.
T4 : tensor(int32): Constrain mask index to integer types

### **com.microsoft.QGemm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

alpha : float: Scalar multiplier for the product of input tensors A * B.
transA : int: Whether A should be transposed
transB : int: Whether B should be transposed

#### Inputs (6 - 9)

A : TA
a_scale : T
a_zero_point : TA
B : TB
b_scale : T
b_zero_point : TB
C (optional) : TC
y_scale (optional) : T
y_zero_point (optional) : TYZ

#### Outputs

Y : TY

#### Type Constraints

T : tensor(float): Constrain scale types to float tensors.
TA : tensor(uint8), tensor(int8): Constrain input A and its zero point types to 8 bit tensors.
TB : tensor(uint8), tensor(int8): Constrain input B and its zero point types to 8 bit tensors.
TC : tensor(int32): Constrain input C to 32 bit integer tensors.
TYZ : tensor(uint8), tensor(int8): Constrain output zero point types to 8 bit tensors.
TY : tensor(float), tensor(uint8), tensor(int8): Constrain output type to float32 or 8 bit tensors.

### **com.microsoft.QLinearAdd** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (7 - 8)

A : T
A_scale : tensor(float)
A_zero_point (optional) : T
B : T
B_scale : tensor(float)
B_zero_point (optional) : T
C_scale : tensor(float)
C_zero_point (optional) : T

#### Outputs

C : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit signed and unsigned tensors.

### **com.microsoft.QLinearAveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

auto_pad : string: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding.
ceil_mode : int: Whether to use ceil or floor (default) to compute the output shape.
channels_last : int: Works on NHWC layout or not? Default not.
count_include_pad : int: Whether include pad pixels when calculating values for the edges. Default is 0, doesn't count include pad.
kernel_shape : list of ints (required): The size of the kernel along each axis.
pads : list of ints: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

#### Inputs (4 - 5)

X : T
x_scale : tensor(float)
x_zero_point (optional) : T
y_scale : tensor(float)
y_zero_point (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit tensors.

### **com.microsoft.QLinearConcat** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axis : int (required): Which axis to concat on

#### Inputs (3 - ∞)

Y_scale : TF
Y_zero_point : T8
inputs (variadic, heterogeneous) : TV

#### Outputs

Y : T8

#### Type Constraints

T8 : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float): Constrain scale types to any float tensor type.
TV : tensor(uint8), tensor(int8), tensor(float): Sequence of (Tensor, Scale, ZeroPoint) tuples. The type is sequence of (T8, TF, T8).

### **com.microsoft.QLinearConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

auto_pad : string
channels_last : int
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

#### Inputs (8 - 9)

x : T1
x_scale : tensor(float)
x_zero_point : T1
w : T2
w_scale : tensor(float)
w_zero_point : T2
y_scale : tensor(float)
y_zero_point : T3
B (optional) : T4

#### Outputs

y : T3

#### Type Constraints

T1 : tensor(int8), tensor(uint8)
T2 : tensor(int8), tensor(uint8)
T3 : tensor(int8), tensor(uint8)
T4 : tensor(int32)

### **com.microsoft.QLinearGlobalAveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

channels_last : int

#### Inputs

X : T
x_scale : tensor(float)
x_zero_point : T
y_scale : tensor(float)
y_zero_point : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to signed/unsigned int8 tensors.

### **com.microsoft.QLinearLeakyRelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

alpha : float: Coefficient of leakage.

#### Inputs (4 - 5)

X : T
X_scale : tensor(float)
X_zero_point (optional) : T
Y_scale : tensor(float)
Y_zero_point (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit tensors.

### **com.microsoft.QLinearMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (7 - 8)

A : T
A_scale : tensor(float)
A_zero_point (optional) : T
B : T
B_scale : tensor(float)
B_zero_point (optional) : T
C_scale : tensor(float)
C_zero_point (optional) : T

#### Outputs

C : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit signed and unsigned tensors.

### **com.microsoft.QLinearReduceMean** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axes : list of ints (required): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
keepdims : int (required): Keep the reduced dimension or not, default 1 mean keep reduced dimension.

#### Inputs (4 - 5)

data : T
data_scale : tensor(float)
data_zero_point (optional) : T
reduced_scale : tensor(float)
reduced_zero_point (optional) : T

#### Outputs

reduced : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input types to 8 bit signed and unsigned tensors.

### **com.microsoft.QLinearSigmoid** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (4 - 5)

X : T
X_scale : tensor(float)
X_zero_point (optional) : T
Y_scale : tensor(float)
Y_zero_point (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit tensors.

### **com.microsoft.QLinearSoftmax** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axis : int: apply softmax to elements for dimensions axis,or all dims along with axis according to op-version
opset : int (required): opset version of corresponding SoftMax.

#### Inputs

X : T
X_scale : tensor(float)
x_zero_point (optional) : T
y_scale : tensor(float)
y_zero_point : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint8), tensor(int8): Constrain input and output types to signed/unsigned int8 tensors.

### **com.microsoft.QLinearWhere** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

condition : B
X : T
x_scale : TF
x_zero_point : T
Y : T
y_scale : TF
y_zero_point : T
z_scale : TF
z_zero_point : T

#### Outputs

Z : T

#### Type Constraints

B : tensor(bool): Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float): Constrain scale types to any float tensor type.
T : tensor(uint8), tensor(int8): Constrain input and output types to 8 bit signed and unsigned tensors.

### **com.microsoft.QMoE** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation_type : string: Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
expert_weight_bits : int: Number of bits used in quantized weights. Default is 4 bits
k : int: Number of top experts to select from expert pool
normalize_routing_weights : int: Whether to normalize routing weights
use_sparse_mixer : int: Whether to use sparse mixer

#### Inputs (7 - 11)

input : T
router_probs : T
fc1_experts_weights : T1
fc1_scales : T
fc1_experts_bias (optional) : T
fc2_experts_weights : T1
fc2_scales : T
fc2_experts_bias (optional) : T
fc3_experts_weights (optional) : T1
fc3_scales (optional) : T
fc3_experts_bias (optional) : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float16): Constrain input and output types to float or float16 tensors.
T1 : tensor(uint8): Constrain weights type to uint8 tensors.

### **com.microsoft.QOrderedAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

num_heads : int (required): Number of attention heads
order_input : int (required): cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required): cublasLt order of global bias
order_weight : int (required): cublasLt order of weight matrix
qkv_hidden_sizes : list of ints: Hidden layer sizes of Q, K, V paths in Attention
unidirectional : int: Whether every token can only attend to previous tokens. Default value is 0.

#### Inputs (17 - 20)

input : Q
scale_input : S
scale_Q_gemm : S
scale_K_gemm : S
scale_V_gemm : S
Q_weight : Q
K_weight : Q
V_weight : Q
scale_Q_weight : S
scale_K_weight : S
scale_V_weight : S
Q_bias : S
K_bias : S
V_bias : S
scale_QKT_gemm (optional) : S
scale_QKT_softmax (optional) : S
scale_values_gemm : S
mask_index (optional) : G
past (optional) : Q
attention_bias (optional) : S

#### Outputs

output : Q

#### Type Constraints

Q : tensor(int8): Constrain input and output types to int8 tensors.
S : tensor(float): Constrain scales to float32 tensors.
G : tensor(int32): Constrain to integer types

### **com.microsoft.QOrderedGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

order_X : int: cublasLt order of input X. Optional. See the schema of QuantizeWithOrder for order definition.
order_Y : int: cublasLt order of matrix Y, must be same as order_X if specified together. Optional.

#### Inputs

X : Q
scale_X : S
scale_Y : S

#### Outputs

Y : Q

#### Type Constraints

Q : tensor(int8): Constrain input and output types to int8 tensors.
S : tensor(float): Constrain scales to float32

### **com.microsoft.QOrderedLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axis : int: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs).
epsilon : float: The epsilon value to use to avoid division by zero.
order_X : int: cublasLt order of input X. Default is ROW MAJOR. See the schema of QuantizeWithOrder for order definition.
order_Y : int: cublasLt order of matrix Y, must be same as order_X. Default is ROW MAJOR.

#### Inputs

X : Q
scale_X : S
scale : F
B (optional) : F
scale_Y : S

#### Outputs

Y : Q

#### Type Constraints

F : tensor(float16), tensor(float): Constrain input gamma and bias could be float16/float tensors. float may get better precision, float16 runs faster.
S : tensor(float): quantization scale must be float tensors.
Q : tensor(int8): quantization tensor must be int8 tensors.

### **com.microsoft.QOrderedLongformerAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

num_heads : int (required): Number of attention heads
order_global_weight : int (required): cublasLt order of weight matrix
order_input : int (required): cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required): cublasLt order of global bias
order_weight : int (required): cublasLt order of weight matrix
window : int (required): One sided attention windows length W, or half of total window length

#### Inputs

input : Q
scale_input : S
weight : Q
scale_weight : S
bias : S
scale_bias : S
scale_qkv_gemm : S
mask : F
global_weight : Q
scale_global_weight : S
global_bias : S
scale_global_gemm : S
global : G
scale_output : S

#### Outputs

output : Q

#### Type Constraints

Q : tensor(int8): Constrain input and output types to int8 tensors.
S : tensor(float): Constrain scales to float32 tensors.
G : tensor(int32): Constrain to integer types
F : tensor(float16): Be compatible with float version.

### **com.microsoft.QOrderedMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

order_A : int (required): cublasLt order of matrix A. See the schema of QuantizeWithOrder for order definition.
order_B : int (required): cublasLt order of matrix B
order_Y : int (required): cublasLt order of matrix Y and optional matrix C

#### Inputs (5 - 8)

A : Q
scale_A : S
B : Q
scale_B : S
scale_Y : S
bias (optional) : S
C (optional) : Q
scale_C (optional) : S

#### Outputs

Y : Q

#### Type Constraints

Q : tensor(int8): Constrain input and output types to int8 tensors.
S : tensor(float): Constrain bias and scales to float32

### **com.microsoft.QuantizeBFP** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

bfp_type : int (required): The type of BFP - must match with the BFPType enum
block_dim : int: Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.

#### Inputs

x : T1

#### Outputs

y : T2
shape : T3
strides : T3

#### Type Constraints

T1 : tensor(float), tensor(float16), tensor(bfloat16): Constrain the input to float and bfloat.
T2 : tensor(uint8): Constrain y to uint8.
T3 : tensor(int64): Constrain shape and strides to uint64.

### **com.microsoft.QuantizeLinear** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axis : int: The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.

#### Inputs (2 - 3)

x : T1
y_scale : T1
y_zero_point (optional) : T2

#### Outputs

y : T2

#### Type Constraints

T1 : tensor(float16), tensor(float): Constrain 'x', 'y_scale' to float tensors.
T2 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int4), tensor(uint4): Constrain 'y_zero_point' and 'y' to 8-bit and 16-bit integer tensors.

### **com.microsoft.QuantizeWithOrder** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

order_input : int (required): cublasLt order of input matrix. ORDER_COL = 0, ORDER_ROW = 1, ORDER_COL32 = 2, ORDER_COL4_4R2_8C = 3, ORDER_COL32_2R_4R4 = 4. Please refer https://docs.nvidia.com/cuda/cublas/index.html#cublasLtOrder_t for their meaning.
order_output : int (required): cublasLt order of output matrix.

#### Inputs

input : F
scale_input : S

#### Outputs

output : Q

#### Type Constraints

Q : tensor(int8): Constrain input and output types to int8 tensors.
F : tensor(float16), tensor(float): Constrain to float types
S : tensor(float): Constrain Scale to float32 types

### **com.microsoft.QuickGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

alpha : float: Alpha value.

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.microsoft.Range** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (2 - 3)

start : T
limit : T
delta (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float), tensor(double), tensor(int16), tensor(int32), tensor(int64): Constrain input and output types.

### **com.microsoft.ReduceSumInteger** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

axes : list of ints (required): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
keepdims : int (required): Keep the reduced dimension or not, default 1 mean keep reduced dimension.

#### Inputs

data : T1

#### Outputs

reduced : T2

#### Type Constraints

T1 : tensor(int8), tensor(uint8): Constrain input type to 8-bit integer tensor.
T2 : tensor(int32), tensor(uint32): Constrain output data type to 32-bit integer tensor.T2 must be tensor(uint32) when T1 is tensor(uint8),or must be tensor(int32) when T1 is tensor(int8).

### **com.microsoft.RelativePositionBias** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

is_bidirectional : int: Default value is 0.
max_distance : int (required): Max distance

#### Inputs

bias_table : T
query_length : U
key_length : U

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float or half tensors.
U : tensor(int64): Constrain sequence_length to int tensors.

### **com.microsoft.RemovePadding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

input : T
sequence_token_count : M

#### Outputs

output : T
token_offset : M
cumulated_seq_len : M
max_seq_len : M

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain sequence_token_count and token_offset to integer types

### **com.microsoft.RestorePadding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

input : T
token_offset : M

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float tensors.
M : tensor(int32): Constrain token_offset to integer types

### **com.microsoft.Rfft** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

normalized : int: must be 0, normalization currently not supported
onesided : int: must be 1, only one sided FFTs supported
signal_ndim : int: number of dimensions comprising the signal, collected in reverse order (e.g. 1 = last dimension is the signal)

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float), tensor(double), tensor(float16): Constrain input and output types to float or half tensors.

### **com.microsoft.RotaryEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

interleaved : int: Rotate using interleaved pattern. Default value is 0 (False).
is_packed_batching : int: ragged batch inputs or not. Default value is 0
num_heads : int: Number of attention heads. Default value is 0. Must use with rotary_embedding_dim
rotary_embedding_dim : int: Rotary embedding dimension. Default value is 0.
scale : float: Custom scale will be used if specified. Default value is 1.0

#### Inputs

input : T
position_ids : M
cos_cache : T
sin_cache : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16): Constrain input and output types to float tensors.
M : tensor(int64): Constrain input and output types to integer tensors

### **com.microsoft.SampleOp** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(uint32), tensor(uint64), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double): Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

### **com.microsoft.Sampling** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

custom : int: If 1 custom sampling logic
decoder : graph (required): Decoder subgraph to execute in a loop.
decoder_start_token_id : int: The id of the token that indicates decoding starts.
encoder : graph: The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required): The id of the end-of-sequence token
filter_value : float: All filtered values will be set to this float value.
init_decoder : graph: The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
min_tokens_to_keep : int: Minimumber of tokens we keep per batch example in the output.
model_type : int: Model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
no_repeat_ngram_size : int: no repeat ngrams size
pad_token_id : int (required): The id of the padding token
presence_penalty : float: Presence penalty for custom sampling
temperature : float: The value used to module the next token probabilities.
top_p : float: If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
vocab_size : int: Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

#### Inputs (2 - 9)

input_ids : I
max_length : I
min_length (optional) : I
repetition_penalty (optional) : T
vocab_mask (optional) : I
prefix_vocab_mask (optional) : I
attention_mask (optional) : I
presence_mask (optional) : I
seed (optional) : I

#### Outputs (1 - 2)

sequences : I
filtered_logits (optional) : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors.
I : tensor(int32): Constrain to integer types

### **com.microsoft.SkipGroupNorm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

activation : int (required): Activation after group normalization: 0 for None, 1 for SiLU
channels_last : int: 1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
epsilon : float: The epsilon value to use to avoid division by zero
groups : int (required): The number of groups of channels. It should be a divisor of the number of channels C

#### Inputs (4 - 5)

X : T
gamma : M
beta : M
skip : T
bias (optional) : T

#### Outputs (1 - 2)

Y : T
S (optional) : T

#### Type Constraints

T : tensor(float16), tensor(float): Constrain input X, skip, bias and output Y, S types to float tensors.
M : tensor(float16), tensor(float): Constrain gamma and beta to float tensors.

### **com.microsoft.SkipLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

epsilon : float: The epsilon value to use to avoid division by zero.

#### Inputs (3 - 5)

input : T
skip : T
gamma : T
beta (optional) : T
bias (optional) : T

#### Outputs (1 - 4)

output : T
mean (optional) : U
inv_std_var (optional) : U
input_skip_bias_sum (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float or half tensors.
U : tensor(float): Constrain mean and inv_std_var to float tensors.

### **com.microsoft.SkipSimplifiedLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

epsilon : float: The epsilon value to use to avoid division by zero.

#### Inputs (3 - 4)

input : T
skip : T
gamma : T
bias (optional) : T

#### Outputs (1 - 4)

output : T
mean (optional) : U
inv_std_var (optional) : U
input_skip_bias_sum (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain input and output types to float or half tensors.
U : tensor(float): Constrain mean and inv_std_var to float tensors.

### **com.microsoft.Snpe** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

DLC : string (required): payload of the SNPE DLC file.
notes : string: (Optional) Some notes for the model
snpe_version : string: (Optional) SNPE version used to convert the model.
target_device : string: (Optional) Target device like CPU, DSP, etc.

#### Inputs (1 - ∞)

inputs (variadic) : T

#### Outputs (1 - ∞)

outputs (variadic) : T

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(float): Constrain input and output types to uint8, uint16, float tensors.

### **com.microsoft.SparseAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

do_rotary : int: Whether to use rotary position embedding. Default value is 0.
kv_num_heads : int (required): Number of attention heads for key and value
num_heads : int (required): Number of attention heads for query
rotary_interleaved : int: Rotary use interleaved pattern or not. Default value is 0.
scale : float: Scaling factor applied prior to softmax. The default value is 1/sqrt(head_size)
sparse_block_size : int (required): Number of tokens per sparse block. Choices: 16, 32, 64, 128

#### Inputs (9 - 11)

query : T
key (optional) : T
value (optional) : T
past_key : T
past_value : T
block_row_indices : M
block_col_indices : M
total_sequence_length : M
key_total_sequence_lengths : M
cos_cache (optional) : T
sin_cache (optional) : T

#### Outputs

output : T
present_key : T
present_value : T

#### Type Constraints

T : tensor(float), tensor(float16), tensor(bfloat16): Constrain input and output to float tensors.
M : tensor(int32): Constrain integer type.

### **com.microsoft.SparseToDenseMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

alpha : float: Scalar multiplier for the product of the input tensors.
transA : int: Whether A should be transposed on the last two dimensions before doing multiplication
transB : int: Whether B should be transposed on the last two dimensions before doing multiplication

#### Inputs

A : T
B : T1

#### Outputs

Y : T1

#### Type Constraints

T : sparse_tensor(float), sparse_tensor(double), sparse_tensor(int64), sparse_tensor(int32), sparse_tensor(uint64), sparse_tensor(uint32): Constrain input and output types to float tensors.
T1 : tensor(float), tensor(double), tensor(int64), tensor(int32), tensor(uint64), tensor(uint32): Constrain input and output types to float tensors.

### **com.microsoft.Tokenizer** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

mark : int (required): Boolean whether to mark the beginning/end character with start of text character (0x02)/end of text character (0x03).
mincharnum : int (required): Minimum number of characters allowed in the output. For example, if mincharnum is 2, tokens such as "A" and "B" would be ignored
pad_value : string (required): The string used to pad output tensors when the tokens extracted doesn't match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers.
separators : list of strings: an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is "Hello World!" and this attribute contains only one space character, the corresponding output would be ["Hello", "World!"]. To achieve character-level tokenization, one should set the 'separators' to [""], which contains an empty string.
tokenexp : string: An optional string. Token's regular expression in basic POSIX format (pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of 'tokenexp' and 'separators' should be set.

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(string): Input/Output is a string tensor

### **com.microsoft.TorchEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (2 - 4)

weight : T
indices : tensor(int64)
padding_idx (optional) : tensor(int64)
scale_grad_by_freq (optional) : tensor(bool)

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64): Constrain input and output types to all numeric tensors.

### **com.microsoft.TransposeMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

alpha : float: Scalar multiplier for the product of the input tensors.
transA : int: Whether A should be transposed on the last two dimensions before doing multiplication
transB : int: Whether B should be transposed on the last two dimensions before doing multiplication

#### Inputs

A : T
B : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.microsoft.Trilu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

upper : int: Boolean. Indicates whether upper or lower part of matrix is retained. Default is true.

#### Inputs (1 - 2)

X : T
k (optional) : tensor(int64)

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool): Constrain input and output types to all numeric tensors and bool tensors.

### **com.microsoft.UnfoldTensor** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

dim : int: specify the dimension to unfold
size : int (required): specify the size
step : int: specify the step.

#### Inputs

input : T

#### Outputs

output : T

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Allow inputs and outputs to be any kind of tensor.

### **com.microsoft.Unique** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs

x : T

#### Outputs

y : T
idx : tensor(int64)
counts : tensor(int64)

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Input can be of any tensor type.

### **com.microsoft.WhisperBeamSearch** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

beginning_timestamp_token_id : int: The id of the first timestamp
decoder : graph (required): Decoder subgraph to execute in a loop.
decoder_output_cross_qk : int: If nozero, decoder subgraph contains output Q*K from cross attentions. Default 0.
decoder_start_token_id : int: The id of the token that indicates decoding starts (i.e. the start of transcription token id)
early_stopping : int: early stop or not
encoder : graph: The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required): The id of the end-of-sequence token
init_decoder : graph: The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int: Must be 2 for whisper
no_repeat_ngram_size : int: no repeat ngrams size
no_speech_token_id : int: The token in whisper model that marks all sequence empty. With this model, whisper could output no_speech_prob after. Default -1.
no_timestamps_token_id : int: The id of the token that indicates no timestamps
pad_token_id : int (required): The id of the padding token
start_of_lm_token_id : int: The id of the token that indicates LM starts
transcribe_token_id : int: The id of the transcribe task
translate_token_id : int: The id of the translate task
vocab_size : int: Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape

#### Inputs (5 - 15)

input_ids : F
max_length : I
min_length (optional) : I
num_beams : I
num_return_sequences : I
length_penalty (optional) : T
repetition_penalty (optional) : T
vocab_mask (optional) : M
prefix_vocab_mask (optional) : M
attention_mask (optional) : I
decoder_input_ids (optional) : I
logits_processor (optional) : I
cross_qk_layer_head (optional) : I
extra_decoding_ids (optional) : I
temperature (optional) : T

#### Outputs (1 - 5)

sequences : I
sequences_scores (optional) : T
scores (optional) : T
cross_qk (optional) : V
non_speech_probs (optional) : T

#### Type Constraints

T : tensor(float), tensor(float16): Constrain to float tensors.
F : tensor(float), tensor(int32), tensor(float16): Constrain input type to float or int tensors.
I : tensor(int32): Constrain to integer types
M : tensor(int32): Constrain mask to integer types
V : tensor(float): Constrain cross_qk to float32 tensors.

### **com.microsoft.WordConvEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes

char_embedding_size : int: Integer representing the embedding vector size for each char.If not provide, use the char embedding size of embedding vector.
conv_window_size : int: This operator applies convolution to word from left to right with window equal to conv_window_size and stride to 1.Take word 'example' for example, with conv_window_size equal to 2, conv is applied to [ex],[xa], [am], [mp]...If not provide, use the first dimension of conv kernel shape.
embedding_size : int: Integer representing the embedding vector size for each word.If not provide, use the filter size of conv weight

#### Inputs

Sequence : T
W : T1
B : T1
C : T1

#### Outputs

Y : T1

#### Type Constraints

T : tensor(int32): Constrain to tensor(int32).
T1 : tensor(float): Constrain to tensor(float).

### _experimental **com.microsoft.IsAllFinite** #### Version No versioning maintained for experimental ops. #### Attributes

isinf_only : int: If true, check only for Inf, -Inf.
isnan_only : int: If true, check only for NaN.

#### Inputs (1 - ∞)

input (variadic) : V

#### Outputs

output : T

#### Type Constraints

V : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.
T : tensor(bool): Constrain the output to a boolean tensor.

### _experimental **com.microsoft.QEmbedLayerNormalization** #### Version No versioning maintained for experimental ops. #### Attributes

epsilon : float: The epsilon value to use to avoid division by zero.

#### Inputs

input_ids : T1
segment_ids (optional) : T1
word_embedding_quant : T2
position_embedding_quant : T2
segment_embedding (optional) : T2
gamma_quant : T2
beta_quant : T2
mask (optional) : T1
word_embedding_scale : T
position_embedding_scale : T
segment_embedding_scale (optional) : T
gamma_scale : T
beta_scale : T
word_embedding_zero_point : T2
position_embedding_zero_point : T2
segment_embedding_zero_point (optional) : T2
gamma_zero_point : T2
beta_zero_point : T2

#### Outputs

layernorm_out : T
mask_index_out : T1

#### Type Constraints

T1 : tensor(int32): Constrain mask index to integer types
T2 : tensor(int8), tensor(uint8): Constrain input and output types to int8 tensors.
T : tensor(float): Constrain input and output types to float32 tensors.

## com.microsoft.nchwc ### **com.microsoft.nchwc.AveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes

auto_pad : string
ceil_mode : int
count_include_pad : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
strides : list of ints

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.Conv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes

activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints

#### Inputs (2 - 4)

X : T
W : T
B (optional) : T
Sum (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.GlobalAveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.GlobalMaxPool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.MaxPool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes

auto_pad : string
ceil_mode : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
storage_order : int
strides : list of ints

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.ReorderInput** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes

channels_last : int

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.ReorderOutput** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes

channels : int
channels_last : int

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

### **com.microsoft.nchwc.Upsample** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes

coordinate_transformation_mode : string
mode : string
scales : list of ints

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float): Constrain input and output types to float tensors

## com.ms.internal.nhwc ### **com.ms.internal.nhwc.BatchNormalization** #### Version This version of the operator has been available since version 15 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.BatchNormalization-7, com.ms.internal.nhwc.BatchNormalization-9, com.ms.internal.nhwc.BatchNormalization-14 #### Attributes

activation : string
activation_params : list of floats
epsilon : float: The epsilon value to use to avoid division by zero.
momentum : float: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).
training_mode : int: If set to true, it indicates BatchNormalization is being used for training, and outputs 1 and 2 are to be computed.

#### Inputs

X : T
scale : T1
B : T1
input_mean : T2
input_var : T2

#### Outputs (1 - 3)

Y : T
running_mean (optional) : T2
running_var (optional) : T2

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain scale and bias types to float tensors.
T2 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain mean and variance types to float tensors.

### **com.ms.internal.nhwc.ConvTranspose** #### Version This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.ConvTranspose-1 #### Attributes

activation : string
activation_params : list of floats
auto_pad : string: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = input_shape[i] * strides[i]` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
dilations : list of ints: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
group : int: number of groups input channels and output channels are divided into.
kernel_shape : list of ints: The shape of the convolution kernel. If not present, should be inferred from input W.
output_padding : list of ints: Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
output_shape : list of ints: The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads. Note that the output_shape attribute value should not include dimensions for batch size and channels, which are automatically inferred.
pads : list of ints: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

#### Inputs (2 - 3)

X : T
W : T
B (optional) : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.ms.internal.nhwc.DepthToSpace** #### Version This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.DepthToSpace-1, com.ms.internal.nhwc.DepthToSpace-11 #### Attributes

blocksize : int (required): Blocks of [blocksize, blocksize] are moved.
mode : string: DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order.

#### Inputs

input : T

#### Outputs

output : T

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Constrain input and output types to all tensor types.

### **com.ms.internal.nhwc.GlobalLpPool** #### Version This version of the operator has been available since version 2 of the 'com.ms.internal.nhwc' operator set. #### Attributes

p : int: p value of the Lp norm used to pool over the input data.

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(bfloat16), tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.ms.internal.nhwc.InstanceNormalization** #### Version This version of the operator has been available since version 6 of the 'com.ms.internal.nhwc' operator set. #### Attributes

activation : string
activation_params : list of floats
epsilon : float: The epsilon value to use to avoid division by zero.

#### Inputs

input : T
scale : T
B : T

#### Outputs

output : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.ms.internal.nhwc.LRN** #### Version This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.LRN-1 #### Attributes

alpha : float: Scaling parameter.
beta : float: The exponent.
bias : float
size : int (required): The number of channels to sum over

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16): Constrain input and output types to float tensors.

### **com.ms.internal.nhwc.LpPool** #### Version This version of the operator has been available since version 18 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.LpPool-11 #### Attributes

auto_pad : string: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = ceil(input_shape[i] / strides[i])` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
ceil_mode : int: Whether to use ceil or floor (default) to compute the output shape.
dilations : list of ints: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
kernel_shape : list of ints (required): The size of the kernel along each axis.
p : int: p value of the Lp norm used to pool over the input data.
pads : list of ints: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

#### Inputs

X : T

#### Outputs

Y : T

#### Type Constraints

T : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.

### **com.ms.internal.nhwc.MaxUnpool** #### Version This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.MaxUnpool-9 #### Attributes

activation : string
activation_params : list of floats
kernel_shape : list of ints (required): The size of the kernel along each axis.
pads : list of ints: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

#### Inputs (2 - 3)

X : T1
I : T2
output_shape (optional) : T2

#### Outputs

output : T1

#### Type Constraints

T1 : tensor(float16), tensor(float), tensor(double): Constrain input and output types to float tensors.
T2 : tensor(int64): Constrain index tensor to int64

### **com.ms.internal.nhwc.QLinearConvTranspose** #### Version This version of the operator has been available since version 1 of the 'com.ms.internal.nhwc' operator set. #### Attributes

auto_pad : string: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET
dilations : list of ints: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
group : int: number of groups input channels and output channels are divided into.
kernel_shape : list of ints: The shape of the convolution kernel. If not present, should be inferred from input W.
output_padding : list of ints: Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
output_shape : list of ints: The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads
pads : list of ints: Padding for the beginning and ending along each spatial axis
strides : list of ints: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

#### Inputs (8 - 9)

x : T1
x_scale : tensor(float)
x_zero_point : T1
w : T2
w_scale : tensor(float)
w_zero_point : T2
y_scale : tensor(float)
y_zero_point : T3
B (optional) : T4

#### Outputs

y : T3

#### Type Constraints

T1 : tensor(int8), tensor(uint8): Constrain input type to 8-bit integer tensor.
T2 : tensor(int8), tensor(uint8): Constrain filter type to 8-bit integer tensor.
T3 : tensor(int8), tensor(uint8): Constrain output type to 8-bit integer tensor.
T4 : tensor(int32): Constrain bias type to 32-bit integer tensor.

### **com.ms.internal.nhwc.Resize** #### Version This version of the operator has been available since version 19 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.Resize-11, com.ms.internal.nhwc.Resize-13, com.ms.internal.nhwc.Resize-18 #### Attributes

antialias : int: If set to 1, "linear" and "cubic" interpolation modes will use an antialiasing filter when downscaling. Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel.
axes : list of ints: If provided, it specifies a subset of axes that 'roi', 'scales' and 'sizes' refer to. If not provided, all axes are assumed [0, 1, ..., r-1], where r = rank(data). Non-specified dimensions are interpreted as non-resizable. Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.
coordinate_transformation_mode : string: This attribute describes how to transform the coordinate in the resized tensor to the coordinate in the original tensor. The coordinate of each dimension is transformed individually. Let's describe a case using axis x as an example. Denote `x_resized` as the coordinate of axis x in the resized tensor, `x_original` as the coordinate of axis x in the original tensor, `length_original` as the length of the original tensor in axis x, `length_resized` as the length of the resized tensor in axis x, `scale = length_resized / length_original`, `output_width` the target length on the axis x which can be a fractional number when it is calculated out of a scale factor, and `output_width_int` the effective output width as an integer. if coordinate_transformation_mode is `"half_pixel"`, ``` x_original = (x_resized + 0.5) / scale - 0.5 ``` if coordinate_transformation_mode is `"half_pixel_symmetric"`, ``` adjustment = output_width_int / output_width center = input_width / 2 offset = center * (1 - adjustment) x_ori = offset + (x + 0.5) / scale - 0.5 ``` if coordinate_transformation_mode is `"pytorch_half_pixel"`, ``` x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0 ``` if coordinate_transformation_mode is `"align_corners"`, ``` x_original = x_resized * (length_original - 1) / (length_resized - 1) ``` if coordinate_transformation_mode is `"asymmetric"`, ``` x_original = x_resized / scale ``` if coordinate_transformation_mode is `"tf_crop_and_resize"`, ``` x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1) ``` .
cubic_coeff_a : float: The coefficient 'a' used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if mode is "cubic".
exclude_outside : int: If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0.
extrapolation_value : float: When coordinate_transformation_mode is "tf_crop_and_resize" and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f.
keep_aspect_ratio_policy : string: This attribute describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input, and it is not applicable when the `scales` input is used. Given a set of `sizes`, associated with a subset of `axes` (explicitly provided or default), and assuming `d = axes[i]`, with `i` being the index of the provided `sizes`. If `keep_aspect_ratio_policy` is `"stretch"`, the original aspect ratio is disregarded, and the input is resized to the specified size: `out_size[d] = sizes[i]` If `keep_aspect_ratio_policy` is `"not_larger"`, the sizes are adjusted so that no extent of the output is larger than the specified size, while keeping the original aspect ratio: ``` scale = Min(sizes[i] / in_size[d]) out_size[d] = round_int(scale * in_size[i]) ``` If `keep_aspect_ratio_policy` is `"not_smaller"`, the sizes are adjusted so that no extent of the output is smaller than the specified size, while keeping the original aspect ratio: ``` scale = Max(sizes[i] / in_size[d]) out_size[d] = round_int(scale * in_size[i]) ``` For non-resizable axes (those not specified in `axes`), the output size will be equal to the input size. Note: `round_int` stands for computing the nearest integer value, rounding halfway cases up.
mode : string: Three interpolation modes: "nearest" (default), "linear" and "cubic". The "linear" mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The "cubic" mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor).
nearest_mode : string: Four modes: "round_prefer_floor" (default, as known as round half down), "round_prefer_ceil" (as known as round half up), "floor", "ceil". Only used by nearest interpolation. It indicates how to get "nearest" pixel in input tensor from x_original, so this attribute is valid only if "mode" is "nearest".

#### Inputs (1 - 4)

X : T1
roi (optional) : T2
scales (optional) : tensor(float)
sizes (optional) : tensor(int64)

#### Outputs

Y : T1

#### Type Constraints

T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Constrain input 'X' and output 'Y' to all tensor types.
T2 : tensor(float16), tensor(float), tensor(double): Constrain roi type to float or double.

### **com.ms.internal.nhwc.SpaceToDepth** #### Version This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.SpaceToDepth-1 #### Attributes

blocksize : int (required): Blocks of [blocksize, blocksize] are moved.

#### Inputs

input : T

#### Outputs

output : T

#### Type Constraints

T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128): Constrain input and output types to all tensor types.