## Contrib Operator Schemas *This file is automatically generated from the registered contrib operator schemas by [this script](https://github.com/microsoft/onnxruntime/blob/main/tools/python/gen_contrib_doc.py). Do not modify directly.* * com.microsoft * com.microsoft.Attention * com.microsoft.AttnLSTM * com.microsoft.BeamSearch * com.microsoft.BiasAdd * com.microsoft.BiasDropout * com.microsoft.BiasGelu * com.microsoft.BiasSoftmax * com.microsoft.BiasSplitGelu * com.microsoft.BifurcationDetector * com.microsoft.BitmaskBiasDropout * com.microsoft.BitmaskDropout * com.microsoft.CDist * com.microsoft.ComplexMul * com.microsoft.ComplexMulConj * com.microsoft.ConvTransposeWithDynamicPads * com.microsoft.CropAndResize * com.microsoft.DecoderAttention * com.microsoft.DecoderMaskedMultiHeadAttention * com.microsoft.DecoderMaskedSelfAttention * com.microsoft.DequantizeBFP * com.microsoft.DequantizeLinear * com.microsoft.DequantizeWithOrder * com.microsoft.DynamicQuantizeLSTM * com.microsoft.DynamicQuantizeMatMul * com.microsoft.DynamicTimeWarping * com.microsoft.EPContext * com.microsoft.EmbedLayerNormalization * com.microsoft.ExpandDims * com.microsoft.FastGelu * com.microsoft.FusedConv * com.microsoft.FusedGemm * com.microsoft.FusedMatMul * com.microsoft.FusedMatMulActivation * com.microsoft.GatedRelativePositionBias * com.microsoft.GatherBlockQuantized * com.microsoft.GatherND * com.microsoft.Gelu * com.microsoft.GemmFastGelu * com.microsoft.GemmFloat8 * com.microsoft.GemmaRotaryEmbedding * com.microsoft.GreedySearch * com.microsoft.GridSample * com.microsoft.GroupNorm * com.microsoft.GroupQueryAttention * com.microsoft.Inverse * com.microsoft.Irfft * com.microsoft.LongformerAttention * com.microsoft.MatMulBnb4 * com.microsoft.MatMulFpQ4 * com.microsoft.MatMulInteger16 * com.microsoft.MatMulIntegerToFloat * com.microsoft.MatMulNBits * com.microsoft.MaxpoolWithMask * com.microsoft.MoE * com.microsoft.MulInteger * com.microsoft.MultiHeadAttention * com.microsoft.MurmurHash3 * com.microsoft.NGramRepeatBlock * com.microsoft.NhwcConv * com.microsoft.NhwcFusedConv * com.microsoft.NhwcMaxPool * com.microsoft.PackedAttention * com.microsoft.PackedMultiHeadAttention * com.microsoft.Pad * com.microsoft.QAttention * com.microsoft.QGemm * com.microsoft.QLinearAdd * com.microsoft.QLinearAveragePool * com.microsoft.QLinearConcat * com.microsoft.QLinearConv * com.microsoft.QLinearGlobalAveragePool * com.microsoft.QLinearLeakyRelu * com.microsoft.QLinearMul * com.microsoft.QLinearReduceMean * com.microsoft.QLinearSigmoid * com.microsoft.QLinearSoftmax * com.microsoft.QLinearWhere * com.microsoft.QMoE * com.microsoft.QOrderedAttention * com.microsoft.QOrderedGelu * com.microsoft.QOrderedLayerNormalization * com.microsoft.QOrderedLongformerAttention * com.microsoft.QOrderedMatMul * com.microsoft.QuantizeBFP * com.microsoft.QuantizeLinear * com.microsoft.QuantizeWithOrder * com.microsoft.QuickGelu * com.microsoft.Range * com.microsoft.ReduceSumInteger * com.microsoft.RelativePositionBias * com.microsoft.RemovePadding * com.microsoft.RestorePadding * com.microsoft.Rfft * com.microsoft.RotaryEmbedding * com.microsoft.SampleOp * com.microsoft.Sampling * com.microsoft.SkipGroupNorm * com.microsoft.SkipLayerNormalization * com.microsoft.SkipSimplifiedLayerNormalization * com.microsoft.Snpe * com.microsoft.SparseAttention * com.microsoft.SparseToDenseMatMul * com.microsoft.Tokenizer * com.microsoft.TorchEmbedding * com.microsoft.TransposeMatMul * com.microsoft.Trilu * com.microsoft.UnfoldTensor * com.microsoft.Unique * com.microsoft.WhisperBeamSearch * com.microsoft.WordConvEmbedding * experimental com.microsoft.IsAllFinite * experimental com.microsoft.QEmbedLayerNormalization * com.microsoft.nchwc * com.microsoft.nchwc.AveragePool * com.microsoft.nchwc.Conv * com.microsoft.nchwc.GlobalAveragePool * com.microsoft.nchwc.GlobalMaxPool * com.microsoft.nchwc.MaxPool * com.microsoft.nchwc.ReorderInput * com.microsoft.nchwc.ReorderOutput * com.microsoft.nchwc.Upsample * com.ms.internal.nhwc * com.ms.internal.nhwc.BatchNormalization * com.ms.internal.nhwc.ConvTranspose * com.ms.internal.nhwc.DepthToSpace * com.ms.internal.nhwc.GlobalLpPool * com.ms.internal.nhwc.InstanceNormalization * com.ms.internal.nhwc.LRN * com.ms.internal.nhwc.LpPool * com.ms.internal.nhwc.MaxUnpool * com.ms.internal.nhwc.QLinearConvTranspose * com.ms.internal.nhwc.Resize * com.ms.internal.nhwc.SpaceToDepth ## com.microsoft ### **com.microsoft.Attention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
do_rotary : int
Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
past_present_share_buffer : int
Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
qkv_hidden_sizes : list of ints
Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
rotary_embedding_dim : int
Dimension of rotary embedding. Limited to 32, 64 or 128. Default value is head_size
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.
#### Inputs (2 - 7)
input : T
weights : T
bias (optional) : T
mask_index (optional) : M
past (optional) : T
attention_bias (optional) : T
past_sequence_length (optional) : M
#### Outputs (1 - 2)
output : T
present (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types
### **com.microsoft.AttnLSTM** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation_alpha : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
activation_beta : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
activations : list of strings
A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
clip : float
Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
direction : string
Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
hidden_size : int
Number of neurons in the hidden layer.
input_forget : int
Couple the input and forget gates if 1, default 0.
#### Inputs (3 - 14)
X : T
W : T
R : T
B (optional) : T
sequence_lens (optional) : T1
initial_h (optional) : T
initial_c (optional) : T
P (optional) : T
QW (optional) : T
MW (optional) : T
V (optional) : T
M (optional) : T
memory_seq_lens (optional) : T1
AW (optional) : T
#### Outputs (0 - 3)
Y (optional) : T
Y_h (optional) : T
Y_c (optional) : T
#### Type Constraints
T : tensor(float), tensor(double)
Constrain input and output types to float tensors.
T1 : tensor(int32)
Constrain seq_lens to integral tensors.
### **com.microsoft.BeamSearch** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_start_token_id : int
The id of the token that indicates decoding starts.
early_stopping : int
early stop or not
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int
model type: 0 for GPT-2; 1 for encoder decoder like T5
no_repeat_ngram_size : int
no repeat ngrams size
pad_token_id : int (required)
The id of the padding token
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
#### Inputs (5 - 12)
input_ids : F
max_length : I
min_length (optional) : I
num_beams : I
num_return_sequences : I
length_penalty (optional) : T
repetition_penalty (optional) : T
vocab_mask (optional) : M
prefix_vocab_mask (optional) : M
attention_mask (optional) : I
decoder_input_ids (optional) : I
logits_processor (optional) : I
#### Outputs (1 - 3)
sequences : I
sequences_scores (optional) : T
scores (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain to float tensors.
F : tensor(float), tensor(int32), tensor(float16)
Constrain input type to float or int tensors.
I : tensor(int32)
Constrain to integer types
M : tensor(int32)
Constrain mask to integer types
### **com.microsoft.BiasAdd** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
X : T
bias : T
skip : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float)
Constrain input and output types to float tensors.
### **com.microsoft.BiasDropout** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
seed : int
(Optional) Seed to the random generator, if not specified we will auto generate one.
#### Inputs (2 - 5)
data : T
bias : T
residual (optional) : T
ratio (optional) : T1
training_mode (optional) : T2
#### Outputs (1 - 2)
output : T
mask (optional) : T2
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input 'ratio' types to float tensors.
T2 : tensor(bool)
Constrain output 'mask' types to boolean tensors.
### **com.microsoft.BiasGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
A : T
B : T
#### Outputs
C : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.microsoft.BiasSoftmax** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axis : int
apply softmax to elements for dimensions axis or higher
is_inner_broadcast : int (required)
true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis - 1
#### Inputs
data : T
bias : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.microsoft.BiasSplitGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
X : T
bias : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float)
Constrain input X and output Y types to float tensors.
### **com.microsoft.BifurcationDetector** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
max_ngram_size : int
The maximum NGram size for suffix matching.
min_ngram_size : int
The minimum NGram size for suffix matching.
#### Inputs (3 - 4)
src_tokens : T
cur_tokens : T
prev_suffix_match_idx : T
pred_tokens (optional) : T
#### Outputs
tokens : T
suffix_match_idx : T
#### Type Constraints
T : tensor(int64)
Constrain to integer types.
### **com.microsoft.BitmaskBiasDropout** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
seed : int
(Optional) Seed to the random generator, if not specified we will auto generate one.
#### Inputs (2 - 5)
data : T
bias : T
residual (optional) : T
ratio (optional) : T1
training_mode (optional) : T2
#### Outputs (1 - 2)
output : T
mask (optional) : T3
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input 'ratio' types to float tensors.
T2 : tensor(bool)
Constrain input 'training_mode' types to boolean tensors.
T3 : tensor(uint32)
Constrain output 'mask' types to uint32 tensors.
### **com.microsoft.BitmaskDropout** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
seed : int
(Optional) Seed to the random generator, if not specified we will auto generate one.
#### Inputs (1 - 3)
data : T
ratio (optional) : T1
training_mode (optional) : T2
#### Outputs (1 - 2)
output : T
mask (optional) : T3
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input 'ratio' types to float tensors.
T2 : tensor(bool)
Constrain 'training_mode' to boolean tensor.
T3 : tensor(uint32)
Constrain output 'mask' types to bit-packed uint32 tensor.
### **com.microsoft.CDist** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
metric : string
The distance metric to use. If a string, the distance function can be "braycurtis", "canberra", "chebyshev", "cityblock", "correlation", "cosine", "dice", "euclidean", "hamming", "jaccard", "jensenshannon", "kulsinski", "mahalanobis", "matching", "minkowski", "rogerstanimoto", "russellrao", "seuclidean", "sokalmichener", "sokalsneath", "sqeuclidean", "wminkowski", "yule".
#### Inputs
A : T
B : T
#### Outputs
C : T
#### Type Constraints
T : tensor(float), tensor(double)
Constrains input to only numeric types.
### **com.microsoft.ComplexMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
A : T
B : T
#### Outputs
C : T
#### Type Constraints
T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.
### **com.microsoft.ComplexMulConj** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
A : T
B : T
#### Outputs
C : T
#### Type Constraints
T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.
### **com.microsoft.ConvTransposeWithDynamicPads** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
output_padding : list of ints
strides : list of ints
#### Inputs (2 - 4)
X : T
W : T
Pads (optional) : tensor(int64)
B (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors
### **com.microsoft.CropAndResize** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
extrapolation_value : float
Value used for extrapolation, when applicable. Default is 0.0f.
mode : string
The pooling method. Two modes are supported: 'bilinear' and 'nearest'. Default is 'bilinear'.
#### Inputs
X : T1
rois : T1
batch_indices : T2
crop_size : T2
#### Outputs
Y : T1
#### Type Constraints
T1 : tensor(float16), tensor(float), tensor(double)
Constrain types to float tensors.
T2 : tensor(int32)
Constrain types to int tensors.
### **com.microsoft.DecoderAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
#### Inputs
query : T
key : T
q_weight : T
kv_weight : T
bias : T
key_padding_mask (optional) : B
key_cache (optional) : T
value_cache (optional) : T
static_kv : B
use_past : B
has_layer_state : B
has_key_padding_mask : B
#### Outputs (1 - 3)
output : T
new_key_cache (optional) : T
new_value_cache (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float and float16 tensors.
B : tensor(bool)
Constrain key_padding_mask to bool tensors.
### **com.microsoft.DecoderMaskedMultiHeadAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
output_qk : int
Need output the cross attention MatMul(Q, K)
past_present_share_buffer : int
Corresponding past and present are same tensor, its size is (batch_size, num_heads, max_sequence_length, head_size)
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
#### Inputs (1 - 11)
query : T
key (optional) : T
value (optional) : T
mask_index (optional) : M
attention_bias (optional) : T
past_key (optional) : T
past_value (optional) : T
past_sequence_length (optional) : M
beam_width (optional) : M
cache_indirection (optional) : M
bias (optional) : T
#### Outputs (1 - 4)
output : T
present_key (optional) : T
present_value (optional) : T
qk (optional) : V
#### Type Constraints
V : tensor(float)
Constrain qk output types to float32 tensors.
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types
### **com.microsoft.DecoderMaskedSelfAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
do_rotary : int
Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
past_present_share_buffer : int
Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
#### Inputs (7 - 9)
input : T
weights : T
bias : T
mask_index (optional) : M
past : T
attention_bias (optional) : T
past_sequence_length : M
beam_width (optional) : M
cache_indirection (optional) : M
#### Outputs
output : T
present : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types
### **com.microsoft.DequantizeBFP** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
bfp_type : int (required)
The type of BFP - must match with the BFPType enum
block_dim : int
Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.
dtype : int
The datatype to dequantize to.
#### Inputs
x : T1
shape : T2
strides : T2
#### Outputs
y : T3
#### Type Constraints
T1 : tensor(uint8)
Constrain the input to uint8.
T2 : tensor(int64)
Constrain shape and strides to uint64.
T3 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain y to float and bfloat16.
### **com.microsoft.DequantizeLinear** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axis : int
The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.
#### Inputs (2 - 3)
x : T1
x_scale : T2
x_zero_point (optional) : T1
#### Outputs
y : T2
#### Type Constraints
T1 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int32), tensor(int4), tensor(uint4)
Constrain 'x' and 'x_zero_point' to 8-bit integer tensors, 16-bit integer tensors, or 32-bit signed integer tensors.
T2 : tensor(float16), tensor(float)
Constrain 'y', 'x_scale' to float tensors.
### **com.microsoft.DequantizeWithOrder** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
order_input : int (required)
cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required)
cublasLt order of output matrix
to : int (required)
The output data type, only support TensorProto_DataType_FLOAT (1) and TensorProto_DataType_FLOAT16 (10)
#### Inputs
input : Q
scale_input : S
#### Outputs
output : F
#### Type Constraints
Q : tensor(int8)
Constrain input and output types to int8 tensors.
F : tensor(float16), tensor(float)
Constrain to float types
S : tensor(float)
Constrain Scale to float32 types
### **com.microsoft.DynamicQuantizeLSTM** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation_alpha : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
activation_beta : list of floats
Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
activations : list of strings
A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
clip : float
Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
direction : string
Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
hidden_size : int
Number of neurons in the hidden layer
input_forget : int
Couple the input and forget gates if 1.
#### Inputs
X : T
W : T2
R : T2
B (optional) : T
sequence_lens (optional) : T1
initial_h (optional) : T
initial_c (optional) : T
P (optional) : T
W_scale : T
W_zero_point : T2
R_scale : T
R_zero_point : T2
#### Outputs (0 - 3)
Y (optional) : T
Y_h (optional) : T
Y_c (optional) : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors.
T1 : tensor(int32)
Constrain seq_lens to integer tensor.
T2 : tensor(uint8), tensor(int8)
Constrain weights types to 8 bit tensors.
### **com.microsoft.DynamicQuantizeMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (3 - 5)
A : T1
B : T2
b_scale : T1
b_zero_point (optional) : T2
bias (optional) : T1
#### Outputs
Y : T1
#### Type Constraints
T1 : tensor(float)
Constrain input A, b_scale and output Y data type as float tensor.
T2 : tensor(int8), tensor(uint8)
Constrain input B data type to 8-bit integer tensor.
### **com.microsoft.DynamicTimeWarping** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
input : F
#### Outputs
output : I
#### Type Constraints
F : tensor(float)
Constrain to float tensors.
I : tensor(int32)
Constrain to integer types.
### **com.microsoft.EPContext** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
embed_mode : int
1: indicate ep_cache_context is the context content. 0: indicate ep_cache_context is the file path to the context content.The path is relative to this Onnx file. Default is 1.
ep_cache_context : string
payload of the execution provider context if embed_mode=1, or path to the context file if embed_mode=0.
ep_sdk_version : string
(Optional) SDK version used to convert the model.
hardware_architecture : string
(Optional) Hardware architecture.
main_context : int
Usually each single EPContext associate with a graph partition.But for some case like QNN, it has single EPContext contains all partitions.In that case, the node with ep_cache_context should set main_context=1. Other nodes set main_context=0 and skip ep_cache_context.The path is relative to this Onnx file. Default is 1.
max_size : int
max size in the context. Usage depend on the EP.
notes : string
(Optional) Some notes for the model
onnx_model_filename : string
(Optional) Filename of the original ONNX model.
partition_name : string
(Optional) partitioned graph name.
source : string
(Optional) the source used to generate the engine/context cache file. Ort EP or native SDK tool chain
#### Inputs (1 - ∞)
inputs (variadic, heterogeneous) : T
#### Outputs (1 - ∞)
outputs (variadic, heterogeneous) : T
#### Type Constraints
T : tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types.
### **com.microsoft.EmbedLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
epsilon : float
The epsilon value to use to avoid division by zero.
mask_index_type : int
The mask index tensor type for shape inference (0: None, 1: 1D mask_index)
#### Inputs (7 - 9)
input_ids : T1
segment_ids (optional) : T1
word_embedding : T
position_embedding : T
segment_embedding (optional) : T
gamma : T
beta : T
mask (optional) : T1
position_ids (optional) : T1
#### Outputs (1 - 3)
output : T
mask_index (optional) : T1
embedding_sum (optional) : T
#### Type Constraints
T1 : tensor(int32)
Constrain input and output integer tensors types
T : tensor(float), tensor(float16)
Constrain input and output float tensors types.
### **com.microsoft.ExpandDims** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
X : T
axis : tensor(int32)
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.
### **com.microsoft.FastGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (1 - 2)
X : T
bias (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float or half tensors.
### **com.microsoft.FusedConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints
#### Inputs (2 - 4)
X : T
W : T
B (optional) : T
Z (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors
### **com.microsoft.FusedGemm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : string
activation_alpha : float
activation_beta : float
activation_gamma : float
alpha : float
Scalar multiplier for the product of input tensors A * B.
beta : float
Scalar multiplier for input tensor C.
transA : int
Whether A should be transposed
transB : int
Whether B should be transposed
#### Inputs (2 - 3)
A : T
B : T
C (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(uint32), tensor(uint64), tensor(int32), tensor(int64)
Constrain input and output types to float/int tensors.
### **com.microsoft.FusedMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication
transBatchA : int
Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
transBatchB : int
Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
#### Inputs
A : T
B : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.microsoft.FusedMatMulActivation** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : string (required)
activation_alpha : float
activation_axis : int
activation_beta : float
activation_gamma : float
alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication
transBatchA : int
Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
transBatchB : int
Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
#### Inputs
A : T
B : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.microsoft.GatedRelativePositionBias** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
num_heads : int (required)
Number of attention heads
#### Inputs (6 - 7)
query_layer : T
query_bias : T
rel_pos : T
weight : T
bias : T
eco_a : T
token_offset (optional) : M
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain token_offset to integer types
### **com.microsoft.GatherBlockQuantized** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
block_size : int
(Optional) block size used for weight quantization. It needs to be a power of 2 and not smaller than 16.
gather_axis : int
(Optional) Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
quantize_axis : int
(Optional) Which axis to block-wise quantize. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
#### Inputs (3 - 4)
data : T1
indices : Tind
scales : T2
zero_points (optional) : T1
#### Outputs
output : T2
#### Type Constraints
T1 : tensor(int4), tensor(uint4)
Constrain quantized types.
T2 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain dequantized types.
Tind : tensor(int32), tensor(int64)
Constrain indices to integer types.
### **com.microsoft.GatherND** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
data : T
indices : Tind
#### Outputs
output : T
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input and output types to any tensor type.
Tind : tensor(int32), tensor(int64)
Constrain indice type to int32 or int64
### **com.microsoft.Gelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.microsoft.GemmFastGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (2 - 3)
X : T
W : T
bias (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float or half tensors.
### **com.microsoft.GemmFloat8** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : string
Activation function, RELU or GELU or NONE (default).
alpha : float
Scalar multiplier for the product of input tensors A * B.
beta : float
Scalar multiplier for the product of input bias C.
dtype : int
Output Type. Same definition as attribute 'to' for operator Cast.
transA : int
Whether A should be transposed. Float 8 only supprted transA=0.
transB : int
Whether B should be transposed. Float 8 only supprted transB=1.
#### Inputs (2 - 6)
A : TA
B : TB
C (optional) : TC
scaleA (optional) : TS
scaleB (optional) : TS
scaleY (optional) : TS
#### Outputs
Y : TR
#### Type Constraints
TA : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to input A.
TB : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to input B.
TC : tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to input C.
TR : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
Constrain type to result type.
TS : tensor(float)
Constrain type for all input scales (scaleA, scaleB, scaleY).
### **com.microsoft.GemmaRotaryEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
emb : U
q : T
q_rot : T
k : T
k_rot : T
#### Outputs
output1 : T
output2 : T
#### Type Constraints
T : tensor(float16)
Constrain input and output types to float16 tensors.
U : tensor(float)
Constrain input 0 type to float tensors
### **com.microsoft.GreedySearch** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_start_token_id : int
The id of the token that indicates decoding starts.
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before `decoder` subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int
model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
no_repeat_ngram_size : int
no repeat ngrams size
pad_token_id : int (required)
The id of the padding token
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
#### Inputs (2 - 7)
input_ids : I
max_length : I
min_length (optional) : I
repetition_penalty (optional) : T
vocab_mask (optional) : I
prefix_vocab_mask (optional) : I
attention_mask (optional) : I
#### Outputs
sequences : I
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors.
I : tensor(int32)
Constrain to integer types
### **com.microsoft.GridSample** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
align_corners : int
If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
mode : string
Three interpolation modes: bilinear (default), nearest and bicubic.
padding_mode : string
Support padding modes for outside grid values: `zeros`(default), `border`, `reflection`. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations.
#### Inputs
X : T1
Grid : T1
#### Outputs
Y : T2
#### Type Constraints
T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input types to all tensor types.
T2 : tensor(float16), tensor(float), tensor(double)
Constrain output types to float tensors.
### **com.microsoft.GroupNorm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : int (required)
Activation after group normalization: 0 for None, 1 for SiLU
channels_last : int
1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
epsilon : float
The epsilon value to use to avoid division by zero
groups : int (required)
The number of groups of channels. It should be a divisor of the number of channels C
#### Inputs
X : T
gamma : M
beta : M
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float)
Constrain input X and output Y types to float tensors.
M : tensor(float16), tensor(float)
Constrain gamma and beta to float tensors.
### **com.microsoft.GroupQueryAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
do_rotary : int
Whether to use rotary position embedding. Default value is 0.
kv_num_heads : int (required)
Number of attention heads for k and v
local_window_size : int
left_window_size for local attention (like Mistral). Default value is -1 meaning unused.
num_heads : int (required)
Number of attention heads for q
rotary_interleaved : int
Rotate using interleaved pattern. Default value is 0 (False).
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
smooth_softmax : int
Use a smooth factor in softmax.
softcap : float
Softcap value for attention weights. Default value is 0.
#### Inputs (7 - 9)
query : T
key (optional) : T
value (optional) : T
past_key (optional) : T
past_value (optional) : T
seqlens_k : M
total_sequence_length : M
cos_cache (optional) : T
sin_cache (optional) : T
#### Outputs
output : T
present_key : T
present_value : T
#### Type Constraints
T : tensor(float16), tensor(bfloat16), tensor(float)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain mask to int tensor.
### **com.microsoft.Inverse** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.microsoft.Irfft** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
normalized : int
must be 0, normalization currently not supported
onesided : int
must be 1, only one sided FFTs supported
signal_ndim : int (required)
number of dimensions comprising the signal
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.
### **com.microsoft.LongformerAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
num_heads : int (required)
Number of attention heads
window : int (required)
One sided attention windows length W, or half of total window length
#### Inputs
input : T
weight : T
bias : T
mask : T
global_weight : T
global_bias : T
global : G
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
G : tensor(int32)
Constrain to integer types
### **com.microsoft.MatMulBnb4** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
K : int (required)
size of each input feature
N : int (required)
size of each output feature
block_size : int (required)
number of groupsize used for weight quantization. It needs to be a power of 2 and not smaller than 16.
quant_type : int (required)
quantization data type. 0 for FP4, 1 for NF4.
training_mode : int
Indicate if the ops run in training_mode, by default, False.
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication. Default to be 1.
#### Inputs
A : T1
B : T2
absmax : T1
#### Outputs
Y : T1
#### Type Constraints
T1 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float/half_float/brain_float tensors.
T2 : tensor(uint8)
Constrain quantized weight types to uint8.
### **com.microsoft.MatMulFpQ4** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
blk_quant_type : int
Quantization type
#### Inputs
A : T1
B : T2
B_shape : T3
#### Outputs
Y : T1
#### Type Constraints
T1 : tensor(float)
Constrain input matrix data types as single precision float tensor
T2 : tensor(uint8)
Constrain input B data types as data blob
T3 : tensor(int64)
Constrain shape of B must be int64 tensor.
### **com.microsoft.MatMulInteger16** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
A : T1
B : T2
#### Outputs
Y : T3
#### Type Constraints
T1 : tensor(int16), tensor(uint16)
Constrain input A data types as 16-bit integer tensor
T2 : tensor(int16), tensor(uint16)
Constrain input B data types as 16-bit integer tensor
T3 : tensor(int32), tensor(uint32)
Constrain output Y data types as 32-bit integer tensor.T3 must be tensor(uint32) when both T1 and T2 are tensor(uint16),or must be tensor(int32) when either T1 or T2 is tensor(int16).
### **com.microsoft.MatMulIntegerToFloat** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (4 - 7)
A : T1
B : T2
a_scale : T3
b_scale : T3
a_zero_point (optional) : T1
b_zero_point (optional) : T2
bias (optional) : T3
#### Outputs
Y : T3
#### Type Constraints
T1 : tensor(int8), tensor(uint8)
Constrain input A data type to 8-bit integer tensor.
T2 : tensor(int8), tensor(uint8)
Constrain input B data type to 8-bit integer tensor.
T3 : tensor(float), tensor(float16)
Constrain input a_scale, b_scale and output Y data type as float tensor.
### **com.microsoft.MatMulNBits** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
K : int (required)
size of each input feature
N : int (required)
size of each output feature
accuracy_level : int
The minimum accuracy level of input A, can be: 0(unset), 1(fp32), 2(fp16), 3(bf16), or 4(int8) (default unset). It is used to control how input A is quantized or downcast internally while doing computation, for example: 0 means input A will not be quantized or downcast while doing computation. 4 means input A can be quantized with the same block_size to int8 internally from type T1.
bits : int (required)
number of bits used for weight quantization (default 4)
block_size : int (required)
number of groupsize used for weight quantization,(default 128). It needs to be a power of 2 and not smaller than 16.
#### Inputs (3 - 6)
A : T1
B : T2
scales : T1
zero_points (optional) : T3
g_idx (optional) : T4
bias (optional) : T1
#### Outputs
Y : T1
#### Type Constraints
T1 : tensor(float), tensor(float16)
Constrain input and output types to float/half_float tensors.
T2 : tensor(uint8), tensor(int32)
Constrain quantized weight types to uint8/int32.
T3 : tensor(uint8), tensor(int32), tensor(float16), tensor(float)
Constrain quantized zero point types to uint8/int32/float16/float.
T4 : tensor(int32)
the index tensor.
### **com.microsoft.MaxpoolWithMask** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
auto_pad : string
kernel_shape : list of ints
pads : list of ints
storage_order : int
strides : list of ints
#### Inputs
X : T
M : tensor(int32)
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input0 and output types to float tensors
### **com.microsoft.MoE** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation_type : string
Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
k : int
Number of top experts to select from expert pool
normalize_routing_weights : int
Whether to normalize routing weights
use_sparse_mixer : int
Whether to use sparse mixer
#### Inputs (5 - 8)
input : T
router_probs : T
fc1_experts_weights : T
fc1_experts_bias (optional) : T
fc2_experts_weights : T
fc2_experts_bias (optional) : T
fc3_experts_weights (optional) : T
fc3_experts_bias (optional) : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float or float16 tensors.
### **com.microsoft.MulInteger** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (3 - 4)
A : T
A_zero_point (optional) : T
B : T
B_zero_point (optional) : T
#### Outputs
C : T1
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input types to 8 bit signed and unsigned tensors.
T1 : tensor(int32)
Constrain output types to 32 bit tensors.
### **com.microsoft.MultiHeadAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.
#### Inputs (1 - 8)
query : T
key (optional) : T
value (optional) : T
bias (optional) : T
key_padding_mask (optional) : M
attention_bias (optional) : T
past_key (optional) : T
past_value (optional) : T
#### Outputs (1 - 3)
output : T
present_key (optional) : T
present_value (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain mask to integer types
### **com.microsoft.MurmurHash3** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
positive : int
If value is 1, output type is uint32_t, else int32_t. Default value is 1.
seed : int
Seed for the hashing algorithm, unsigned 32-bit integer, default to 0.
#### Inputs
X : T1
#### Outputs
Y : T2
#### Type Constraints
T1 : tensor(uint32), tensor(int32), tensor(uint64), tensor(int64), tensor(float), tensor(double), tensor(string)
Constrain input type to unsigned or signed 32-bit integer tensor, or string tensor. It should be utf-8 encoded if using unicode.
T2 : tensor(uint32), tensor(int32)
Constrain output type to unsigned and signed 32-bit integer tensor.
### **com.microsoft.NGramRepeatBlock** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
ngram_size : int (required)
The NGram size.
#### Inputs
input_ids : Tid
scores : T
#### Outputs
scores_out : T
#### Type Constraints
Tid : tensor(int64)
Constrain indices to integer types
T : tensor(float)
Constrain scores input and output types to float tensors.
### **com.microsoft.NhwcConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
auto_pad : string
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
group : int
number of groups input channels and output channels are divided into.
kernel_shape : list of ints
The shape of the convolution kernel. If not present, should be inferred from input W.
pads : list of ints
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.
#### Inputs (2 - 3)
X : T
W : T
B (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.microsoft.NhwcFusedConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints
#### Inputs (2 - 4)
X : T
W : T
B (optional) : T
Z (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16)
Constrain input and output types to float tensors
### **com.microsoft.NhwcMaxPool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
auto_pad : string
ceil_mode : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
strides : list of ints
#### Inputs
x : T
#### Outputs
y : T
#### Type Constraints
T : tensor(int8), tensor(uint8)
### **com.microsoft.PackedAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
num_heads : int (required)
Number of attention heads
qkv_hidden_sizes : list of ints
Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
#### Inputs (5 - 6)
input : T
weights : T
bias : T
token_offset : M
cumulative_sequence_length : M
attention_bias (optional) : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain mask index to integer types
### **com.microsoft.PackedMultiHeadAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
#### Inputs (6 - 7)
query : T
key (optional) : T
value (optional) : T
bias (optional) : T
token_offset : M
cumulative_sequence_length : M
attention_bias (optional) : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain mask, offset and sequence length to integer types
### **com.microsoft.Pad** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
mode : string
Three modes: `constant`(default) - pads with a given constant value, `reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, `edge` - pads with the edge values of array
#### Inputs (2 - 3)
data : T
pads : tensor(int64)
value (optional) : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.microsoft.QAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
do_rotary : int
Whether to use rotary position embedding. Default value is 0.
mask_filter_value : float
The value to be filled in the attention mask. Default value is -10000.0f
num_heads : int (required)
Number of attention heads
past_present_share_buffer : int
Corresponding past and present are same tensor, its shape is (2, batch_size, num_heads, max_sequence_length, head_size)
scale : float
Custom scale will be used if specified. Default value is 1/sqrt(head_size)
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.
#### Inputs (5 - 9)
input : T1
weight : T2
bias : T3
input_scale : T3
weight_scale : T3
mask_index (optional) : T4
input_zero_point (optional) : T1
weight_zero_point (optional) : T2
past (optional) : T3
#### Outputs (1 - 2)
output : T3
present (optional) : T3
#### Type Constraints
T1 : tensor(int8), tensor(uint8)
Constrain input and output types to int8 tensors.
T2 : tensor(int8), tensor(uint8)
Constrain input and output types to int8 tensors.
T3 : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
T4 : tensor(int32)
Constrain mask index to integer types
### **com.microsoft.QGemm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
alpha : float
Scalar multiplier for the product of input tensors A * B.
transA : int
Whether A should be transposed
transB : int
Whether B should be transposed
#### Inputs (6 - 9)
A : TA
a_scale : T
a_zero_point : TA
B : TB
b_scale : T
b_zero_point : TB
C (optional) : TC
y_scale (optional) : T
y_zero_point (optional) : TYZ
#### Outputs
Y : TY
#### Type Constraints
T : tensor(float)
Constrain scale types to float tensors.
TA : tensor(uint8), tensor(int8)
Constrain input A and its zero point types to 8 bit tensors.
TB : tensor(uint8), tensor(int8)
Constrain input B and its zero point types to 8 bit tensors.
TC : tensor(int32)
Constrain input C to 32 bit integer tensors.
TYZ : tensor(uint8), tensor(int8)
Constrain output zero point types to 8 bit tensors.
TY : tensor(float), tensor(uint8), tensor(int8)
Constrain output type to float32 or 8 bit tensors.
### **com.microsoft.QLinearAdd** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (7 - 8)
A : T
A_scale : tensor(float)
A_zero_point (optional) : T
B : T
B_scale : tensor(float)
B_zero_point (optional) : T
C_scale : tensor(float)
C_zero_point (optional) : T
#### Outputs
C : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.
### **com.microsoft.QLinearAveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding.
ceil_mode : int
Whether to use ceil or floor (default) to compute the output shape.
channels_last : int
Works on NHWC layout or not? Default not.
count_include_pad : int
Whether include pad pixels when calculating values for the edges. Default is 0, doesn't count include pad.
kernel_shape : list of ints (required)
The size of the kernel along each axis.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
#### Inputs (4 - 5)
X : T
x_scale : tensor(float)
x_zero_point (optional) : T
y_scale : tensor(float)
y_zero_point (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit tensors.
### **com.microsoft.QLinearConcat** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axis : int (required)
Which axis to concat on
#### Inputs (3 - ∞)
Y_scale : TF
Y_zero_point : T8
inputs (variadic, heterogeneous) : TV
#### Outputs
Y : T8
#### Type Constraints
T8 : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float)
Constrain scale types to any float tensor type.
TV : tensor(uint8), tensor(int8), tensor(float)
Sequence of (Tensor, Scale, ZeroPoint) tuples. The type is sequence of (T8, TF, T8).
### **com.microsoft.QLinearConv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
auto_pad : string
channels_last : int
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints
#### Inputs (8 - 9)
x : T1
x_scale : tensor(float)
x_zero_point : T1
w : T2
w_scale : tensor(float)
w_zero_point : T2
y_scale : tensor(float)
y_zero_point : T3
B (optional) : T4
#### Outputs
y : T3
#### Type Constraints
T1 : tensor(int8), tensor(uint8)
T2 : tensor(int8), tensor(uint8)
T3 : tensor(int8), tensor(uint8)
T4 : tensor(int32)
### **com.microsoft.QLinearGlobalAveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
channels_last : int
#### Inputs
X : T
x_scale : tensor(float)
x_zero_point : T
y_scale : tensor(float)
y_zero_point : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to signed/unsigned int8 tensors.
### **com.microsoft.QLinearLeakyRelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
alpha : float
Coefficient of leakage.
#### Inputs (4 - 5)
X : T
X_scale : tensor(float)
X_zero_point (optional) : T
Y_scale : tensor(float)
Y_zero_point (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit tensors.
### **com.microsoft.QLinearMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (7 - 8)
A : T
A_scale : tensor(float)
A_zero_point (optional) : T
B : T
B_scale : tensor(float)
B_zero_point (optional) : T
C_scale : tensor(float)
C_zero_point (optional) : T
#### Outputs
C : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.
### **com.microsoft.QLinearReduceMean** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axes : list of ints (required)
A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
keepdims : int (required)
Keep the reduced dimension or not, default 1 mean keep reduced dimension.
#### Inputs (4 - 5)
data : T
data_scale : tensor(float)
data_zero_point (optional) : T
reduced_scale : tensor(float)
reduced_zero_point (optional) : T
#### Outputs
reduced : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input types to 8 bit signed and unsigned tensors.
### **com.microsoft.QLinearSigmoid** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (4 - 5)
X : T
X_scale : tensor(float)
X_zero_point (optional) : T
Y_scale : tensor(float)
Y_zero_point (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit tensors.
### **com.microsoft.QLinearSoftmax** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axis : int
apply softmax to elements for dimensions axis,or all dims along with axis according to op-version
opset : int (required)
opset version of corresponding SoftMax.
#### Inputs
X : T
X_scale : tensor(float)
x_zero_point (optional) : T
y_scale : tensor(float)
y_zero_point : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint8), tensor(int8)
Constrain input and output types to signed/unsigned int8 tensors.
### **com.microsoft.QLinearWhere** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
condition : B
X : T
x_scale : TF
x_zero_point : T
Y : T
y_scale : TF
y_zero_point : T
z_scale : TF
z_zero_point : T
#### Outputs
Z : T
#### Type Constraints
B : tensor(bool)
Constrain input and output types to 8 bit signed and unsigned tensors.
TF : tensor(float)
Constrain scale types to any float tensor type.
T : tensor(uint8), tensor(int8)
Constrain input and output types to 8 bit signed and unsigned tensors.
### **com.microsoft.QMoE** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation_type : string
Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
expert_weight_bits : int
Number of bits used in quantized weights. Default is 4 bits
k : int
Number of top experts to select from expert pool
normalize_routing_weights : int
Whether to normalize routing weights
use_sparse_mixer : int
Whether to use sparse mixer
#### Inputs (7 - 11)
input : T
router_probs : T
fc1_experts_weights : T1
fc1_scales : T
fc1_experts_bias (optional) : T
fc2_experts_weights : T1
fc2_scales : T
fc2_experts_bias (optional) : T
fc3_experts_weights (optional) : T1
fc3_scales (optional) : T
fc3_experts_bias (optional) : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float16)
Constrain input and output types to float or float16 tensors.
T1 : tensor(uint8)
Constrain weights type to uint8 tensors.
### **com.microsoft.QOrderedAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
num_heads : int (required)
Number of attention heads
order_input : int (required)
cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required)
cublasLt order of global bias
order_weight : int (required)
cublasLt order of weight matrix
qkv_hidden_sizes : list of ints
Hidden layer sizes of Q, K, V paths in Attention
unidirectional : int
Whether every token can only attend to previous tokens. Default value is 0.
#### Inputs (17 - 20)
input : Q
scale_input : S
scale_Q_gemm : S
scale_K_gemm : S
scale_V_gemm : S
Q_weight : Q
K_weight : Q
V_weight : Q
scale_Q_weight : S
scale_K_weight : S
scale_V_weight : S
Q_bias : S
K_bias : S
V_bias : S
scale_QKT_gemm (optional) : S
scale_QKT_softmax (optional) : S
scale_values_gemm : S
mask_index (optional) : G
past (optional) : Q
attention_bias (optional) : S
#### Outputs
output : Q
#### Type Constraints
Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain scales to float32 tensors.
G : tensor(int32)
Constrain to integer types
### **com.microsoft.QOrderedGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
order_X : int
cublasLt order of input X. Optional. See the schema of QuantizeWithOrder for order definition.
order_Y : int
cublasLt order of matrix Y, must be same as order_X if specified together. Optional.
#### Inputs
X : Q
scale_X : S
scale_Y : S
#### Outputs
Y : Q
#### Type Constraints
Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain scales to float32
### **com.microsoft.QOrderedLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axis : int
The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs).
epsilon : float
The epsilon value to use to avoid division by zero.
order_X : int
cublasLt order of input X. Default is ROW MAJOR. See the schema of QuantizeWithOrder for order definition.
order_Y : int
cublasLt order of matrix Y, must be same as order_X. Default is ROW MAJOR.
#### Inputs
X : Q
scale_X : S
scale : F
B (optional) : F
scale_Y : S
#### Outputs
Y : Q
#### Type Constraints
F : tensor(float16), tensor(float)
Constrain input gamma and bias could be float16/float tensors. float may get better precision, float16 runs faster.
S : tensor(float)
quantization scale must be float tensors.
Q : tensor(int8)
quantization tensor must be int8 tensors.
### **com.microsoft.QOrderedLongformerAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
num_heads : int (required)
Number of attention heads
order_global_weight : int (required)
cublasLt order of weight matrix
order_input : int (required)
cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
order_output : int (required)
cublasLt order of global bias
order_weight : int (required)
cublasLt order of weight matrix
window : int (required)
One sided attention windows length W, or half of total window length
#### Inputs
input : Q
scale_input : S
weight : Q
scale_weight : S
bias : S
scale_bias : S
scale_qkv_gemm : S
mask : F
global_weight : Q
scale_global_weight : S
global_bias : S
scale_global_gemm : S
global : G
scale_output : S
#### Outputs
output : Q
#### Type Constraints
Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain scales to float32 tensors.
G : tensor(int32)
Constrain to integer types
F : tensor(float16)
Be compatible with float version.
### **com.microsoft.QOrderedMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
order_A : int (required)
cublasLt order of matrix A. See the schema of QuantizeWithOrder for order definition.
order_B : int (required)
cublasLt order of matrix B
order_Y : int (required)
cublasLt order of matrix Y and optional matrix C
#### Inputs (5 - 8)
A : Q
scale_A : S
B : Q
scale_B : S
scale_Y : S
bias (optional) : S
C (optional) : Q
scale_C (optional) : S
#### Outputs
Y : Q
#### Type Constraints
Q : tensor(int8)
Constrain input and output types to int8 tensors.
S : tensor(float)
Constrain bias and scales to float32
### **com.microsoft.QuantizeBFP** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
bfp_type : int (required)
The type of BFP - must match with the BFPType enum
block_dim : int
Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.
#### Inputs
x : T1
#### Outputs
y : T2
shape : T3
strides : T3
#### Type Constraints
T1 : tensor(float), tensor(float16), tensor(bfloat16)
Constrain the input to float and bfloat.
T2 : tensor(uint8)
Constrain y to uint8.
T3 : tensor(int64)
Constrain shape and strides to uint64.
### **com.microsoft.QuantizeLinear** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axis : int
The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.
#### Inputs (2 - 3)
x : T1
y_scale : T1
y_zero_point (optional) : T2
#### Outputs
y : T2
#### Type Constraints
T1 : tensor(float16), tensor(float)
Constrain 'x', 'y_scale' to float tensors.
T2 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int4), tensor(uint4)
Constrain 'y_zero_point' and 'y' to 8-bit and 16-bit integer tensors.
### **com.microsoft.QuantizeWithOrder** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
order_input : int (required)
cublasLt order of input matrix. ORDER_COL = 0, ORDER_ROW = 1, ORDER_COL32 = 2, ORDER_COL4_4R2_8C = 3, ORDER_COL32_2R_4R4 = 4. Please refer https://docs.nvidia.com/cuda/cublas/index.html#cublasLtOrder_t for their meaning.
order_output : int (required)
cublasLt order of output matrix.
#### Inputs
input : F
scale_input : S
#### Outputs
output : Q
#### Type Constraints
Q : tensor(int8)
Constrain input and output types to int8 tensors.
F : tensor(float16), tensor(float)
Constrain to float types
S : tensor(float)
Constrain Scale to float32 types
### **com.microsoft.QuickGelu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
alpha : float
Alpha value.
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.microsoft.Range** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (2 - 3)
start : T
limit : T
delta (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float), tensor(double), tensor(int16), tensor(int32), tensor(int64)
Constrain input and output types.
### **com.microsoft.ReduceSumInteger** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
axes : list of ints (required)
A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
keepdims : int (required)
Keep the reduced dimension or not, default 1 mean keep reduced dimension.
#### Inputs
data : T1
#### Outputs
reduced : T2
#### Type Constraints
T1 : tensor(int8), tensor(uint8)
Constrain input type to 8-bit integer tensor.
T2 : tensor(int32), tensor(uint32)
Constrain output data type to 32-bit integer tensor.T2 must be tensor(uint32) when T1 is tensor(uint8),or must be tensor(int32) when T1 is tensor(int8).
### **com.microsoft.RelativePositionBias** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
is_bidirectional : int
Default value is 0.
max_distance : int (required)
Max distance
#### Inputs
bias_table : T
query_length : U
key_length : U
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float or half tensors.
U : tensor(int64)
Constrain sequence_length to int tensors.
### **com.microsoft.RemovePadding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
input : T
sequence_token_count : M
#### Outputs
output : T
token_offset : M
cumulated_seq_len : M
max_seq_len : M
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain sequence_token_count and token_offset to integer types
### **com.microsoft.RestorePadding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
input : T
token_offset : M
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float tensors.
M : tensor(int32)
Constrain token_offset to integer types
### **com.microsoft.Rfft** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
normalized : int
must be 0, normalization currently not supported
onesided : int
must be 1, only one sided FFTs supported
signal_ndim : int
number of dimensions comprising the signal, collected in reverse order (e.g. 1 = last dimension is the signal)
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float), tensor(double), tensor(float16)
Constrain input and output types to float or half tensors.
### **com.microsoft.RotaryEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
interleaved : int
Rotate using interleaved pattern. Default value is 0 (False).
is_packed_batching : int
ragged batch inputs or not. Default value is 0
num_heads : int
Number of attention heads. Default value is 0. Must use with rotary_embedding_dim
rotary_embedding_dim : int
Rotary embedding dimension. Default value is 0.
scale : float
Custom scale will be used if specified. Default value is 1.0
#### Inputs
input : T
position_ids : M
cos_cache : T
sin_cache : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output types to float tensors.
M : tensor(int64)
Constrain input and output types to integer tensors
### **com.microsoft.SampleOp** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(uint32), tensor(uint64), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)
Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.
### **com.microsoft.Sampling** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
custom : int
If 1 custom sampling logic
decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_start_token_id : int
The id of the token that indicates decoding starts.
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
filter_value : float
All filtered values will be set to this float value.
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
min_tokens_to_keep : int
Minimumber of tokens we keep per batch example in the output.
model_type : int
Model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
no_repeat_ngram_size : int
no repeat ngrams size
pad_token_id : int (required)
The id of the padding token
presence_penalty : float
Presence penalty for custom sampling
temperature : float
The value used to module the next token probabilities.
top_p : float
If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
#### Inputs (2 - 9)
input_ids : I
max_length : I
min_length (optional) : I
repetition_penalty (optional) : T
vocab_mask (optional) : I
prefix_vocab_mask (optional) : I
attention_mask (optional) : I
presence_mask (optional) : I
seed (optional) : I
#### Outputs (1 - 2)
sequences : I
filtered_logits (optional) : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors.
I : tensor(int32)
Constrain to integer types
### **com.microsoft.SkipGroupNorm** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
activation : int (required)
Activation after group normalization: 0 for None, 1 for SiLU
channels_last : int
1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
epsilon : float
The epsilon value to use to avoid division by zero
groups : int (required)
The number of groups of channels. It should be a divisor of the number of channels C
#### Inputs (4 - 5)
X : T
gamma : M
beta : M
skip : T
bias (optional) : T
#### Outputs (1 - 2)
Y : T
S (optional) : T
#### Type Constraints
T : tensor(float16), tensor(float)
Constrain input X, skip, bias and output Y, S types to float tensors.
M : tensor(float16), tensor(float)
Constrain gamma and beta to float tensors.
### **com.microsoft.SkipLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
epsilon : float
The epsilon value to use to avoid division by zero.
#### Inputs (3 - 5)
input : T
skip : T
gamma : T
beta (optional) : T
bias (optional) : T
#### Outputs (1 - 4)
output : T
mean (optional) : U
inv_std_var (optional) : U
input_skip_bias_sum (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float or half tensors.
U : tensor(float)
Constrain mean and inv_std_var to float tensors.
### **com.microsoft.SkipSimplifiedLayerNormalization** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
epsilon : float
The epsilon value to use to avoid division by zero.
#### Inputs (3 - 4)
input : T
skip : T
gamma : T
bias (optional) : T
#### Outputs (1 - 4)
output : T
mean (optional) : U
inv_std_var (optional) : U
input_skip_bias_sum (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain input and output types to float or half tensors.
U : tensor(float)
Constrain mean and inv_std_var to float tensors.
### **com.microsoft.Snpe** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
DLC : string (required)
payload of the SNPE DLC file.
notes : string
(Optional) Some notes for the model
snpe_version : string
(Optional) SNPE version used to convert the model.
target_device : string
(Optional) Target device like CPU, DSP, etc.
#### Inputs (1 - ∞)
inputs (variadic) : T
#### Outputs (1 - ∞)
outputs (variadic) : T
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(float)
Constrain input and output types to uint8, uint16, float tensors.
### **com.microsoft.SparseAttention** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
do_rotary : int
Whether to use rotary position embedding. Default value is 0.
kv_num_heads : int (required)
Number of attention heads for key and value
num_heads : int (required)
Number of attention heads for query
rotary_interleaved : int
Rotary use interleaved pattern or not. Default value is 0.
scale : float
Scaling factor applied prior to softmax. The default value is 1/sqrt(head_size)
sparse_block_size : int (required)
Number of tokens per sparse block. Choices: 16, 32, 64, 128
#### Inputs (9 - 11)
query : T
key (optional) : T
value (optional) : T
past_key : T
past_value : T
block_row_indices : M
block_col_indices : M
total_sequence_length : M
key_total_sequence_lengths : M
cos_cache (optional) : T
sin_cache (optional) : T
#### Outputs
output : T
present_key : T
present_value : T
#### Type Constraints
T : tensor(float), tensor(float16), tensor(bfloat16)
Constrain input and output to float tensors.
M : tensor(int32)
Constrain integer type.
### **com.microsoft.SparseToDenseMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication
#### Inputs
A : T
B : T1
#### Outputs
Y : T1
#### Type Constraints
T : sparse_tensor(float), sparse_tensor(double), sparse_tensor(int64), sparse_tensor(int32), sparse_tensor(uint64), sparse_tensor(uint32)
Constrain input and output types to float tensors.
T1 : tensor(float), tensor(double), tensor(int64), tensor(int32), tensor(uint64), tensor(uint32)
Constrain input and output types to float tensors.
### **com.microsoft.Tokenizer** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
mark : int (required)
Boolean whether to mark the beginning/end character with start of text character (0x02)/end of text character (0x03).
mincharnum : int (required)
Minimum number of characters allowed in the output. For example, if mincharnum is 2, tokens such as "A" and "B" would be ignored
pad_value : string (required)
The string used to pad output tensors when the tokens extracted doesn't match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers.
separators : list of strings
an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is "Hello World!" and this attribute contains only one space character, the corresponding output would be ["Hello", "World!"]. To achieve character-level tokenization, one should set the 'separators' to [""], which contains an empty string.
tokenexp : string
An optional string. Token's regular expression in basic POSIX format (pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of 'tokenexp' and 'separators' should be set.
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(string)
Input/Output is a string tensor
### **com.microsoft.TorchEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs (2 - 4)
weight : T
indices : tensor(int64)
padding_idx (optional) : tensor(int64)
scale_grad_by_freq (optional) : tensor(bool)
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64)
Constrain input and output types to all numeric tensors.
### **com.microsoft.TransposeMatMul** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
alpha : float
Scalar multiplier for the product of the input tensors.
transA : int
Whether A should be transposed on the last two dimensions before doing multiplication
transB : int
Whether B should be transposed on the last two dimensions before doing multiplication
#### Inputs
A : T
B : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.microsoft.Trilu** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
upper : int
Boolean. Indicates whether upper or lower part of matrix is retained. Default is true.
#### Inputs (1 - 2)
X : T
k (optional) : tensor(int64)
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool)
Constrain input and output types to all numeric tensors and bool tensors.
### **com.microsoft.UnfoldTensor** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
dim : int
specify the dimension to unfold
size : int (required)
specify the size
step : int
specify the step.
#### Inputs
input : T
#### Outputs
output : T
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Allow inputs and outputs to be any kind of tensor.
### **com.microsoft.Unique** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Inputs
x : T
#### Outputs
y : T
idx : tensor(int64)
counts : tensor(int64)
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Input can be of any tensor type.
### **com.microsoft.WhisperBeamSearch** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
beginning_timestamp_token_id : int
The id of the first timestamp
decoder : graph (required)
Decoder subgraph to execute in a loop.
decoder_output_cross_qk : int
If nozero, decoder subgraph contains output Q*K from cross attentions. Default 0.
decoder_start_token_id : int
The id of the token that indicates decoding starts (i.e. the start of transcription token id)
early_stopping : int
early stop or not
encoder : graph
The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
eos_token_id : int (required)
The id of the end-of-sequence token
init_decoder : graph
The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
model_type : int
Must be 2 for whisper
no_repeat_ngram_size : int
no repeat ngrams size
no_speech_token_id : int
The token in whisper model that marks all sequence empty. With this model, whisper could output no_speech_prob after. Default -1.
no_timestamps_token_id : int
The id of the token that indicates no timestamps
pad_token_id : int (required)
The id of the padding token
start_of_lm_token_id : int
The id of the token that indicates LM starts
transcribe_token_id : int
The id of the transcribe task
translate_token_id : int
The id of the translate task
vocab_size : int
Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
#### Inputs (5 - 15)
input_ids : F
max_length : I
min_length (optional) : I
num_beams : I
num_return_sequences : I
length_penalty (optional) : T
repetition_penalty (optional) : T
vocab_mask (optional) : M
prefix_vocab_mask (optional) : M
attention_mask (optional) : I
decoder_input_ids (optional) : I
logits_processor (optional) : I
cross_qk_layer_head (optional) : I
extra_decoding_ids (optional) : I
temperature (optional) : T
#### Outputs (1 - 5)
sequences : I
sequences_scores (optional) : T
scores (optional) : T
cross_qk (optional) : V
non_speech_probs (optional) : T
#### Type Constraints
T : tensor(float), tensor(float16)
Constrain to float tensors.
F : tensor(float), tensor(int32), tensor(float16)
Constrain input type to float or int tensors.
I : tensor(int32)
Constrain to integer types
M : tensor(int32)
Constrain mask to integer types
V : tensor(float)
Constrain cross_qk to float32 tensors.
### **com.microsoft.WordConvEmbedding** #### Version This version of the operator has been available since version 1 of the 'com.microsoft' operator set. #### Attributes
char_embedding_size : int
Integer representing the embedding vector size for each char.If not provide, use the char embedding size of embedding vector.
conv_window_size : int
This operator applies convolution to word from left to right with window equal to conv_window_size and stride to 1.Take word 'example' for example, with conv_window_size equal to 2, conv is applied to [ex],[xa], [am], [mp]...If not provide, use the first dimension of conv kernel shape.
embedding_size : int
Integer representing the embedding vector size for each word.If not provide, use the filter size of conv weight
#### Inputs
Sequence : T
W : T1
B : T1
C : T1
#### Outputs
Y : T1
#### Type Constraints
T : tensor(int32)
Constrain to tensor(int32).
T1 : tensor(float)
Constrain to tensor(float).
### experimental **com.microsoft.IsAllFinite** #### Version No versioning maintained for experimental ops. #### Attributes
isinf_only : int
If true, check only for Inf, -Inf.
isnan_only : int
If true, check only for NaN.
#### Inputs (1 - ∞)
input (variadic) : V
#### Outputs
output : T
#### Type Constraints
V : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T : tensor(bool)
Constrain the output to a boolean tensor.
### experimental **com.microsoft.QEmbedLayerNormalization** #### Version No versioning maintained for experimental ops. #### Attributes
epsilon : float
The epsilon value to use to avoid division by zero.
#### Inputs
input_ids : T1
segment_ids (optional) : T1
word_embedding_quant : T2
position_embedding_quant : T2
segment_embedding (optional) : T2
gamma_quant : T2
beta_quant : T2
mask (optional) : T1
word_embedding_scale : T
position_embedding_scale : T
segment_embedding_scale (optional) : T
gamma_scale : T
beta_scale : T
word_embedding_zero_point : T2
position_embedding_zero_point : T2
segment_embedding_zero_point (optional) : T2
gamma_zero_point : T2
beta_zero_point : T2
#### Outputs
layernorm_out : T
mask_index_out : T1
#### Type Constraints
T1 : tensor(int32)
Constrain mask index to integer types
T2 : tensor(int8), tensor(uint8)
Constrain input and output types to int8 tensors.
T : tensor(float)
Constrain input and output types to float32 tensors.
## com.microsoft.nchwc ### **com.microsoft.nchwc.AveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes
auto_pad : string
ceil_mode : int
count_include_pad : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
strides : list of ints
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.Conv** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes
activation : string
activation_params : list of floats
auto_pad : string
dilations : list of ints
group : int
kernel_shape : list of ints
pads : list of ints
strides : list of ints
#### Inputs (2 - 4)
X : T
W : T
B (optional) : T
Sum (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.GlobalAveragePool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.GlobalMaxPool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.MaxPool** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes
auto_pad : string
ceil_mode : int
dilations : list of ints
kernel_shape : list of ints (required)
pads : list of ints
storage_order : int
strides : list of ints
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.ReorderInput** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes
channels_last : int
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.ReorderOutput** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes
channels : int
channels_last : int
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
### **com.microsoft.nchwc.Upsample** #### Version This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set. #### Attributes
coordinate_transformation_mode : string
mode : string
scales : list of ints
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float)
Constrain input and output types to float tensors
## com.ms.internal.nhwc ### **com.ms.internal.nhwc.BatchNormalization** #### Version This version of the operator has been available since version 15 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.BatchNormalization-7, com.ms.internal.nhwc.BatchNormalization-9, com.ms.internal.nhwc.BatchNormalization-14 #### Attributes
activation : string
activation_params : list of floats
epsilon : float
The epsilon value to use to avoid division by zero.
momentum : float
Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).
training_mode : int
If set to true, it indicates BatchNormalization is being used for training, and outputs 1 and 2 are to be computed.
#### Inputs
X : T
scale : T1
B : T1
input_mean : T2
input_var : T2
#### Outputs (1 - 3)
Y : T
running_mean (optional) : T2
running_var (optional) : T2
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain scale and bias types to float tensors.
T2 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain mean and variance types to float tensors.
### **com.ms.internal.nhwc.ConvTranspose** #### Version This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.ConvTranspose-1 #### Attributes
activation : string
activation_params : list of floats
auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = input_shape[i] * strides[i]` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
group : int
number of groups input channels and output channels are divided into.
kernel_shape : list of ints
The shape of the convolution kernel. If not present, should be inferred from input W.
output_padding : list of ints
Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
output_shape : list of ints
The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads. Note that the output_shape attribute value should not include dimensions for batch size and channels, which are automatically inferred.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
#### Inputs (2 - 3)
X : T
W : T
B (optional) : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.ms.internal.nhwc.DepthToSpace** #### Version This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.DepthToSpace-1, com.ms.internal.nhwc.DepthToSpace-11 #### Attributes
blocksize : int (required)
Blocks of [blocksize, blocksize] are moved.
mode : string
DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order.
#### Inputs
input : T
#### Outputs
output : T
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input and output types to all tensor types.
### **com.ms.internal.nhwc.GlobalLpPool** #### Version This version of the operator has been available since version 2 of the 'com.ms.internal.nhwc' operator set. #### Attributes
p : int
p value of the Lp norm used to pool over the input data.
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(bfloat16), tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.ms.internal.nhwc.InstanceNormalization** #### Version This version of the operator has been available since version 6 of the 'com.ms.internal.nhwc' operator set. #### Attributes
activation : string
activation_params : list of floats
epsilon : float
The epsilon value to use to avoid division by zero.
#### Inputs
input : T
scale : T
B : T
#### Outputs
output : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.ms.internal.nhwc.LRN** #### Version This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.LRN-1 #### Attributes
alpha : float
Scaling parameter.
beta : float
The exponent.
bias : float
size : int (required)
The number of channels to sum over
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
Constrain input and output types to float tensors.
### **com.ms.internal.nhwc.LpPool** #### Version This version of the operator has been available since version 18 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.LpPool-11 #### Attributes
auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = ceil(input_shape[i] / strides[i])` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
ceil_mode : int
Whether to use ceil or floor (default) to compute the output shape.
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
kernel_shape : list of ints (required)
The size of the kernel along each axis.
p : int
p value of the Lp norm used to pool over the input data.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
#### Inputs
X : T
#### Outputs
Y : T
#### Type Constraints
T : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
### **com.ms.internal.nhwc.MaxUnpool** #### Version This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.MaxUnpool-9 #### Attributes
activation : string
activation_params : list of floats
kernel_shape : list of ints (required)
The size of the kernel along each axis.
pads : list of ints
Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
#### Inputs (2 - 3)
X : T1
I : T2
output_shape (optional) : T2
#### Outputs
output : T1
#### Type Constraints
T1 : tensor(float16), tensor(float), tensor(double)
Constrain input and output types to float tensors.
T2 : tensor(int64)
Constrain index tensor to int64
### **com.ms.internal.nhwc.QLinearConvTranspose** #### Version This version of the operator has been available since version 1 of the 'com.ms.internal.nhwc' operator set. #### Attributes
auto_pad : string
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET
dilations : list of ints
dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
group : int
number of groups input channels and output channels are divided into.
kernel_shape : list of ints
The shape of the convolution kernel. If not present, should be inferred from input W.
output_padding : list of ints
Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
output_shape : list of ints
The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads
pads : list of ints
Padding for the beginning and ending along each spatial axis
strides : list of ints
Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
#### Inputs (8 - 9)
x : T1
x_scale : tensor(float)
x_zero_point : T1
w : T2
w_scale : tensor(float)
w_zero_point : T2
y_scale : tensor(float)
y_zero_point : T3
B (optional) : T4
#### Outputs
y : T3
#### Type Constraints
T1 : tensor(int8), tensor(uint8)
Constrain input type to 8-bit integer tensor.
T2 : tensor(int8), tensor(uint8)
Constrain filter type to 8-bit integer tensor.
T3 : tensor(int8), tensor(uint8)
Constrain output type to 8-bit integer tensor.
T4 : tensor(int32)
Constrain bias type to 32-bit integer tensor.
### **com.ms.internal.nhwc.Resize** #### Version This version of the operator has been available since version 19 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.Resize-11, com.ms.internal.nhwc.Resize-13, com.ms.internal.nhwc.Resize-18 #### Attributes
antialias : int
If set to 1, "linear" and "cubic" interpolation modes will use an antialiasing filter when downscaling. Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel.
axes : list of ints
If provided, it specifies a subset of axes that 'roi', 'scales' and 'sizes' refer to. If not provided, all axes are assumed [0, 1, ..., r-1], where r = rank(data). Non-specified dimensions are interpreted as non-resizable. Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.
coordinate_transformation_mode : string
This attribute describes how to transform the coordinate in the resized tensor to the coordinate in the original tensor. The coordinate of each dimension is transformed individually. Let's describe a case using axis x as an example. Denote `x_resized` as the coordinate of axis x in the resized tensor, `x_original` as the coordinate of axis x in the original tensor, `length_original` as the length of the original tensor in axis x, `length_resized` as the length of the resized tensor in axis x, `scale = length_resized / length_original`, `output_width` the target length on the axis x which can be a fractional number when it is calculated out of a scale factor, and `output_width_int` the effective output width as an integer. if coordinate_transformation_mode is `"half_pixel"`, ``` x_original = (x_resized + 0.5) / scale - 0.5 ``` if coordinate_transformation_mode is `"half_pixel_symmetric"`, ``` adjustment = output_width_int / output_width center = input_width / 2 offset = center * (1 - adjustment) x_ori = offset + (x + 0.5) / scale - 0.5 ``` if coordinate_transformation_mode is `"pytorch_half_pixel"`, ``` x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0 ``` if coordinate_transformation_mode is `"align_corners"`, ``` x_original = x_resized * (length_original - 1) / (length_resized - 1) ``` if coordinate_transformation_mode is `"asymmetric"`, ``` x_original = x_resized / scale ``` if coordinate_transformation_mode is `"tf_crop_and_resize"`, ``` x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1) ``` .
cubic_coeff_a : float
The coefficient 'a' used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if mode is "cubic".
exclude_outside : int
If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0.
extrapolation_value : float
When coordinate_transformation_mode is "tf_crop_and_resize" and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f.
keep_aspect_ratio_policy : string
This attribute describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input, and it is not applicable when the `scales` input is used. Given a set of `sizes`, associated with a subset of `axes` (explicitly provided or default), and assuming `d = axes[i]`, with `i` being the index of the provided `sizes`. If `keep_aspect_ratio_policy` is `"stretch"`, the original aspect ratio is disregarded, and the input is resized to the specified size: `out_size[d] = sizes[i]` If `keep_aspect_ratio_policy` is `"not_larger"`, the sizes are adjusted so that no extent of the output is larger than the specified size, while keeping the original aspect ratio: ``` scale = Min(sizes[i] / in_size[d]) out_size[d] = round_int(scale * in_size[i]) ``` If `keep_aspect_ratio_policy` is `"not_smaller"`, the sizes are adjusted so that no extent of the output is smaller than the specified size, while keeping the original aspect ratio: ``` scale = Max(sizes[i] / in_size[d]) out_size[d] = round_int(scale * in_size[i]) ``` For non-resizable axes (those not specified in `axes`), the output size will be equal to the input size. Note: `round_int` stands for computing the nearest integer value, rounding halfway cases up.
mode : string
Three interpolation modes: "nearest" (default), "linear" and "cubic". The "linear" mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The "cubic" mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor).
nearest_mode : string
Four modes: "round_prefer_floor" (default, as known as round half down), "round_prefer_ceil" (as known as round half up), "floor", "ceil". Only used by nearest interpolation. It indicates how to get "nearest" pixel in input tensor from x_original, so this attribute is valid only if "mode" is "nearest".
#### Inputs (1 - 4)
X : T1
roi (optional) : T2
scales (optional) : tensor(float)
sizes (optional) : tensor(int64)
#### Outputs
Y : T1
#### Type Constraints
T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input 'X' and output 'Y' to all tensor types.
T2 : tensor(float16), tensor(float), tensor(double)
Constrain roi type to float or double.
### **com.ms.internal.nhwc.SpaceToDepth** #### Version This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set. Other versions of this operator: com.ms.internal.nhwc.SpaceToDepth-1 #### Attributes
blocksize : int (required)
Blocks of [blocksize, blocksize] are moved.
#### Inputs
input : T
#### Outputs
output : T
#### Type Constraints
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
Constrain input and output types to all tensor types.