175 KiB
Contrib Operator Schemas
This file is automatically generated from the registered contrib operator schemas by this script. Do not modify directly.
- com.microsoft
- com.microsoft.Attention
- com.microsoft.AttnLSTM
- com.microsoft.BeamSearch
- com.microsoft.BiasAdd
- com.microsoft.BiasDropout
- com.microsoft.BiasGelu
- com.microsoft.BiasSoftmax
- com.microsoft.BiasSplitGelu
- com.microsoft.BifurcationDetector
- com.microsoft.BitmaskBiasDropout
- com.microsoft.BitmaskDropout
- com.microsoft.CDist
- com.microsoft.ComplexMul
- com.microsoft.ComplexMulConj
- com.microsoft.ConvTransposeWithDynamicPads
- com.microsoft.CropAndResize
- com.microsoft.DecoderAttention
- com.microsoft.DecoderMaskedMultiHeadAttention
- com.microsoft.DecoderMaskedSelfAttention
- com.microsoft.DequantizeBFP
- com.microsoft.DequantizeLinear
- com.microsoft.DequantizeWithOrder
- com.microsoft.DynamicQuantizeLSTM
- com.microsoft.DynamicQuantizeMatMul
- com.microsoft.DynamicTimeWarping
- com.microsoft.EPContext
- com.microsoft.EmbedLayerNormalization
- com.microsoft.ExpandDims
- com.microsoft.FastGelu
- com.microsoft.FusedConv
- com.microsoft.FusedGemm
- com.microsoft.FusedMatMul
- com.microsoft.FusedMatMulActivation
- com.microsoft.GatedRelativePositionBias
- com.microsoft.GatherBlockQuantized
- com.microsoft.GatherND
- com.microsoft.Gelu
- com.microsoft.GemmFastGelu
- com.microsoft.GemmFloat8
- com.microsoft.GemmaRotaryEmbedding
- com.microsoft.GreedySearch
- com.microsoft.GridSample
- com.microsoft.GroupNorm
- com.microsoft.GroupQueryAttention
- com.microsoft.Inverse
- com.microsoft.Irfft
- com.microsoft.LongformerAttention
- com.microsoft.MatMulBnb4
- com.microsoft.MatMulFpQ4
- com.microsoft.MatMulInteger16
- com.microsoft.MatMulIntegerToFloat
- com.microsoft.MatMulNBits
- com.microsoft.MaxpoolWithMask
- com.microsoft.MoE
- com.microsoft.MulInteger
- com.microsoft.MultiHeadAttention
- com.microsoft.MurmurHash3
- com.microsoft.NGramRepeatBlock
- com.microsoft.NhwcConv
- com.microsoft.NhwcFusedConv
- com.microsoft.NhwcMaxPool
- com.microsoft.PackedAttention
- com.microsoft.PackedMultiHeadAttention
- com.microsoft.Pad
- com.microsoft.QAttention
- com.microsoft.QGemm
- com.microsoft.QLinearAdd
- com.microsoft.QLinearAveragePool
- com.microsoft.QLinearConcat
- com.microsoft.QLinearConv
- com.microsoft.QLinearGlobalAveragePool
- com.microsoft.QLinearLeakyRelu
- com.microsoft.QLinearMul
- com.microsoft.QLinearReduceMean
- com.microsoft.QLinearSigmoid
- com.microsoft.QLinearSoftmax
- com.microsoft.QLinearWhere
- com.microsoft.QMoE
- com.microsoft.QOrderedAttention
- com.microsoft.QOrderedGelu
- com.microsoft.QOrderedLayerNormalization
- com.microsoft.QOrderedLongformerAttention
- com.microsoft.QOrderedMatMul
- com.microsoft.QuantizeBFP
- com.microsoft.QuantizeLinear
- com.microsoft.QuantizeWithOrder
- com.microsoft.QuickGelu
- com.microsoft.Range
- com.microsoft.ReduceSumInteger
- com.microsoft.RelativePositionBias
- com.microsoft.RemovePadding
- com.microsoft.RestorePadding
- com.microsoft.Rfft
- com.microsoft.RotaryEmbedding
- com.microsoft.SampleOp
- com.microsoft.Sampling
- com.microsoft.SkipGroupNorm
- com.microsoft.SkipLayerNormalization
- com.microsoft.SkipSimplifiedLayerNormalization
- com.microsoft.Snpe
- com.microsoft.SparseAttention
- com.microsoft.SparseToDenseMatMul
- com.microsoft.Tokenizer
- com.microsoft.TorchEmbedding
- com.microsoft.TransposeMatMul
- com.microsoft.Trilu
- com.microsoft.UnfoldTensor
- com.microsoft.Unique
- com.microsoft.WhisperBeamSearch
- com.microsoft.WordConvEmbedding
- experimental com.microsoft.IsAllFinite
- experimental com.microsoft.QEmbedLayerNormalization
- com.microsoft.nchwc
- com.ms.internal.nhwc
- com.ms.internal.nhwc.BatchNormalization
- com.ms.internal.nhwc.ConvTranspose
- com.ms.internal.nhwc.DepthToSpace
- com.ms.internal.nhwc.GlobalLpPool
- com.ms.internal.nhwc.InstanceNormalization
- com.ms.internal.nhwc.LRN
- com.ms.internal.nhwc.LpPool
- com.ms.internal.nhwc.MaxUnpool
- com.ms.internal.nhwc.QLinearConvTranspose
- com.ms.internal.nhwc.Resize
- com.ms.internal.nhwc.SpaceToDepth
com.microsoft
com.microsoft.Attention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- do_rotary : int
- Whether to use rotary position embedding. Default value is 0.
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
- past_present_share_buffer : int
- Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
- qkv_hidden_sizes : list of ints
- Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
- rotary_embedding_dim : int
- Dimension of rotary embedding. Limited to 32, 64 or 128. Default value is head_size
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
- unidirectional : int
- Whether every token can only attend to previous tokens. Default value is 0.
Inputs (2 - 7)
- input : T
- weights : T
- bias (optional) : T
- mask_index (optional) : M
- past (optional) : T
- attention_bias (optional) : T
- past_sequence_length (optional) : M
Outputs (1 - 2)
- output : T
- present (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain mask index to integer types
com.microsoft.AttnLSTM
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation_alpha : list of floats
- Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
- activation_beta : list of floats
- Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
- activations : list of strings
- A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
- clip : float
- Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
- direction : string
- Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
- hidden_size : int
- Number of neurons in the hidden layer.
- input_forget : int
- Couple the input and forget gates if 1, default 0.
Inputs (3 - 14)
- X : T
- W : T
- R : T
- B (optional) : T
- sequence_lens (optional) : T1
- initial_h (optional) : T
- initial_c (optional) : T
- P (optional) : T
- QW (optional) : T
- MW (optional) : T
- V (optional) : T
- M (optional) : T
- memory_seq_lens (optional) : T1
- AW (optional) : T
Outputs (0 - 3)
- Y (optional) : T
- Y_h (optional) : T
- Y_c (optional) : T
Type Constraints
- T : tensor(float), tensor(double)
- Constrain input and output types to float tensors.
- T1 : tensor(int32)
- Constrain seq_lens to integral tensors.
com.microsoft.BeamSearch
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- decoder : graph (required)
- Decoder subgraph to execute in a loop.
- decoder_start_token_id : int
- The id of the token that indicates decoding starts.
- early_stopping : int
- early stop or not
- encoder : graph
- The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
- eos_token_id : int (required)
- The id of the end-of-sequence token
- init_decoder : graph
- The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
- model_type : int
- model type: 0 for GPT-2; 1 for encoder decoder like T5
- no_repeat_ngram_size : int
- no repeat ngrams size
- pad_token_id : int (required)
- The id of the padding token
- vocab_size : int
- Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
Inputs (5 - 12)
- input_ids : F
- max_length : I
- min_length (optional) : I
- num_beams : I
- num_return_sequences : I
- length_penalty (optional) : T
- repetition_penalty (optional) : T
- vocab_mask (optional) : M
- prefix_vocab_mask (optional) : M
- attention_mask (optional) : I
- decoder_input_ids (optional) : I
- logits_processor (optional) : I
Outputs (1 - 3)
- sequences : I
- sequences_scores (optional) : T
- scores (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain to float tensors.
- F : tensor(float), tensor(int32), tensor(float16)
- Constrain input type to float or int tensors.
- I : tensor(int32)
- Constrain to integer types
- M : tensor(int32)
- Constrain mask to integer types
com.microsoft.BiasAdd
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- X : T
- bias : T
- skip : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float)
- Constrain input and output types to float tensors.
com.microsoft.BiasDropout
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- seed : int
- (Optional) Seed to the random generator, if not specified we will auto generate one.
Inputs (2 - 5)
- data : T
- bias : T
- residual (optional) : T
- ratio (optional) : T1
- training_mode (optional) : T2
Outputs (1 - 2)
- output : T
- mask (optional) : T2
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
- T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input 'ratio' types to float tensors.
- T2 : tensor(bool)
- Constrain output 'mask' types to boolean tensors.
com.microsoft.BiasGelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- A : T
- B : T
Outputs
- C : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.microsoft.BiasSoftmax
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axis : int
- apply softmax to elements for dimensions axis or higher
- is_inner_broadcast : int (required)
- true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis - 1
Inputs
- data : T
- bias : T
Outputs
- output : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.microsoft.BiasSplitGelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- X : T
- bias : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float)
- Constrain input X and output Y types to float tensors.
com.microsoft.BifurcationDetector
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- max_ngram_size : int
- The maximum NGram size for suffix matching.
- min_ngram_size : int
- The minimum NGram size for suffix matching.
Inputs (3 - 4)
- src_tokens : T
- cur_tokens : T
- prev_suffix_match_idx : T
- pred_tokens (optional) : T
Outputs
- tokens : T
- suffix_match_idx : T
Type Constraints
- T : tensor(int64)
- Constrain to integer types.
com.microsoft.BitmaskBiasDropout
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- seed : int
- (Optional) Seed to the random generator, if not specified we will auto generate one.
Inputs (2 - 5)
- data : T
- bias : T
- residual (optional) : T
- ratio (optional) : T1
- training_mode (optional) : T2
Outputs (1 - 2)
- output : T
- mask (optional) : T3
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
- T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input 'ratio' types to float tensors.
- T2 : tensor(bool)
- Constrain input 'training_mode' types to boolean tensors.
- T3 : tensor(uint32)
- Constrain output 'mask' types to uint32 tensors.
com.microsoft.BitmaskDropout
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- seed : int
- (Optional) Seed to the random generator, if not specified we will auto generate one.
Inputs (1 - 3)
- data : T
- ratio (optional) : T1
- training_mode (optional) : T2
Outputs (1 - 2)
- output : T
- mask (optional) : T3
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
- T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input 'ratio' types to float tensors.
- T2 : tensor(bool)
- Constrain 'training_mode' to boolean tensor.
- T3 : tensor(uint32)
- Constrain output 'mask' types to bit-packed uint32 tensor.
com.microsoft.CDist
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- metric : string
- The distance metric to use. If a string, the distance function can be "braycurtis", "canberra", "chebyshev", "cityblock", "correlation", "cosine", "dice", "euclidean", "hamming", "jaccard", "jensenshannon", "kulsinski", "mahalanobis", "matching", "minkowski", "rogerstanimoto", "russellrao", "seuclidean", "sokalmichener", "sokalsneath", "sqeuclidean", "wminkowski", "yule".
Inputs
- A : T
- B : T
Outputs
- C : T
Type Constraints
- T : tensor(float), tensor(double)
- Constrains input to only numeric types.
com.microsoft.ComplexMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- A : T
- B : T
Outputs
- C : T
Type Constraints
- T : tensor(float), tensor(double), tensor(float16)
- Constrain input and output types to float or half tensors.
com.microsoft.ComplexMulConj
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- A : T
- B : T
Outputs
- C : T
Type Constraints
- T : tensor(float), tensor(double), tensor(float16)
- Constrain input and output types to float or half tensors.
com.microsoft.ConvTransposeWithDynamicPads
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- auto_pad : string
- dilations : list of ints
- group : int
- kernel_shape : list of ints
- output_padding : list of ints
- strides : list of ints
Inputs (2 - 4)
- X : T
- W : T
- Pads (optional) : tensor(int64)
- B (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors
com.microsoft.CropAndResize
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- extrapolation_value : float
- Value used for extrapolation, when applicable. Default is 0.0f.
- mode : string
- The pooling method. Two modes are supported: 'bilinear' and 'nearest'. Default is 'bilinear'.
Inputs
- X : T1
- rois : T1
- batch_indices : T2
- crop_size : T2
Outputs
- Y : T1
Type Constraints
- T1 : tensor(float16), tensor(float), tensor(double)
- Constrain types to float tensors.
- T2 : tensor(int32)
- Constrain types to int tensors.
com.microsoft.DecoderAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
Inputs
- query : T
- key : T
- q_weight : T
- kv_weight : T
- bias : T
- key_padding_mask (optional) : B
- key_cache (optional) : T
- value_cache (optional) : T
- static_kv : B
- use_past : B
- has_layer_state : B
- has_key_padding_mask : B
Outputs (1 - 3)
- output : T
- new_key_cache (optional) : T
- new_value_cache (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float and float16 tensors.
- B : tensor(bool)
- Constrain key_padding_mask to bool tensors.
com.microsoft.DecoderMaskedMultiHeadAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
- output_qk : int
- Need output the cross attention MatMul(Q, K)
- past_present_share_buffer : int
- Corresponding past and present are same tensor, its size is (batch_size, num_heads, max_sequence_length, head_size)
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
Inputs (1 - 11)
- query : T
- key (optional) : T
- value (optional) : T
- mask_index (optional) : M
- attention_bias (optional) : T
- past_key (optional) : T
- past_value (optional) : T
- past_sequence_length (optional) : M
- beam_width (optional) : M
- cache_indirection (optional) : M
- bias (optional) : T
Outputs (1 - 4)
- output : T
- present_key (optional) : T
- present_value (optional) : T
- qk (optional) : V
Type Constraints
- V : tensor(float)
- Constrain qk output types to float32 tensors.
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain mask index to integer types
com.microsoft.DecoderMaskedSelfAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- do_rotary : int
- Whether to use rotary position embedding. Default value is 0.
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
- past_present_share_buffer : int
- Corresponding past and present are same tensor, its size is (2, batch_size, num_heads, max_sequence_length, head_size)
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
Inputs (7 - 9)
- input : T
- weights : T
- bias : T
- mask_index (optional) : M
- past : T
- attention_bias (optional) : T
- past_sequence_length : M
- beam_width (optional) : M
- cache_indirection (optional) : M
Outputs
- output : T
- present : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain mask index to integer types
com.microsoft.DequantizeBFP
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- bfp_type : int (required)
- The type of BFP - must match with the BFPType enum
- block_dim : int
- Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.
- dtype : int
- The datatype to dequantize to.
Inputs
- x : T1
- shape : T2
- strides : T2
Outputs
- y : T3
Type Constraints
- T1 : tensor(uint8)
- Constrain the input to uint8.
- T2 : tensor(int64)
- Constrain shape and strides to uint64.
- T3 : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain y to float and bfloat16.
com.microsoft.DequantizeLinear
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axis : int
- The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.
Inputs (2 - 3)
- x : T1
- x_scale : T2
- x_zero_point (optional) : T1
Outputs
- y : T2
Type Constraints
- T1 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int32), tensor(int4), tensor(uint4)
- Constrain 'x' and 'x_zero_point' to 8-bit integer tensors, 16-bit integer tensors, or 32-bit signed integer tensors.
- T2 : tensor(float16), tensor(float)
- Constrain 'y', 'x_scale' to float tensors.
com.microsoft.DequantizeWithOrder
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- order_input : int (required)
- cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
- order_output : int (required)
- cublasLt order of output matrix
- to : int (required)
- The output data type, only support TensorProto_DataType_FLOAT (1) and TensorProto_DataType_FLOAT16 (10)
Inputs
- input : Q
- scale_input : S
Outputs
- output : F
Type Constraints
- Q : tensor(int8)
- Constrain input and output types to int8 tensors.
- F : tensor(float16), tensor(float)
- Constrain to float types
- S : tensor(float)
- Constrain Scale to float32 types
com.microsoft.DynamicQuantizeLSTM
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation_alpha : list of floats
- Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.
- activation_beta : list of floats
- Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.
- activations : list of strings
- A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.
- clip : float
- Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.
- direction : string
- Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional.
- hidden_size : int
- Number of neurons in the hidden layer
- input_forget : int
- Couple the input and forget gates if 1.
Inputs
- X : T
- W : T2
- R : T2
- B (optional) : T
- sequence_lens (optional) : T1
- initial_h (optional) : T
- initial_c (optional) : T
- P (optional) : T
- W_scale : T
- W_zero_point : T2
- R_scale : T
- R_zero_point : T2
Outputs (0 - 3)
- Y (optional) : T
- Y_h (optional) : T
- Y_c (optional) : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors.
- T1 : tensor(int32)
- Constrain seq_lens to integer tensor.
- T2 : tensor(uint8), tensor(int8)
- Constrain weights types to 8 bit tensors.
com.microsoft.DynamicQuantizeMatMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (3 - 5)
- A : T1
- B : T2
- b_scale : T1
- b_zero_point (optional) : T2
- bias (optional) : T1
Outputs
- Y : T1
Type Constraints
- T1 : tensor(float)
- Constrain input A, b_scale and output Y data type as float tensor.
- T2 : tensor(int8), tensor(uint8)
- Constrain input B data type to 8-bit integer tensor.
com.microsoft.DynamicTimeWarping
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- input : F
Outputs
- output : I
Type Constraints
- F : tensor(float)
- Constrain to float tensors.
- I : tensor(int32)
- Constrain to integer types.
com.microsoft.EPContext
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- embed_mode : int
- 1: indicate ep_cache_context is the context content. 0: indicate ep_cache_context is the file path to the context content.The path is relative to this Onnx file. Default is 1.
- ep_cache_context : string
- payload of the execution provider context if embed_mode=1, or path to the context file if embed_mode=0.
- ep_sdk_version : string
- (Optional) SDK version used to convert the model.
- hardware_architecture : string
- (Optional) Hardware architecture.
- main_context : int
- Usually each single EPContext associate with a graph partition.But for some case like QNN, it has single EPContext contains all partitions.In that case, the node with ep_cache_context should set main_context=1. Other nodes set main_context=0 and skip ep_cache_context.The path is relative to this Onnx file. Default is 1.
- max_size : int
- max size in the context. Usage depend on the EP.
- notes : string
- (Optional) Some notes for the model
- onnx_model_filename : string
- (Optional) Filename of the original ONNX model.
- partition_name : string
- (Optional) partitioned graph name.
- source : string
- (Optional) the source used to generate the engine/context cache file. Ort EP or native SDK tool chain
Inputs (1 - ∞)
- inputs (variadic, heterogeneous) : T
Outputs (1 - ∞)
- outputs (variadic, heterogeneous) : T
Type Constraints
- T : tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types.
com.microsoft.EmbedLayerNormalization
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- epsilon : float
- The epsilon value to use to avoid division by zero.
- mask_index_type : int
- The mask index tensor type for shape inference (0: None, 1: 1D mask_index)
Inputs (7 - 9)
- input_ids : T1
- segment_ids (optional) : T1
- word_embedding : T
- position_embedding : T
- segment_embedding (optional) : T
- gamma : T
- beta : T
- mask (optional) : T1
- position_ids (optional) : T1
Outputs (1 - 3)
- output : T
- mask_index (optional) : T1
- embedding_sum (optional) : T
Type Constraints
- T1 : tensor(int32)
- Constrain input and output integer tensors types
- T : tensor(float), tensor(float16)
- Constrain input and output float tensors types.
com.microsoft.ExpandDims
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- X : T
- axis : tensor(int32)
Outputs
- Y : T
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.
com.microsoft.FastGelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (1 - 2)
- X : T
- bias (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain input and output types to float or half tensors.
com.microsoft.FusedConv
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : string
- activation_params : list of floats
- auto_pad : string
- dilations : list of ints
- group : int
- kernel_shape : list of ints
- pads : list of ints
- strides : list of ints
Inputs (2 - 4)
- X : T
- W : T
- B (optional) : T
- Z (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors
com.microsoft.FusedGemm
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : string
- activation_alpha : float
- activation_beta : float
- activation_gamma : float
- alpha : float
- Scalar multiplier for the product of input tensors A * B.
- beta : float
- Scalar multiplier for input tensor C.
- transA : int
- Whether A should be transposed
- transB : int
- Whether B should be transposed
Inputs (2 - 3)
- A : T
- B : T
- C (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(uint32), tensor(uint64), tensor(int32), tensor(int64)
- Constrain input and output types to float/int tensors.
com.microsoft.FusedMatMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- alpha : float
- Scalar multiplier for the product of the input tensors.
- transA : int
- Whether A should be transposed on the last two dimensions before doing multiplication
- transB : int
- Whether B should be transposed on the last two dimensions before doing multiplication
- transBatchA : int
- Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
- transBatchB : int
- Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
Inputs
- A : T
- B : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.microsoft.FusedMatMulActivation
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : string (required)
- activation_alpha : float
- activation_axis : int
- activation_beta : float
- activation_gamma : float
- alpha : float
- Scalar multiplier for the product of the input tensors.
- transA : int
- Whether A should be transposed on the last two dimensions before doing multiplication
- transB : int
- Whether B should be transposed on the last two dimensions before doing multiplication
- transBatchA : int
- Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
- transBatchB : int
- Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication
Inputs
- A : T
- B : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.microsoft.GatedRelativePositionBias
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- num_heads : int (required)
- Number of attention heads
Inputs (6 - 7)
- query_layer : T
- query_bias : T
- rel_pos : T
- weight : T
- bias : T
- eco_a : T
- token_offset (optional) : M
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain token_offset to integer types
com.microsoft.GatherBlockQuantized
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- block_size : int
- (Optional) block size used for weight quantization. It needs to be a power of 2 and not smaller than 16.
- gather_axis : int
- (Optional) Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
- quantize_axis : int
- (Optional) Which axis to block-wise quantize. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).
Inputs (3 - 4)
- data : T1
- indices : Tind
- scales : T2
- zero_points (optional) : T1
Outputs
- output : T2
Type Constraints
- T1 : tensor(int4), tensor(uint4)
- Constrain quantized types.
- T2 : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain dequantized types.
- Tind : tensor(int32), tensor(int64)
- Constrain indices to integer types.
com.microsoft.GatherND
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- data : T
- indices : Tind
Outputs
- output : T
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Constrain input and output types to any tensor type.
- Tind : tensor(int32), tensor(int64)
- Constrain indice type to int32 or int64
com.microsoft.Gelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.microsoft.GemmFastGelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (2 - 3)
- X : T
- W : T
- bias (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain input and output types to float or half tensors.
com.microsoft.GemmFloat8
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : string
- Activation function, RELU or GELU or NONE (default).
- alpha : float
- Scalar multiplier for the product of input tensors A * B.
- beta : float
- Scalar multiplier for the product of input bias C.
- dtype : int
- Output Type. Same definition as attribute 'to' for operator Cast.
- transA : int
- Whether A should be transposed. Float 8 only supprted transA=0.
- transB : int
- Whether B should be transposed. Float 8 only supprted transB=1.
Inputs (2 - 6)
- A : TA
- B : TB
- C (optional) : TC
- scaleA (optional) : TS
- scaleB (optional) : TS
- scaleY (optional) : TS
Outputs
- Y : TR
Type Constraints
- TA : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
- Constrain type to input A.
- TB : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
- Constrain type to input B.
- TC : tensor(float16), tensor(bfloat16), tensor(float)
- Constrain type to input C.
- TR : tensor(float8e4m3fn), tensor(float8e5m2), tensor(float16), tensor(bfloat16), tensor(float)
- Constrain type to result type.
- TS : tensor(float)
- Constrain type for all input scales (scaleA, scaleB, scaleY).
com.microsoft.GemmaRotaryEmbedding
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- emb : U
- q : T
- q_rot : T
- k : T
- k_rot : T
Outputs
- output1 : T
- output2 : T
Type Constraints
- T : tensor(float16)
- Constrain input and output types to float16 tensors.
- U : tensor(float)
- Constrain input 0 type to float tensors
com.microsoft.GreedySearch
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- decoder : graph (required)
- Decoder subgraph to execute in a loop.
- decoder_start_token_id : int
- The id of the token that indicates decoding starts.
- encoder : graph
- The subgraph for initialization of encoder and decoder. It will be called once before `decoder` subgraph.
- eos_token_id : int (required)
- The id of the end-of-sequence token
- init_decoder : graph
- The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
- model_type : int
- model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
- no_repeat_ngram_size : int
- no repeat ngrams size
- pad_token_id : int (required)
- The id of the padding token
- vocab_size : int
- Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
Inputs (2 - 7)
- input_ids : I
- max_length : I
- min_length (optional) : I
- repetition_penalty (optional) : T
- vocab_mask (optional) : I
- prefix_vocab_mask (optional) : I
- attention_mask (optional) : I
Outputs
- sequences : I
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors.
- I : tensor(int32)
- Constrain to integer types
com.microsoft.GridSample
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- align_corners : int
- If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.
- mode : string
- Three interpolation modes: bilinear (default), nearest and bicubic.
- padding_mode : string
- Support padding modes for outside grid values: `zeros`(default), `border`, `reflection`. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations.
Inputs
- X : T1
- Grid : T1
Outputs
- Y : T2
Type Constraints
- T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Constrain input types to all tensor types.
- T2 : tensor(float16), tensor(float), tensor(double)
- Constrain output types to float tensors.
com.microsoft.GroupNorm
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : int (required)
- Activation after group normalization: 0 for None, 1 for SiLU
- channels_last : int
- 1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
- epsilon : float
- The epsilon value to use to avoid division by zero
- groups : int (required)
- The number of groups of channels. It should be a divisor of the number of channels C
Inputs
- X : T
- gamma : M
- beta : M
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float)
- Constrain input X and output Y types to float tensors.
- M : tensor(float16), tensor(float)
- Constrain gamma and beta to float tensors.
com.microsoft.GroupQueryAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- do_rotary : int
- Whether to use rotary position embedding. Default value is 0.
- kv_num_heads : int (required)
- Number of attention heads for k and v
- local_window_size : int
- left_window_size for local attention (like Mistral). Default value is -1 meaning unused.
- num_heads : int (required)
- Number of attention heads for q
- rotary_interleaved : int
- Rotate using interleaved pattern. Default value is 0 (False).
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
- smooth_softmax : int
- Use a smooth factor in softmax.
- softcap : float
- Softcap value for attention weights. Default value is 0.
Inputs (7 - 9)
- query : T
- key (optional) : T
- value (optional) : T
- past_key (optional) : T
- past_value (optional) : T
- seqlens_k : M
- total_sequence_length : M
- cos_cache (optional) : T
- sin_cache (optional) : T
Outputs
- output : T
- present_key : T
- present_value : T
Type Constraints
- T : tensor(float16), tensor(bfloat16), tensor(float)
- Constrain input and output to float tensors.
- M : tensor(int32)
- Constrain mask to int tensor.
com.microsoft.Inverse
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.microsoft.Irfft
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- normalized : int
- must be 0, normalization currently not supported
- onesided : int
- must be 1, only one sided FFTs supported
- signal_ndim : int (required)
- number of dimensions comprising the signal
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float), tensor(double), tensor(float16)
- Constrain input and output types to float or half tensors.
com.microsoft.LongformerAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- num_heads : int (required)
- Number of attention heads
- window : int (required)
- One sided attention windows length W, or half of total window length
Inputs
- input : T
- weight : T
- bias : T
- mask : T
- global_weight : T
- global_bias : T
- global : G
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- G : tensor(int32)
- Constrain to integer types
com.microsoft.MatMulBnb4
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- K : int (required)
- size of each input feature
- N : int (required)
- size of each output feature
- block_size : int (required)
- number of groupsize used for weight quantization. It needs to be a power of 2 and not smaller than 16.
- quant_type : int (required)
- quantization data type. 0 for FP4, 1 for NF4.
- training_mode : int
- Indicate if the ops run in training_mode, by default, False.
- transB : int
- Whether B should be transposed on the last two dimensions before doing multiplication. Default to be 1.
Inputs
- A : T1
- B : T2
- absmax : T1
Outputs
- Y : T1
Type Constraints
- T1 : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain input and output types to float/half_float/brain_float tensors.
- T2 : tensor(uint8)
- Constrain quantized weight types to uint8.
com.microsoft.MatMulFpQ4
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- blk_quant_type : int
- Quantization type
Inputs
- A : T1
- B : T2
- B_shape : T3
Outputs
- Y : T1
Type Constraints
- T1 : tensor(float)
- Constrain input matrix data types as single precision float tensor
- T2 : tensor(uint8)
- Constrain input B data types as data blob
- T3 : tensor(int64)
- Constrain shape of B must be int64 tensor.
com.microsoft.MatMulInteger16
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- A : T1
- B : T2
Outputs
- Y : T3
Type Constraints
- T1 : tensor(int16), tensor(uint16)
- Constrain input A data types as 16-bit integer tensor
- T2 : tensor(int16), tensor(uint16)
- Constrain input B data types as 16-bit integer tensor
- T3 : tensor(int32), tensor(uint32)
- Constrain output Y data types as 32-bit integer tensor.T3 must be tensor(uint32) when both T1 and T2 are tensor(uint16),or must be tensor(int32) when either T1 or T2 is tensor(int16).
com.microsoft.MatMulIntegerToFloat
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (4 - 7)
- A : T1
- B : T2
- a_scale : T3
- b_scale : T3
- a_zero_point (optional) : T1
- b_zero_point (optional) : T2
- bias (optional) : T3
Outputs
- Y : T3
Type Constraints
- T1 : tensor(int8), tensor(uint8)
- Constrain input A data type to 8-bit integer tensor.
- T2 : tensor(int8), tensor(uint8)
- Constrain input B data type to 8-bit integer tensor.
- T3 : tensor(float), tensor(float16)
- Constrain input a_scale, b_scale and output Y data type as float tensor.
com.microsoft.MatMulNBits
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- K : int (required)
- size of each input feature
- N : int (required)
- size of each output feature
- accuracy_level : int
- The minimum accuracy level of input A, can be: 0(unset), 1(fp32), 2(fp16), 3(bf16), or 4(int8) (default unset). It is used to control how input A is quantized or downcast internally while doing computation, for example: 0 means input A will not be quantized or downcast while doing computation. 4 means input A can be quantized with the same block_size to int8 internally from type T1.
- bits : int (required)
- number of bits used for weight quantization (default 4)
- block_size : int (required)
- number of groupsize used for weight quantization,(default 128). It needs to be a power of 2 and not smaller than 16.
Inputs (3 - 6)
- A : T1
- B : T2
- scales : T1
- zero_points (optional) : T3
- g_idx (optional) : T4
- bias (optional) : T1
Outputs
- Y : T1
Type Constraints
- T1 : tensor(float), tensor(float16)
- Constrain input and output types to float/half_float tensors.
- T2 : tensor(uint8), tensor(int32)
- Constrain quantized weight types to uint8/int32.
- T3 : tensor(uint8), tensor(int32), tensor(float16), tensor(float)
- Constrain quantized zero point types to uint8/int32/float16/float.
- T4 : tensor(int32)
- the index tensor.
com.microsoft.MaxpoolWithMask
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- auto_pad : string
- kernel_shape : list of ints
- pads : list of ints
- storage_order : int
- strides : list of ints
Inputs
- X : T
- M : tensor(int32)
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input0 and output types to float tensors
com.microsoft.MoE
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation_type : string
- Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
- k : int
- Number of top experts to select from expert pool
- normalize_routing_weights : int
- Whether to normalize routing weights
- use_sparse_mixer : int
- Whether to use sparse mixer
Inputs (5 - 8)
- input : T
- router_probs : T
- fc1_experts_weights : T
- fc1_experts_bias (optional) : T
- fc2_experts_weights : T
- fc2_experts_bias (optional) : T
- fc3_experts_weights (optional) : T
- fc3_experts_bias (optional) : T
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float or float16 tensors.
com.microsoft.MulInteger
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (3 - 4)
- A : T
- A_zero_point (optional) : T
- B : T
- B_zero_point (optional) : T
Outputs
- C : T1
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input types to 8 bit signed and unsigned tensors.
- T1 : tensor(int32)
- Constrain output types to 32 bit tensors.
com.microsoft.MultiHeadAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
- unidirectional : int
- Whether every token can only attend to previous tokens. Default value is 0.
Inputs (1 - 8)
- query : T
- key (optional) : T
- value (optional) : T
- bias (optional) : T
- key_padding_mask (optional) : M
- attention_bias (optional) : T
- past_key (optional) : T
- past_value (optional) : T
Outputs (1 - 3)
- output : T
- present_key (optional) : T
- present_value (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output to float tensors.
- M : tensor(int32)
- Constrain mask to integer types
com.microsoft.MurmurHash3
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- positive : int
- If value is 1, output type is uint32_t, else int32_t. Default value is 1.
- seed : int
- Seed for the hashing algorithm, unsigned 32-bit integer, default to 0.
Inputs
- X : T1
Outputs
- Y : T2
Type Constraints
- T1 : tensor(uint32), tensor(int32), tensor(uint64), tensor(int64), tensor(float), tensor(double), tensor(string)
- Constrain input type to unsigned or signed 32-bit integer tensor, or string tensor. It should be utf-8 encoded if using unicode.
- T2 : tensor(uint32), tensor(int32)
- Constrain output type to unsigned and signed 32-bit integer tensor.
com.microsoft.NGramRepeatBlock
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- ngram_size : int (required)
- The NGram size.
Inputs
- input_ids : Tid
- scores : T
Outputs
- scores_out : T
Type Constraints
- Tid : tensor(int64)
- Constrain indices to integer types
- T : tensor(float)
- Constrain scores input and output types to float tensors.
com.microsoft.NhwcConv
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- auto_pad : string
- dilations : list of ints
- dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
- group : int
- number of groups input channels and output channels are divided into.
- kernel_shape : list of ints
- The shape of the convolution kernel. If not present, should be inferred from input W.
- pads : list of ints
- strides : list of ints
- Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.
Inputs (2 - 3)
- X : T
- W : T
- B (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.microsoft.NhwcFusedConv
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : string
- activation_params : list of floats
- auto_pad : string
- dilations : list of ints
- group : int
- kernel_shape : list of ints
- pads : list of ints
- strides : list of ints
Inputs (2 - 4)
- X : T
- W : T
- B (optional) : T
- Z (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16)
- Constrain input and output types to float tensors
com.microsoft.NhwcMaxPool
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- auto_pad : string
- ceil_mode : int
- dilations : list of ints
- kernel_shape : list of ints (required)
- pads : list of ints
- strides : list of ints
Inputs
- x : T
Outputs
- y : T
Type Constraints
- T : tensor(int8), tensor(uint8)
com.microsoft.PackedAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- num_heads : int (required)
- Number of attention heads
- qkv_hidden_sizes : list of ints
- Hidden dimension of Q, K, V: hidden_size, hidden_size and v_hidden_size
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
Inputs (5 - 6)
- input : T
- weights : T
- bias : T
- token_offset : M
- cumulative_sequence_length : M
- attention_bias (optional) : T
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain mask index to integer types
com.microsoft.PackedMultiHeadAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
Inputs (6 - 7)
- query : T
- key (optional) : T
- value (optional) : T
- bias (optional) : T
- token_offset : M
- cumulative_sequence_length : M
- attention_bias (optional) : T
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output to float tensors.
- M : tensor(int32)
- Constrain mask, offset and sequence length to integer types
com.microsoft.Pad
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- mode : string
- Three modes: `constant`(default) - pads with a given constant value, `reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, `edge` - pads with the edge values of array
Inputs (2 - 3)
- data : T
- pads : tensor(int64)
- value (optional) : T
Outputs
- output : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.microsoft.QAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- do_rotary : int
- Whether to use rotary position embedding. Default value is 0.
- mask_filter_value : float
- The value to be filled in the attention mask. Default value is -10000.0f
- num_heads : int (required)
- Number of attention heads
- past_present_share_buffer : int
- Corresponding past and present are same tensor, its shape is (2, batch_size, num_heads, max_sequence_length, head_size)
- scale : float
- Custom scale will be used if specified. Default value is 1/sqrt(head_size)
- unidirectional : int
- Whether every token can only attend to previous tokens. Default value is 0.
Inputs (5 - 9)
- input : T1
- weight : T2
- bias : T3
- input_scale : T3
- weight_scale : T3
- mask_index (optional) : T4
- input_zero_point (optional) : T1
- weight_zero_point (optional) : T2
- past (optional) : T3
Outputs (1 - 2)
- output : T3
- present (optional) : T3
Type Constraints
- T1 : tensor(int8), tensor(uint8)
- Constrain input and output types to int8 tensors.
- T2 : tensor(int8), tensor(uint8)
- Constrain input and output types to int8 tensors.
- T3 : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- T4 : tensor(int32)
- Constrain mask index to integer types
com.microsoft.QGemm
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- alpha : float
- Scalar multiplier for the product of input tensors A * B.
- transA : int
- Whether A should be transposed
- transB : int
- Whether B should be transposed
Inputs (6 - 9)
- A : TA
- a_scale : T
- a_zero_point : TA
- B : TB
- b_scale : T
- b_zero_point : TB
- C (optional) : TC
- y_scale (optional) : T
- y_zero_point (optional) : TYZ
Outputs
- Y : TY
Type Constraints
- T : tensor(float)
- Constrain scale types to float tensors.
- TA : tensor(uint8), tensor(int8)
- Constrain input A and its zero point types to 8 bit tensors.
- TB : tensor(uint8), tensor(int8)
- Constrain input B and its zero point types to 8 bit tensors.
- TC : tensor(int32)
- Constrain input C to 32 bit integer tensors.
- TYZ : tensor(uint8), tensor(int8)
- Constrain output zero point types to 8 bit tensors.
- TY : tensor(float), tensor(uint8), tensor(int8)
- Constrain output type to float32 or 8 bit tensors.
com.microsoft.QLinearAdd
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (7 - 8)
- A : T
- A_scale : tensor(float)
- A_zero_point (optional) : T
- B : T
- B_scale : tensor(float)
- B_zero_point (optional) : T
- C_scale : tensor(float)
- C_zero_point (optional) : T
Outputs
- C : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit signed and unsigned tensors.
com.microsoft.QLinearAveragePool
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- auto_pad : string
- auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding.
- ceil_mode : int
- Whether to use ceil or floor (default) to compute the output shape.
- channels_last : int
- Works on NHWC layout or not? Default not.
- count_include_pad : int
- Whether include pad pixels when calculating values for the edges. Default is 0, doesn't count include pad.
- kernel_shape : list of ints (required)
- The size of the kernel along each axis.
- pads : list of ints
- Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
- strides : list of ints
- Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs (4 - 5)
- X : T
- x_scale : tensor(float)
- x_zero_point (optional) : T
- y_scale : tensor(float)
- y_zero_point (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit tensors.
com.microsoft.QLinearConcat
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axis : int (required)
- Which axis to concat on
Inputs (3 - ∞)
- Y_scale : TF
- Y_zero_point : T8
- inputs (variadic, heterogeneous) : TV
Outputs
- Y : T8
Type Constraints
- T8 : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit signed and unsigned tensors.
- TF : tensor(float)
- Constrain scale types to any float tensor type.
- TV : tensor(uint8), tensor(int8), tensor(float)
- Sequence of (Tensor, Scale, ZeroPoint) tuples. The type is sequence of (T8, TF, T8).
com.microsoft.QLinearConv
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- auto_pad : string
- channels_last : int
- dilations : list of ints
- group : int
- kernel_shape : list of ints
- pads : list of ints
- strides : list of ints
Inputs (8 - 9)
- x : T1
- x_scale : tensor(float)
- x_zero_point : T1
- w : T2
- w_scale : tensor(float)
- w_zero_point : T2
- y_scale : tensor(float)
- y_zero_point : T3
- B (optional) : T4
Outputs
- y : T3
Type Constraints
- T1 : tensor(int8), tensor(uint8)
- T2 : tensor(int8), tensor(uint8)
- T3 : tensor(int8), tensor(uint8)
- T4 : tensor(int32)
com.microsoft.QLinearGlobalAveragePool
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- channels_last : int
Inputs
- X : T
- x_scale : tensor(float)
- x_zero_point : T
- y_scale : tensor(float)
- y_zero_point : T
Outputs
- Y : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to signed/unsigned int8 tensors.
com.microsoft.QLinearLeakyRelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- alpha : float
- Coefficient of leakage.
Inputs (4 - 5)
- X : T
- X_scale : tensor(float)
- X_zero_point (optional) : T
- Y_scale : tensor(float)
- Y_zero_point (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit tensors.
com.microsoft.QLinearMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (7 - 8)
- A : T
- A_scale : tensor(float)
- A_zero_point (optional) : T
- B : T
- B_scale : tensor(float)
- B_zero_point (optional) : T
- C_scale : tensor(float)
- C_zero_point (optional) : T
Outputs
- C : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit signed and unsigned tensors.
com.microsoft.QLinearReduceMean
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axes : list of ints (required)
- A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
- keepdims : int (required)
- Keep the reduced dimension or not, default 1 mean keep reduced dimension.
Inputs (4 - 5)
- data : T
- data_scale : tensor(float)
- data_zero_point (optional) : T
- reduced_scale : tensor(float)
- reduced_zero_point (optional) : T
Outputs
- reduced : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input types to 8 bit signed and unsigned tensors.
com.microsoft.QLinearSigmoid
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (4 - 5)
- X : T
- X_scale : tensor(float)
- X_zero_point (optional) : T
- Y_scale : tensor(float)
- Y_zero_point (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit tensors.
com.microsoft.QLinearSoftmax
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axis : int
- apply softmax to elements for dimensions axis,or all dims along with axis according to op-version
- opset : int (required)
- opset version of corresponding SoftMax.
Inputs
- X : T
- X_scale : tensor(float)
- x_zero_point (optional) : T
- y_scale : tensor(float)
- y_zero_point : T
Outputs
- Y : T
Type Constraints
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to signed/unsigned int8 tensors.
com.microsoft.QLinearWhere
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- condition : B
- X : T
- x_scale : TF
- x_zero_point : T
- Y : T
- y_scale : TF
- y_zero_point : T
- z_scale : TF
- z_zero_point : T
Outputs
- Z : T
Type Constraints
- B : tensor(bool)
- Constrain input and output types to 8 bit signed and unsigned tensors.
- TF : tensor(float)
- Constrain scale types to any float tensor type.
- T : tensor(uint8), tensor(int8)
- Constrain input and output types to 8 bit signed and unsigned tensors.
com.microsoft.QMoE
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation_type : string
- Activation function to use. Choose from relu, gelu, silu and identity. Default is relu
- expert_weight_bits : int
- Number of bits used in quantized weights. Default is 4 bits
- k : int
- Number of top experts to select from expert pool
- normalize_routing_weights : int
- Whether to normalize routing weights
- use_sparse_mixer : int
- Whether to use sparse mixer
Inputs (7 - 11)
- input : T
- router_probs : T
- fc1_experts_weights : T1
- fc1_scales : T
- fc1_experts_bias (optional) : T
- fc2_experts_weights : T1
- fc2_scales : T
- fc2_experts_bias (optional) : T
- fc3_experts_weights (optional) : T1
- fc3_scales (optional) : T
- fc3_experts_bias (optional) : T
Outputs
- output : T
Type Constraints
- T : tensor(float16)
- Constrain input and output types to float or float16 tensors.
- T1 : tensor(uint8)
- Constrain weights type to uint8 tensors.
com.microsoft.QOrderedAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- num_heads : int (required)
- Number of attention heads
- order_input : int (required)
- cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
- order_output : int (required)
- cublasLt order of global bias
- order_weight : int (required)
- cublasLt order of weight matrix
- qkv_hidden_sizes : list of ints
- Hidden layer sizes of Q, K, V paths in Attention
- unidirectional : int
- Whether every token can only attend to previous tokens. Default value is 0.
Inputs (17 - 20)
- input : Q
- scale_input : S
- scale_Q_gemm : S
- scale_K_gemm : S
- scale_V_gemm : S
- Q_weight : Q
- K_weight : Q
- V_weight : Q
- scale_Q_weight : S
- scale_K_weight : S
- scale_V_weight : S
- Q_bias : S
- K_bias : S
- V_bias : S
- scale_QKT_gemm (optional) : S
- scale_QKT_softmax (optional) : S
- scale_values_gemm : S
- mask_index (optional) : G
- past (optional) : Q
- attention_bias (optional) : S
Outputs
- output : Q
Type Constraints
- Q : tensor(int8)
- Constrain input and output types to int8 tensors.
- S : tensor(float)
- Constrain scales to float32 tensors.
- G : tensor(int32)
- Constrain to integer types
com.microsoft.QOrderedGelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- order_X : int
- cublasLt order of input X. Optional. See the schema of QuantizeWithOrder for order definition.
- order_Y : int
- cublasLt order of matrix Y, must be same as order_X if specified together. Optional.
Inputs
- X : Q
- scale_X : S
- scale_Y : S
Outputs
- Y : Q
Type Constraints
- Q : tensor(int8)
- Constrain input and output types to int8 tensors.
- S : tensor(float)
- Constrain scales to float32
com.microsoft.QOrderedLayerNormalization
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axis : int
- The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs).
- epsilon : float
- The epsilon value to use to avoid division by zero.
- order_X : int
- cublasLt order of input X. Default is ROW MAJOR. See the schema of QuantizeWithOrder for order definition.
- order_Y : int
- cublasLt order of matrix Y, must be same as order_X. Default is ROW MAJOR.
Inputs
- X : Q
- scale_X : S
- scale : F
- B (optional) : F
- scale_Y : S
Outputs
- Y : Q
Type Constraints
- F : tensor(float16), tensor(float)
- Constrain input gamma and bias could be float16/float tensors. float may get better precision, float16 runs faster.
- S : tensor(float)
- quantization scale must be float tensors.
- Q : tensor(int8)
- quantization tensor must be int8 tensors.
com.microsoft.QOrderedLongformerAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- num_heads : int (required)
- Number of attention heads
- order_global_weight : int (required)
- cublasLt order of weight matrix
- order_input : int (required)
- cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
- order_output : int (required)
- cublasLt order of global bias
- order_weight : int (required)
- cublasLt order of weight matrix
- window : int (required)
- One sided attention windows length W, or half of total window length
Inputs
- input : Q
- scale_input : S
- weight : Q
- scale_weight : S
- bias : S
- scale_bias : S
- scale_qkv_gemm : S
- mask : F
- global_weight : Q
- scale_global_weight : S
- global_bias : S
- scale_global_gemm : S
- global : G
- scale_output : S
Outputs
- output : Q
Type Constraints
- Q : tensor(int8)
- Constrain input and output types to int8 tensors.
- S : tensor(float)
- Constrain scales to float32 tensors.
- G : tensor(int32)
- Constrain to integer types
- F : tensor(float16)
- Be compatible with float version.
com.microsoft.QOrderedMatMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- order_A : int (required)
- cublasLt order of matrix A. See the schema of QuantizeWithOrder for order definition.
- order_B : int (required)
- cublasLt order of matrix B
- order_Y : int (required)
- cublasLt order of matrix Y and optional matrix C
Inputs (5 - 8)
- A : Q
- scale_A : S
- B : Q
- scale_B : S
- scale_Y : S
- bias (optional) : S
- C (optional) : Q
- scale_C (optional) : S
Outputs
- Y : Q
Type Constraints
- Q : tensor(int8)
- Constrain input and output types to int8 tensors.
- S : tensor(float)
- Constrain bias and scales to float32
com.microsoft.QuantizeBFP
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- bfp_type : int (required)
- The type of BFP - must match with the BFPType enum
- block_dim : int
- Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension.
Inputs
- x : T1
Outputs
- y : T2
- shape : T3
- strides : T3
Type Constraints
- T1 : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain the input to float and bfloat.
- T2 : tensor(uint8)
- Constrain y to uint8.
- T3 : tensor(int64)
- Constrain shape and strides to uint64.
com.microsoft.QuantizeLinear
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axis : int
- The axis along which same quantization parameters are applied. It's optional.If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars.If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.
Inputs (2 - 3)
- x : T1
- y_scale : T1
- y_zero_point (optional) : T2
Outputs
- y : T2
Type Constraints
- T1 : tensor(float16), tensor(float)
- Constrain 'x', 'y_scale' to float tensors.
- T2 : tensor(int8), tensor(uint8), tensor(int16), tensor(uint16), tensor(int4), tensor(uint4)
- Constrain 'y_zero_point' and 'y' to 8-bit and 16-bit integer tensors.
com.microsoft.QuantizeWithOrder
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- order_input : int (required)
- cublasLt order of input matrix. ORDER_COL = 0, ORDER_ROW = 1, ORDER_COL32 = 2, ORDER_COL4_4R2_8C = 3, ORDER_COL32_2R_4R4 = 4. Please refer https://docs.nvidia.com/cuda/cublas/index.html#cublasLtOrder_t for their meaning.
- order_output : int (required)
- cublasLt order of output matrix.
Inputs
- input : F
- scale_input : S
Outputs
- output : Q
Type Constraints
- Q : tensor(int8)
- Constrain input and output types to int8 tensors.
- F : tensor(float16), tensor(float)
- Constrain to float types
- S : tensor(float)
- Constrain Scale to float32 types
com.microsoft.QuickGelu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- alpha : float
- Alpha value.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.microsoft.Range
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (2 - 3)
- start : T
- limit : T
- delta (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float), tensor(double), tensor(int16), tensor(int32), tensor(int64)
- Constrain input and output types.
com.microsoft.ReduceSumInteger
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- axes : list of ints (required)
- A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.
- keepdims : int (required)
- Keep the reduced dimension or not, default 1 mean keep reduced dimension.
Inputs
- data : T1
Outputs
- reduced : T2
Type Constraints
- T1 : tensor(int8), tensor(uint8)
- Constrain input type to 8-bit integer tensor.
- T2 : tensor(int32), tensor(uint32)
- Constrain output data type to 32-bit integer tensor.T2 must be tensor(uint32) when T1 is tensor(uint8),or must be tensor(int32) when T1 is tensor(int8).
com.microsoft.RelativePositionBias
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- is_bidirectional : int
- Default value is 0.
- max_distance : int (required)
- Max distance
Inputs
- bias_table : T
- query_length : U
- key_length : U
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float or half tensors.
- U : tensor(int64)
- Constrain sequence_length to int tensors.
com.microsoft.RemovePadding
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- input : T
- sequence_token_count : M
Outputs
- output : T
- token_offset : M
- cumulated_seq_len : M
- max_seq_len : M
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain sequence_token_count and token_offset to integer types
com.microsoft.RestorePadding
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- input : T
- token_offset : M
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float tensors.
- M : tensor(int32)
- Constrain token_offset to integer types
com.microsoft.Rfft
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- normalized : int
- must be 0, normalization currently not supported
- onesided : int
- must be 1, only one sided FFTs supported
- signal_ndim : int
- number of dimensions comprising the signal, collected in reverse order (e.g. 1 = last dimension is the signal)
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float), tensor(double), tensor(float16)
- Constrain input and output types to float or half tensors.
com.microsoft.RotaryEmbedding
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- interleaved : int
- Rotate using interleaved pattern. Default value is 0 (False).
- is_packed_batching : int
- ragged batch inputs or not. Default value is 0
- num_heads : int
- Number of attention heads. Default value is 0. Must use with rotary_embedding_dim
- rotary_embedding_dim : int
- Rotary embedding dimension. Default value is 0.
- scale : float
- Custom scale will be used if specified. Default value is 1.0
Inputs
- input : T
- position_ids : M
- cos_cache : T
- sin_cache : T
Outputs
- output : T
Type Constraints
- T : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain input and output types to float tensors.
- M : tensor(int64)
- Constrain input and output types to integer tensors
com.microsoft.SampleOp
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(uint32), tensor(uint64), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)
- Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.
com.microsoft.Sampling
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- custom : int
- If 1 custom sampling logic
- decoder : graph (required)
- Decoder subgraph to execute in a loop.
- decoder_start_token_id : int
- The id of the token that indicates decoding starts.
- encoder : graph
- The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
- eos_token_id : int (required)
- The id of the end-of-sequence token
- filter_value : float
- All filtered values will be set to this float value.
- init_decoder : graph
- The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
- min_tokens_to_keep : int
- Minimumber of tokens we keep per batch example in the output.
- model_type : int
- Model type: 0 for decoder only like GPT-2; 1 for encoder decoder like Bart
- no_repeat_ngram_size : int
- no repeat ngrams size
- pad_token_id : int (required)
- The id of the padding token
- presence_penalty : float
- Presence penalty for custom sampling
- temperature : float
- The value used to module the next token probabilities.
- top_p : float
- If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or higher are kept for generation.
- vocab_size : int
- Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
Inputs (2 - 9)
- input_ids : I
- max_length : I
- min_length (optional) : I
- repetition_penalty (optional) : T
- vocab_mask (optional) : I
- prefix_vocab_mask (optional) : I
- attention_mask (optional) : I
- presence_mask (optional) : I
- seed (optional) : I
Outputs (1 - 2)
- sequences : I
- filtered_logits (optional) : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors.
- I : tensor(int32)
- Constrain to integer types
com.microsoft.SkipGroupNorm
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- activation : int (required)
- Activation after group normalization: 0 for None, 1 for SiLU
- channels_last : int
- 1 if the input and output are in the NHWC layout, 0 if it is in the NCHW layout. Defaults to 1.
- epsilon : float
- The epsilon value to use to avoid division by zero
- groups : int (required)
- The number of groups of channels. It should be a divisor of the number of channels C
Inputs (4 - 5)
- X : T
- gamma : M
- beta : M
- skip : T
- bias (optional) : T
Outputs (1 - 2)
- Y : T
- S (optional) : T
Type Constraints
- T : tensor(float16), tensor(float)
- Constrain input X, skip, bias and output Y, S types to float tensors.
- M : tensor(float16), tensor(float)
- Constrain gamma and beta to float tensors.
com.microsoft.SkipLayerNormalization
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- epsilon : float
- The epsilon value to use to avoid division by zero.
Inputs (3 - 5)
- input : T
- skip : T
- gamma : T
- beta (optional) : T
- bias (optional) : T
Outputs (1 - 4)
- output : T
- mean (optional) : U
- inv_std_var (optional) : U
- input_skip_bias_sum (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float or half tensors.
- U : tensor(float)
- Constrain mean and inv_std_var to float tensors.
com.microsoft.SkipSimplifiedLayerNormalization
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- epsilon : float
- The epsilon value to use to avoid division by zero.
Inputs (3 - 4)
- input : T
- skip : T
- gamma : T
- bias (optional) : T
Outputs (1 - 4)
- output : T
- mean (optional) : U
- inv_std_var (optional) : U
- input_skip_bias_sum (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain input and output types to float or half tensors.
- U : tensor(float)
- Constrain mean and inv_std_var to float tensors.
com.microsoft.Snpe
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- DLC : string (required)
- payload of the SNPE DLC file.
- notes : string
- (Optional) Some notes for the model
- snpe_version : string
- (Optional) SNPE version used to convert the model.
- target_device : string
- (Optional) Target device like CPU, DSP, etc.
Inputs (1 - ∞)
- inputs (variadic) : T
Outputs (1 - ∞)
- outputs (variadic) : T
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(float)
- Constrain input and output types to uint8, uint16, float tensors.
com.microsoft.SparseAttention
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- do_rotary : int
- Whether to use rotary position embedding. Default value is 0.
- kv_num_heads : int (required)
- Number of attention heads for key and value
- num_heads : int (required)
- Number of attention heads for query
- rotary_interleaved : int
- Rotary use interleaved pattern or not. Default value is 0.
- scale : float
- Scaling factor applied prior to softmax. The default value is 1/sqrt(head_size)
- sparse_block_size : int (required)
- Number of tokens per sparse block. Choices: 16, 32, 64, 128
Inputs (9 - 11)
- query : T
- key (optional) : T
- value (optional) : T
- past_key : T
- past_value : T
- block_row_indices : M
- block_col_indices : M
- total_sequence_length : M
- key_total_sequence_lengths : M
- cos_cache (optional) : T
- sin_cache (optional) : T
Outputs
- output : T
- present_key : T
- present_value : T
Type Constraints
- T : tensor(float), tensor(float16), tensor(bfloat16)
- Constrain input and output to float tensors.
- M : tensor(int32)
- Constrain integer type.
com.microsoft.SparseToDenseMatMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- alpha : float
- Scalar multiplier for the product of the input tensors.
- transA : int
- Whether A should be transposed on the last two dimensions before doing multiplication
- transB : int
- Whether B should be transposed on the last two dimensions before doing multiplication
Inputs
- A : T
- B : T1
Outputs
- Y : T1
Type Constraints
- T : sparse_tensor(float), sparse_tensor(double), sparse_tensor(int64), sparse_tensor(int32), sparse_tensor(uint64), sparse_tensor(uint32)
- Constrain input and output types to float tensors.
- T1 : tensor(float), tensor(double), tensor(int64), tensor(int32), tensor(uint64), tensor(uint32)
- Constrain input and output types to float tensors.
com.microsoft.Tokenizer
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- mark : int (required)
- Boolean whether to mark the beginning/end character with start of text character (0x02)/end of text character (0x03).
- mincharnum : int (required)
- Minimum number of characters allowed in the output. For example, if mincharnum is 2, tokens such as "A" and "B" would be ignored
- pad_value : string (required)
- The string used to pad output tensors when the tokens extracted doesn't match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers.
- separators : list of strings
- an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is "Hello World!" and this attribute contains only one space character, the corresponding output would be ["Hello", "World!"]. To achieve character-level tokenization, one should set the 'separators' to [""], which contains an empty string.
- tokenexp : string
- An optional string. Token's regular expression in basic POSIX format (pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of 'tokenexp' and 'separators' should be set.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(string)
- Input/Output is a string tensor
com.microsoft.TorchEmbedding
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs (2 - 4)
- weight : T
- indices : tensor(int64)
- padding_idx (optional) : tensor(int64)
- scale_grad_by_freq (optional) : tensor(bool)
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64)
- Constrain input and output types to all numeric tensors.
com.microsoft.TransposeMatMul
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- alpha : float
- Scalar multiplier for the product of the input tensors.
- transA : int
- Whether A should be transposed on the last two dimensions before doing multiplication
- transB : int
- Whether B should be transposed on the last two dimensions before doing multiplication
Inputs
- A : T
- B : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.microsoft.Trilu
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- upper : int
- Boolean. Indicates whether upper or lower part of matrix is retained. Default is true.
Inputs (1 - 2)
- X : T
- k (optional) : tensor(int64)
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16), tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bool)
- Constrain input and output types to all numeric tensors and bool tensors.
com.microsoft.UnfoldTensor
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- dim : int
- specify the dimension to unfold
- size : int (required)
- specify the size
- step : int
- specify the step.
Inputs
- input : T
Outputs
- output : T
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Allow inputs and outputs to be any kind of tensor.
com.microsoft.Unique
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Inputs
- x : T
Outputs
- y : T
- idx : tensor(int64)
- counts : tensor(int64)
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Input can be of any tensor type.
com.microsoft.WhisperBeamSearch
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- beginning_timestamp_token_id : int
- The id of the first timestamp
- decoder : graph (required)
- Decoder subgraph to execute in a loop.
- decoder_output_cross_qk : int
- If nozero, decoder subgraph contains output Q*K from cross attentions. Default 0.
- decoder_start_token_id : int
- The id of the token that indicates decoding starts (i.e. the start of transcription token id)
- early_stopping : int
- early stop or not
- encoder : graph
- The subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph.
- eos_token_id : int (required)
- The id of the end-of-sequence token
- init_decoder : graph
- The subgraph for the first decoding run. It will be called once before `decoder` subgraph. This is relevant only for the GPT2 model. If this attribute is missing, the `decoder` subgraph will be used for all decoding runs
- model_type : int
- Must be 2 for whisper
- no_repeat_ngram_size : int
- no repeat ngrams size
- no_speech_token_id : int
- The token in whisper model that marks all sequence empty. With this model, whisper could output no_speech_prob after. Default -1.
- no_timestamps_token_id : int
- The id of the token that indicates no timestamps
- pad_token_id : int (required)
- The id of the padding token
- start_of_lm_token_id : int
- The id of the token that indicates LM starts
- transcribe_token_id : int
- The id of the transcribe task
- translate_token_id : int
- The id of the translate task
- vocab_size : int
- Size of the vocabulary. If not provided, it will be inferred from the decoder subgraph's output shape
Inputs (5 - 15)
- input_ids : F
- max_length : I
- min_length (optional) : I
- num_beams : I
- num_return_sequences : I
- length_penalty (optional) : T
- repetition_penalty (optional) : T
- vocab_mask (optional) : M
- prefix_vocab_mask (optional) : M
- attention_mask (optional) : I
- decoder_input_ids (optional) : I
- logits_processor (optional) : I
- cross_qk_layer_head (optional) : I
- extra_decoding_ids (optional) : I
- temperature (optional) : T
Outputs (1 - 5)
- sequences : I
- sequences_scores (optional) : T
- scores (optional) : T
- cross_qk (optional) : V
- non_speech_probs (optional) : T
Type Constraints
- T : tensor(float), tensor(float16)
- Constrain to float tensors.
- F : tensor(float), tensor(int32), tensor(float16)
- Constrain input type to float or int tensors.
- I : tensor(int32)
- Constrain to integer types
- M : tensor(int32)
- Constrain mask to integer types
- V : tensor(float)
- Constrain cross_qk to float32 tensors.
com.microsoft.WordConvEmbedding
Version
This version of the operator has been available since version 1 of the 'com.microsoft' operator set.
Attributes
- char_embedding_size : int
- Integer representing the embedding vector size for each char.If not provide, use the char embedding size of embedding vector.
- conv_window_size : int
- This operator applies convolution to word from left to right with window equal to conv_window_size and stride to 1.Take word 'example' for example, with conv_window_size equal to 2, conv is applied to [ex],[xa], [am], [mp]...If not provide, use the first dimension of conv kernel shape.
- embedding_size : int
- Integer representing the embedding vector size for each word.If not provide, use the filter size of conv weight
Inputs
- Sequence : T
- W : T1
- B : T1
- C : T1
Outputs
- Y : T1
Type Constraints
- T : tensor(int32)
- Constrain to tensor(int32).
- T1 : tensor(float)
- Constrain to tensor(float).
experimental com.microsoft.IsAllFinite
Version
No versioning maintained for experimental ops.
Attributes
- isinf_only : int
- If true, check only for Inf, -Inf.
- isnan_only : int
- If true, check only for NaN.
Inputs (1 - ∞)
- input (variadic) : V
Outputs
- output : T
Type Constraints
- V : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
- T : tensor(bool)
- Constrain the output to a boolean tensor.
experimental com.microsoft.QEmbedLayerNormalization
Version
No versioning maintained for experimental ops.
Attributes
- epsilon : float
- The epsilon value to use to avoid division by zero.
Inputs
- input_ids : T1
- segment_ids (optional) : T1
- word_embedding_quant : T2
- position_embedding_quant : T2
- segment_embedding (optional) : T2
- gamma_quant : T2
- beta_quant : T2
- mask (optional) : T1
- word_embedding_scale : T
- position_embedding_scale : T
- segment_embedding_scale (optional) : T
- gamma_scale : T
- beta_scale : T
- word_embedding_zero_point : T2
- position_embedding_zero_point : T2
- segment_embedding_zero_point (optional) : T2
- gamma_zero_point : T2
- beta_zero_point : T2
Outputs
- layernorm_out : T
- mask_index_out : T1
Type Constraints
- T1 : tensor(int32)
- Constrain mask index to integer types
- T2 : tensor(int8), tensor(uint8)
- Constrain input and output types to int8 tensors.
- T : tensor(float)
- Constrain input and output types to float32 tensors.
com.microsoft.nchwc
com.microsoft.nchwc.AveragePool
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Attributes
- auto_pad : string
- ceil_mode : int
- count_include_pad : int
- dilations : list of ints
- kernel_shape : list of ints (required)
- pads : list of ints
- strides : list of ints
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.Conv
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Attributes
- activation : string
- activation_params : list of floats
- auto_pad : string
- dilations : list of ints
- group : int
- kernel_shape : list of ints
- pads : list of ints
- strides : list of ints
Inputs (2 - 4)
- X : T
- W : T
- B (optional) : T
- Sum (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.GlobalAveragePool
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.GlobalMaxPool
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.MaxPool
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Attributes
- auto_pad : string
- ceil_mode : int
- dilations : list of ints
- kernel_shape : list of ints (required)
- pads : list of ints
- storage_order : int
- strides : list of ints
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.ReorderInput
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Attributes
- channels_last : int
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.ReorderOutput
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Attributes
- channels : int
- channels_last : int
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.microsoft.nchwc.Upsample
Version
This version of the operator has been available since version 1 of the 'com.microsoft.nchwc' operator set.
Attributes
- coordinate_transformation_mode : string
- mode : string
- scales : list of ints
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float)
- Constrain input and output types to float tensors
com.ms.internal.nhwc
com.ms.internal.nhwc.BatchNormalization
Version
This version of the operator has been available since version 15 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.BatchNormalization-7, com.ms.internal.nhwc.BatchNormalization-9, com.ms.internal.nhwc.BatchNormalization-14
Attributes
- activation : string
- activation_params : list of floats
- epsilon : float
- The epsilon value to use to avoid division by zero.
- momentum : float
- Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum).
- training_mode : int
- If set to true, it indicates BatchNormalization is being used for training, and outputs 1 and 2 are to be computed.
Inputs
- X : T
- scale : T1
- B : T1
- input_mean : T2
- input_var : T2
Outputs (1 - 3)
- Y : T
- running_mean (optional) : T2
- running_var (optional) : T2
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
- T1 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain scale and bias types to float tensors.
- T2 : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain mean and variance types to float tensors.
com.ms.internal.nhwc.ConvTranspose
Version
This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.ConvTranspose-1
Attributes
- activation : string
- activation_params : list of floats
- auto_pad : string
- auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = input_shape[i] * strides[i]` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
- dilations : list of ints
- dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
- group : int
- number of groups input channels and output channels are divided into.
- kernel_shape : list of ints
- The shape of the convolution kernel. If not present, should be inferred from input W.
- output_padding : list of ints
- Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
- output_shape : list of ints
- The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads. Note that the output_shape attribute value should not include dimensions for batch size and channels, which are automatically inferred.
- pads : list of ints
- Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
- strides : list of ints
- Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs (2 - 3)
- X : T
- W : T
- B (optional) : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.ms.internal.nhwc.DepthToSpace
Version
This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.DepthToSpace-1, com.ms.internal.nhwc.DepthToSpace-11
Attributes
- blocksize : int (required)
- Blocks of [blocksize, blocksize] are moved.
- mode : string
- DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order.
Inputs
- input : T
Outputs
- output : T
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Constrain input and output types to all tensor types.
com.ms.internal.nhwc.GlobalLpPool
Version
This version of the operator has been available since version 2 of the 'com.ms.internal.nhwc' operator set.
Attributes
- p : int
- p value of the Lp norm used to pool over the input data.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(bfloat16), tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.ms.internal.nhwc.InstanceNormalization
Version
This version of the operator has been available since version 6 of the 'com.ms.internal.nhwc' operator set.
Attributes
- activation : string
- activation_params : list of floats
- epsilon : float
- The epsilon value to use to avoid division by zero.
Inputs
- input : T
- scale : T
- B : T
Outputs
- output : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.ms.internal.nhwc.LRN
Version
This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.LRN-1
Attributes
- alpha : float
- Scaling parameter.
- beta : float
- The exponent.
- bias : float
- size : int (required)
- The number of channels to sum over
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
- Constrain input and output types to float tensors.
com.ms.internal.nhwc.LpPool
Version
This version of the operator has been available since version 18 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.LpPool-11
Attributes
- auto_pad : string
- auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that `output_shape[i] = ceil(input_shape[i] / strides[i])` for each axis `i`. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER.
- ceil_mode : int
- Whether to use ceil or floor (default) to compute the output shape.
- dilations : list of ints
- dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.
- kernel_shape : list of ints (required)
- The size of the kernel along each axis.
- p : int
- p value of the Lp norm used to pool over the input data.
- pads : list of ints
- Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
- strides : list of ints
- Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs
- X : T
Outputs
- Y : T
Type Constraints
- T : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
com.ms.internal.nhwc.MaxUnpool
Version
This version of the operator has been available since version 11 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.MaxUnpool-9
Attributes
- activation : string
- activation_params : list of floats
- kernel_shape : list of ints (required)
- The size of the kernel along each axis.
- pads : list of ints
- Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. `pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
- strides : list of ints
- Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs (2 - 3)
- X : T1
- I : T2
- output_shape (optional) : T2
Outputs
- output : T1
Type Constraints
- T1 : tensor(float16), tensor(float), tensor(double)
- Constrain input and output types to float tensors.
- T2 : tensor(int64)
- Constrain index tensor to int64
com.ms.internal.nhwc.QLinearConvTranspose
Version
This version of the operator has been available since version 1 of the 'com.ms.internal.nhwc' operator set.
Attributes
- auto_pad : string
- auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET
- dilations : list of ints
- dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.
- group : int
- number of groups input channels and output channels are divided into.
- kernel_shape : list of ints
- The shape of the convolution kernel. If not present, should be inferred from input W.
- output_padding : list of ints
- Additional elements added to the side with higher coordinate indices in the output. Each padding value in "output_padding" must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn't directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If "output_shape" is explicitly provided, "output_padding" does not contribute additional size to "output_shape" but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.
- output_shape : list of ints
- The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads
- pads : list of ints
- Padding for the beginning and ending along each spatial axis
- strides : list of ints
- Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs (8 - 9)
- x : T1
- x_scale : tensor(float)
- x_zero_point : T1
- w : T2
- w_scale : tensor(float)
- w_zero_point : T2
- y_scale : tensor(float)
- y_zero_point : T3
- B (optional) : T4
Outputs
- y : T3
Type Constraints
- T1 : tensor(int8), tensor(uint8)
- Constrain input type to 8-bit integer tensor.
- T2 : tensor(int8), tensor(uint8)
- Constrain filter type to 8-bit integer tensor.
- T3 : tensor(int8), tensor(uint8)
- Constrain output type to 8-bit integer tensor.
- T4 : tensor(int32)
- Constrain bias type to 32-bit integer tensor.
com.ms.internal.nhwc.Resize
Version
This version of the operator has been available since version 19 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.Resize-11, com.ms.internal.nhwc.Resize-13, com.ms.internal.nhwc.Resize-18
Attributes
- antialias : int
- If set to 1, "linear" and "cubic" interpolation modes will use an antialiasing filter when downscaling. Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel.
- axes : list of ints
- If provided, it specifies a subset of axes that 'roi', 'scales' and 'sizes' refer to. If not provided, all axes are assumed [0, 1, ..., r-1], where r = rank(data). Non-specified dimensions are interpreted as non-resizable. Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.
- coordinate_transformation_mode : string
-
This attribute describes how to transform the coordinate in the resized tensor to the coordinate in the original tensor.
The coordinate of each dimension is transformed individually. Let's describe a case using axis x as an example. Denote
x_resizedas the coordinate of axis x in the resized tensor,x_originalas the coordinate of axis x in the original tensor,length_originalas the length of the original tensor in axis x,length_resizedas the length of the resized tensor in axis x,scale = length_resized / length_original,output_widththe target length on the axis x which can be a fractional number when it is calculated out of a scale factor, andoutput_width_intthe effective output width as an integer.if coordinate_transformation_mode is
"half_pixel",x_original = (x_resized + 0.5) / scale - 0.5if coordinate_transformation_mode is
"half_pixel_symmetric",adjustment = output_width_int / output_width center = input_width / 2 offset = center * (1 - adjustment) x_ori = offset + (x + 0.5) / scale - 0.5if coordinate_transformation_mode is
"pytorch_half_pixel",x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0if coordinate_transformation_mode is
"align_corners",x_original = x_resized * (length_original - 1) / (length_resized - 1)if coordinate_transformation_mode is
"asymmetric",x_original = x_resized / scaleif coordinate_transformation_mode is
"tf_crop_and_resize",x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1).
- cubic_coeff_a : float
- The coefficient 'a' used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if mode is "cubic".
- exclude_outside : int
- If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0.
- extrapolation_value : float
- When coordinate_transformation_mode is "tf_crop_and_resize" and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f.
- keep_aspect_ratio_policy : string
-
This attribute describes how to interpret the `sizes` input with regard to keeping the original aspect ratio of the input, and it is not applicable when
the `scales` input is used.
Given a set of
sizes, associated with a subset ofaxes(explicitly provided or default), and assumingd = axes[i], withibeing the index of the providedsizes.If
keep_aspect_ratio_policyis"stretch", the original aspect ratio is disregarded, and the input is resized to the specified size:out_size[d] = sizes[i]If
keep_aspect_ratio_policyis"not_larger", the sizes are adjusted so that no extent of the output is larger than the specified size, while keeping the original aspect ratio:scale = Min(sizes[i] / in_size[d]) out_size[d] = round_int(scale * in_size[i])If
keep_aspect_ratio_policyis"not_smaller", the sizes are adjusted so that no extent of the output is smaller than the specified size, while keeping the original aspect ratio:scale = Max(sizes[i] / in_size[d]) out_size[d] = round_int(scale * in_size[i])For non-resizable axes (those not specified in
axes), the output size will be equal to the input size.Note:
round_intstands for computing the nearest integer value, rounding halfway cases up. - mode : string
- Three interpolation modes: "nearest" (default), "linear" and "cubic". The "linear" mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The "cubic" mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor).
- nearest_mode : string
- Four modes: "round_prefer_floor" (default, as known as round half down), "round_prefer_ceil" (as known as round half up), "floor", "ceil". Only used by nearest interpolation. It indicates how to get "nearest" pixel in input tensor from x_original, so this attribute is valid only if "mode" is "nearest".
Inputs (1 - 4)
- X : T1
- roi (optional) : T2
- scales (optional) : tensor(float)
- sizes (optional) : tensor(int64)
Outputs
- Y : T1
Type Constraints
- T1 : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Constrain input 'X' and output 'Y' to all tensor types.
- T2 : tensor(float16), tensor(float), tensor(double)
- Constrain roi type to float or double.
com.ms.internal.nhwc.SpaceToDepth
Version
This version of the operator has been available since version 13 of the 'com.ms.internal.nhwc' operator set.
Other versions of this operator: com.ms.internal.nhwc.SpaceToDepth-1
Attributes
- blocksize : int (required)
- Blocks of [blocksize, blocksize] are moved.
Inputs
- input : T
Outputs
- output : T
Type Constraints
- T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(bfloat16), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
- Constrain input and output types to all tensor types.