* Make QuantizeLinear support half
* remove unnessary type constraint
* refine kernel definition
* add fp16 support for dequantizelinear
* diable QuantizeLinear_per_tensor_half_int8 for tensorrt
* refine unit test and fix saturate issue for MSDomain QuantizeLinear
* fix build break
* include tensorrt for half_uint8 test
* Migrate winml to Microsoft Namespace (packaging changes are pending)
* add ns_prefix toggle
* fix packaging
* Users/sheilk/add missing raw header (#3484)
* add dualapipartition
* wrong variable for repo root
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* remove existence check to force failures
* extra paren
* dualapipartition needs to be referenced from the source
* add microsoft.ai.machinelearning.dll to the output dir
* rename the idl file so that assembly info is correctly added into the winmd
* fix namespaces
* update namespaces
* default to microsoft, and add namespace override as build argument
* update cmakesetings.json as well
* remove from cmakelists.txt
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
* Fixed cornercases for acl ep gemm implementation by setting fully connected as the main layer
* Introduced versioned build for the acl ep. ACL versions supported are 1902, 1905 and 1908
* Added convolution-activation fusion optimization for acl ep. We see improvements of 12% for mobilenetv2 and 4% for resnet50
Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
1. Fix static analysis warnings found by VC++
2. Add a new pipeline for static analysis
3. Merge all the windows CI build into one single yaml file.(Easier to queue them all).
4. Make DNNL build faster by disabling building the tests and examples.
5. Enable custom op unitest.
* Add int64 input type
* Fix for cuda
* Fix linking
* Cuda
* Fixed missing registration
* Fix registeration for opsets 1-11
* Adding reduce_matrix_rows for int64
* Update reduction_functions.cu
* Revert cuda
warn that initializers are in graph input
provide a tool to move initializer out of graph input
Motivation and Context
ONNX model from IR_VERSION 4 only treats initializers that appear in graph input as non-constant. This may fail some of the graph optimizations, like const folding, operator fusion and etc. Warn the case and provide a tool.
* Add flag to enable automatic generation of input for models with tensor inputs
* change wording of variable
* Naming convention changes to variables
* Handle free dimensions
* Comment with default allocator
* variable rename
* Remove input_count
* Cast to size_t to avoid warning
Co-authored-by: Ryan Lai <ryalai96@gamil.com>
This is required for running multithreaded with multi-GPUs. Without it, when running in a work thread it would default to GPU 0, while CUDAExecutionProvider is assigned on other GPUs. That might cause CUDA crash when some CUDA resources is from GPU 0, while being used in GPU N>0.
* Make CDist faster via Eigen squaredNorma and GEMM.
* Add call to abs() as the GEMM output may differ slightly due to floating point accuracy and result in a negative distance which returns NaN if sqrt() is applied to it.
* Update math::Gemm to use the type for alpha and beta instead of hardcoding to float. Matches the GemmEx definition.
* Provide Eigen based replication of the GEMM call on x86 if T=double.
* Make test model data deterministic.
* Do the GEMM first so we can avoid potentially subtracting two numbers that are very close to each other.
* update C# API to optimize inference latency
* rename PinnedOnnxValue to fixedBufferOnnxValue and fix build break
* add more test cases
* add conditions on string tensors for pre-allocated outputs
* change to random inputs
* fix word spell
* resolve comments
* resolve comments
* remove FixedBufferOnnxValueTests.cs
* fix trivial typos in doc
* wangye/pivot (#3432)
* check in
* work version
* add ForecastingPivot kernel
* fix mac os and linux build error
* update FeaturizerLibrary Version
* resolve comments
* remove changes
* Add Kernel for LagLeadOperator & RollingWindowFeaturizer (#3434)
* update
* update todo
* resolve comments
* relax eps for TruncatedSVD transformer
* mute TruncatedSVD_transformer due to undeterministic test result
* resolve comments
* update
* test
* update
* fix