* [ROCm] Add InstanceNormalization Op
* Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm
* [ROCm] Add BatchNormalization for fp32 and fp16
* Enable BatchNormTest for ROCm
* [ROCm] Add LRN Op
* [ROCM] replace miCompat functions with Helper functions
* MatMulInteger + post op fusion
This fuses MatMulInteger with upto 32 binary/elementwise
operators if running on the oneDNN execution provider.
Signed-off-by: George Nash <george.nash@intel.com>
* Remove the un-needed transformer
The MatMulIntegerToFloat transformer is not needed since
the transform done is handled by the MatMulIntegerBinaryEltwise
transformer code.
Signed-off-by: George Nash <george.nash@intel.com>
* Refactor of the post op trasformer code
This separates the code that finds the post op
nodes for MatMul and MatMulInteger to reduce code
repetition.
Signed-off-by: George Nash <george.nash@intel.com>
* Minor cleanup based on cpplint
resolved unused-variable build failure
Signed-off-by: George Nash <george.nash@intel.com>
Losen the following test timeout:
1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min
2. Node.js binding default per-case timeout: 30 sec -> 90 sec
using ensureSymlinkSync might have issues with permissions when using 'dir' - changed to 'junction' to avoid this.
If the folder generation fails it will cause the test to fails as well.
* Make multiple-level nested control flow op model work
* find correct input index
* find correct input index (cont.)
* enable nested layer unit tests for TRT EP
* add comment
* add Scan op to current workaround support of control flow op
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740
Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.
This change is a workaround to manually add `-O3` for "Release" Android builds.
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.
* make memory profiler work with multiple session runs.
(cherry picked from commit 5b636b4dd6fe91b75c063696dc73eda33ec36c8d)
* minor fix
* fix build
* fix window build
* 1. fix cpplint issues;
2. give unique filesname for each session profiler result.
Add csharp\sample\InferenceSample\Microsoft.ML.OnnxRuntime.InferenceSample.Maui so we have an equivalent setup for MAUI as for the other platforms.
This provides a setup to do some basic local testing of using an InferenceSession in a MAUI app.
* update to 2022
* Update the VS version
* Rolling back to gcc 10
* Rolling back
* Update cuda home
* remove "CMAKE_CUDA_ARCHITECTURES=52"
* update cuda Architure to 70
* Delete cuda 10.2 training pipeline
* rolling back a mistake
* Update win-gpu-reduce-op-ci-pipeline.yml
* Update win-gpu-reduce-op-ci-pipeline.yml
* Update win-gpu-reduce-op-ci-pipeline.yml
* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.10.0_cu10.2 directory
* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_cu10.2 directory
Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control.
This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it.
Convert DQ node with const weight tensor int8 to uint8
This is a follow-up with #12088, where convert weight tensor from int8 to uint8. Here we do the same thing in DequantizeLinear node, so that we don't have to perform the same changes for every single future operator.
* op version check during fusion
* Revert "op version check during fusion"
This reverts commit cacc8f50ea36b08b73a98a81ca0ae3f84782435f.
* add pow-15 for fusion
Description: Reduce CI noise from Python lint
Motivation and Context
Disable "missing-docstring" in pylint. This is usually noisy in tests
Show only added lint messages only for pyright