With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740
Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.
This change is a workaround to manually add `-O3` for "Release" Android builds.
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.
* make memory profiler work with multiple session runs.
(cherry picked from commit 5b636b4dd6fe91b75c063696dc73eda33ec36c8d)
* minor fix
* fix build
* fix window build
* 1. fix cpplint issues;
2. give unique filesname for each session profiler result.
Add csharp\sample\InferenceSample\Microsoft.ML.OnnxRuntime.InferenceSample.Maui so we have an equivalent setup for MAUI as for the other platforms.
This provides a setup to do some basic local testing of using an InferenceSession in a MAUI app.
* update to 2022
* Update the VS version
* Rolling back to gcc 10
* Rolling back
* Update cuda home
* remove "CMAKE_CUDA_ARCHITECTURES=52"
* update cuda Architure to 70
* Delete cuda 10.2 training pipeline
* rolling back a mistake
* Update win-gpu-reduce-op-ci-pipeline.yml
* Update win-gpu-reduce-op-ci-pipeline.yml
* Update win-gpu-reduce-op-ci-pipeline.yml
* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.10.0_cu10.2 directory
* Delete tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_cu10.2 directory
Current builds use a NDK version that happens to be on the build machine. The build machine environment may change in ways that are outside of our control.
This change installs a specific version of NDK (the current LTS version 25.0.8775105) and uses it.
Convert DQ node with const weight tensor int8 to uint8
This is a follow-up with #12088, where convert weight tensor from int8 to uint8. Here we do the same thing in DequantizeLinear node, so that we don't have to perform the same changes for every single future operator.
* op version check during fusion
* Revert "op version check during fusion"
This reverts commit cacc8f50ea36b08b73a98a81ca0ae3f84782435f.
* add pow-15 for fusion
Description: Reduce CI noise from Python lint
Motivation and Context
Disable "missing-docstring" in pylint. This is usually noisy in tests
Show only added lint messages only for pyright
Description: Reinstate #11127 with Cuda fix.
Motivation and Context
Fixes Inference on Onnx with external data not working since PR 11320 (location planning logic) #11511
* Remove hand written add_.Tensor as it can now be generated.
* Generate .out for tensor version of basic math ops. Add.out testing added too.
* Remove sin tests as they are covered by parameterized tests. Also, moved all parameterized tests to the end in their own section.
* Add binary ops tests for tensors. Scalar tests are calling the aten .out which is for tensor.
* Add support for scalar input to add, div, mul, and sub.
* Apply project formatting rules to ort_aten.cpp
Formatting applied by formatting the file in VS Code.
This file is under active development and the inconsistent formatting
was causing friction due to:
1. cpplint job on Pipeline was flagging a lot of style issues,
resulting in a lot of noisy annotations.
2. local edits would result in changes that are not part of the core change.
While there are other files in this part of the source tree with
inconsistent formatting, this file was causing the most friction. We can
come back and address the other files later, which would be a much
larger change.
* Apply consistent pattern for invoker.Invoke(...)
* cpu adamwoptimizer implementation
* unit tests for cpu kernel pass
* refine based on comments
* parallize the weights loop in PrepareForCompute.
* fix wrong test data path
* fix kernel hash
* fix rocm ci pipeline
Initialize generated tensor data in onnxruntime_perf_test to zeroes instead of leaving it uninitialized. String tensors were already being initialized.