* Force Windows AI Nuget pipeline to use 19041 Windows SDK as 22000 casues a downlevel regression by importing LoadLibraryW
* move into quotes
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Add full iOS job in package pipeline (#9036)
* Add full ios xcframework job
* create zip file of the xcframework
* Bump up TVM version to avoid conflict with existing one (#9159)
* Bump up tvm version
* Bump up onnxruntime-tvm version
There are some c++17 related fixes in TVM
Co-authored-by: KeDengMS <kedeng@microsoft.com>
* fix bug introduced by PR9130 (#9166)
* make uwp store apps link to statically-linked crt desktop builds (#9182)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* #9182 removed the `--is_store_build` option but one place where that was used was missed. (#9219)
This should fix the relevant packaging pipelines.
* DirectML.dll load fails when executable path contains Non-English characters (#9229)
* enable unicode dml
* add wide string L prefix
* Add Fail Fast back
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Fix Android build break after Virtual Environment update to 20210919 (#9163)
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: ke1337 <22626095+ke1337@users.noreply.github.com>
Co-authored-by: KeDengMS <kedeng@microsoft.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
* Revert "Fix nightly CI pipeline to generate ROCm 4.2 wheels and add ROCm 4.3.1 wheels (#9101)"
This reverts commit 47888392ab.
* Add BatchNorm kernel for ROCm (#9014)
* Add BatchNorm kernel for ROCm, update BN test
* correct epsilon_ setting; limit min epsilon
* Upgrade ROCm CI pipeline for ROCm 4.3.1 and permit run inside container (#9070)
* try to run inside 4.3.1 container
* no \ in container run command
* remove networking options
* try with adding video render groups
* add job to build docker image
* try without 1st stage
* change alpha, beta to float
* try adding service connection
* retain huggingface directory
* static video and render gid
* use runtime expression for variables
* install torch-ort
* pin sacrebleu==1.5.1
* update curves for rocm 4.3.1
* try again
* disable determinism and only check tail of loss curve and with a much larger threshold of 0.05
* disable RoBERTa due to high run variablity on ROCm 4.3.1
* put reduction unit tests back in
* Fix nightly CI pipeline to generate ROCm 4.2 wheels and add ROCm 4.3.1 wheels (#9101)
* make work for both rocm 4.2 and rocm 4.3.1
* fix rocm 4.3.1 docker image reference
* fix CUDA_VERSION to ROCM_VERSION
* fix ReduceConsts conflict def
* add ifdef to miopen_common.h as well
* trailing ws
Co-authored-by: wangye <wangye@microsoft.com>
Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>
* Fixing MORE mlas unittest failures in POWER (#8673)
* Ensure ms-experimental domain Audio Ops build in mac pipeline (#8857)
* Globally enable ms-experimental ops
* change meaning of ms_experimental to mean *all* ms_experimental ops. Some experimental ops will still be enabled globally without this flag like audio ops.
* add cmath
* add cmath to signal_defs.cc
* move audio back into experimental, verify on mac
* remove experimental from mac builds
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Remove cpuinfo from WCOS builds (#9076)
* Fix a bug for Openvino Python binding (#9130)
* Fix default initialization value in C API header (#9126)
* fix default initialization value in C API header
* Fix conflicts
* Nits
* Do not generate nuget symbol packages on Linux
* fix name conflict in 1.9 for Fix default initialization value in C API header
* Fix nightly CI pipeline to generate ROCm 4.2 wheels and add ROCm 4.3.1 wheels (#9101)
* make work for both rocm 4.2 and rocm 4.3.1
* fix rocm 4.3.1 docker image reference
* fix CUDA_VERSION to ROCM_VERSION
* fix ReduceConsts conflict def
* add ifdef to miopen_common.h as well
* trailing ws
* remove OrtCUDAProviderOptions() and simply set value
* revert to use custom ctor and fix tests
Co-authored-by: austinpagan <fossum@us.ibm.com>
Co-authored-by: Sheil Kumar <smk2007@gmail.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Tiago Koji Castro Shibata <ticastro@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
* Adding async fetching for webgl backend (#8951)
* Adding async fetching for webgl backend
* fix PR comments and CI failure.
* fixing a bug
* adding a flag
* Enable linking in exception throwing support library when build onnxruntime wasm. (#8973)
* Enable linking in exception throwing support library when build onnxruntime webassembly containing onnxruntime-extensions.
* Add flag in build.py to enable linking exceptions throwing library.
* Update onnxruntime-extensions document and bind custom_ops build flag with use_extensions.
* Update doc.
* Update cgmanifest.json.
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
* Remove document text from error message in a couple of ops (#9003)
* do not add pkg wheel entry to the index html file if it already exists (#9004)
* do not add pkg wheel entry to the index html file if it already exists
* [js/web] fix ort web e2e test (#9025)
* Fix cmake POWER10 detection
Recent commit 60c98a8 changed variable mlas_common_srcs which affects
POWER10 detection.
* Fix Where op type reduction processing (#9033)
* Update type reduction script to track Where Op's second input type.
* Clean up op_kernel_type_control.h includes.
* Use more maintainable include.
* Fix ROCm wheels CI pipeline break by installing latest protobuf from source (#9047)
* install protobuf from source
* fix rm command in Dockerfile
* fix options on rm command
* fix cd into protobuf source directory
* try again
* remove strip step
* debug list the files
* ls on /usr
* more debug
* more debug
* adjust LD_LIBRARY_PATH
* try remove protobuf before ORT build
* [js/web] a bugfix and add tests for wasm proxy worker (#9048)
* [js/web] add tests for wasm proxy worker
* fix script src override
* Set onnxruntime_DISABLE_RTTI to default OFF (#9049)
Co-authored-by: Du Li <duli1@microsoft.com>
Co-authored-by: Zuwei Zhao <4123666+Zuwei-Zhao@users.noreply.github.com>
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: liqun Fu <liqfu@microsoft.com>
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
* fast reduction for reducemean (#8976)
* Adding preprocessor checks for torch version during torch cpp extensions compilation (#8989)
* custom autograd func memory refinement (#8993)
* Release torch tensor referenced by torch gradient graph (created in PythonOp)
* Update orttraining/orttraining/python/training/ortmodule/torch_cpp_extensions/torch_interop_utils/torch_interop_utils.cc
* refine with comments
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
* Fix issues in TensorRT EP (#8996)
* fix big engine load issue and add cuda_cpu_alloc
* remove redundancy
* fix minor issues
* [js/web] fix karma launch with chrome headless (#8998)
* Update Nuget Packge Pipline to CUDA11.4 and TensorRT8 on Windows (#9000)
* Update to CUDA11.4 and TensorRT-8.0.3.4
* update trt pool, remove cudnn from setup_env_gpu.bat
* revert pool
* test gpu package pipeline on t4
* back out changes
* back out changes
Co-authored-by: George Wu <jywu@microsoft.com>
* Fix fuzz testing build blocking release. (#9008)
* add model local function support (#8540)
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* draft - enable model local function
* enable model local functions in ORT
* update to latest rel onnx commit
* plus tests
* plus more updates
* plus updates
* test updates
* Fix for nested functions + shape inference
* plus bug fix and updates per review
* plus fixes per review
* plus test updates
* plus updates per review
* plus fixes
* fix a test
Co-authored-by: Vincent Wang <wangwchpku@outlook.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
* Add netstandard2.0 to nuget managed package.
Re-does PR that was backed out due to packaging pipeline changes.
Allows deprecation of netstandard1.1 in the following release as netstandard2 is the preferred lowest level framework.
* copy changes from trt_and_mem
* second edits
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* change to cuda 11.4
* build with cuda 11.4
* Update Dockerfile.ubuntu_cuda11_1_tensorrt7_2
* add cmake extra defines
* cmake architectures
* fix cmake arch
* Delete ubuntu-18.04.Dockerfile
* Rename Dockerfile.ubuntu_cuda11_1_tensorrt7_2 to Dockerfile.ubuntu_cuda11_4_tensorrt7_2
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* removing previous ort args
* rename to cuda 11.4
* remove cuda 10_2
* delete trt 7.1
* remove 7.1
* Passing in cuda architecture to reduce build time
* always add submodule sync due to recursive cloning
* fix run command
* add and
* take away unused arms and share python installation script
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update Dockerfile.tensorrt
* cleanup file
* install python directly on dockerfile - move to scripts in future
* Update Dockerfile.custom-trt-perf
* adding cuda 11.1 for missing Libnvrtc.so.11.1
* Delete install_python.sh
* test running hf bert-large
* try again
* try again
* include other models
* correct names
* disable deberta-v2-xxlarge
* avoid torch.distributed
* add compare json loss and perf for bert-large to test
* fix sed expression
* remove pytest
* add more models
* move unit tests u
* display samples/sec
The previous attempt to enable static analysis (#8842) didn't actually run the static analysis checks.
- Run clang-tidy directly.
- Address static analysis warnings.
* Expose symbols in onnx and protobuf namespaces in python when building with --enable_external_custom_op_schemas
* Add external onnx and protobuf files to wheel
* Added an example to demonstrate external custom ops use-case
* Added a Linux build pipeline to test external custom ops
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* prepare for PR
* Rename cuda directory to gpu directory in tarball
* Fix gpu java package
* fix bug
* fix small bug
* Add onnxruntime_providers_shared.dll into gpu nuget package
* Modify for test
* Temporarily remove for test
* Modify for test
* Modify for test
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* Test packging Windows combined GPU
* modify for test
* modify for test
* fix bug
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Prepare for PR
* Prepare for PR
* Code refactor
* Rename proper Artifact name
* Rename intermediate Artifact names
* Revert Artifact Names
* Rename Artifact Names
* Modify Artifact name
* Modify Artifact name
* Modify Artifact name
* Update Java package
* Update Java package
* fix bug to change artifact name
* Fix bug for the wrong file path
* Fix no fetching correct artifact and test
* temporarily modify for test
* undo the change for test
* additional changes
* test package run
* minor fix
* minor fix
* minor fix
* Get around no arm64 simulator
* fix objc pod build failure
* downgrade_eigen
* update objc podspec template
* fix build - python.h not found
* disable --build_shared_lib for ortmodule tests
* fix
* fix the build flag
* disable --build_shared_lib for training path (not only for ortmodule)
* fix missing test model files
* disable test CApiTest.test_custom_op_library when ENABLE_TRAINING_TORCH_INTEROP is ON
* enable custom_op_library build
* fix build
* fix
* merge master and fix build failure
* build onnx_test_runner when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is ON
* resolve comments
* use --enable_training_torch_interop to replace "onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON"
* Merge CPU/GPU nuget pipeline
* Include TensorRT EP libraries into existing GPU nuget package pipeline
* modify to use correct YAML
* Modify for test
* modify for test
* Add depedance
* Add depedance (cont.)
* modify for test
* Add create TensorRT nuget package
* modify for test
* modify for test
* Merge CPU/GPU nuget pipeline
* Include TensorRT EP libraries into existing GPU nuget package pipeline
* modify to use correct YAML
* Modify for test
* modify for test
* Add depedance
* Add depedance (cont.)
* modify for test
* Add create TensorRT nuget package
* modify for test
* fix merge bug
* code refactor
* code refactor
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* modify for test
* cleanup
* modify for test
* fix bug
* modify for test
* refactor
* fix bug and test
* Modify for test
* Modify for test
* Modify for test
* Modify for test
* Prepare for PR
* Prepare for PR
* code refacotr from review
* Remove naming 'Microsoft.ML.OnnxRuntime.TensorRT' to avoid confusion
* Add linux TensorRT libraries
* Remove redundant variable in YMAL
* revert file
* undo revert file
* Modify regular expression so that it can capture the correct file
* Remove newline at end of file
* small fix
* Revert to CUDA11.1 on Windows
* Add unit tests for nuget package on Linux
Co-authored-by: Changming Sun <chasun@microsoft.com>
* initial update from 11.1 to 11.4
* change 11.4.1 to 11.4.0
* adjusting to match nvidia/cuda image tags
* adjusting to match nvidia/cuda image tags centos7
* correction to 11.4.0
* correction to 11.4.0
* update to cuda 11.4
* change training back to 11.1
* change training back to 11.1
* point to correct nvcr.io/nvidia/cuda 11.4.1 image
* change centos8 to centos7
* correct cudnn path
* Update linux-gpu-ci-pipeline.yml for Azure Pipelines
* Update c-api-noopenmp-packaging-pipelines.yml
* need to resolve centos images but remove space and change to 11.4
* Update linux-gpu-ci-pipeline.yml
* add cudnn to docker image
* bump devtoolset to 10
* revert cuda 11.4 change to setup_env_trt
* orttraining back to 11.1
* use nvcr.io
* Fix previous change back to cuda 11.1
* update cudnn path
* use cudnn image (revert if failure)