Working on JNI refactor for OnnxTensor.
Simplifying the error handling logic in createTensor.
Collapsing casting branches and migrating to ONNX element type enum.
Disable cpplint for JNI C files.
* adding conditional variable again
* Adding split test cases in python
* Adding python cases for split
* Enable s8s8 split
* Optimize input
* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"
This reverts commit d5e34acb
* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""
This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad.
* format file
* Update c-api-linux-cpu.yml
* Update c-api-linux-cpu.yml
* Update c-api-linux-cpu.yml
* Reformat file
* Reformat file
* format file
* Optimize input
* Remove unused import
* Remove useless init
* Format split.py with black
* set zero point to 0 if all value are 0.0
* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal
* check scale to make sure it is not subnormal
* Workaround false positive error produced by clang
ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name"
where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error
here.
* Split GemmBase RocBlasGemm
* Add composable kernel GEMM baseline
* Make linter happy
* Address review comment
* Update bert cases with batchsize
* Adjust includes to fix IWYU lint
* Only builds and links used ck kernels to improve building time
* Remove warmup run on SelectImpl
* Add comment to utility function
* Mute cpplint
* Make RocBlasGemm<T>::SelectImpl semantically correct
* Add reduced basic test cases for ck gemm
* More robust gemm testing
* Fix warnings
* Fix grammar
Fix comparison of path characters when checking for ".ort" suffix.
Some clean up of InferenceSession Load functions.
- Reduce duplication between std::string/std::wstring versions.
- Renaming for clarity.
* first draft
* plus fixes
* plus more links
* Plus updates per review
* plus more clarifications
* plus updates
* plus more nit fixes
* plus some additions
* Update to handle multiline declarations for the kernels which are typical these days.
* Update to new path for the cpu contrib_op kernel registrations.
* Update tools/python/find_optimizer_opset_version_updates_required.py
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
* [ROCm] Add InstanceNormalization Op
* Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm
* [ROCm] Add BatchNormalization for fp32 and fp16
* Enable BatchNormTest for ROCm
* [ROCm] Add LRN Op
* [ROCM] replace miCompat functions with Helper functions
* MatMulInteger + post op fusion
This fuses MatMulInteger with upto 32 binary/elementwise
operators if running on the oneDNN execution provider.
Signed-off-by: George Nash <george.nash@intel.com>
* Remove the un-needed transformer
The MatMulIntegerToFloat transformer is not needed since
the transform done is handled by the MatMulIntegerBinaryEltwise
transformer code.
Signed-off-by: George Nash <george.nash@intel.com>
* Refactor of the post op trasformer code
This separates the code that finds the post op
nodes for MatMul and MatMulInteger to reduce code
repetition.
Signed-off-by: George Nash <george.nash@intel.com>
* Minor cleanup based on cpplint
resolved unused-variable build failure
Signed-off-by: George Nash <george.nash@intel.com>
Losen the following test timeout:
1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min
2. Node.js binding default per-case timeout: 30 sec -> 90 sec
using ensureSymlinkSync might have issues with permissions when using 'dir' - changed to 'junction' to avoid this.
If the folder generation fails it will cause the test to fails as well.
* Make multiple-level nested control flow op model work
* find correct input index
* find correct input index (cont.)
* enable nested layer unit tests for TRT EP
* add comment
* add Scan op to current workaround support of control flow op
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740
Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.
This change is a workaround to manually add `-O3` for "Release" Android builds.
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.