Commit graph

7166 commits

Author SHA1 Message Date
Adam Pocock
8a86b346a5
[Java] JNI refactor for ONNX Tensor (#12281)
Working on JNI refactor for OnnxTensor.
  Simplifying the error handling logic in createTensor.
  Collapsing casting branches and migrating to ONNX element type enum.
  Disable cpplint for JNI C files.
2022-08-08 12:48:30 -07:00
Jian Chen
8c5c283471
new quantized operators split (#12495)
* adding conditional variable again

* Adding split test cases in python

* Adding python cases for split

* Enable s8s8 split

* Optimize input

* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"

This reverts commit d5e34acb

* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""

This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad.

* format file

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Reformat file

* Reformat file

* format file

* Optimize input

* Remove unused import

* Remove useless init

* Format split.py with black
2022-08-08 15:12:09 -04:00
cloudhan
9c05577021
Fix various warning in kernel explorer (#12501)
Fix various warning
2022-08-08 11:15:41 -07:00
Yufeng Li
bdd6b00c9a
set zero point to 0 if all value are 0.0 (#12470)
* set zero point to 0 if all value are 0.0

* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal

* check scale to make sure it is not subnormal
2022-08-07 21:34:58 -07:00
cloudhan
ddea1e48df
Avoid false-positive dependent name lookup error by not depending on auto keyword (#12483)
* Workaround false positive error produced by clang

ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name"
where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error
here.
2022-08-08 10:32:01 +08:00
Dwayne Robinson
eb90b52a75
DML EP fix training build error (#12461)
Fix onnxruntime_training.cmake missing linkage issue
2022-08-05 16:01:25 -07:00
Vincent Wang
e85e31ee80
Update ORTModule Default Opset Version to 15 (#12419)
* update ortmodule opset to 15

* update torch version

* fix ut

* fix ut

* rollback

* rollback for orttrainer
2022-08-05 16:55:04 +08:00
Baiju Meswani
a7d6290774
CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412) 2022-08-04 22:28:28 -07:00
PeixuanZuo
3e1b0ac4b3
[DELETE] delete python package rocm4.3.1 (#12480)
[delete] delete rocm4.3.1
2022-08-05 13:27:42 +08:00
ytaous
b879dca51c
Fix Python Packaging CI (Rocm) (#12477)
Fix Python Packaging CI

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-04 20:40:09 -07:00
Scott McKay
8d830adf24
Rework parts of Graph::Resolve to reduce memory usage (#12176)
* Rework some aspects of Graph::Resolve to reduce memory usage.
2022-08-05 13:20:25 +10:00
cloudhan
f39354d7cb
Add composable kernel GEMM baseline for kernel explorer (#12364)
* Split GemmBase RocBlasGemm

* Add composable kernel GEMM baseline

* Make linter happy

* Address review comment

* Update bert cases with batchsize

* Adjust includes to fix IWYU lint

* Only builds and links used ck kernels to improve building time

* Remove warmup run on SelectImpl

* Add comment to utility function

* Mute cpplint

* Make RocBlasGemm<T>::SelectImpl semantically correct

* Add reduced basic test cases for ck gemm

* More robust gemm testing

* Fix warnings

* Fix grammar
2022-08-04 17:32:20 -07:00
Vincent Wang
37995a7245
[CUDA] BiasSoftmax Supporting New Pattern (#12361) 2022-08-05 06:59:24 +08:00
LironKesem
d452462b5e
Lironkesem/unsqueeze_and_squeeze (#12421) 2022-08-04 15:12:34 -04:00
Dmitri Smirnov
a4ef0e7f7b
Remove dynamic allocation for ThreadPool ParallelSection (#12429)
Use InlinedVector in a TP
Store per thread parallel section in std::optional and avoid memory allocation
2022-08-04 09:46:16 -07:00
Yufeng Li
ac10f33d2d
Enable quant op to share quantization parameter between input and ouput (#12408)
* share quant param between tensors
2022-08-03 21:25:35 -07:00
Ryan Hill
52d4699788
Minor doc fixes (#12388) 2022-08-03 19:47:36 -07:00
Edward Chen
3efd9a73bb
Refactor InferenceSession Load member functions. (#12430)
Fix comparison of path characters when checking for ".ort" suffix.

Some clean up of InferenceSession Load functions.
- Reduce duplication between std::string/std::wstring versions.
- Renaming for clarity.
2022-08-03 16:28:26 -07:00
Ashwini Khade
97268e023c
dev notes for layout transformer (#12396)
* first draft

* plus fixes

* plus more links

* Plus updates per review

* plus more clarifications

* plus updates

* plus more nit fixes

* plus some additions
2022-08-03 15:15:59 -07:00
Scott McKay
a3de1bbf7d
Update script to find optimizers that potentially need supported opset updates (#12330)
* Update to handle multiline declarations for the kernels which are typical these days.
* Update to new path for the cpu contrib_op kernel registrations.
* Update tools/python/find_optimizer_opset_version_updates_required.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-08-04 07:37:27 +10:00
Xinya Zhang
77cab7a3a5
[ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops (#11968)
* [ROCm] disable expected failure tests PoolTest.MaxPool_10_DilationPadding_?d

* [ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops

* (To squash after review) Replace rocm/nn/pool.cc with amd_hipify.py changes

* [ROCM] Replace miCompat with Helper functions

* (to squash) fix the compiling error of SetPoolingNdDescriptorHelper
2022-08-03 14:36:36 -07:00
Erick Muñoz
d1497bdf62
[oneDNN EP] Optimized DynamicQuantizeLinear operator (#12403)
* Removed unnecesary reorders
* Removed unnecesary element wise clip
2022-08-03 12:36:42 -07:00
Baiju Meswani
7f58bd7236
Perform graph transformations during offline tooling (#12422) 2022-08-03 11:27:12 -07:00
Dmitri Smirnov
dc984a03d5
Container and memory allocation guidelines (#12387)
Container and memory allocation guidelines
  Re-org and add code samples
  Clarify the wording on returning gsl::span
2022-08-03 10:31:59 -07:00
Tianlei Wu
97a340bf48
Fix integer overflow in LongformerAttention (#12435)
fix integer overflow
2022-08-03 10:29:07 -07:00
Changming Sun
44ec2cf088
Update publish-python-apidocs.yml (#12433) 2022-08-03 10:17:00 -07:00
Ye Wang
b622e5fa9b
Support vocab_mask/prefix_vocab_mask/no_repeat_number in greedysearch op (#12327)
* support more inputs for greedy search

* fix docs

* refactor test

* lint

* review comments
2022-08-03 10:10:08 -07:00
Xinya Zhang
01f3a197d7
[ROCm] InstanceNormalization, BatchNormalization and LRN Ops (#11972)
* [ROCm] Add InstanceNormalization Op

* Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm

* [ROCm] Add BatchNormalization for fp32 and fp16

* Enable BatchNormTest for ROCm

* [ROCm] Add LRN Op

* [ROCM] replace miCompat functions with Helper functions
2022-08-02 23:14:26 -07:00
Vincent Wang
99d2a63e1a
Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432)
add seed
2022-08-03 13:29:30 +08:00
George Nash
26dc09417b
[oneDNN ep] matmulinteger postop fusion (#12354)
* MatMulInteger + post op fusion

This fuses MatMulInteger with upto 32 binary/elementwise
operators if running on the oneDNN execution provider.

Signed-off-by: George Nash <george.nash@intel.com>

* Remove the un-needed transformer

The MatMulIntegerToFloat transformer is not needed since
the transform done is handled by the MatMulIntegerBinaryEltwise
transformer code.

Signed-off-by: George Nash <george.nash@intel.com>

* Refactor of the post op trasformer code

This separates the code that finds the post op
nodes for MatMul and MatMulInteger to reduce code
repetition.

Signed-off-by: George Nash <george.nash@intel.com>

* Minor cleanup based on cpplint

resolved unused-variable build failure

Signed-off-by: George Nash <george.nash@intel.com>
2022-08-02 20:42:34 -07:00
Changming Sun
5d610bc8eb
Disable CG task in PR pipelines (#12426) 2022-08-02 19:01:41 -07:00
Yulong Wang
feed5da435
[js] loosen test timeout (#12427)
Losen the following test timeout:

1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min
2. Node.js binding default per-case timeout: 30 sec -> 90 sec
2022-08-02 19:01:19 -07:00
smrkatte
54d5e86981
Add cast before copy for dissimilar scalar type (#12391)
* Add proper cast/copy callflow for ORT and non-ORT devices
2022-08-02 18:32:58 -07:00
Yulong Wang
c9e0d0f8b6
[js/node] upgrade terser version (#12351) 2022-08-02 15:50:44 -07:00
Changming Sun
1a64b94f60
Fix a small issue in nuget packaging pipeline (#12405)
In #12358 I typed a wrong path in the yaml file.
2022-08-02 15:44:43 -07:00
Dmitri Smirnov
eebaf5f270
Adjust and fixx abseil-cpp debugging visualization (#12415)
Move abseil-cpp.natvis file, add it to PDB, adjust visualization
2022-08-02 15:08:17 -07:00
shalvamist
ca6b4221fe
[js] Bug fix - permission issue with ensureSymlinkSync (#12369)
using ensureSymlinkSync might have issues with permissions when using 'dir' - changed to 'junction' to avoid this. 
If the folder generation fails it will cause the test to fails as well.
2022-08-02 12:21:31 -07:00
Chi Lo
b39257a5e6
Enable support of multi-level nested control flow ops model for TRT EP (#12147)
* Make multiple-level nested control flow op model work

* find correct input index

* find correct input index (cont.)

* enable nested layer unit tests for TRT EP

* add comment

* add Scan op to current workaround support of control flow op
2022-08-01 23:57:30 -07:00
Chi Lo
de3a91d85d
Revert TRT EP cache refactoring (#12376)
* revert cache refactor

* fix conflicts when reverting
2022-08-01 23:57:05 -07:00
Yi Zhang
5d1173fe68
Run IOS pipeline concurrently (#12400)
split ios pipelines
2022-08-02 11:07:17 +08:00
Yi Zhang
63d64636f6
Add the comment linking to wiki (#12398)
add the comment
2022-08-02 10:09:16 +08:00
LironKesem
315e006532
adding a comment on nll_loss_forward.output that can not be implemented (#12406)
adding a comment on nll_loss_forward.output that can not be implemented
2022-08-01 19:12:35 -04:00
msftlincoln
62922f4c3c
Eager Mode generator: add comments, rename functions (#12385)
* eager generator: add comments, rename functions

* lint
2022-08-01 15:52:47 -04:00
Edward Chen
f77ab4fea6
Manually add optimization flag for Android Release builds. (#12390)
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740

Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.

This change is a workaround to manually add `-O3` for "Release" Android builds.
2022-08-01 12:49:03 -07:00
George Wu
6bb807ef74
add cuda compute 8.7 to Cmakelists.txt to support Nvidia Orin devices (#12377)
* add cuda arch 8.7 to cmakelists.txt to support Nvidia Orin devices

* add cuda version >= 11 check for orin support
2022-08-01 09:45:58 -07:00
Cheng
3f66297499
code clean (#12392)
* code clean

* mispelling fix
2022-08-01 14:12:35 +08:00
Valery Chernov
1a4868e5c4
[TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381)
fix of build on Windows with ipp-crypto. cmake warnings fix

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-31 14:36:54 +02:00
Yi Zhang
8b4ad77ea2
pipeline can use last run's artifacts (#12379)
* first step

* depends on stage

* temp change

* specific

* runId

* parameters

* fix typo

* fix typo

* add nnapi

* add nnapi

* fix typo

* minor fix

* condition on stage

* format

* format
2022-07-30 21:34:57 +08:00
pengwa
6d1eb9509e
Refine gradient accumulation (on device training) (#12363)
* a

(cherry picked from commit 43909cdd6e3daf30a82d584292286806d1172a0b)

* optimize inplace accumulator a bit

* fix inputs

* revert logging

* minor fix

* tune perf and resolve comments

* typo

* fix

* fix tests

* move threshold to constexpr.
2022-07-30 10:24:01 +08:00
Changming Sun
7b4ce0c1e1
Delete the build scripts that were copied from manylinux project (#12358)
1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead.
2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343
3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR)
4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.
2022-07-29 18:24:19 -07:00