Commit graph

7169 commits

Author SHA1 Message Date
pengwa
a2dc3e9eac
Improve the compilation speed when compiling for multiple architectures. (#12490)
* improve the compilation speed when compiling for multiple architectures.

* formatting

* fix

* use 0 by default

* fix comments
2022-08-09 11:52:26 +08:00
Scott McKay
56bd96a3f5
Incrementally free initializers while saving to OrtValue instances (#12485)
* Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage.

Co-authored-by: Pranav Sharma <prs@microsoft.com>
2022-08-09 10:59:10 +10:00
Hector Li
730240d2a5
remove the link the comments (#12510) 2022-08-08 15:20:40 -07:00
Adam Pocock
8a86b346a5
[Java] JNI refactor for ONNX Tensor (#12281)
Working on JNI refactor for OnnxTensor.
  Simplifying the error handling logic in createTensor.
  Collapsing casting branches and migrating to ONNX element type enum.
  Disable cpplint for JNI C files.
2022-08-08 12:48:30 -07:00
Jian Chen
8c5c283471
new quantized operators split (#12495)
* adding conditional variable again

* Adding split test cases in python

* Adding python cases for split

* Enable s8s8 split

* Optimize input

* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"

This reverts commit d5e34acb

* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""

This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad.

* format file

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Reformat file

* Reformat file

* format file

* Optimize input

* Remove unused import

* Remove useless init

* Format split.py with black
2022-08-08 15:12:09 -04:00
cloudhan
9c05577021
Fix various warning in kernel explorer (#12501)
Fix various warning
2022-08-08 11:15:41 -07:00
Yufeng Li
bdd6b00c9a
set zero point to 0 if all value are 0.0 (#12470)
* set zero point to 0 if all value are 0.0

* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal

* check scale to make sure it is not subnormal
2022-08-07 21:34:58 -07:00
cloudhan
ddea1e48df
Avoid false-positive dependent name lookup error by not depending on auto keyword (#12483)
* Workaround false positive error produced by clang

ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name"
where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error
here.
2022-08-08 10:32:01 +08:00
Dwayne Robinson
eb90b52a75
DML EP fix training build error (#12461)
Fix onnxruntime_training.cmake missing linkage issue
2022-08-05 16:01:25 -07:00
Vincent Wang
e85e31ee80
Update ORTModule Default Opset Version to 15 (#12419)
* update ortmodule opset to 15

* update torch version

* fix ut

* fix ut

* rollback

* rollback for orttrainer
2022-08-05 16:55:04 +08:00
Baiju Meswani
a7d6290774
CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412) 2022-08-04 22:28:28 -07:00
PeixuanZuo
3e1b0ac4b3
[DELETE] delete python package rocm4.3.1 (#12480)
[delete] delete rocm4.3.1
2022-08-05 13:27:42 +08:00
ytaous
b879dca51c
Fix Python Packaging CI (Rocm) (#12477)
Fix Python Packaging CI

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-08-04 20:40:09 -07:00
Scott McKay
8d830adf24
Rework parts of Graph::Resolve to reduce memory usage (#12176)
* Rework some aspects of Graph::Resolve to reduce memory usage.
2022-08-05 13:20:25 +10:00
cloudhan
f39354d7cb
Add composable kernel GEMM baseline for kernel explorer (#12364)
* Split GemmBase RocBlasGemm

* Add composable kernel GEMM baseline

* Make linter happy

* Address review comment

* Update bert cases with batchsize

* Adjust includes to fix IWYU lint

* Only builds and links used ck kernels to improve building time

* Remove warmup run on SelectImpl

* Add comment to utility function

* Mute cpplint

* Make RocBlasGemm<T>::SelectImpl semantically correct

* Add reduced basic test cases for ck gemm

* More robust gemm testing

* Fix warnings

* Fix grammar
2022-08-04 17:32:20 -07:00
Vincent Wang
37995a7245
[CUDA] BiasSoftmax Supporting New Pattern (#12361) 2022-08-05 06:59:24 +08:00
LironKesem
d452462b5e
Lironkesem/unsqueeze_and_squeeze (#12421) 2022-08-04 15:12:34 -04:00
Dmitri Smirnov
a4ef0e7f7b
Remove dynamic allocation for ThreadPool ParallelSection (#12429)
Use InlinedVector in a TP
Store per thread parallel section in std::optional and avoid memory allocation
2022-08-04 09:46:16 -07:00
Yufeng Li
ac10f33d2d
Enable quant op to share quantization parameter between input and ouput (#12408)
* share quant param between tensors
2022-08-03 21:25:35 -07:00
Ryan Hill
52d4699788
Minor doc fixes (#12388) 2022-08-03 19:47:36 -07:00
Edward Chen
3efd9a73bb
Refactor InferenceSession Load member functions. (#12430)
Fix comparison of path characters when checking for ".ort" suffix.

Some clean up of InferenceSession Load functions.
- Reduce duplication between std::string/std::wstring versions.
- Renaming for clarity.
2022-08-03 16:28:26 -07:00
Ashwini Khade
97268e023c
dev notes for layout transformer (#12396)
* first draft

* plus fixes

* plus more links

* Plus updates per review

* plus more clarifications

* plus updates

* plus more nit fixes

* plus some additions
2022-08-03 15:15:59 -07:00
Scott McKay
a3de1bbf7d
Update script to find optimizers that potentially need supported opset updates (#12330)
* Update to handle multiline declarations for the kernels which are typical these days.
* Update to new path for the cpu contrib_op kernel registrations.
* Update tools/python/find_optimizer_opset_version_updates_required.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-08-04 07:37:27 +10:00
Xinya Zhang
77cab7a3a5
[ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops (#11968)
* [ROCm] disable expected failure tests PoolTest.MaxPool_10_DilationPadding_?d

* [ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops

* (To squash after review) Replace rocm/nn/pool.cc with amd_hipify.py changes

* [ROCM] Replace miCompat with Helper functions

* (to squash) fix the compiling error of SetPoolingNdDescriptorHelper
2022-08-03 14:36:36 -07:00
Erick Muñoz
d1497bdf62
[oneDNN EP] Optimized DynamicQuantizeLinear operator (#12403)
* Removed unnecesary reorders
* Removed unnecesary element wise clip
2022-08-03 12:36:42 -07:00
Baiju Meswani
7f58bd7236
Perform graph transformations during offline tooling (#12422) 2022-08-03 11:27:12 -07:00
Dmitri Smirnov
dc984a03d5
Container and memory allocation guidelines (#12387)
Container and memory allocation guidelines
  Re-org and add code samples
  Clarify the wording on returning gsl::span
2022-08-03 10:31:59 -07:00
Tianlei Wu
97a340bf48
Fix integer overflow in LongformerAttention (#12435)
fix integer overflow
2022-08-03 10:29:07 -07:00
Changming Sun
44ec2cf088
Update publish-python-apidocs.yml (#12433) 2022-08-03 10:17:00 -07:00
Ye Wang
b622e5fa9b
Support vocab_mask/prefix_vocab_mask/no_repeat_number in greedysearch op (#12327)
* support more inputs for greedy search

* fix docs

* refactor test

* lint

* review comments
2022-08-03 10:10:08 -07:00
Xinya Zhang
01f3a197d7
[ROCm] InstanceNormalization, BatchNormalization and LRN Ops (#11972)
* [ROCm] Add InstanceNormalization Op

* Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm

* [ROCm] Add BatchNormalization for fp32 and fp16

* Enable BatchNormTest for ROCm

* [ROCm] Add LRN Op

* [ROCM] replace miCompat functions with Helper functions
2022-08-02 23:14:26 -07:00
Vincent Wang
99d2a63e1a
Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432)
add seed
2022-08-03 13:29:30 +08:00
George Nash
26dc09417b
[oneDNN ep] matmulinteger postop fusion (#12354)
* MatMulInteger + post op fusion

This fuses MatMulInteger with upto 32 binary/elementwise
operators if running on the oneDNN execution provider.

Signed-off-by: George Nash <george.nash@intel.com>

* Remove the un-needed transformer

The MatMulIntegerToFloat transformer is not needed since
the transform done is handled by the MatMulIntegerBinaryEltwise
transformer code.

Signed-off-by: George Nash <george.nash@intel.com>

* Refactor of the post op trasformer code

This separates the code that finds the post op
nodes for MatMul and MatMulInteger to reduce code
repetition.

Signed-off-by: George Nash <george.nash@intel.com>

* Minor cleanup based on cpplint

resolved unused-variable build failure

Signed-off-by: George Nash <george.nash@intel.com>
2022-08-02 20:42:34 -07:00
Changming Sun
5d610bc8eb
Disable CG task in PR pipelines (#12426) 2022-08-02 19:01:41 -07:00
Yulong Wang
feed5da435
[js] loosen test timeout (#12427)
Losen the following test timeout:

1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min
2. Node.js binding default per-case timeout: 30 sec -> 90 sec
2022-08-02 19:01:19 -07:00
smrkatte
54d5e86981
Add cast before copy for dissimilar scalar type (#12391)
* Add proper cast/copy callflow for ORT and non-ORT devices
2022-08-02 18:32:58 -07:00
Yulong Wang
c9e0d0f8b6
[js/node] upgrade terser version (#12351) 2022-08-02 15:50:44 -07:00
Changming Sun
1a64b94f60
Fix a small issue in nuget packaging pipeline (#12405)
In #12358 I typed a wrong path in the yaml file.
2022-08-02 15:44:43 -07:00
Dmitri Smirnov
eebaf5f270
Adjust and fixx abseil-cpp debugging visualization (#12415)
Move abseil-cpp.natvis file, add it to PDB, adjust visualization
2022-08-02 15:08:17 -07:00
shalvamist
ca6b4221fe
[js] Bug fix - permission issue with ensureSymlinkSync (#12369)
using ensureSymlinkSync might have issues with permissions when using 'dir' - changed to 'junction' to avoid this. 
If the folder generation fails it will cause the test to fails as well.
2022-08-02 12:21:31 -07:00
Chi Lo
b39257a5e6
Enable support of multi-level nested control flow ops model for TRT EP (#12147)
* Make multiple-level nested control flow op model work

* find correct input index

* find correct input index (cont.)

* enable nested layer unit tests for TRT EP

* add comment

* add Scan op to current workaround support of control flow op
2022-08-01 23:57:30 -07:00
Chi Lo
de3a91d85d
Revert TRT EP cache refactoring (#12376)
* revert cache refactor

* fix conflicts when reverting
2022-08-01 23:57:05 -07:00
Yi Zhang
5d1173fe68
Run IOS pipeline concurrently (#12400)
split ios pipelines
2022-08-02 11:07:17 +08:00
Yi Zhang
63d64636f6
Add the comment linking to wiki (#12398)
add the comment
2022-08-02 10:09:16 +08:00
LironKesem
315e006532
adding a comment on nll_loss_forward.output that can not be implemented (#12406)
adding a comment on nll_loss_forward.output that can not be implemented
2022-08-01 19:12:35 -04:00
msftlincoln
62922f4c3c
Eager Mode generator: add comments, rename functions (#12385)
* eager generator: add comments, rename functions

* lint
2022-08-01 15:52:47 -04:00
Edward Chen
f77ab4fea6
Manually add optimization flag for Android Release builds. (#12390)
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740

Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.

This change is a workaround to manually add `-O3` for "Release" Android builds.
2022-08-01 12:49:03 -07:00
George Wu
6bb807ef74
add cuda compute 8.7 to Cmakelists.txt to support Nvidia Orin devices (#12377)
* add cuda arch 8.7 to cmakelists.txt to support Nvidia Orin devices

* add cuda version >= 11 check for orin support
2022-08-01 09:45:58 -07:00
Cheng
3f66297499
code clean (#12392)
* code clean

* mispelling fix
2022-08-01 14:12:35 +08:00
Valery Chernov
1a4868e5c4
[TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381)
fix of build on Windows with ipp-crypto. cmake warnings fix

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-31 14:36:54 +02:00