Commit graph

7355 commits

Author SHA1 Message Date
cloudhan
10f9a69707
Use CMake EXCLUDE_FROM_ALL for composable kernels to avoid building of conv related kernels (#12855) 2022-09-14 22:11:31 -07:00
Chun-Wei Chen
d819b56fba
Consume ONNX 1.12.1 to prevent vulnerability issue while loading external file (#12915)
* consume ONNX 1.12.1 to prevent vulnerability issue while loading external tensors

* update ONNX 1.12.1

* test updated PR

* use official rel-1.12.1 commit
2022-09-14 21:10:24 -07:00
PeixuanZuo
3f456a1847
[Update] update rocm5.2.3 (#12942)
* [Update] update rocm5.2.3

* [Update] use rocm docker image as base
2022-09-15 10:41:49 +08:00
Cassie Breviu
5099dda42f
Lint updates csharp docs (#12962)
* fix lint issues on docfx.vendor.js file

* fix ci

* remove submodule

* fix ci

* Update var name to AcceptedList

* remove test branch from ci
2022-09-14 17:56:41 -05:00
Dmitri Smirnov
bc2df1bf95
Remove previously deprecated API (#12935)
Remove previously deprecated API
Format JS code, address review comments
NPM Formatting
2022-09-14 10:58:03 -07:00
Yi Zhang
1ef1029163
Skip 2 tests in windows gpu workflow (#12956) 2022-09-14 09:43:38 -07:00
cloudhan
b8e34fbd91
Split topk implementation into per-type translation units to speed up compilation (#12861) 2022-09-14 19:36:54 +08:00
Vincent Wang
da07c83948
SoftmaxCrossEntropyLossInternalGrad and Sum Fusion (#12746)
* fuse scegrad and sum

* add yield output shapes to value_info

* resolve comments

* fix merge main
2022-09-14 14:45:51 +08:00
Dwayne Robinson
568950e28c
Warn on node EP silent fallback from preferred provider (#10831)
* Warn on node EP fallback from preferred provider
* Clarify with comment
* Update to ORT's weird coding style for ragged parameter wrap
* Android build error: unused parameter ‘providers’
* Update logic to be more robust
* Updates from Pranav's feedback about messaging to rerun with verbose and respecting explicit vs implicit EP addition. Also merge from main.
* brace style patch up
* Update with feedback from Pranav and Scott McKay
* Restore node_placement_set after realizing it only applies when is_verbose is true
* Fix build warning on Android
* Renamed to node_placement_provider_set per Pranav's suggestion
2022-09-13 15:53:17 -07:00
Yulong Wang
78bc53f91d
fix prefast:Warning C26814 in non_max_suppression.cc (#12934) 2022-09-13 15:22:55 -07:00
Changming Sun
bb98922cc8
Delete nuphar docker file (#12944) 2022-09-13 15:22:07 -07:00
Tianlei Wu
95c4fc6877
[CUDA] Add TensorRT fused attention fp16 v2 kernels (#12814)
* Add TensorRT fused attention fp16 kernels
* drop sm 72;  seq 512 for sm75; and head_size 32 kernels
* Add env variable ORT_DISABLE_FUSED_ATTENTION
* exclude files in hipify
* update AttentionPastState_dynamic test threshold
* fix --use_mask_index in benchmark
2022-09-13 15:16:12 -07:00
Scott McKay
1016c33519
Fix prefast warning in upsample.cc. (#12938)
* Fix prefast warning.
* Fix some other static analysis warnings.
2022-09-14 08:14:33 +10:00
Changming Sun
626d94aa23
Refactor python packaging pipeline and nuget packaging pipeline (#12945)
1. Move the Linux ARM64 part of python packaging pipeline to a real ARM64 machine pool
2. Refactor the Linux CPU build jobs of python packaging pipeline to two parts: build and test. The test part will be exempted from Cyber EO compliance requirements as it won't affect the final bits we publish. This refactoring is to reduce dependencies in the build part. For example, this PR remove pytorch from the build dependencies.
3. Combine DML nuget packaging pipeline with "Zip-Nuget-Java-Nodejs Packaging Pipeline" as they all produce ORT nuget packages. Also, publish DML nuget packages and ORT GPU nuget packages to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly feed.
2022-09-13 14:50:31 -07:00
Hariharan Seshadri
9edc9465f0
Fix some prefast warnings (#12936) 2022-09-13 13:04:37 -07:00
RandySheriffH
64466c2d62
Remove nuphar provider folder (#12939) 2022-09-13 09:10:52 -07:00
madurais
28e27ee7f7
Changes for AIX compilation to get CPU of running thread. hz is inter… (#12744)
* Changes for AIX compilation to get CPU of running thread. hz is internal variable in AIX, hence changing to hz1 in window_functions.cc so that all OS shall work

Co-authored-by: madurais <root@telesto10.in.ibm.com>
Co-authored-by: tvkai <vamshikrishna@in.ibm.com>
2022-09-13 11:06:35 +10:00
Edward Chen
31a1403e06
Add --output_dir option to convert_onnx_models_to_ort.py. (#12844)
Add --output_dir option to convert_onnx_models_to_ort.py.
Allows one to optionally specify an output directory for the converted model files.
2022-09-12 15:36:03 -07:00
Joseph Groenenboom
a433f22f17
Softmax interface update (#12469)
* Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP

* Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP

The kernel already masks off invalid items and this gives a much
faster implementation in hipCUB.

* Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel

Hard code accumulator to fp32 for hipCUB in indicated kernel.

* Reset casting to old behavior

* Document steps to optimize SoftMax kernel on ROCm EP

Usage of the hipCUB valid_items interface on reduction operations
has a significant performance impact. Masking all thread data to
avoid need to use the valid_items interface to hipCUB.
2022-09-12 13:02:31 -07:00
Tianlei Wu
30ebc9e00a
Useless Cast removal after converting model from float32 to float16 (#12871) 2022-09-12 11:07:33 -07:00
Yi Zhang
d8636c2be8
Add enable_onnx_tests in windows nuget test step (#12926) 2022-09-12 10:08:24 -07:00
Tianlei Wu
1e34440c37
Fix ORT crash when loading BeamSearch model (#12872)
* add subgraph verification in VerifyNodeAndOpMatch

* add regression tests

* update comments

* update test
2022-09-09 12:48:32 -07:00
Scott McKay
022d9e2d0c
Get files for XNNPACK wasm build from BUILD.bazel. (#12892)
Get files for wasm build from BUILD.bazel.
2022-09-09 12:38:57 -07:00
Jian Chen
e561a7cf29
Adding QuantConfig Class (#12810)
* Initial commit for testing

* Adding DynamicQuantConfig

* Adding DynamicQuantConfig

* Format file

* Adding Default configuration placeholder.

* Update onnxruntime/python/tools/quantization/quantize.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

* Reformat file

* Reformat Rest Docstring style to google

* Updatge set to frozeset

* Uopdate Quant Config

* Updates Quant Config

* Update enum comparison

* Update onnxruntime/python/tools/quantization/quantize.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

* Update

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-09-09 14:08:47 -04:00
Dwayne Robinson
8e4eb24648
Update operator kernel table to include DML operators (#12887)
* Fix bug in pybind get_all_operator_schema due to premature reference dropping
* Add updated operator kernels markdown table
* Update build.py to include documentation generation for DML operators too
* Update GPU pipeline to include DML in the build to so operators can be generated.
* Use a separate pipeline stage, feedback from Changming and Scott
* Appease annoying Python linter
* Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage
2022-09-09 10:21:25 -07:00
Hariharan Seshadri
0b235b2763
Disable QOrderedMatMul with bias tests on Windows (#12901) 2022-09-08 17:57:37 -07:00
pengwa
b5327595f3
Fix [prefast:Warning]: C26814 (#12897)
fix C26814
2022-09-09 08:26:48 +08:00
Adam Pocock
5d55b0730e
[Java] JNI refactor for OrtJniUtil (#12516)
Refactoring more JNI methods in OrtJniUtil.
Make the strings const.
Removing unnecessary use of OrtAllocator.
2022-09-08 17:04:42 -07:00
Scott McKay
60e4d012e0
Fix unused variable warning from reduced ops build (#12889) 2022-09-09 08:08:56 +10:00
Wei-Sheng Chin
28f2e57de5
Use CUDA callback to release deferred-release buffers (#12883)
* Use CUDA callback to release deferred-release buffers

Polishment

* Minor improvements.
1. Reorder a if-else so that frequent cases are checked first.
2. More documents.

* Fix tests.
Previously, in CUDAExecutionProvider::OnRunStart, we call
GetPerThreadContext in

  auto& current_deferred_release_event = GetPerThreadContext().GetCurrentDeferredReleaseEvent();

so that a CUDAExecutionProvider always owns an active PerThreadContext
and the ReleasePerThreadContext in CUDAExecutionProvider::OnRunEnd
is always valid. However, this isn't true after we drop event-
based deferred-release code, so we need to check if
CUDAExecutionProvider really owns PerThreadContext than call
ReleasePerThreadContext if yes.

* Follow up for AMD GPU and improve CUDA part's return value.
2022-09-08 14:23:48 -07:00
Thiago Crepaldi
55c745eefd
Add support for ORTModule Torch cpp CUDA extension build within docker (#12868)
Currently, CUDA hardware is not available to be leveraged by build
during `docker build`. because of that, CUDA capable hardware would not
have CUDA support

This PR adds an env varf ONNXRUNTIME_FORCE_CUDA in which it allows CUDA
extensions to be compiled even when CUDA support is not detected.
2022-09-08 15:30:44 -04:00
pallavides
6ebb7b91eb
Re-apply fix for mkl issue for eager mode (#12881)
* reapply fix for mkl issue for eager mode
* add comment, update link libs
2022-09-08 12:29:24 -07:00
Changming Sun
ff52d6a6bf
Delete Dockerfile.ubuntu (#12888)
The file was solely for Nuphar.
2022-09-08 10:26:40 -07:00
Changming Sun
a811c7629f
Remove "Build Python Documentation" from py-packaging-stage.yml (#12890)
Remove "Build Python Documentation" from py-packaging-stage.yml because the task has been moved to Github actions by @natke in PR #10116 .
2022-09-08 09:56:54 -07:00
sophies927
b1984278d9
Enable blank issues (#12885) 2022-09-07 23:28:17 -07:00
guyang3532
4765e5c382
Using ORTModule to wrap a evaluation model should not change the mode (#12747)
Using ORTModule to wrap a evaluation model should not change the mode of model
2022-09-08 10:54:59 +08:00
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
Jian Chen
acc8bdc6c5
Splitting quantize_tensor and quantize_input (#12873)
* Splitting quantize_tensor and quantize_input

* Reformat code

* Reformat code

* Update is_input_a_weight to is_input_a_initializer
2022-09-07 18:05:42 -04:00
Sheil Kumar
535b0835f2
User/sheilk/dft fixes (#12862)
* DirectML DFT Tests and Fixes

* Dynamicaly allocate temporaries using the allocator...

* Allocate during compute

* wrong dims

* CR feedback
2022-09-07 13:21:56 -07:00
sophies927
f63bd0765d
New GitHub templates (#12777)
* Create 01-build.yml

* Create 02-documentation.yml

* Create 03-mobile.yml

* Create 04-web.yml

* Create 05-performance.yml

* Create 06-training.yml

* Create 07-feature_request.yml

* Create 08-general.yml

* Create config.yml

* Delete bug-performance-issue.md

* Delete feature_request.md

* Create labeler.yml

* Create labeler.yml

* Update Performance template to make model info optional.

* Update feature request description placeholder
2022-09-07 11:59:56 -07:00
Hariharan Seshadri
ad69aac491
Introduce ordered quantization ops for the CUDA EP [1/n] (#12582)
Initial core small set for the ordered quantization ops for cuda EP.
2022-09-07 11:58:15 -07:00
petermcaughan
69f7cc6494
Add pybind support for all memory config options in OrtArenaCfg (#12658)
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind

* Add overloaded constructor for KVP, UT still in progress

* Fix class member access in pybind, fix unit test

* Resolve linter warnings

* Improve formatting

* Simplify UT

* Fix linter formatting

Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
2022-09-07 11:15:00 -07:00
Chen Fu
8004db4bf1
fix python import sequence warning (#12864)
fix python import sequence warning
2022-09-07 09:53:39 -07:00
Xavier Dupré
400195a10a
raise an exception when TreeEnsemble request a feature out of boundaries (#12859)
* Catch a potential error when the number of featues is low than the features referenced in TreeEnsemble

* add unit test

* remove extra spaces
2022-09-07 10:05:32 +02:00
Guenther Schmuelling
f856be162e
fix xnnpack wasm build (#12845) 2022-09-06 19:20:07 -07:00
Jan Tilly
437409c343
Add DONT_VECTORIZE flag to cmake (#12169)
Add DONT_VECTORIZE flag.
2022-09-07 12:14:14 +10:00
Scott McKay
706e03c63d
Add azp run helper (#12832)
* Add helper to add azp run comments to a PR.
2022-09-07 11:48:31 +10:00
Yi Zhang
c571b99336
Refactor setup_test_data (#12818)
* refactory setup_test_data

* mv setup test data to test stage

* model link for C# test

* add comment
2022-09-07 08:33:27 +08:00
Yulong Wang
726251609a
increase max memory to 4G for wasm (#12798) 2022-09-06 17:07:13 -07:00
Tianlei Wu
d19955fd89
fix transformers script issues (#12802)
Fix a few obvious issues:
(1) bert_perf_test.py create session without provider in line 65.
(2) compare_bert_results.py miss a parameter in create_session in line 37
(3) onnx_exporter.py returns value mismatch in lines 667, 690.
(4) remove some imports not used in the scripts.
(5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"...
(6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+
2022-09-06 16:15:16 -07:00