Commit graph

7337 commits

Author SHA1 Message Date
Joseph Groenenboom
a433f22f17
Softmax interface update (#12469)
* Template datatype for SoftmaxWithRawMaskSmallKernel in ROCm EP

* Remove valid_items usage from SoftmaxWithRawMaskSmallKernel for ROCm EP

The kernel already masks off invalid items and this gives a much
faster implementation in hipCUB.

* Update accumulator type in ROCm EP for SoftmaxWithRawMaskSmallKernel

Hard code accumulator to fp32 for hipCUB in indicated kernel.

* Reset casting to old behavior

* Document steps to optimize SoftMax kernel on ROCm EP

Usage of the hipCUB valid_items interface on reduction operations
has a significant performance impact. Masking all thread data to
avoid need to use the valid_items interface to hipCUB.
2022-09-12 13:02:31 -07:00
Tianlei Wu
30ebc9e00a
Useless Cast removal after converting model from float32 to float16 (#12871) 2022-09-12 11:07:33 -07:00
Yi Zhang
d8636c2be8
Add enable_onnx_tests in windows nuget test step (#12926) 2022-09-12 10:08:24 -07:00
Tianlei Wu
1e34440c37
Fix ORT crash when loading BeamSearch model (#12872)
* add subgraph verification in VerifyNodeAndOpMatch

* add regression tests

* update comments

* update test
2022-09-09 12:48:32 -07:00
Scott McKay
022d9e2d0c
Get files for XNNPACK wasm build from BUILD.bazel. (#12892)
Get files for wasm build from BUILD.bazel.
2022-09-09 12:38:57 -07:00
Jian Chen
e561a7cf29
Adding QuantConfig Class (#12810)
* Initial commit for testing

* Adding DynamicQuantConfig

* Adding DynamicQuantConfig

* Format file

* Adding Default configuration placeholder.

* Update onnxruntime/python/tools/quantization/quantize.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

* Reformat file

* Reformat Rest Docstring style to google

* Updatge set to frozeset

* Uopdate Quant Config

* Updates Quant Config

* Update enum comparison

* Update onnxruntime/python/tools/quantization/quantize.py

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

* Update

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-09-09 14:08:47 -04:00
Dwayne Robinson
8e4eb24648
Update operator kernel table to include DML operators (#12887)
* Fix bug in pybind get_all_operator_schema due to premature reference dropping
* Add updated operator kernels markdown table
* Update build.py to include documentation generation for DML operators too
* Update GPU pipeline to include DML in the build to so operators can be generated.
* Use a separate pipeline stage, feedback from Changming and Scott
* Appease annoying Python linter
* Add onnxruntime_BUILD_UNIT_TESTS=OFF and remove stale --use_dml in cuda stage
2022-09-09 10:21:25 -07:00
Hariharan Seshadri
0b235b2763
Disable QOrderedMatMul with bias tests on Windows (#12901) 2022-09-08 17:57:37 -07:00
pengwa
b5327595f3
Fix [prefast:Warning]: C26814 (#12897)
fix C26814
2022-09-09 08:26:48 +08:00
Adam Pocock
5d55b0730e
[Java] JNI refactor for OrtJniUtil (#12516)
Refactoring more JNI methods in OrtJniUtil.
Make the strings const.
Removing unnecessary use of OrtAllocator.
2022-09-08 17:04:42 -07:00
Scott McKay
60e4d012e0
Fix unused variable warning from reduced ops build (#12889) 2022-09-09 08:08:56 +10:00
Wei-Sheng Chin
28f2e57de5
Use CUDA callback to release deferred-release buffers (#12883)
* Use CUDA callback to release deferred-release buffers

Polishment

* Minor improvements.
1. Reorder a if-else so that frequent cases are checked first.
2. More documents.

* Fix tests.
Previously, in CUDAExecutionProvider::OnRunStart, we call
GetPerThreadContext in

  auto& current_deferred_release_event = GetPerThreadContext().GetCurrentDeferredReleaseEvent();

so that a CUDAExecutionProvider always owns an active PerThreadContext
and the ReleasePerThreadContext in CUDAExecutionProvider::OnRunEnd
is always valid. However, this isn't true after we drop event-
based deferred-release code, so we need to check if
CUDAExecutionProvider really owns PerThreadContext than call
ReleasePerThreadContext if yes.

* Follow up for AMD GPU and improve CUDA part's return value.
2022-09-08 14:23:48 -07:00
Thiago Crepaldi
55c745eefd
Add support for ORTModule Torch cpp CUDA extension build within docker (#12868)
Currently, CUDA hardware is not available to be leveraged by build
during `docker build`. because of that, CUDA capable hardware would not
have CUDA support

This PR adds an env varf ONNXRUNTIME_FORCE_CUDA in which it allows CUDA
extensions to be compiled even when CUDA support is not detected.
2022-09-08 15:30:44 -04:00
pallavides
6ebb7b91eb
Re-apply fix for mkl issue for eager mode (#12881)
* reapply fix for mkl issue for eager mode
* add comment, update link libs
2022-09-08 12:29:24 -07:00
Changming Sun
ff52d6a6bf
Delete Dockerfile.ubuntu (#12888)
The file was solely for Nuphar.
2022-09-08 10:26:40 -07:00
Changming Sun
a811c7629f
Remove "Build Python Documentation" from py-packaging-stage.yml (#12890)
Remove "Build Python Documentation" from py-packaging-stage.yml because the task has been moved to Github actions by @natke in PR #10116 .
2022-09-08 09:56:54 -07:00
sophies927
b1984278d9
Enable blank issues (#12885) 2022-09-07 23:28:17 -07:00
guyang3532
4765e5c382
Using ORTModule to wrap a evaluation model should not change the mode (#12747)
Using ORTModule to wrap a evaluation model should not change the mode of model
2022-09-08 10:54:59 +08:00
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
Jian Chen
acc8bdc6c5
Splitting quantize_tensor and quantize_input (#12873)
* Splitting quantize_tensor and quantize_input

* Reformat code

* Reformat code

* Update is_input_a_weight to is_input_a_initializer
2022-09-07 18:05:42 -04:00
Sheil Kumar
535b0835f2
User/sheilk/dft fixes (#12862)
* DirectML DFT Tests and Fixes

* Dynamicaly allocate temporaries using the allocator...

* Allocate during compute

* wrong dims

* CR feedback
2022-09-07 13:21:56 -07:00
sophies927
f63bd0765d
New GitHub templates (#12777)
* Create 01-build.yml

* Create 02-documentation.yml

* Create 03-mobile.yml

* Create 04-web.yml

* Create 05-performance.yml

* Create 06-training.yml

* Create 07-feature_request.yml

* Create 08-general.yml

* Create config.yml

* Delete bug-performance-issue.md

* Delete feature_request.md

* Create labeler.yml

* Create labeler.yml

* Update Performance template to make model info optional.

* Update feature request description placeholder
2022-09-07 11:59:56 -07:00
Hariharan Seshadri
ad69aac491
Introduce ordered quantization ops for the CUDA EP [1/n] (#12582)
Initial core small set for the ordered quantization ops for cuda EP.
2022-09-07 11:58:15 -07:00
petermcaughan
69f7cc6494
Add pybind support for all memory config options in OrtArenaCfg (#12658)
* Add support for initial_growth_chunk_size_bytes setting in OrtArenaCfg pybind

* Add overloaded constructor for KVP, UT still in progress

* Fix class member access in pybind, fix unit test

* Resolve linter warnings

* Improve formatting

* Simplify UT

* Fix linter formatting

Co-authored-by: Peter Mcaughan <petermca@microsoft.com>
2022-09-07 11:15:00 -07:00
Chen Fu
8004db4bf1
fix python import sequence warning (#12864)
fix python import sequence warning
2022-09-07 09:53:39 -07:00
Xavier Dupré
400195a10a
raise an exception when TreeEnsemble request a feature out of boundaries (#12859)
* Catch a potential error when the number of featues is low than the features referenced in TreeEnsemble

* add unit test

* remove extra spaces
2022-09-07 10:05:32 +02:00
Guenther Schmuelling
f856be162e
fix xnnpack wasm build (#12845) 2022-09-06 19:20:07 -07:00
Jan Tilly
437409c343
Add DONT_VECTORIZE flag to cmake (#12169)
Add DONT_VECTORIZE flag.
2022-09-07 12:14:14 +10:00
Scott McKay
706e03c63d
Add azp run helper (#12832)
* Add helper to add azp run comments to a PR.
2022-09-07 11:48:31 +10:00
Yi Zhang
c571b99336
Refactor setup_test_data (#12818)
* refactory setup_test_data

* mv setup test data to test stage

* model link for C# test

* add comment
2022-09-07 08:33:27 +08:00
Yulong Wang
726251609a
increase max memory to 4G for wasm (#12798) 2022-09-06 17:07:13 -07:00
Tianlei Wu
d19955fd89
fix transformers script issues (#12802)
Fix a few obvious issues:
(1) bert_perf_test.py create session without provider in line 65.
(2) compare_bert_results.py miss a parameter in create_session in line 37
(3) onnx_exporter.py returns value mismatch in lines 667, 690.
(4) remove some imports not used in the scripts.
(5) fusion_utils need not print "Removed 0 cast nodes" or "Removed 0 Identity nodes"...
(6) update requirements for numpy version since gpt2 parity tool use equal_nan in numpy v1.19+
2022-09-06 16:15:16 -07:00
Xavier Dupré
54360c88d2
Disable two warnings raised by tensorboard on Visual Studio (#12773) 2022-09-06 20:42:52 +02:00
Chen Fu
9ad5b95e4f
Fix math domain error with log10 (#12841)
fix math domain error with log10
2022-09-06 08:54:41 -07:00
Cheng
8cedafe250
[xnnpack] Have Initializer in Mobile related EPs in Minimal_build and creating EP specific dynamic-schema (#12555)
* Remove the dependence of Qlinearsoftmax schema

* refactor initializerview &&  create shared schema

* Dynamic Create EP specific schema

* Have Initializer in minimal_build

* address comments

* remove CancelFuseSubGraph
2022-09-06 14:32:15 +08:00
Scott McKay
ac4f1bf960
Update max opset for NNAPI and CoreML. (#12831)
Update max opset for NNAPI and CoreML. Changes in opsets 16 and 17 don't require any updates.
2022-09-05 09:37:14 +10:00
Baiju Meswani
9e47eb68e0
Remove unused orttraining amd dockerfiles and scripts (#12707) 2022-09-02 18:43:21 -07:00
Cheng
76d17b0f48
Add java API for xnnpack (#12788)
* Add java API for xnnpack

* provider option support

* a more general interface for creating EP
2022-09-03 08:29:40 +08:00
Baiju Meswani
295bd26980
Remove orttraining-distributed CI pipeline (#12738) 2022-09-02 14:34:26 -07:00
ashbhandare
27dde0b51f
Csharp bindings for on-device training APIs (#12404) 2022-09-02 13:13:48 -07:00
Jian Chen
2fe919c3ad
Adding Split Fusion (#12732)
* Adding Split Fusion

* Make changes to comments

* Format files and change typo

* Format files and change typo

* Format files and change typo

* Format files and change typo

* Format file

* Format files

* Format files

* Format files

* Format files
2022-09-02 14:17:10 -04:00
Baiju Meswani
56bae3b196
Use InplaceClipGradNorm for offline processing for on-device training (#12603) 2022-09-02 07:47:17 -07:00
Cassie Breviu
98b2b7f5bb
Update csharp documentation (#12830) 2022-09-01 22:14:03 -07:00
sophies927
548938fb97
Update stale.yml (#12813)
* Update stale.yml

Change the number of days of inactivity before an issue becomes stale from 60 to 5 and the number of days of inactivity before a stale issue is closed from 7 to 5. Update the exempt labels based on the redefined set of GH labels.

* Implement stale.yml feedback.
2022-09-01 20:50:46 -07:00
Changming Sun
ca5af24765
Update Sdl.ruleset to remove C26812 from the rules (#12695) 2022-09-01 20:05:20 -07:00
Hariharan Seshadri
931c8b0147
Resolve GH issue 12706 (#12815) 2022-09-01 18:30:57 -07:00
Justin Chu
6fe712b587
Create codeql.yml to replace LGTM (#12790)
**Description**: Create codeql.yml to replace LGTM

**Motivation and Context**

LGTM.com is shutting down and moving to github code scanning. This PR enables github code scanning.

cpp and c# support will be added in a separate pr.
2022-09-01 16:37:43 -07:00
ashbhandare
349469c381
Enable way to extract all parameters to and from a contiguous buffer. (#12674)
* implementation

* review comments

* review comment

* lint error
2022-09-01 15:23:30 -07:00
Hariharan Seshadri
52ce6a90b4
Props file cleanup (#12782) 2022-09-01 11:05:46 -07:00
George Nash
0125e15281
Fix include order build failure training build (#12425)
Signed-off-by: George Nash <george.nash@intel.com>
2022-09-01 10:48:40 -07:00