Commit graph

4745 commits

Author SHA1 Message Date
Yulong Wang
add4e4225b
[js/web] fix pacakge metadata of onnxruntime-web (#7543) 2021-05-02 13:26:07 -07:00
Yulong Wang
97de078c24
[js/node] fix pacakge metadata of onnxruntime-node (#7542) 2021-05-02 11:17:32 -07:00
Yulong Wang
79dc7d3e50
[js/common] revise TSDoc of some interfaces (#7541) 2021-05-01 22:20:22 -07:00
Pranav Prakash
8ba6ed953f
Fix batch norm training op on CPU (#6946)
* Fix batch norm training op on CPU

* Add BatchNorm 14 Op Support

* Update hashes for BN

* Exclude TRT and OpenVINO for BatchNorm training test
2021-05-01 11:25:19 -07:00
Sheil Kumar
94c4c44bfc
Enable Microsoft.Ai.MachineLearning package to work on .NET5 down to 17763 Windows SDK (#7522)
* upgrade cswinrt and downgrade target framework

* fix sdk version references

* cswinrt 1.1.0

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-05-01 00:56:36 -07:00
satyajandhyala
9f1e61be92
Check whether nvcc supports -Wstrict-aliasing before adding the flag. (#7509)
* Check whether nvcc supports -Wstrict-aliasing before adding the compiler flag in CMakeList.txt.

* Removed reinterpret_cast to not cause strict aliasing violation errors or require -Wno-strict-aliasing when it is not available.
2021-05-01 00:14:50 -07:00
Changming Sun
00882ce495
Set CMAKE_CUDA_STANDARD to 14 because we are using std::make_unique (#7534) 2021-04-30 20:20:00 -07:00
Dwayne Robinson
93e93d0851
Merge pull request #7519 from microsoft/user/dwayner/ort1.8dml1.5.1
Update DML EP changes related to DML 1.5.1 for ORT 1.8
2021-04-30 19:25:16 -07:00
Sherlock
668a65f1a7
Complete GetGlobalAveragePoolGradient (#7514)
* Improve GetGlobalAveragePoolGradient

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-30 18:04:01 -07:00
Tim Harris
9c1900866a Revert ""Sticky" allocation of worker threads (#7372)"
This reverts commit 3d92723d1c.
2021-04-30 14:39:58 -07:00
Thiago Crepaldi
9ba9da0c95
Fix unused registered buffers issue on ORTModule (#7525) 2021-04-30 13:50:23 -07:00
Tang, Cheng
54db6648af
kerne invoker api for eager mode (#7473)
* initial draft for kernel invoke api

* initial implementation of kernel invoker

* [eager] fix build on Mac

* [eager] increment input name in kernel invoker

* temp fix for type in eager mode

* use global default log manager

* rollback the previous commit since it break linux build

* Revert "rollback the previous commit since it break linux build"

This reverts commit 58c2c3423a.

* Eager Mode: fix linking on macOS

* optimizer_execution_frame: ignore unused lambda capture (model_path)

* fix link issue

* ORTInvoker: set correct input argument tensor element proto types

Do not set a type proto on output arguments to allow ORT to deduce them

* ORTInvoker: create only one logging manager

* Minor fix to set execution provider type correctly. (#7000)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* training fix

* support config output ml values in frame, so we can use it to implement inplace update

* Fix range loop error while building. (#7087)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* Conditionally link with nsync_cpp if not windows. (#7151)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* Fixed initialization order in ORT kernel invoker (#7342)

* Updated constructor of ort_kernel_invoker to take a logger.

* Changed linking order.

* Updated test.

* add inplace ut

* add build option

* Update include/onnxruntime/core/eager/ort_kernel_invoker.h

Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>

* resolve comments in pr

* fix build break;merge from master

* fix build break

Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Aaron Bockover <abock@microsoft.com>
Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com>
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
2021-04-30 13:33:58 -07:00
Ori Levari
dfca1a09d5
Add Thread Spinning Session Option in WinML (#7498)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-04-30 11:44:58 -07:00
Weixing Zhang
e6f66f660c
missed change for external allocator in ROCm EP. (#7505) 2021-04-30 09:53:05 -07:00
Jeff Bloomfield
10e67b7340 Merged PR 5918130: Add activation fusions missing in newer opsets
Related work items: #32473540
2021-04-30 05:15:01 -07:00
Dwayne Robinson
06a2b0401a Merged PR 5873494: Resize support nearest_mode floor in DML EP
Resize support nearest_mode floor in DML EP.

Related work items: #32221069
2021-04-30 05:14:53 -07:00
Jeff Bloomfield
e6f35cc132 Merged PR 5866812: Decompose unsupported QLinearSigmoid operation in DML EP
Related work items: #32220862
2021-04-30 05:14:32 -07:00
Jeff Bloomfield
f87527c0df Merged PR 5861108: Allow nodes in DML graph partitions with empty shapes on constant CPU inputs
Resize is spec'd to ignore the "roi" tensor in certain modes.  For some reason, converters are specifying an arbitrary value for this tensor, even though it's optional.

This makes the graph partitioner skip a check for empty shape dimensions for tensors such as this, which the DML kernel registers as consuming as CPU inputs.  Otherwise, the node is not included in DML graph partitions, because the DML graph doesn't handle empty dimensions.

Related work items: #32221164
2021-04-30 05:14:13 -07:00
Adrian Tsai
915931384a Merged PR 5807585: Remove support for strided 64-bit emulation in DML's Cast kernel
A model from one of our partners regressed with a failure to evaluate due to the addition of strided 64-bit emulation in the DML EP for the Cast operator. Specifically, the model uses a Cast from int32 to int64 to produce the input shape to a Reshape node. When supplied with a shape dimension of -1 (int32 0xffffffff), the strided emulation in Cast ends up producing an int64 result of 0x00000000ffffffff. This is then fed into the Reshape operator, where it produces an incorrect tensor shape and a failure during evaluation.

Generally speaking we assume that using strided 64-bit emulation is safe if a node's inputs came from the DML EP itself. This isn't true in the general case for Cast, however - casting negative signed values can and will produce incorrect outputs with strided emulation.

After this change, Cast nodes with 64-bit types will fall back to CPU unless running on a GPU that native supports 64-bit datatypes.

Related work items: #31768166
2021-04-30 05:13:44 -07:00
Adrian Tsai
70e67ddd2b
Update DirectML version to 1.5.1 and enable ARM/ARM64 builds with DML (#7511)
* Update DirectML to version 1.5.1
* Enable --use_dml with ARM and ARM64
* Add ARM/ARM64 binaries to nuget packages
2021-04-30 00:49:30 -07:00
Yulong Wang
00aaa6dabb
update CI for onnxruntime-web (#7497) 2021-04-29 22:22:52 -07:00
Changming Sun
0d107bbb73
Fix CUDA 10.2 pipeline (#7508) 2021-04-29 22:22:35 -07:00
Scott McKay
d6df5764d7
Android package infrastructure (#7430)
* Include ORT format model conversion scripts and infrastructure in ORT python package.
  - tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).

* Address PR comments
2021-04-30 14:23:54 +10:00
Tim Harris
3d92723d1c
"Sticky" allocation of worker threads (#7372)
* Sticky thread alloaction

* Test sticky thread assignment

* Test sticky thread assignment

* Test sticky thread assignment

* Expose control over additional worker assignment stats

* Sticky thread alloaction

* Test sticky thread assignment

* Test sticky thread assignment

* Test sticky thread assignment

* Expose control over additional worker assignment stats

* Merge

* Merge

* Merge

* Fix Windows build

* Fix windows build 2

* Build Python 3.8 Windows CPU only

* Add env var to override binding

* Build Python 3.8 Windows CPU only

* Fix windows build

* Remove thread affinity override

* Remove goodworker

* Remove Python build settings

* Remove unneeded changes

* Remove unneeded changes

* Remove unneeded changes

* Remove unneeded changes

* Remove unneeded changes

* Remove unneeded changes

* Tidy

* Tidy

* Avoid race on preferred_worker vector

* Improve assertions

* Improve assertions

* Enum for PushBackWithTag result

* Remove unused field

* Update comments

* Extra debugging

* Extra debugging

* Extra debugging

* Support varying thread pool sizes

* Improve comments

* Remove requirement for thread local to be trivially destructible

* Use unsigned consistently for thread counts, removing casting

* Remove debug code

* Fix webassembly build

* Merge

* Merge

* Merge

* Remove unused code

* Fix build

* Extra test case for varying loop sizes

* Clean variable names

* Clean variable names

* Clean variable names

* Remove unneeded include, fix build

* Fix profiling

* Update from review comments
2021-04-29 20:42:14 -07:00
Edward Chen
ec04b6203b
Remove conditional compilation of std::is_trivially_copyable since we are no longer supporting GCC 4. (#7504) 2021-04-29 19:13:09 -07:00
Changming Sun
1012535dab
Change onnxruntime::make_unique to std::make_unique (#7502)
1. Change onnxruntime::make_unique to std::make_unique
2. Add "-std=c++14" to ROCM EP's build flags.
2021-04-29 17:04:53 -07:00
Yufeng Li
d337fa90e7
Propagate QDQ only when scale and zp are scalar (#7492)
fix crash when DeQuantizeLinear's output is graph output
propagate only when scale and zp are scalar.
fix bug for is_modified= is_modified || TryCancelOutDQQPair(graph, dq_node, q_node); in which TryCancelOutDQQPair wouldn't be invoked if is_modified is true
2021-04-29 14:40:41 -07:00
Scott McKay
e255506bcd
Add another input validation to ReverseSequence (#7445)
* Add another input validation to ReverseSequence

* Limit the bad length test to the CPU EP
2021-04-30 07:24:32 +10:00
Xiaoyu Liu
994c2ed420
GPT2 one step beam search update with configuration support (#7425)
* check in early stop search as separate type
* rename to beam search configurations
* update do sample configuration flag help
* rename to configurable search step
* add option groups
* add more unit tests

Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
2021-04-29 13:19:56 -07:00
Ilya Lavrenov
6358e96b63
Added OpenVINO 2021.4 support (#7470)
* Added OpenVINO 2021.4 support

* Added OPENVINO_2021_4 handling
2021-04-29 12:25:04 -07:00
Changming Sun
7b003967b1
Add static code analyzer to Windows CPU/GPU CI builds and fix the warnings (#7489) 2021-04-29 11:54:57 -07:00
Tracy Sharpe
2b0bbfd1a8
MLAS: add SSE 4.1 u8s8 kernel (#7490) 2021-04-29 11:12:32 -07:00
Tang, Cheng
e73c3e0651
rollback the GetRuntimePath impl for linux (#7488)
* rollback the GetRuntimePath impl for linux; limit the dynamic ep load ut for win

* remove the override
2021-04-29 09:11:23 -07:00
Chi Lo
0dbe51b002
Enable TRT EP for C# (#7482)
* enabled TRT EP for C#

* Fix potential leak
2021-04-29 04:56:40 -07:00
RajalakshmiSR
3c7c728989
cmake: Add regex pattern for POWER architecture (#7494)
This patch helps to set architecture as power, when processor
check output matches ppc64le*.

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-04-28 22:23:14 -07:00
Adrian Tsai
f13b378995
Re-disable tests (#7495) 2021-04-28 21:50:22 -07:00
sabreshao
e6a3308db7
Optimize cuComputeGradInput performance. (#7479)
Move the checking of gamma to host and specialize both case through template.
2021-04-28 17:08:31 -07:00
Chandru Ramakrishnan
6773b4f5dd
Fix implicit-exception-spec-mismatch warning. (#7481) (#7483)
* Fix implicit-exception-spec-mismatch warning. (#7481)

* Suppress implicit-exception-spec-mismatch warning.

* Updated to noexcept.

* Unconditionally use noexcept.
2021-04-28 19:17:39 -04:00
Thiago Crepaldi
3ee63beafa
Fix user input order before ORTModule feed it to backend (#7456) 2021-04-28 14:33:40 -07:00
Changming Sun
d68cedfa85
Fix some C/C++ warnings in the jni part (#7385) 2021-04-28 14:25:58 -07:00
Lifu Huang
ab373d6f03
Lifhuan/force trt sequential (#7440)
* Support sequential TensorRT engine build.

* Add documentation.

* Add tests and fix typos.

* Fix missing field in pybind_state.
2021-04-28 13:59:37 -07:00
Bowen Bao
c584d48283
Add sequence identity for opset 14 & fix sequence insert (#7335)
**Description**: 
- Fix SequenceInsert with last position, which is equal to the current sequence length.
- Implement Identity to support sequence input for opset 14.

**Motivation and Context**
- Required to export Huggingface/transformers T5 with beam search.
2021-04-28 13:26:57 -07:00
thilow
22d7cde725
Fix a 'Squeeze' related issue in symbolic_shape_infer.py (#7380)
* Update symbolic_shape_infer.py

don't rely on static code infer in _infer_Squeeze_

* checking if dorpped axes might be =! 1

* Checking opset. Logging assumption that symbolic dimensions are unequal to 1.

* more checks
2021-04-28 13:13:04 -07:00
Maajid khan
674915208a
Fixes RelWithDebInfo build issue on windows for OV-EP (#7471)
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-04-28 10:44:05 -07:00
G. Ramalingam
044c78f089
Add function body to LayerNorm (#7378)
* LayerNorm function body v1

* LayerNorm function body

* layernorm function test

* Minor fixes

* Fix signed unsigned comparison

* Move contrib ops test

* Handle optional output parameters

* Add test case for optional outputs

* Handle float16 random generation

* Address PR feedback
2021-04-28 09:31:53 -07:00
Pranav Sharma
da5c9263e9
Add log to allow serving platforms to quantify ORT usage. (#7476) 2021-04-28 08:20:02 -07:00
KeDengMS
8e21329206
Update nuphar notebook model download url (#7475) 2021-04-27 21:18:06 -07:00
liqunfu
196e6702ad
to support multiple cuda versions in published onnxruntime-training package (#7468)
to support multiple CUDA versions in published onnxruntime-training package
2021-04-27 17:15:33 -07:00
Zhang Lei
e64e30ee0d
Improve ConvTranspose by transposing const filter during prepacking. (#7388)
* Improve ConvTranspose by transposing const filter during prepacking.

* Fix CI build break for openvino which can not load such onnx model now.
2021-04-27 16:49:03 -07:00
Edward Chen
d21304ceb0
Initial Objective-C API (#7366)
Initial implementation of an Objective-C API.
2021-04-27 10:06:30 -07:00