Commit graph

4772 commits

Author SHA1 Message Date
Sergii Dymchenko
a647da3e1a
Fix 2 input Gemm grad (#7561)
* Add test for 2 input Gemm grad.

* Fix 2 input Gemm grad.
2021-05-04 12:00:14 -07:00
harshithapv
d812354ebd
Tile grad fix (#7556)
* tile grad fix

* code clean up
2021-05-04 11:16:26 -07:00
Guoyu Wang
e05528a365
Update Android AAR packaging pipeline script (#7559)
* update android package pipeline

* update shell script

* update script

* add kMSExperimentalDomain to reduction
2021-05-04 11:13:33 -07:00
Guoyu Wang
71ff6ff2ec
Disable NNAPI support for dynamic input shape, add warning logs (#7439)
* disable nnapi for graph with dynamic input shape

* Add warning for multiple paritions

* minor update

* update the message logging

* Fix coreml ci failure
2021-05-04 11:09:23 -07:00
Fanny Nina Paravecino
c3c4db2c1b
Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli (#6262)
* Add gist nodes, kernels, optimizer rule, and cli

* Add Gist CUDA kernels

* Added/updated gist compression cli to bert, gpt2, mnist

* Fix decode priority generator for large models

* Fix hardcoded decode priority generator, update gist training test

* Fix incomplete if/else sequence for CI build

* Added MSFP15 for gist compression type

* fix Msfp15 bug

* Resolved azure pipeline errors - unsupported ORT_RETURN macro format, cudastream argument

* Resolved hardcoded cudastream argument, Pack8 zero error

* Resolved PR comments - except gist tests

* Added TypeInference to Gist Nodes, To attribute to Gist Decoder, Updated Gist Test Cases

* Reverted error in merge commit

* Updated logger usage in Gist rule, Updated GistPackMSFP15 compressed tensor's explaination

* Converted onnxruntime::make_unique to std::make_unique based on PR 7502

Co-authored-by: Fanny Nina Paravecino <faninapa@microsoft.com>
Co-authored-by: Aayush Ankit <aayushankit@microsoft.com>
Co-authored-by: Aayush Ankit <Aayush-Ankit@users.noreply.github.com>
Co-authored-by: Fanny Nina Paravecino <fanny.nina@microsoft.com>
2021-05-04 10:33:35 -07:00
Sherlock
c1ed647170
ORTModule enable run_symbolic_shape_infer by default (#7423)
* ORTModule enable run_symbolic_shape_infer by default

* Fix UTs by replacing Relu with Softmax

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-04 10:08:14 -07:00
Chi Lo
a94a893d5e
Update SessionOptions.cs (#7540)
Fix compile warning
2021-05-04 01:51:35 -07:00
Scott McKay
594dde2647
Validate that the conversion script from the python package can be used to convert models. (#7517) 2021-05-04 16:25:04 +10:00
sfatimar
898fff702c
compatibility was broken for myriad config parameter (#7349)
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2021-05-03 21:13:12 -07:00
Tianlei Wu
3c9ece4a11
[transformers optimizer] catch symbolic shape inference exception and clean up (#7560)
catch symbolic shape inference exception.
no prune graph when there is inner graph (Loop/If/Scan)
add an wrapper for numpy_helper.to_array so that we can debug onnx graph without external data
remove fuse_mask that is not used any more in onnx_model_bert_tf.py
2021-05-03 20:42:13 -07:00
George Wu
faea7a222d
linux trt package pipeline (#7537) 2021-05-03 19:14:20 -07:00
Yulong Wang
8eaa4c33e2
[js] fix library bundling and some trivial improvement (#7550)
* [js] fix library bundling

* fix filename in code comments
2021-05-03 18:31:55 -07:00
Tianlei Wu
731f9e5033
Fix symbolic shape inference for Unsqueeze (#7555)
* fix Unsqueeze shape inference
* add tests
2021-05-03 18:06:59 -07:00
Yulong Wang
418623355a
disable logging for WASM in inference session ctor (#7545) 2021-05-03 16:01:41 -07:00
Dmitri Smirnov
8b6602ae68
Refactor provider test utils to prepare for expansion. (#7538)
Refactor provider test utils to prepare for expansion.
  Do not sort Tensors in place. This is reasonably rare.
2021-05-03 15:56:07 -07:00
baijumeswani
cab84d902e
Install and use conda on ortmodule CI pipelines (#7530)
* Install and use conda on ortmodule CI pipelines

* Update build script to install onnxruntime wheel before running unit tests

* Remove python 3.5 from install_python_deps

* Pinning deepspeed version to 0.3.15
2021-05-03 15:52:22 -07:00
Yufeng Li
ad15811ade
Add QDQ support for MatMulIntegerToFloat, Gather and Transpose (#7500)
* add QDQ support for MatMulIntegerToFloat, Gather and Transpose
2021-05-03 15:51:25 -07:00
Scott McKay
830d9e54dd
Add script to dump initializer, NodeArg, Node and subgraph info from an ORT format model (#7516) 2021-05-04 08:34:35 +10:00
Yulong Wang
3600c3e66e
[js/web] integrate latest changes from onnxjs (#7535)
* [js/web] integrate latest changes from onnxjs

* apply ESLint rules: filename-case and header

* remove filename-case rule for wasm .d.ts
2021-05-03 15:03:25 -07:00
sumitsays
9c0e5954cb
Output Tensor Shape Validation b/w ONNX inference and ORT (#7252)
* Adding Output Shape Validation for ORT-CPU execution flow

* Skipping validation check in-case output is not a tensor. Fixed conv_transpose test. Ignoring pad and reduction test

* Comparison b/w signed and un-signed int. Removed const for a primitive variable

* Commented the un-used test function signature

* Removed exception instead logging warning. Because there are lots of ORT tests which are failing because of this validation

* Fixed warning condition and test

* Fixed test and addressed comment on the PR

* Output shape verification will happen only for final output nodes of the model

* Changed variable name from camel case to underscore style

* Enable the tests as the validation failure will now logs warning instead of throwing an exception

* Adding Output Shape Validation for ORT-CPU execution flow

* Resolve merge conflict

* Comparison b/w signed and un-signed int. Removed const for a primitive variable

* Commented the un-used test function signature

* Removed exception instead logging warning. Because there are lots of ORT tests which are failing because of this validation

* Fixed warning condition and test

* Fixed test and addressed comment on the PR

* Output shape verification will happen only for final output nodes of the model

* Changed variable name from camel case to underscore style

* Enable the tests as the validation failure will now logs warning instead of throwing an exception

* Remove duplicate function "GetLogger()"

Remove duplicate function "GetLogger()"

* Fixed typo in method name TestConvTransposeOpInitializer

Fixed typo in method name "TestConvTransposeOpInitializer"

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-05-03 12:56:09 -07:00
sumitsays
e344a583b0
updated sampleTolerance of model fp16_inception_v1 for GPU execution provider (#7533)
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-05-03 12:08:31 -07:00
G. Ramalingam
b0a3b501fe
Add function body for Gelu and FastGelu (#7496)
* LayerNorm function body v1

* LayerNorm function body

* layernorm function test

* Minor fixes

* Fix signed unsigned comparison

* Move contrib ops test

* Handle optional output parameters

* Add test case for optional outputs

* Handle float16 random generation

* Add function body to Gelu and FastGelu

* Add FastGelu test

* Fix comments

* Include cmath
2021-05-03 12:00:37 -07:00
Yulong Wang
7079dfb93d
[wasm] fix and unify webassembly target name (#7549) 2021-05-03 10:37:25 -07:00
Tim Harris
2e09d9921a
"Sticky" allocation of worker threads (#7551)
[ PR previously merged as https://github.com//pull/7372, then reverted pending investigation of lost-wake-up issue seen with ParallelExecutor. Issue was a missing test for new work pushed to thread concurrent with a worker blocking. Change from 7372 is the addition of: https://github.com/microsoft/onnxruntime/blob/tiharr/dev-sticky-4/include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h#L1473-L1492 ]

Description: This change updates the heuristics used when a thread selects which worker threads to push work to on entering a parallel loop. Previously, worker threads would maintain a best-effort bitmap of "good worker hints" indicating the threads that were likely to be spinning waiting for work. This change uses a simpler heuristic where a thread records which workers ran its previous loop, and then re-submits its next loop to those same workers. The aim is to retain affinity between a thread and a set of workers, and to avoid maintaining the "good worker hints" bitmaps.

Motivation and Context: Profiling suggested that maintaining the "good worker hints" was taking unexpected time, particularly on NUMA systems. In addition, when running many concurrent workloads, the hints did not provide a way to help retain locality of workers and hence data in caches. Testing to confirm no regressions on microbenchmark (./build/Linux/Release/onnxruntime_benchmark --benchmark_filter=BM_ThreadPoolParallelFor) and on Linux mobilenet_v1_1.0_224.onnx, comparing p50 and p99 with vs without this change:

1 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0204s vs 0.0216s

2 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0213s vs 0.0221s
2021-05-03 18:28:13 +01:00
Sherlock
6714f2f85d
Improve tol value logging in ORTModule test (#7544)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-03 09:43:40 -07:00
Ryota Tomioka
d1cb8c9dc9
Support negative indices and fix bound checking in symbolic shape inference for Slice (#7401)
* Use positivity everywhere; handle negative index in Slice

* limit positivity to inputs

* make handle_negative_index private

* strengthen sympy comparison

* further strengthen compariso
n and a minor refactoring

* Add flip test

* Fall through if -int_max in handle_negative_index()

* minor fix for infer_Concat to include initializers

* Add more tests

* use simplify

* more tests
2021-05-03 09:07:55 -07:00
Sheil Kumar
8e3cdf0452
Use unicode apis for loadlibrary (#7523)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-05-03 07:24:40 -07:00
Yulong Wang
add4e4225b
[js/web] fix pacakge metadata of onnxruntime-web (#7543) 2021-05-02 13:26:07 -07:00
Yulong Wang
97de078c24
[js/node] fix pacakge metadata of onnxruntime-node (#7542) 2021-05-02 11:17:32 -07:00
Yulong Wang
79dc7d3e50
[js/common] revise TSDoc of some interfaces (#7541) 2021-05-01 22:20:22 -07:00
Pranav Prakash
8ba6ed953f
Fix batch norm training op on CPU (#6946)
* Fix batch norm training op on CPU

* Add BatchNorm 14 Op Support

* Update hashes for BN

* Exclude TRT and OpenVINO for BatchNorm training test
2021-05-01 11:25:19 -07:00
Sheil Kumar
94c4c44bfc
Enable Microsoft.Ai.MachineLearning package to work on .NET5 down to 17763 Windows SDK (#7522)
* upgrade cswinrt and downgrade target framework

* fix sdk version references

* cswinrt 1.1.0

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-05-01 00:56:36 -07:00
satyajandhyala
9f1e61be92
Check whether nvcc supports -Wstrict-aliasing before adding the flag. (#7509)
* Check whether nvcc supports -Wstrict-aliasing before adding the compiler flag in CMakeList.txt.

* Removed reinterpret_cast to not cause strict aliasing violation errors or require -Wno-strict-aliasing when it is not available.
2021-05-01 00:14:50 -07:00
Changming Sun
00882ce495
Set CMAKE_CUDA_STANDARD to 14 because we are using std::make_unique (#7534) 2021-04-30 20:20:00 -07:00
Dwayne Robinson
93e93d0851
Merge pull request #7519 from microsoft/user/dwayner/ort1.8dml1.5.1
Update DML EP changes related to DML 1.5.1 for ORT 1.8
2021-04-30 19:25:16 -07:00
Sherlock
668a65f1a7
Complete GetGlobalAveragePoolGradient (#7514)
* Improve GetGlobalAveragePoolGradient

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-30 18:04:01 -07:00
Tim Harris
9c1900866a Revert ""Sticky" allocation of worker threads (#7372)"
This reverts commit 3d92723d1c.
2021-04-30 14:39:58 -07:00
Thiago Crepaldi
9ba9da0c95
Fix unused registered buffers issue on ORTModule (#7525) 2021-04-30 13:50:23 -07:00
Tang, Cheng
54db6648af
kerne invoker api for eager mode (#7473)
* initial draft for kernel invoke api

* initial implementation of kernel invoker

* [eager] fix build on Mac

* [eager] increment input name in kernel invoker

* temp fix for type in eager mode

* use global default log manager

* rollback the previous commit since it break linux build

* Revert "rollback the previous commit since it break linux build"

This reverts commit 58c2c3423a.

* Eager Mode: fix linking on macOS

* optimizer_execution_frame: ignore unused lambda capture (model_path)

* fix link issue

* ORTInvoker: set correct input argument tensor element proto types

Do not set a type proto on output arguments to allow ORT to deduce them

* ORTInvoker: create only one logging manager

* Minor fix to set execution provider type correctly. (#7000)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* training fix

* support config output ml values in frame, so we can use it to implement inplace update

* Fix range loop error while building. (#7087)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* Conditionally link with nsync_cpp if not windows. (#7151)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* Fixed initialization order in ORT kernel invoker (#7342)

* Updated constructor of ort_kernel_invoker to take a logger.

* Changed linking order.

* Updated test.

* add inplace ut

* add build option

* Update include/onnxruntime/core/eager/ort_kernel_invoker.h

Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>

* resolve comments in pr

* fix build break;merge from master

* fix build break

Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Aaron Bockover <abock@microsoft.com>
Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com>
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
2021-04-30 13:33:58 -07:00
Ori Levari
dfca1a09d5
Add Thread Spinning Session Option in WinML (#7498)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-04-30 11:44:58 -07:00
Weixing Zhang
e6f66f660c
missed change for external allocator in ROCm EP. (#7505) 2021-04-30 09:53:05 -07:00
Jeff Bloomfield
10e67b7340 Merged PR 5918130: Add activation fusions missing in newer opsets
Related work items: #32473540
2021-04-30 05:15:01 -07:00
Dwayne Robinson
06a2b0401a Merged PR 5873494: Resize support nearest_mode floor in DML EP
Resize support nearest_mode floor in DML EP.

Related work items: #32221069
2021-04-30 05:14:53 -07:00
Jeff Bloomfield
e6f35cc132 Merged PR 5866812: Decompose unsupported QLinearSigmoid operation in DML EP
Related work items: #32220862
2021-04-30 05:14:32 -07:00
Jeff Bloomfield
f87527c0df Merged PR 5861108: Allow nodes in DML graph partitions with empty shapes on constant CPU inputs
Resize is spec'd to ignore the "roi" tensor in certain modes.  For some reason, converters are specifying an arbitrary value for this tensor, even though it's optional.

This makes the graph partitioner skip a check for empty shape dimensions for tensors such as this, which the DML kernel registers as consuming as CPU inputs.  Otherwise, the node is not included in DML graph partitions, because the DML graph doesn't handle empty dimensions.

Related work items: #32221164
2021-04-30 05:14:13 -07:00
Adrian Tsai
915931384a Merged PR 5807585: Remove support for strided 64-bit emulation in DML's Cast kernel
A model from one of our partners regressed with a failure to evaluate due to the addition of strided 64-bit emulation in the DML EP for the Cast operator. Specifically, the model uses a Cast from int32 to int64 to produce the input shape to a Reshape node. When supplied with a shape dimension of -1 (int32 0xffffffff), the strided emulation in Cast ends up producing an int64 result of 0x00000000ffffffff. This is then fed into the Reshape operator, where it produces an incorrect tensor shape and a failure during evaluation.

Generally speaking we assume that using strided 64-bit emulation is safe if a node's inputs came from the DML EP itself. This isn't true in the general case for Cast, however - casting negative signed values can and will produce incorrect outputs with strided emulation.

After this change, Cast nodes with 64-bit types will fall back to CPU unless running on a GPU that native supports 64-bit datatypes.

Related work items: #31768166
2021-04-30 05:13:44 -07:00
Adrian Tsai
70e67ddd2b
Update DirectML version to 1.5.1 and enable ARM/ARM64 builds with DML (#7511)
* Update DirectML to version 1.5.1
* Enable --use_dml with ARM and ARM64
* Add ARM/ARM64 binaries to nuget packages
2021-04-30 00:49:30 -07:00
Yulong Wang
00aaa6dabb
update CI for onnxruntime-web (#7497) 2021-04-29 22:22:52 -07:00
Changming Sun
0d107bbb73
Fix CUDA 10.2 pipeline (#7508) 2021-04-29 22:22:35 -07:00
Scott McKay
d6df5764d7
Android package infrastructure (#7430)
* Include ORT format model conversion scripts and infrastructure in ORT python package.
  - tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).

* Address PR comments
2021-04-30 14:23:54 +10:00