Commit graph

2970 commits

Author SHA1 Message Date
gwang-msft
0933148fc3
[Android NNAPI EP] change most of the exceptions to return Status (#4701)
* change some function from throw to return status

* move more functions to return status

* move most of the exception to return status

* move logsink on android from CLogSink to AndroidLogSink

* addressed comments

* add type attr to check if the return status is used in compile
2020-08-04 21:37:27 -07:00
Changming Sun
1e054739b8
Remove the requirement of CUDA's version.txt (#4706)
Sometimes there is a file named "version.txt" in your CUDA installation dir, but sometimes there isn't one. I couldn't figure out it why, but the latest CUDA 11 on our CI build machines doesn't have this file. As the file is not needed for building onnxruntime, so I removed the check.
2020-08-04 17:03:40 -07:00
edgchen1
9d7284fc3b
Enable MatMul + Scale fusion (#4669)
Update TransposeMatMul to support scaling of the matrix product by a constant scalar value (analogous to the GEMM alpha parameter). Rename TransposeMatMul to TransposeScaleMatMul.
Fuse MatMul with surrounding Mul/Div with constant scalar into TransposeScaleMatMul.
2020-08-04 16:27:22 -07:00
Ryan Lai
f9bd52f852
Log telemetry for WinML Native API for setting intra op num usage (#4700)
Co-authored-by: Ryan Lai <ryalai96@gmail.com>
2020-08-04 09:44:23 -07:00
Tim Harris
4bd9e8d05c
Stress-test and fix thread pool when work queues are full (#4690)
While investigating an unrelated issue, I noticed that the thread pool may drop tasks when a burst of 1024+ tasks is submitted by a thread from inside the pool. Today, in general, we execute work synchronously in this case. However, there is a bug where work submitted by a thread already inside the pool will be discarded instead of executed. Currently the only scenario where I can see this occurring is when the parallel executor is used with a model in which such a large number of nodes become eligible to run all at once. This PR fixes the underlying issue and adds a test case for burst-submission of work.
2020-08-04 10:19:49 +01:00
Changming Sun
d0297f8d24
Add 'Install ONNX' step to Windows GPU pipeline (#4696)
Add 'Install ONNX' step to Windows GPU pipeline

Previously it's not a problem because onnxruntime python package explicitly said it depends on ONNX, so ONNX will get installed when we test onnxruntime. However, it was removed in #4073
2020-08-03 18:51:24 -07:00
Hariharan Seshadri
49febba3c2
Support int32 and int64 types for Tile CUDA kernel (#4684)
* Support int32 and int64 types for Tile CUDA kernel

* Fix build
2020-08-03 17:47:37 -07:00
Ori Levari
e6ef3653a7
Add Named Dimension Override API to LearningModelSessionOptions (WinML) (#4606)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2020-08-03 16:04:21 -07:00
Dmitri Smirnov
bb9b452a88
resolves #3101 - fix nuget package restore for sdk-style projects (#4680)
Co-authored-by: Christof Senn <christof.senn@gmail.com>
2020-08-03 15:27:48 -07:00
Hariharan Seshadri
0828a900e1
Support easy way to request verbose logging in test runner (#4676) 2020-08-03 15:25:55 -07:00
Changming Sun
01ca6392cb
Avoid building ONNX of every history ONNX versions in our CI (#4678)
1. Avoid building ONNX of every history ONNX versions in our CI, it is costly and easy to fail.
2. Run docker command without sudo. Previously the user is not in docker group, now Azure DevOps Service have added it in.
2020-08-03 10:18:10 -07:00
Tianlei Wu
e70e9e2f67
refine machine_info and output onnxruntime_tools version (#4679)
* output onnxruntime_tools version
* change get_machine_info return data type to string
2020-08-02 18:20:59 -07:00
Boris Fomitchev
6958f49dae
Added Dockerfile and build instructions for Jetson. Also set CUDA arch set automatically. (#4637)
* Revert "Remove docstrigs if __ONNX_NO_DOC_STRINGS" (#4495)

This reverts commit bb4d331fa7bf1fe8d68b1527dda56e4739c80800.

* Bump version to 1.4.0 (#4496)

* Create N-1 threads in intra-op pool, given main thread now active (#4493)

Create N-1 threads in a thread pool when configured with intra-op parallelism of N. This ensures we have N active threads, given that the main thread also runs work. To avoid ambiguity on the value returned, rename ThreadPool::NumThreads method to ThreadPool::DegreeOfParallelism, and make corresponding updates in MLAS and operators.

* Conditionally compile without std::is_trivially_copyable to satisfy old GCC versions. (#4510)

* Adding CUDA arch flags for NVIDIA Jetson

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Added Dockerfile for Jetson and instructions to build wheel and image

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Removing guess about nvcc location

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Restoring pip3 setuptools install order

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Updated README with links and notes re NVIDIA Docker runtime

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Added mention of nvidia-docker

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressing code review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressing code review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Tiago Koji Castro Shibata <ticastro@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Tim Harris <tiharr@microsoft.com>
Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com>
2020-07-31 23:49:23 -07:00
Ye Wang
b1bfff34e0
Support distill-bert fusion in transformers tool (#4631)
* checkin attention

* checkin embedlayer but cause invalid onnx model

* resolve comments

* fix comments

* check return values

* add version limit

* fix comments

* add warning
2020-07-31 17:57:54 -07:00
Ye Wang
8cf2c1c410
Modify EmbedLayerNorm to support distill-bert (#4666)
* modify cpu op

* modify cuda ops

* change is_distill to has_segment
2020-07-31 14:37:58 -07:00
Brian Martin
1eadec0eea
Update Versioning.md for Windows 10 and Microsoft.AI.MachineLearning NuGet versions (#4659)
* Update Versioning.md

Update documentation to cover latest Windows 10 release (Vb) and the NuGet packages.

* PR feedback.

* readability changes

* spell out Windows ML Availability
2020-07-31 07:58:51 -07:00
Wei-Sheng Chin
e9d20e9dba
Revise Send and Recv (#4547)
* Add ability to retrieve inferred shapes when executing a kernel.
This ability helps Recv to know its output shapes without doing
actual cummunication. Of course, if the output shapes cannot be
inferred, Recv still needs to do communication to get shapes from
Send.

* Avoid communicating shape information when it can be inferred statically

* Replace unordered_map with thread-safe wrapper.
We don't want to have racing condition and undefined behavior
when using parallel executor.y

* Remove cout

* Add missing file

* Address comments

* Check dim_value. -1 means missing

* lock properly

* Address comments (remove thread-safe map)

* Remove poc header

* Replace Stream with DeferredReleaseCPUPtr
2020-07-30 23:02:45 -07:00
Tianlei Wu
3588c5b545
Add GPT-2 test generation to convert_to_onnx.py (#4670)
* add gpt2 tester
* add an option to include output latency.
2020-07-30 21:03:53 -07:00
RandySheriffH
1fcd3eb376
cancel night build on pyop (#4673) 2020-07-30 19:51:52 -07:00
Changming Sun
f9f25c5559
Remove featurizer from CI build (#4661) 2020-07-30 18:37:55 -07:00
Ryan Lai
5ce675c3b9
Expose Onnxruntime Intra Op thread controls through WinML Native API (#4638)
* Register ILearningModelSessionOptionsNate interface

* Threading options exposed

* Add interrogator for Session options

* Add test

* Polish test

* PR comments

* Set intra op threads

* Add adapter api to grab intra op threads

* Add adapter test for getting intraop num threads

* Make ILearningModelSessionNative and update winml api test

* Make it required when building engine to set the intraop num threads

* Make test  more pretty

* Change naming of idl function

* Revert "Change naming of idl function"

This reverts commit c06916aa5bf94e3bf233ed281e508b935fc8638d.

* PR comment on naming

* Skip the test because it's influenced if it's built with openmp

Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2020-07-30 17:55:26 -07:00
gwang-msft
de0b04b971
[Android NNAPI EP] Add support for dynamic output (#4650)
* add dynamic output shape support

* fix bugs associates with scalar inputs

* addressed comments, fixed issue the output buffer size is not correctly set, refactor shaper class

* split the execution logic from nnapi::Model into nnapi::Execution

* update comments for certain scenarios, 1. dynamic output buffer size, 2. ONNX scalar input

* move ctor of nnapi::Execution to public
2020-07-30 16:42:17 -07:00
gwang-msft
282975aefb
[Android NNAPI EP] Add QLinearAdd op Support, move some throw with return status (#4607)
* remove dependency of external jd-dnnlibrary

* add qlinearadd support

* combine some qlinear ops logics, move some throw into return status

* merge master

* minor bug fixes

* addressed comments
2020-07-30 11:45:11 -07:00
Changming Sun
51332e3c81
Change Linux CI build time out value to 3 hours (#4664)
Because it often need more than 1 hr 55 minutes, increase the value so that we'll less likely see pipeline failed.
2020-07-30 02:52:05 -07:00
George Wu
319d30e50e
dnnl only run opset 10 model tests to reduce footprint/runtime (#4663) 2020-07-30 17:25:53 +08:00
Sheil Kumar
0a8bfb10fa
Inbox WinML tests fail because Inbox loads binaries from system32 (#4660)
* make dml and onnxruntime system32 only when winml and onnxruntime is loaded from system32

* use __ImageBase as that will not incur the unsupport store api call into GetModuleHandleEx

* remove accidental comment

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-30 00:09:18 -07:00
Hariharan Seshadri
382f94c95c
Fix regression introduced in cast transformer (#4658) 2020-07-29 23:32:01 -07:00
Tixxx
f90a2d46ae
Changes to support TNLRV3 fine-tuning (#4639)
* added reducesumlogexp gradient
added test
fixed type mismatch when calling cudnnreduce kernel
fixed python frontend to remove redundant states to match pytorch state dict
2020-07-29 19:17:59 -07:00
Faith Xu
d8f3e46d45
Readme updates (#4448)
* Readme updates

* Update package repo table
2020-07-29 13:25:36 -07:00
gwang-msft
db475c4f35
Add option for onnx_test_runner can pause after launch, make create_test_dir work on non-windows os (#4618)
* minor fix for test dir util

* add pause option for onnx_test_runner

* add flush std to show pause prompt text

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2020-07-29 11:47:01 -07:00
Tianlei Wu
326cc686df
Update notebook: disable GPU for tensorflow (#4649) 2020-07-29 10:09:06 -07:00
S. Manohar Karlapalem
623dd53eb7
Rename inner-scoped variable to avoid MSVC warning (#4587) 2020-07-28 18:03:57 -07:00
RRRachelllll555
f3fc8ca954
Add input tensor calibration (#4619)
* add input tensor calibration

* set default fusions to be true

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-07-28 14:04:41 -07:00
ashbhandare
d4983f83ff
Shape independent gradient builder for ops requiring broadcast (#4586)
* Adding CPU implementation of BroadcastGradientArgs op

Modify to take shape as input instead of tensor

Cleanup

Correct schema

Corrected kernel, added tests, addressed review comments.

Initial change, to add ReduceSumTraining cpu op

cpu support

Initial changes to gradient builder

Non-empty reduction case passing.

Added exception,test for invalid broadcast,addresed review comments.

Initial change, to add ReduceSumTraining cpu op

cpu support

cuda support + more UTs

on comments + UT

no op support for {} axes with new attr - noop_with_empty_axes

Add noop attribute to ReduceSumTraining use

Add testing for no-shape graph, modify AddSub grad builder, logging.:

MulGrad support

Div support

Expand support

Gemm support

MatMul grad change

Transpose Grad change

BiasGeluGrad change.

Fixes after squash

* Remove logging, add specific exception for shape inference error

* fix build

* Review comments

* Review comments

* Fix windows build

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-07-28 13:01:07 -07:00
RandySheriffH
948a33bdfc
FixPyOpSegFault&MakeItStaticLib (#4600)
* remove pyop wrapper

* add py threading logic

* fix doc

* fix doc

* fix doc

* format doc

* format doc

* format doc

* reenable test

Co-authored-by: RandySheriffH <rashuai@microsoft.com>
2020-07-28 11:45:25 -07:00
Bowen Bao
6c2bd127ba
more types for comparison ops (#4634) 2020-07-28 09:53:19 -07:00
M. Zeeshan Siddiqui
73ad92e773
Change ignore_index to 0 in Bert-Loss. (#4640) 2020-07-28 04:37:11 -07:00
Yufeng Li
a06cf6a3b3
Show quantization model size in benchmark of transformer (#4626)
* Show quantization model size in benchmark of transformer

* refine model size calculation
2020-07-27 23:56:33 -07:00
Tiago Koji Castro Shibata
73c99f8269
Set WINVER (#4636) 2020-07-27 20:24:11 -07:00
Xiang Zhang
d73e01e5b9
remove ENABLE_TELEMETRY macro (#4633) 2020-07-27 20:06:11 -07:00
albertomagni-ms
c08e5f55e9
Fix misleading indentation (#4629)
* Adjust indentation of statement, without this fix GCC 7.5 errors
out with:
"this ‘if’ clause does not guard this statement, but the
latter is misleadingly indented as if it were guarded by the ‘if’"

* Add braces around the if-statement for improved clarity.

Co-authored-by: Alberto Magni <alberto.magni@microsoft.com>
2020-07-27 14:42:58 -07:00
Sheil Kumar
efa393e596
WinML should dynamically link against onnxruntime.dll and only system32 for inbox builds (#4615)
* Dynamically link onnxruntime.dll

* fixes

* add preceeding backslash to onnxruntime.dll for inbox builds

* remove /d

* loadlibrary -> loadlibraryex

* use loadlibrary system32 option

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-27 09:56:49 -07:00
Sheil Kumar
222fd08f20
DirectML.dll is loaded via LoadLibraryW but should use LoadLibraryExA (#4616)
* create dml device via loadlibraryexa

* add build_INBOX flag to adapter

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-25 21:29:46 -07:00
Alisha Sonawalla
1e67fff93c
Add GetStringTensorElement, GetStringTensorElementLength and FillStringTensorElement API (#4374)
Add new string tensor APIs and unit tests
2020-07-24 21:35:46 -07:00
Sheil Kumar
c361a59cff
disable gpu timeouts in winml (#4604)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-07-24 13:44:44 -07:00
Tiago Koji Castro Shibata
48d969f4bf
Constexpr CreateFeatureValueFromInspectable (#4460) 2020-07-24 13:08:14 -07:00
Hariharan Seshadri
9510f26744
[Python] Support more APIs for the SessionOptions class (#4596) 2020-07-24 12:56:54 -07:00
ytaous
9888c9e944
SplitTraining op to support split as input (#4597)
* SplitTraining op to support split as input

* on comments and minor refactor

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-07-24 12:49:19 -07:00
Sherlock
aa328c2c20
Update GratherGard to accumulate in fp32 (#4601) 2020-07-24 10:54:31 -07:00
Yufeng Li
9c75c29403
refine opset version getter (#4602) 2020-07-24 10:34:56 -07:00