Commit graph

11997 commits

Author SHA1 Message Date
Scott McKay
2127a229d7
The IndexedSubGraph is used to create the Function body, but after that is invalid as the nodes it referred to have been removed from the main Graph. As such there's no need to store it in the FunctionImpl instance. (#5669) 2020-11-05 17:21:56 +10:00
Ryan Hill
941e3a69f9
Test a build break fix (#5706) 2020-11-04 21:15:38 -08:00
ashbhandare
6d8e81cb08
Update Squeeze, Unsqueeze, Split and ReduceSum kernel for Opset13 (#5691)
* Split  change

* ReduceSum and Split change

* Other op changes, Grad builder, tests, registering required opset 13 ops

* Rebase fixes

* Fix tests, add some more

* Review changes, rebase

* Fix windows build

* Disable new tests for TesnorRT EP

* Disable unsupported for OpenVINO

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-04 20:00:27 -08:00
Dmitri Smirnov
830f567be8
Add C API Guidelines document (#5686)
Add C API Guidelines document
Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>
2020-11-04 18:50:31 -08:00
alexzakv
8bae883d3e
User/alexzak/win ml principles (#5453)
* Contributing page change

* Update WinML_principles.md

* Update WinML_principles.md

* Update WinML_principles.md

* Updated

* Update WinML_principles.md

* Update WinML_principles.md

* Update WinML_principles.md
2020-11-04 13:35:40 -08:00
wezuo
62a99824cb
Wezuo/priority in nodedef (#5692)
* set the priority in nodedef

* remove debugging stmts

* revoke zero builder

* remove unnecessary namespace comment

Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-04 12:40:37 -08:00
S. Manohar Karlapalem
e49f7a8b71
Fix build error due to unused variable (#5698)
Fixes build error due to unused variable when building with
OpenVINO 2020.2 and 2020.3.
2020-11-04 12:12:16 -08:00
Changming Sun
0b9f7bb1b0 Update InferenceTest.cs 2020-11-04 11:39:49 -08:00
Changming Sun
0445473dc1 Add ssd to x86_disabled_tests 2020-11-04 11:39:49 -08:00
Guoyu Wang
a2b551ff08
Add runtime options for NNAPI EP (#5576)
* Add options for nnapi ep

* Add nnapi flags test

* add comments

* Add flag comments

* Make the flags bitset const

* Fix build break

* Add stub changes to java and c# api

* Fix java related build break

* Fix java build break

* Switch to bit flags instead of bitset
2020-11-04 10:08:43 -08:00
Guoyu Wang
2ad7bcb766
NNAPI add opset version check (#5687)
* nnapi add opset support
2020-11-04 21:48:00 +10:00
edgchen1
07bd4ef470
Upgrade optional implementation to https://github.com/martinmoene/optional-lite. (#5563) 2020-11-03 15:27:47 -08:00
Changming Sun
67d7e3967d Disable some model tests 2020-11-03 14:42:45 -08:00
Hector Li
b6eeadf420
Enable OpenVino build on Arm64 platform (#5682) 2020-11-03 13:55:34 -08:00
Scott McKay
c9f44276da
Add ability to filter GraphViewer using IndexedSubGraph. (#5614)
* Add ability to filter GraphViewer using IndexedSubGraph. This is to support compiling execution providers in a minimal build.
2020-11-04 07:08:18 +10:00
Changming Sun
357a51c75c
Update python packaging pipeline's docker image (#5680) 2020-11-03 12:01:36 -08:00
Hariharan Seshadri
db9c1308a5
Fix Resize kernel registration (#5677) 2020-11-03 10:43:41 -08:00
edgchen1
28f1e32898
Loosen tolerance of CudaKernelTest.ReduceSum_MidTensor, allow test random seed to be regenerated within a test run. (#5675) 2020-11-03 10:37:00 -08:00
Ye Wang
a028ca41ec
Optimize flaubert (#5651)
* optimize flaubert

* fix an issue and format

* revert non-relevent change

* review comments
2020-11-03 09:51:42 -08:00
M. Zeeshan Siddiqui
9b010963b7
Turn off peak memory logging and fix memory pattern generation bug. (#5676)
* Turn off peak memory log lines and fix memory pattern generation bug.

* Turn off peak memory log lines and fix memory pattern generation bug.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-03 08:44:15 -08:00
Dmitri Smirnov
5d66cf017c
Register Clip for OpSet13 (#5671)
Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>
2020-11-03 07:07:28 -08:00
Wei-Sheng Chin
8856c2595b
Sync the two IDs in OrtMemoryInfo when calling ctor (#5663)
* Sync the two IDs in OrtMemoryInfo when calling ctor

* Also fix the same problem for output
2020-11-02 23:22:47 -08:00
Changming Sun
4936e10e22
Disable some model tests (#5664)
These are the new models added by WinML team. But some of our EPs can't pass some of tests.
2020-11-02 22:01:35 -08:00
Tracy Sharpe
182d9c48e4
Merge u8u8/u8s8 QLinearConv implementations (#5662)
Combine the u8u8/u8s8 implementations for x86/x64 builds and add special case handling for 1D convolutions.
2020-11-02 21:38:39 -08:00
ashbhandare
c875fe0919
Add option to dump activations on all ranks (#5455)
* Add option to dump activations on all ranks

* address review comments

* review comments

* Fix review comment

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-02 18:03:05 -08:00
Changming Sun
87e1063e19
Revert "Update Squeeze, Unsqueeze, Split and ReduceSum kernel for Opset13 (#5488)" (#5668)
This reverts commit db63c5d10f.
2020-11-02 16:09:22 -08:00
Tianlei Wu
2c02530603
Bert Model Profiling Tool (#5654)
* Add profiler tool for BERT models
2020-11-02 13:47:37 -08:00
Jesse Benson
1495f737ca Use cudaMemsetAsync and add checks on CUDA calls. 2020-11-02 11:25:13 -08:00
ashbhandare
db63c5d10f
Update Squeeze, Unsqueeze, Split and ReduceSum kernel for Opset13 (#5488)
* Split  change

* ReduceSum and Split change

* Other op changes, Grad builder, tests, registering required opset 13 ops

* Rebase fixes

* Fix tests, add some more

* Review changes, rebase

* Fix windows build

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-02 10:51:48 -08:00
Wenbing Li
5b44982971
Change the OrtCustomOp invocation as a constant. (#5506)
* Chanage the OrtCustomOp invocation as a constant.

* fix build on macos

* build fixing
2020-11-02 10:38:07 -08:00
Derek Murray
ff538b8d3a
Minor fixes in BERT Inference notebook (#5637)
Add missing commas to the code example.
2020-11-02 09:49:23 -08:00
Ashwini Khade
1cca903680
update onnx commit id (#5594)
* update onnx commit id

* update onnx commit for docker images

* update docker images
2020-11-02 09:46:36 -08:00
M. Zeeshan Siddiqui
f2168cef29
Misc. cleanup. (#5659)
Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-02 07:05:28 -08:00
M. Zeeshan Siddiqui
9af0d48524
Memory planner and pattern generation enhancements. (#4443)
* static allocation.

* chanegs.

* contigious dynamic allocation.

* contigious dynamic allocation.

* fix bugs.

* fix bug.

* build errors.

* PR feedback.

* PR feedback.

* Update Graph builder for nccl_allreduce, mps.

* misc.

* fix windows build break.

* changes.

* fine-grained memory-time scheduling.

* merge.

* fix misc stuff.

* fix windows build.

* fix windows build.

* fix merge bug.

* merge conflicts.

* revert onnx-tensorrt submodule commit.

* fix submodule commit.

* misc.

* merge conflicts.

* Revert "merge conflicts."

This reverts commit 319a071a6e.

* merge conflict.

* merge conflict.

* merge conflicts.

* fixes.

* PR feedback.

* build break.

* build break.

* Add asserts.

* Add asserts.

* asserts.

* asserts.

* asserts.

* asserts.

* asserts.

* fixes.

* fixes.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: root <root@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-11-01 23:05:46 -08:00
Maajid khan
d98062da0c
[OpenVINO-EP] Hetero support (#5627)
* Implement Hetero in UEP
* Added security checks to take valid Hetero combinations
  as device type
* Integrating Hetero features
* Get the statistics Report in Debug Mode

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Passing right device type for vadm_baackend

Added simple fix to pick the right device type
when using vadm_backend with Hetero as well.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed batching logic for 2020.4 and above

* Fixed flake8 PEP8 errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor Fixes Added
*Added security checks for device_type passed
in for Hetero build during run time
*code cleanup

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes Added
*Fixed batch_size bug in vadm_backend
*code cleanup
*Documentation updated for Hetero

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2020-10-30 22:35:08 -07:00
Changming Sun
d9293f38e6 Revert "Custom Op on GPU (#5620)"
This reverts commit 2c63196600.
2020-10-30 21:23:51 -07:00
Changming Sun
7948a4b0bc Revert "add header (#5648)"
This reverts commit d7f3baed18.
2020-10-30 21:23:51 -07:00
KeDengMS
32bf6390ad
Some fixes to symbolic shape inference (#5642)
* Some fixes to symbolic shape inference

1. Topological sort before iteration in graph
2. Fix a case in slice: start=100000, end=-100000, step=-1, dim=2
3. Fix Nuphar Gemm test's random seed
4. Slice opset 1 axes is optional
2020-10-30 19:28:47 -07:00
Hariharan Seshadri
7a80a4b526
Support more C# APIs (#5608) 2020-10-30 19:19:50 -07:00
Zhang Lei
17bce6f07e
Implement Im2colNd NHWC and related qlinearconv logic for u8s8. (#5612)
Implement Im2colNd NHWC and related qlinearconv logic for u8s8, and training.
2020-10-30 15:28:30 -07:00
RandySheriffH
d7f3baed18
add header (#5648)
Co-authored-by: RandySheriffH <rashuai@microsoft.com>
2020-10-30 14:26:10 -07:00
Changming Sun
3e71e8bd7e
Revert "[CUDA EP] remove per-thread allocator (#5415)" (#5647)
This reverts commit b4869926d3 because it broke our multiple GPU test pipeline.
2020-10-30 13:58:33 -07:00
RandySheriffH
2c63196600
Custom Op on GPU (#5620)
* add case for cpu custom op on gpu

* format doc

* restrict GPU custom op on Linux GPU CI only

* separate cu file to a independent project

* fix typo

Co-authored-by: RandySheriffH <rashuai@microsoft.com>
2020-10-30 12:25:44 -07:00
S. Manohar Karlapalem
aa38893afb
[OpenVINO-EP] Add Dockerfile with C# API bindings (#5633)
* Update Dockerfile README with C# info

* Add OpenVINO EP dockerfile with C# APIs
2020-10-30 11:27:15 -07:00
Weixing Zhang
aec4cb489e
ROCm EP for AMD GPU (#5480)
The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/

ROCm EP was created based on the following things:
1. AMD GPU programming language: HIP
2. AMD GPU HIP language runtime: amdhip64
3. BLAS: rocBLAS, hipBLAS
4. DNN: miOpen
5. Collective Communication library: RCCL
6. cub: hipCub
7. …

Current status:
BERT-L and GPT2 training can be ran on AMD GPU with data parallel.

Next:
1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA.
2. Continue improving the implementation.
3. Continue GPU kernel optimization.
4. Support model parallelism on ROCm EP.
……

The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels.  

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: sabreshao <sabre.shao@amd.com>
Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2020-10-29 17:13:04 -07:00
Dmitri Smirnov
742ffb860c
Allow Kernels refer to some attribute data directly in the protobuf (#5624)
* Introduce OpKernelInfo GetAttrAsSpan() for floats and ints attribute proto arrays
  and GetAttrsStringRefs() to return a vector of string references.
  These new APIs allow kernels not copy attribute arrays especially if they are large
  and save on memory.
  but refer directly to data that is in AttributeProto.
  Modify TfIdfVectorizer to take advantage of the new API.

Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>
2020-10-29 16:12:54 -07:00
Vincent Wang
1fa1c51544
bug fix for name of gradient constant (#5626)
Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>
2020-10-30 07:08:19 +08:00
KeDengMS
b4869926d3
[CUDA EP] remove per-thread allocator (#5415)
Now that we are using legacy default stream, which is shared among all inference threads,
there is no need to have per-thread allocator.

In the past, the race could happen when two threads running concurrently on GPU:
thread1: allocA->copyA->computeA->freeA
thread2: allocB->copyB->computeB->freeB

Note that freeA/B only means the buffer is ready to be allocated on CPU, while the corresponding
operation on GPU is not finished yet. It is possible for thread1/2 use the same buffer, when the
alloc/free pair are not interleaved (note that alloc/free is thread-safe)

If the GPU commands run in separate per-thread default stream, there's a chance that copyA/computeA
 are interleaved with copyB/computeB, even when the order in CPU execution is not interleaved. This
would cause incorrect results if computeB uses copyA's results.

By using one legacy default stream, CPU execution order would match the GPU execution order, so
if A and B use the same buffer from alloc, the correpsonding copy/compute won't be interleaved. If
the copy/compute is indeed interleaved, then allocA and allocB would return different buffers, thus
no racing either.
2020-10-29 11:33:05 -07:00
Sergii Dymchenko
2e1fa3ccb7
Fix GeluRecompute for 2 inputs case. (#5573)
* Add test for FastGelu + GeluRecompute.

* Fix GeluRecompute for 2 inputs case.

* Fix test for BiasGelu + GeluRecompute.

* Copy all inputs to Gelu, not just 2.

* Move GeluRecompute test to training-specific file.
2020-10-29 00:07:13 -07:00
Dwayne Robinson
b85e7a19ea
isalnum is not defined - include cctype (#5623) 2020-10-28 23:31:34 -07:00