Commit graph

4342 commits

Author SHA1 Message Date
Shucai Xiao
c588d5d13a
Add rocm execution provider to prover_list (#6306)
* code changes to add rocm ep to ep_list
2021-03-12 07:51:08 -08:00
Alberto Magni
031587814b
Add support to save onnx graph with external initializers file. (#6911)
Add functionality to the Graph class to be dumped to protobuf using an external binary file for the float initializers.

This change is meant to avoid hitting the 2GB protobuf limit when dumping large graphs.
This limit was particularly easy to exceed when dumping graphs after auto-diff.
The use of the external file is limited to initializers larger than a user-specified threshold.
This gives the possibility to users to include in the onnx file shape constants used by Reshape and Transpose used by Shape Inference.
2021-03-12 09:15:25 +00:00
Hariharan Seshadri
12b5ab3bab
Update CUDA custom op unit tests to account for recent ORT change (#6971) 2021-03-11 22:22:45 -08:00
Xavier Dupré
694389a85d
Automate generation of python documentation (#6909)
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-03-11 19:02:45 -08:00
baijumeswani
f7df2f805b
Resolve HTTP Error 503: Service Unavailable for MNIST dataset 2021-03-11 13:53:54 -08:00
Edward Chen
aa60a8368f
Update type reduction operator type usage processors set. (#6976) 2021-03-11 09:22:53 -08:00
Ye Wang
b57a85d863
Support symbolic shape infer in transformers tool (#6899)
* fusion support runtime edge shape checking

* trim ctor

* add test

* fix

* Update test_shape_infer_helper.py

* use torch input size as dynamic axis hints

* check dir

* update

* support longformerattention

* update and add support for bert ops

* trim

* review comments

* review comments
2021-03-10 21:37:12 -08:00
Edward Chen
f4796e1953
Enable type reduction for Range, ReverseSequence, ScatterND, Split, and Unique CPU kernels. (#6963) 2021-03-10 16:20:25 -08:00
Chen Fu
4a4488baae
Release buffers for prepacked tensors (#6820)
Unsolved problems:

1. One test failure was caused by a bug in Cudnn rnn kernels, when they can allocate a buffer and partially initialize it, the garbage data near tail of the buffer caused problem in some of the hardware. To attack this problem in a broader sense, should we add code in our allocators, and during a memory fuzzing test, fill an allocated buffer with garbage before returning to the caller?


2. Prepacking is used more widely than we know. For instance, Cudnn rnn kernels also cache their weights. They mix several weight tensors together into a single buffer, and never touch the original weight tensor anymore. This is the same idea with pre-pack, but they didn't override the virtual function, and they never tried to release those weight tensors, leading to memory waste. It also seems to me that there are some other kernels have similar behavior. Wonder how much memory we can save if we try to cleanup those too.

3. Turning off memory pattern planning does increase memory fragmentation, leading to out of memory error in some training test cases. Perhaps we can revisit the idea of pushing kernels-creation stage earlier, and then during initializer deserialization, we only avoid tracing those that will be prepacked.
2021-03-10 14:07:20 -08:00
Guoyu Wang
2f307dd223
Fix possible fd leak in NNAPI (#6966) 2021-03-10 11:20:08 -08:00
Ori Levari
9f84819f32
Update onnxruntime_perf_test.exe to accept free dimension overrides (#6962)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-03-10 10:45:19 -08:00
David Medine
f723ff2285
fixed type to experimental session constructor (#6950)
* fixed type to experimental session constructor

Co-authored-by: David Medine <david.medine@brainproducts.com>
2021-03-10 10:18:27 -08:00
Tianlei Wu
4884eee642
Attention fusion detect num_heads and hidden_size automatically (#6920) 2021-03-10 10:17:00 -08:00
Zhang Lei
acfe7ac4ce
Implement QLinearAveragePool with unit tests. (#6896)
Implement QLinearAveragePool with unit tests.
2021-03-10 10:02:01 -08:00
Tracy Sharpe
a8b897f710
MLAS: quantized GEMM update (#6916)
Various updates to the int8_t GEMMs:

1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before.
2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size.
3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator.
2021-03-10 09:54:43 -08:00
Edward Chen
bc319bd7aa
Fix warning from setting multiple MSVC warning level options. (#6917)
Fix warning from setting multiple MSVC warning level options. Replace an existing /Wn flag instead of always appending a new one.
2021-03-10 09:27:54 -08:00
Edward Chen
d5ed3e7fba
Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960)
* Update EyeLike CPU kernel.

* Update Mod CPU kernel.

* Update Multinomial CPU kernel.

* Slight improvement to Pad CPU kernel binary size.

* Update RandomNormal[Like], RandomUniform[Like] CPU kernels.
2021-03-10 15:32:56 +10:00
Tianlei Wu
89916fdb05
fix stream sync issue (#6954) 2021-03-09 20:57:18 -08:00
Wei-Sheng Chin
bdaea1d9ae
Update baseline due to loss scale fix (#6948) 2021-03-10 09:46:15 +08:00
Raduan Al-Shedivat
743a93faf3
Fix broken link in server usage and remove absolute path from dockerfiles readme (#6926) 2021-03-09 11:54:21 -08:00
George Nash
ba51774a1f
Add GPU support for DNNL endpoint (#6741)
* Added code for Relugrad with GPU support.

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Add GPU support for DNNL ConvGrad

Signed-off-by: George Nash <george.nash@intel.com>

* Add GPU support for DNNL MaxPoolGrad

Updates to MaxPool for training with GPU
Update oneDNN to version 1.8.1

Signed-off-by: George Nash <george.nash@intel.com>

* Fixed issues found durring code review

- error in code comment
- using auto when the direct type would have been better
- removed ternary operators that were returning bool values

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
2021-03-09 09:40:42 -08:00
Hariharan Seshadri
c8e2e3191b
Support parsing an array of values stored as an attribute in a custom op (#6878) 2021-03-08 23:49:58 -08:00
Guoyu Wang
e64eff1f13
Enable build with bitcode for iOS (#6905)
* Enable build with bitcode for iOS

* minor format update

* Minor format update

* Addressed CR comments
2021-03-08 22:56:13 -08:00
Edward Chen
73fe1f2deb
Rename op kernel type control 'supported types' to 'default types'. (#6886)
Cleaning up some naming in the op kernel type control infrastructure.
"Supported types" was a bit semantically overloaded. Renamed it to "default types". They are the types that are supported by default.
2021-03-08 18:33:27 -08:00
Sheil Kumar
67c67408c4
Only set _native folder for Microsoft.AI.MachineLearning package (#6939)
* only set _native folder for Microsoft.AI.MachineLearning package

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-03-08 15:27:11 -08:00
Tracy Sharpe
bc27652188
MLAS: workaround LLVM x86 assembler (#6922)
Implement an alternate workaround for the LLVM x86 problem described in PR #5088. That change made the x86 assembly files build with the GNU assembler by using -fno-integrated-as
2021-03-08 14:18:49 -08:00
Tianlei Wu
b89f52c277
Add tests of Attention and QAttention for pruned model (#6914) 2021-03-08 11:56:31 -08:00
Denny Abraham Cheriyan
f2f60eed59
Fix broken Java API link (#6826) 2021-03-08 11:28:41 -08:00
Edward Chen
15d81fb63a
Enable type reduction for Clip, MaxPool, and Pad CPU kernels. (#6918) 2021-03-08 08:25:43 -08:00
Edward Chen
b6c4a7ac54
Support required types when excluding typed registrations (#6871) 2021-03-08 08:22:07 -08:00
Wei-Sheng Chin
de6e66f3d4
Fix loss scaling when running ORTTrainer with BERT under mixed-precision mode (#6932)
* Fix missed Loss scale

* not to dump
2021-03-08 21:12:33 +08:00
George Wu
601e04fb27
update Readme (#6903) 2021-03-05 16:29:04 -08:00
Funtowicz Morgan
9126faa35b
Ability to fuse non-square (pruned) attention weights for BERT-like models (#6850) 2021-03-04 17:08:08 -08:00
RandySheriffH
f986ffcb5f
move pipeline file and change relative path (#6882)
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-03-04 15:31:42 -08:00
Reuben Zotz-Wilson
107c9672fd
No such file or directory with --use_external_data_form and int8 (#6867)
Implemented following change to avoid the error when using both --use_external_data_form and --precision int8 with GPT2LMHeadModel, which results in
line 161, in save_external_data; open(external_data_file_path, 'ab').close()
FileNotFoundError: [Errno 2] No such file or directory:
This may also be related to the identified bug #6047.
2021-03-04 15:14:23 -08:00
RandySheriffH
679718b12f
Configure session thread pool spinning preference (#6895)
* add config allow_spinning

* add config allow_spinning

* set true as default

* split configures for inter and intra ops

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-03-04 14:54:58 -08:00
Tianlei Wu
8f1786d5d2
Save output tensors in bert_test_data tool (#6872) 2021-03-04 13:09:05 -08:00
Tiago Koji Castro Shibata
fa8d1b44b8
Fix app packaging in UWP (#6804)
* Change msbuild condition for UAP

* update .netcore target as well

* create nuget packages with _native path

* validate path under _native directory for windowsai package

* pep8

* add diagnostic error message

* pep8

* use baseame

* lib\uap10.0

* uap10

* build\\uap10.0

* Manually binplace winmds into appx when PackageReference is used.

* always binplace winmd regardless of packagereference since c# should work with packages.config also

* resolve all paths to full paths to avoid some reference warnings

* move winmds out of lib folder to prevent automatic component registration

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-03-04 11:16:25 -08:00
Suffian Khan
7915b6709a
Revert Gather Grad optimization in PR 6381 targeted for Rocm (#6880)
* revert gather_grad_impl.cu

* put stream changes back in

* restrict changes to commenting launch of optimized version
2021-03-04 10:21:49 -08:00
Scott McKay
54cdb6af71
Add check that the first 2 Loop subgraph inputs have an shape (could be explicit or inferred) as we need to know the rank the subgraph expects. Other inputs to the subgraph are more opaque so we can just pass them through. (#6891) 2021-03-04 20:42:40 +10:00
RandySheriffH
d01006fc22
Move constants from heap to stack to avoid randomness on cudnn function (#6869)
* move const from heap to stack

* add namespace

* add base prefix

* define local type
2021-03-03 20:18:21 -08:00
baijumeswani
ed1883a97c
Workaround for HTTP Error 403: Forbidden for MNIST dataset (#6885) 2021-03-03 18:59:48 -08:00
Guoyu Wang
fedb68429c
[NNAPI EP] Add per-tensor u8s8 support for Qlinear[Conv/MatMul] (#6818)
* NNAPI Add per-tensor u8s8 support

* Update some comments

* Address CR comments

* Address CR comments
2021-03-03 15:44:49 -08:00
Guoyu Wang
3c5d811e77
[CoreML EP] Add [Average/Max]Pool support (#6870) 2021-03-03 14:32:39 -08:00
Hariharan Seshadri
9a9e741a8c
Support optional inputs/outputs in custom op development (#6727) 2021-03-03 05:59:23 -08:00
jingyanwangms
f22f04a109
Add comment (#6860)
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-02 18:54:25 -08:00
Faith Xu
6285ee2398
Reroute quantization tool readme to /docs page (#6854) 2021-03-02 13:49:42 -08:00
Ye Wang
9073f7a5c3
support opset13 in embednorm (#6866) 2021-03-02 12:33:40 -08:00
Ryan Hill
0d0eb2c85c
Change OpKernel class to be shared with shared providers (#6837)
In the previous shared providers there aren't many OpKernel classes, and the existing Provider_OpKernel wrapper was fine. With the opposibility of making Cuda a shared provider, having this need to be changed per OpKernel adds a lot of complexity.

It was fairly straightforward to make OpKernel work with shared providers with minimal changes.

In this change, the ONNX_OPERATOR_* macros can also be shared with the shared providers.
2021-03-02 00:53:48 -08:00
Hariharan Seshadri
38796ad451
Refine force CPU fallback logic in the CUDA EP (#6849) 2021-03-01 19:59:07 -08:00