Commit graph

4579 commits

Author SHA1 Message Date
Jeff Bloomfield
057de97d92 Merged PR 5866812: Decompose unsupported QLinearSigmoid operation in DML EP
Related work items: #32220862
2021-04-01 00:24:38 +00:00
Jeff Bloomfield
56d2c4baa2 Merged PR 5861108: Allow nodes in DML graph partitions with empty shapes on constant CPU inputs
Resize is spec'd to ignore the "roi" tensor in certain modes.  For some reason, converters are specifying an arbitrary value for this tensor, even though it's optional.

This makes the graph partitioner skip a check for empty shape dimensions for tensors such as this, which the DML kernel registers as consuming as CPU inputs.  Otherwise, the node is not included in DML graph partitions, because the DML graph doesn't handle empty dimensions.

Related work items: #32221164
2021-03-31 19:06:08 +00:00
Adrian Tsai
a8f0ab9c5f Merged PR 5846998: Fix warnings level for DML EP
Apparently ORT has a new, rather unusual way of setting the warning level. This change resets our warning level back to W3 for the DML EP.
2021-03-26 22:55:33 +00:00
Adrian Tsai
39bd192d33 Merged PR 5837692: Merge latest from upstream 2021-03-25 16:21:56 +00:00
Adrian Tsai
293774fbeb Merge remote-tracking branch 'upstream/master' into p/adtsai/merge
# Conflicts:
#	onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc
2021-03-24 19:48:01 -07:00
jingyanwangms
cd67f12add
Move IOBinding and RunOptions to ctx (#7028)
* Liqun/ort module perf1 (#6806)

add mysql script to log perf data
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Resolve HTTP Error 503: Service Unavailable for MNIST dataset (#6989)

* Reduce logging for ORTModule for the end user (#6982)

* Support none types in forward output (#7001)

* Missed test case for none type output (#7014)

* save iobinding to ctx

* save run_options to ctx

* remove debug tests

* PR comments and clean up

* add RunStateInfo

* remove whitespace edits

* PR comments

* remove test changes

* fix test failure

* Fit unit test test_nesting_forward_backward_calls

Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-24 17:51:00 -07:00
Changming Sun
2e3bbad19f
Move TensorRT Windows CI build to the machine pool (#7127) 2021-03-24 14:28:25 -07:00
Guoyu Wang
1c04eec2b1
[NNAPI EP] Fix error for QLinearAdd with an initializer as input (#7093)
* Fix the issue where input to qlinearadd is an initializer

* Add UT

* Adress CR comments
2021-03-24 11:56:53 -07:00
harshithapv
540eac253e
Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078)
* adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule

* fixed typo in args

* addressed Thiago's comments

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-03-24 09:43:05 -07:00
KeDengMS
6987106bf5
Add missing Python dependencies for ORT training (#7104)
* Add missing Python dependencies for training

cerberus - option parsing
h5py - checkpoint
onnx - model proto
packaging/sympy - symbolic shape inference

* Separate requirements.txt for inference and training Python packages.
2021-03-23 18:43:19 -07:00
Yufeng Li
fffe16cb43
Fix a bug in quant GEMM and add an unit test (#7111) 2021-03-23 16:39:35 -07:00
Changming Sun
b07e168a2b
Delete an unused file: download_test_data.py (#7109) 2021-03-23 14:49:26 -07:00
Suffian Khan
5cb8934459
update Dockerfile for workaround for issue in RCCL for rocm4.0 (#7108) 2021-03-23 13:36:04 -07:00
Suffian Khan
c0994fdfbb
Update ORTTrainer to permit Rocm and permit export of opset 13 (#7059)
* update orttrainer to permit rocm and allow export for opset 13

* wrap rocm check in try-except block
2021-03-23 11:09:48 -07:00
Edward Chen
53392664d3
Enable type reduction for Shrink, Sign, SplitToSequence CPU kernels (#7090)
Enable type reduction for Shrink, Sign, SplitToSequence CPU kernels.
Some other type reduction changes including refactoring to specify element types in a single place.
2021-03-23 09:57:33 -07:00
baijumeswani
c3310efdcd
Support for models having partially non trainable parameters (#7058)
* Support for models having partially non trainable parameters
2021-03-23 09:41:16 -07:00
baijumeswani
a7a2a16edd
Pass arguments to azure_scale_set_vm_mount_test_data from perf test ci pipeline (#7094) 2021-03-22 21:48:32 -07:00
Yufeng Li
c965878a69
fix a bug in global average pool and add unit test (#6913)
* fix bug in QGlobalAveragePool

* add unit test for quant GlobalAveragePool

* not run quantization tests if disable_contrib_ops enabled
2021-03-22 20:01:27 -07:00
Aaron Boxer
230c137460
cmake: support install target with generated pkg-config file (#7076) 2021-03-22 19:36:31 -07:00
liqunfu
309885b08d
upload ort-gpu-training python nightly package to azure feed (#6998) 2021-03-22 18:44:54 -07:00
Tracy Sharpe
416ee3c4d2
MLAS: add 32-bit transpose support (#7092) 2021-03-22 16:20:31 -07:00
Sherlock
5ec0e71542
ORTModule support non-differentiable module output (#7048)
* Handle non-differentiable module output

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-22 15:46:11 -07:00
Changming Sun
be45a59d99
Make our CUDA code be compatible with the latest VS2019 update (#7062) 2021-03-22 14:39:45 -07:00
Thiago Crepaldi
df6a68f59c
Fix fallback providers for InferenceSession (#7091) 2021-03-22 13:38:58 -07:00
RandySheriffH
529da3b003
Thread pool profiler (#6748)
* add profiler

* add thread id

* refactoring

* switch to vector

* add override keyword

* fix comments

* renaming

* add revoke time

* restore statics

* restore enable flag

* fix end error

* fix comments

* add comment

* add comments

* make profiler thread-safe

* switch to shared_lock

* switch to shared_timed_mutex

* switch to OrtMutex

* add per child thread counters

* switch to vector

* refactor LogCore

* fix comments

* cancel spin and block counter to reduce overhead

* fix minor format issue

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-03-22 10:49:57 -07:00
Thiago Crepaldi
867804bea1
Add auto doc gen for ORTModule API during CI build (#7046)
In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI
2021-03-22 10:20:33 -07:00
Dmitri Smirnov
3b58fc7b97
Add types support for Sparse Initializer in Onnxruntime (#7004)
Add types support for DenseToSparse and SparseToDense conversions
  Address the case of empty sparse values and indicies when the initializer does
  not contain any NNZ.
  Add sparsify script.
2021-03-22 10:06:11 -07:00
Olivia Jain
4a3d1176d7
adding ngraph_DIR to fix build (#6975) 2021-03-22 09:43:02 -07:00
Edward Chen
4cbb8e166a
Update kernel def hashing (#7019)
Update the kernel def hashing in ORT format models. The new hashing logic ignores the ordering of type constraint types.
This is a backward compatibility breaking change, but we don't guarantee backward compatibility yet.
2021-03-22 09:28:27 -07:00
Brian Martin
06df28748f
Change tabs to spaces in Windows.AI.MachineLearning.idl (#7088)
noticed this in a recent PR, this file has some tabs that should be spaces.
2021-03-22 09:23:18 -07:00
raviskolli
79ba045d74
Enabled rocm support for graph transformations (#7057) 2021-03-22 09:02:10 -07:00
Scott McKay
b2c6617b0f
Use 'as_scalar' when checking the 'cond' value of 'If' (#7063)
#6884
2021-03-22 18:04:38 +10:00
Vincent Wang
cec919bae9
handle 8 bit uint dlpack tensor (#7069) 2021-03-20 08:00:49 +08:00
Edward Chen
8d5bfdeb47
Increase timeout for Android CI pipeline by 30 minutes. (#7065) 2021-03-19 08:03:22 -07:00
Chi Lo
8c3b59a026
Quantization calibration refactor (#6893)
* Code refactor

* Modify code to tackle OOM when calibrating on larget dataset

* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax

* Add COCO val 2017 annotation

* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax

* Fix bug of "No module named:onnxruntime.quantization.CalTableFlatBuffers"

* Check and install flatbuffers module

* Add script to donwload coco dataset image and refactor example

* Fix bug of "No module
named:onnxruntime.quantization.CalTableFlatBuffers"

* Add CalTableFaltBuffers as module

* Remove annotation, user can download by themselves.

* Uncommet code

* Add back instances_val2017.json

* Make sure flatbuffers installed when ORT is installed

* Refactor code to call coco api

* Enable FP16 for example
2021-03-19 01:09:11 -07:00
Changming Sun
701e73b5b8
Move Linux minimal build CI pipeline to the new Linux machine pool (#7050) 2021-03-18 12:09:12 -07:00
satyajandhyala
8bc275e93f
Enhance Transpose, Cast and MatMul fusion when Cast and/or Fusion feeds multiple nodes. (#7021)
* Added new Transpose+Cast+MatMul => Cast+FusedMatMul test scenarios.

* The Cast node may feed more than one node.

* Transpose node may feed multiple nodes and still may be fused with MatMul nodes.
2021-03-18 11:41:58 -07:00
Suffian Khan
1a1dd4843d
Enable opset 13 for Rocm (#7047)
* enable opset13

* import cuda changes for opset 13 softmax to rocm as well
2021-03-18 10:09:45 -07:00
Guoyu Wang
7c7d6debe6
[CoreML EP] Add Resize Support (#7015)
* code placeholders

* Add previously missing comments

* [CoreML EP] Add Resize Support
2021-03-17 23:27:41 -07:00
Adrian Tsai
3d37a3c1d3 Merged PR 5807585: Remove support for strided 64-bit emulation in DML's Cast kernel
A model from one of our partners regressed with a failure to evaluate due to the addition of strided 64-bit emulation in the DML EP for the Cast operator. Specifically, the model uses a Cast from int32 to int64 to produce the input shape to a Reshape node. When supplied with a shape dimension of -1 (int32 0xffffffff), the strided emulation in Cast ends up producing an int64 result of 0x00000000ffffffff. This is then fed into the Reshape operator, where it produces an incorrect tensor shape and a failure during evaluation.

Generally speaking we assume that using strided 64-bit emulation is safe if a node's inputs came from the DML EP itself. This isn't true in the general case for Cast, however - casting negative signed values can and will produce incorrect outputs with strided emulation.

After this change, Cast nodes with 64-bit types will fall back to CPU unless running on a GPU that native supports 64-bit datatypes.

Related work items: #31768166
2021-03-18 00:42:32 +00:00
Jeff Bloomfield
897a0b9839 Merged PR 5807395: Add DML kernels for QLinearAdd (com.microsoft namespace) and DynamicQuantizeLinear
Related work items: #32165595
2021-03-18 00:15:46 +00:00
Xavier Dupré
514444d820
Fix pipeline generating python documentation (#7027)
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-03-17 16:57:51 -07:00
Thiago Crepaldi
c60ef62190
Update ORTModule feature with remaining PRs from feature branch (#7040)
* Liqun/ort module perf1 (#6806)

add mysql script to log perf data
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Resolve HTTP Error 503: Service Unavailable for MNIST dataset (#6989)

* Reduce logging for ORTModule for the end user (#6982)

* Support none types in forward output (#7001)

* Missed test case for none type output (#7014)

* Fix code style according to autopep8

Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
2021-03-17 16:32:32 -07:00
Cecilia Liu
4fd9fef9ee
Support HuggingFace Models Converted From tf2onnx in Python Script (#6985)
Support tf2onnx huggingface models in python script
2021-03-17 15:33:57 -07:00
Tiago Koji Castro Shibata
934bb52cfb Merged PR 5805461: Add ARM64X forwarder libs
Add ARM64X implementation libs, to be forwarded to by the ARM64X lib.

From Ben Niu:

For system dlls that are built outside of windows repo and ingested through vpacks or binary check-ins, we always start by trying to port them to ARM64X. However, due to immature support for ARM64X build from Visual Studio 2019, it could be quite uphill to port dlls to ARM64X.

When that happens, we have an alternative without porting the dll to ARM64X. The alternative solution is, we build an ARM64X pure forwarder from windows repo, for example, onnxruntime.dll. That forwarder does nothing but forwards all the ARM64 API calls to a native ARM64 onnxruntime_arm64.dll, and all the x64 APIs to native x64 onnxruntime_amd64.dll. Please see here for an example: 29ae6ca516

At load time, applications still loads the ARM64X forwarder onnxruntime.dll. In an ARM64 process, that forwarder dll will further load the native ARM64 onnxruntime_arm64.dll; otherwise, the x64 onnxruntime_amd64.dll will be loaded, both the ARM64 and x64 dlls are happy.

The onnxruntime_arm64.dll and onnxruntime_amd64.dll are essentially aliases of their native counterparts, but we cannot directly rename existing native dlls in windows build. The reason is about PDB binplacing. If you simply rename a dll, the PDB name embedded in the dll is still unchanged. So you can imagine that if we just rename the native dlls in ARM64 windows build, there will be two renamed native dlls, onnxruntime_arm64.dll and onnxruntime_amd64.dll, sharing the same PDB name onnxruntime.pdb. When binplacing happens (basically moving dll and pdb from os\obj to os\bin), one PDB will overwrite the other. As a result, we either lose the PDB for the ARM64 dll, or the x64 dll.

That’s why we are asking to change the build pipeline to execute the link commands two extra times to produce onnxruntime_arm64/amd64.dll with different pdb names. You don’t need to do the compilation twice, but just the link. See here for an example: https://microsoft.visualstudio.com/DefaultCollection/Xbox/_git/Xbox.ShaderCompiler.WinTools/pullrequest/5291717

Related work items: #31925159
2021-03-17 18:43:21 +00:00
Tiago Koji Castro Shibata
bc3aea4be0 Capitalize DLL name 2021-03-17 11:01:14 -07:00
Tiago Koji Castro Shibata
4f5d6a0e4d Add ARM64X forwarder libs 2021-03-17 10:35:37 -07:00
Thiago Crepaldi
335edaa2c4
Merge pull request #6973 from microsoft/thiagofc/merge-ortmodule-into-master
Introduce ORTModule training API to ONNX Runtime
2021-03-17 10:30:06 -07:00
Chen Fu
03885af5a0
Adding prepacking to QLinearMatMul (#6980)
Reuse the same prepacking logic in mat mul integer, to enable prepacking weight for QLinearMatMul. Currently only prepacking 2D matrix weights
2021-03-17 09:28:24 -07:00
Tracy Sharpe
90642e7eac
MLAS: more code cleanup (#7036)
Change int32_t->ptrdiff_t when interacting with the threadpool.
Migrate more code from MlasMaskMoveAvx->MlasMaskMoveTableAvx.
Update more code to use FUNCTION_ENTRY macro.
2021-03-17 09:22:55 -07:00