Commit graph

4585 commits

Author SHA1 Message Date
ashbhandare
2aa89989c4
Not-where fusion (#7182)
* Not-where fusion

* Change to rewrite rule

* Add to inference transforms

* Support numtiple where consumers

* review comments
2021-04-06 16:12:26 -07:00
Yufeng Li
790fc11e60
QDQ: type conversion and more ops support (#7243)
* QDQ: add int8_t to uint8_t conversion and Relu/AveragePool support
2021-04-06 15:30:31 -07:00
raviskolli
5d759e182b
Allocate external Rocm allocator via PyBind (#7148)
* Enabled rocm support for graph transformations

* Support for external Hip allocator

* Added const_cast to reinterpret_cast to fix compiler issue

* Another crack at fixing the compile error

* More compilation fixes

* Added compilation flags to load_inline extension

* Added ROCM, ROCM_PINNED constants

* Changes to address PR comments

* Changed gpu identifier from ROCM to CUDA

* Added HIP compilation flag for torch inline functions

* Fixed a typo in header allocator string formatting

* Fix for runtime error with external_cuda_allocator

* Removed cuda/rocm specific code paths for allocators

* More name changes to generic gpu from rocm/cuda

* Removed duplicate allocator creation

* Rename cuda_external_ config options as gpu_external_

* Rename hip_mem_limit to gpu_mem_limit

* Rename cuda_mem_limit to gpu_mem_limit
2021-04-06 15:23:51 -07:00
Derek Murray
6308e709cc
Update opset for other training graphs to 12. (#7259)
Co-authored-by: Derek Murray <demurra@microsoft.com>
2021-04-06 13:02:59 -07:00
G. Ramalingam
a9ff4c29e5
Add function body to GeluGrad schema (#7190)
* Add GeluGrad function definition

* complete gelugrad function definition

* add opset to function definition
2021-04-06 12:40:59 -07:00
Zhang Lei
dbcfc4bee6
Add mlas_bench tools. Starting with sconv bench and sgemm bench. (#7139)
* Add mlas_bench tools. Starting with sconv bench and sgemm bench.

* Some update with build related.
2021-04-06 10:30:18 -07:00
ashari4
56b22c1c6b
Fix assert that the tensor's device type is 'cpu' #7248 2021-04-06 09:08:32 -07:00
ashbhandare
e9ffcfa247
Add cuda kernels for GreaterOrEqual, LessOrEqual, Where; modify Clip to avoid memcpy (#7187)
* Where and Clip cuda kernel support

* GreaterOrEqual and LessOrEqual cuda kernels

* Clip input GPU mem

* review comments

* Add CPU kernel as well

* review comment

* Add kernel def hash for new op kernels

* Fix CI
2021-04-06 09:04:38 -07:00
Derek Murray
c85657cfd7
Update test_training_model.onnx to opset 12. (#7251)
Co-authored-by: Derek Murray <demurra@microsoft.com>
2021-04-06 07:49:58 -07:00
Tracy Sharpe
a9dbb511fb
MLAS: fix qgemm bus error with Android + ARM32 (#7250) 2021-04-05 22:46:04 -07:00
Olivia Jain
fb40602ea2
Mem trt (#6868)
* adding trt comparison and memory consumption

* creating separate docker file
2021-04-05 22:16:12 -07:00
Changming Sun
2fcd69d644
Cleanup build.py (#7245) 2021-04-05 18:49:29 -07:00
Changming Sun
5bd192c439
Update ContribOperators.md (#7246) 2021-04-05 17:11:33 -07:00
Pranav Prakash
3b16afc0db
Make dW optional for convgrad (#7083) 2021-04-05 17:05:20 -07:00
Guoyu Wang
c5973fbbac
Update the build script for Android AAR package (#7229)
* Update the build script for Android AAR package

* Address CR comments
2021-04-05 16:37:22 -07:00
Suffian Khan
9f14af9809
Add BERT-L perf regression test on MI100 and re-enable batch size test (#7240)
* restore bs test and add perf test

* update perf number and fix path to results
2021-04-05 15:51:52 -07:00
Ryan Lai
10102c09b6
Add better model test error messaging (#7239) 2021-04-05 14:59:19 -07:00
Ashwini Khade
e7c5dcd572
Fix Zip-Nuget-Java Packaging Pipeline (#7208)
* Ignore test failures due to opset support

* skip identity sequence test

* plus fixes
2021-04-05 10:58:13 -07:00
Chun-Wei Chen
3ee9b0ec4d
Add detailed assertion error message (#7232) 2021-04-05 10:05:40 -07:00
Marek Šuppa
008065aab1
Update README.md (#7043)
* Fix the precision type (switch from nonexistent `int32` to `fp32`).
2021-04-05 10:03:14 -07:00
ashbhandare
2b8513539e
Div mul fusion (#7183)
* Div mul fusion

* Change to rewrite rule

* Add to inference transformers
2021-04-05 09:35:30 -07:00
Weixing Zhang
74ee24cf7f
rename cuda_mem_limit and hip_mem_limit to gpu_mem_limit for both CUDA EP and ROCm EP (#7226)
With this change, differentiating CUDA EP and ROCm EP is not needed in training script when mem_limit option needs to be set.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-05 09:04:04 -07:00
baijumeswani
68b12a6179
Support for saving and loading pytorch compatible state dictionaries (#7220)
* Override methods on torch.nn.Module to get direct access to the methods on the original module.
2021-04-05 03:40:41 -07:00
Yufeng Li
8d737f9770
handle optional input in quant topo sort (#7223) 2021-04-02 20:42:48 -07:00
Weixing Zhang
59b57d8322
HSA_NO_SCRATCH_RECLAIM and RCCL_ALLTOALL_KERNEL_DISABLE are not needed for ROCm 4.1 (#7224)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-02 18:19:11 -07:00
Ahmad Zakaria
ba5f056b09
move trt_profile to TensorrtFuncState and reuse it (#7195)
use unordered_set instead of unordered_map to keep track of dynamic shape tensors with shape updates

fix: insert input_name in the set of input_names

move trt_profile to TensorrtFuncState and reuse it
2021-04-02 17:09:03 -07:00
Weixing Zhang
ef88dc912c
enable more unit tests for ROCM EP (#7222) 2021-04-02 15:57:08 -07:00
Guoyu Wang
afbbeaa30a
[NNAPI/CoreML EP] Add Onnx opset 14 support (#7211)
* Add opset 14 support for nnapi/coreml ep

* Address CR comments
2021-04-02 13:18:47 -07:00
Sherlock
a98c2ebb8c
Enable saving optimized models in OrtModule (#7214)
* Enable saving optimized models in OrtModule

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-02 12:37:05 -07:00
RandySheriffH
ebde320950
Add cupti path for python gpu packaging pipeline (#7200)
* add cupti dll path for py3.8

* correct path

* add prints

* replace path join

* add all path

* restore pipeline

* format

* expand path only for python 38&39

* add all cupti path

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-04-02 12:12:46 -07:00
Weixing Zhang
2d352056cf
Support SkipLayerNorm for ROCm EP (#7210)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-02 09:03:30 -07:00
Weixing Zhang
a3f17c8b0d
update lamb and GatherGrad kernel for ROCm EP (#7184)
With ROCm4.1, the CUDA implementation of Lamb and GatherGrad can be
utilized for ROCm EP.
2021-04-02 09:02:49 -07:00
Weixing Zhang
17f91ff410
remove un-needed header file. (#7193)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-04-01 21:05:58 -07:00
Ryan Hill
5a6d477625
Make IDataTransfer be directly shared with shared providers (#7215) 2021-04-01 20:39:16 -07:00
Edward Chen
0ebeaf529d
Check kernel def hashes (#7120)
Add unit test for verifying kernel def hashes.
Add way to add new types to kernel definition without changing hash.
2021-04-01 17:42:58 -07:00
ashbhandare
15c67ddbf0
Make output 1 of ConcatTraining Optional and place on CPU (#7199)
* Optional input 1 on CPU ConcatTraining

* Rename output_1
2021-04-01 16:05:17 -07:00
Jesse Benson
4543459984 MIOpen supports MIOPEN_REDUCE_TENSOR_AVG now. 2021-04-01 16:00:34 -07:00
Yufeng Li
34a8b22186
disable prepacking in training (#7201)
* disable prepacking in training
2021-04-01 14:03:47 -07:00
sfatimar
52bcef4d4f
Openvino ep 2021.3 (#7180)
* Integrate openvino-ep-2021.3

* operators type

* changed the myriad as it is case sensitive

* logging information for openvino-ep-2021.3

* Unit test fix

* Resize operator added for myriad

* Fixed python tests for CPU and GPU

* data commit for loop tile and gatherelements failure

* adding checks for Where

* fixing gatherelements and loop tests

* disabling instance normalization test for now as there seems to be a
myriad bug, putting loop in ops supported only because all the tests
fail

* gather elements op test taking care of warning message

* condition needs to be an intializers

* Disabled python test for Myriad

* Disable compilation warning for MSVC windows compiler

* softmax_test, threedimaxis0 and 1 test give accuracy mismatch
tensoroptest disables test gives accuracy mismatch
gather test gives accuracy mismatch

* Updated with ov version 2021.3

* Updated with ov version 2021.3

* Updated README

* Disabling python tests for cpu

* Disabling python tests with accuracy mismatch on cpu

* Added fix for Linux CI Pipeline failure

-> Disabled tests that were throwing segfault

Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
2021-04-01 11:28:54 -07:00
baijumeswani
249a2c14ef
Pin version of pytorch to 1.8.1 for ORTModule CI pipeline (#7167)
* Pin version of pytorch to 1.8.1 for ORTModule CI pipeline
* Use pytorch-lightning stable version 1.2.5
* Revert to cuda 10.1
2021-04-01 09:37:47 -07:00
George Wu
fc6ac5bfac
dnnl fixes (#7202) 2021-04-01 07:34:18 -07:00
Scott McKay
329fd03bb4
Add int32_t as required type to some operators (#7192)
* Updates to some operators to always support int32 and int64 based on testing of Android package build config with a minimal build.

If an operator can be used for shape manipulation (int64) it is frequently used for indices manipulation (int32), so we enable both types for that set of ops.
  - e.g. BERT models take indices as input
  - Scatter/Gather ops utilize indices

Misc. fix to python bindings to exclude call that fails in a minimal build.
2021-04-01 19:32:34 +10:00
Edward Chen
04679e31ab
Specify CUDA compute capability 7.5 in Linux GPU build (#7203)
Recently a build agent pool was changed to use T4 GPUs (CUDA compute capability 7.5). Updating some CUDA build options to accommodate that.
2021-03-31 18:51:44 -07:00
Hariharan Seshadri
0e0dd50e39
Support int32 type for TopK CPU op (#7089) 2021-03-31 18:08:21 -07:00
Xavier Dupré
b370ddbf5e
Removes unnecessary transpose in operator Einsum (#7141)
* remove one unnecessary transpose
* add more unit test
2021-03-31 09:59:08 +02:00
Guoyu Wang
d500c5952b
Add Android AAR packaging script for ORT-Mobile (#7138)
* Add Android aar packaging script for ORT-Mobile

* Address CR comments
2021-03-30 18:42:18 -07:00
Yulong Wang
0fdef1bf47
[Node.js binding] upgrade y18n to v4.0.1 (#7185) 2021-03-30 16:09:04 -07:00
Negin Raoof
45cb0cae8c
Adding TorchEmbedding contrib op (#7136)
* Adding TorchEmbedding contrib op

* Update contrib_defs.cc

* Shape fix

* Update shape_inference_test_helper.h

* Fix typo

* Fix test

* Fix for test code

* Merge

* Fix CI

* Fix for CI

* Fix CI no-contrib
2021-03-30 14:33:25 -07:00
liqunfu
e545604499
. (#7165) 2021-03-30 13:58:30 -07:00
RandySheriffH
d880578537
Exclude cpuid.h from Mac non x86 arch (#7166)
* add ifdef to exclude inclusion from non x86 arch

* exclude calling of __cpuid_count

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-03-30 11:50:42 -07:00