Commit graph

11997 commits

Author SHA1 Message Date
RandySheriffH
aeca7c2940
Cuda Profiler (#7110)
* implement cuda profiler

* add counters

* downgrade cupti kernel version

* move mutex

* add cupti to path

* fix win gpu build err

* add path for cuda10

* fix linux com err

* extend include path

* add init flag

* fix test case

* fix tensorrt pipeline

* add UT

Co-authored-by: Ubuntu <randysheriff@rashuai-linux-gpu-3.3cfnmjowvu4e5bidlsmcxsmzwg.xx.internal.cloudapp.net>
2021-03-29 12:04:36 -07:00
Ashwini Khade
b22e60bd44
pull onnx latest commit (#7102)
* update onnx commit

* fix test scripts to remove deprecated call

* update filters

* add registration for relu and cumsum ver 14

* add promote trilu to onnx domain

* update onnx-tensorrt submodule

* update flag

* update flag

* update dependencies

* fix android ci failure
2021-03-29 11:00:38 -07:00
Scott McKay
9297527b7a
Enable NHWC transformer when generating ORT format model (#7126)
* Allow specific optimizers to be disabled.
  - replace unused ability to specify just the optimizers to run
    - never used so not needed
Allow the disabled list to be specified via the python bindings
  - expected usage is internal, so using kwargs for that so as not to pollute the documentation with stuff no user is likely to need
Update the ORT format model conversion script to disable NCHWc transformer when level is 'all'
  - currently there aren't any known use cases where we'd want the NCHWc transformations to run as they create a device specific model and aren't used on ARM
    - the ORT format model is not expected to be generated on the target device (e.g. generate on Windows/Linux/macOS to deploy to Android/iOS so there's a good chance we'd generate a useless/invalid model
  - default to 'all' as ARM and MLAS prefer NHWC and the NHWC transformer runs at that level
* Add matching changes to optimizer generation in training code
2021-03-29 18:39:48 +10:00
satyajandhyala
90294b9c43
Fix Transpose and MatMul fusion code to check the input datatypes as … (#7147)
* Fix Transpose and MatMul fusion code to check the input datatypes as FusedMatMul only supports floating point datatypes.

* Added testcases to make sure that the int32/int64 datatypes prevent Transport-MatMul fusion.
2021-03-28 09:24:12 -07:00
Jeff Daily
65ce5f07b3
add Dockerfile.rocm4.1.pytorch (#7152) 2021-03-26 21:40:10 -07:00
Suffian Khan
f27835c4de
Disable batch size test for AMD CI pipeline after agent upgrade to Rocm 4.1 (#7153)
* disable batch size test for rocm 4.1 until resolved

* Update orttraining-pai-ci-pipeline.yml

Forgot to modify both pipelines
2021-03-26 22:32:39 -05:00
Changming Sun
f365f1d967
Resize_impl.cu: Change _Round to roundf (#7140)
This is to keep the change minimal, make it work exactly like what it worked before.
2021-03-26 18:29:21 -07:00
Edward Chen
63d9d5afd3
Fix Pad and Gather incorrect usage of HasType helpers. (#7146) 2021-03-26 17:36:31 -07:00
Sherlock
ab86634c36
Address comments from ORTModule master merge (#7101)
* Address ortmodule merge master comments

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-26 16:26:42 -07:00
Adrian Tsai
a8f0ab9c5f Merged PR 5846998: Fix warnings level for DML EP
Apparently ORT has a new, rather unusual way of setting the warning level. This change resets our warning level back to W3 for the DML EP.
2021-03-26 22:55:33 +00:00
Thiago Crepaldi
a01f15198c
Add support for large models (#7113)
* Add support for large models

* Handle models with registered buffers
2021-03-26 14:08:46 -07:00
Suffian Khan
2b31b80b1f
icnrease timeout (#7145) 2021-03-26 11:26:18 -07:00
Yufeng Li
3771e0bf10
update bert quantization notebook (#7137) 2021-03-25 18:12:53 -07:00
KeDengMS
c9b29fbd06
Disable MatmulTransposeFusion for CPU EP (#7135)
It causes convergence issue in BERT on CPU
2021-03-25 17:16:58 -07:00
Dmitri Smirnov
2bf54bcaa2
Fix bugs in sparsify script (#7134)
Fix type and check.
2021-03-25 14:53:52 -07:00
G. Ramalingam
cc0e7bee76
Add function-body to SoftmaxGrad (#6988)
* Add function body to SoftmaxGrad schema

* Add type context and cleanup

* Add test case with symbolic dimensions

* Add opset specification to function

* handle opset dependence

* Exclude from minimal build
2021-03-25 11:34:06 -07:00
Tianlei Wu
53c123dcee
Add session option configuration to enable GeluApproximation (#7131) 2021-03-25 11:32:36 -07:00
Adrian Tsai
39bd192d33 Merged PR 5837692: Merge latest from upstream 2021-03-25 16:21:56 +00:00
Yufeng Li
8e54b76e2d
QDQ implementation (#7033)
* Add QDQ basic implementation
2021-03-25 09:17:23 -07:00
RandySheriffH
865c67611c
Exclude profiler from minimal build (#7115)
* Exclude TP profiler from minimum build

* fix typo

* remove Clock

* fix comments

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-03-25 09:06:14 -07:00
Vincent Wang
fda0470683
Add New AllocKind for YieldOp Outputs, Run YieldOp with InferenceSession in UT (#7125)
* new allockind, add ut

* change macro

* fix win build

* rename alloc kind

* fix mem leak
2021-03-25 15:18:51 +08:00
Sherlock
1c8d874412
Promote BiasDropout from orttraining to onnxruntime (#7116)
* Promote BiasDropout from orttraining to onnxruntime

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-24 20:42:42 -07:00
Adrian Tsai
293774fbeb Merge remote-tracking branch 'upstream/master' into p/adtsai/merge
# Conflicts:
#	onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc
2021-03-24 19:48:01 -07:00
jingyanwangms
cd67f12add
Move IOBinding and RunOptions to ctx (#7028)
* Liqun/ort module perf1 (#6806)

add mysql script to log perf data
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Resolve HTTP Error 503: Service Unavailable for MNIST dataset (#6989)

* Reduce logging for ORTModule for the end user (#6982)

* Support none types in forward output (#7001)

* Missed test case for none type output (#7014)

* save iobinding to ctx

* save run_options to ctx

* remove debug tests

* PR comments and clean up

* add RunStateInfo

* remove whitespace edits

* PR comments

* remove test changes

* fix test failure

* Fit unit test test_nesting_forward_backward_calls

Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-24 17:51:00 -07:00
Changming Sun
2e3bbad19f
Move TensorRT Windows CI build to the machine pool (#7127) 2021-03-24 14:28:25 -07:00
Guoyu Wang
1c04eec2b1
[NNAPI EP] Fix error for QLinearAdd with an initializer as input (#7093)
* Fix the issue where input to qlinearadd is an initializer

* Add UT

* Adress CR comments
2021-03-24 11:56:53 -07:00
harshithapv
540eac253e
Deepspeed pipeline parallel and fairscale sharded optimizer test samples with ORTModule (#7078)
* adding samples for Deepspeed pipeline parallel and fairscale sharded optimizer with ortmodule

* fixed typo in args

* addressed Thiago's comments

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_deepspeed_pipeline_parallel.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-03-24 09:43:05 -07:00
KeDengMS
6987106bf5
Add missing Python dependencies for ORT training (#7104)
* Add missing Python dependencies for training

cerberus - option parsing
h5py - checkpoint
onnx - model proto
packaging/sympy - symbolic shape inference

* Separate requirements.txt for inference and training Python packages.
2021-03-23 18:43:19 -07:00
Yufeng Li
fffe16cb43
Fix a bug in quant GEMM and add an unit test (#7111) 2021-03-23 16:39:35 -07:00
Changming Sun
b07e168a2b
Delete an unused file: download_test_data.py (#7109) 2021-03-23 14:49:26 -07:00
Suffian Khan
5cb8934459
update Dockerfile for workaround for issue in RCCL for rocm4.0 (#7108) 2021-03-23 13:36:04 -07:00
Suffian Khan
c0994fdfbb
Update ORTTrainer to permit Rocm and permit export of opset 13 (#7059)
* update orttrainer to permit rocm and allow export for opset 13

* wrap rocm check in try-except block
2021-03-23 11:09:48 -07:00
Edward Chen
53392664d3
Enable type reduction for Shrink, Sign, SplitToSequence CPU kernels (#7090)
Enable type reduction for Shrink, Sign, SplitToSequence CPU kernels.
Some other type reduction changes including refactoring to specify element types in a single place.
2021-03-23 09:57:33 -07:00
baijumeswani
c3310efdcd
Support for models having partially non trainable parameters (#7058)
* Support for models having partially non trainable parameters
2021-03-23 09:41:16 -07:00
baijumeswani
a7a2a16edd
Pass arguments to azure_scale_set_vm_mount_test_data from perf test ci pipeline (#7094) 2021-03-22 21:48:32 -07:00
Yufeng Li
c965878a69
fix a bug in global average pool and add unit test (#6913)
* fix bug in QGlobalAveragePool

* add unit test for quant GlobalAveragePool

* not run quantization tests if disable_contrib_ops enabled
2021-03-22 20:01:27 -07:00
Aaron Boxer
230c137460
cmake: support install target with generated pkg-config file (#7076) 2021-03-22 19:36:31 -07:00
liqunfu
309885b08d
upload ort-gpu-training python nightly package to azure feed (#6998) 2021-03-22 18:44:54 -07:00
Tracy Sharpe
416ee3c4d2
MLAS: add 32-bit transpose support (#7092) 2021-03-22 16:20:31 -07:00
Sherlock
5ec0e71542
ORTModule support non-differentiable module output (#7048)
* Handle non-differentiable module output

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-03-22 15:46:11 -07:00
Changming Sun
be45a59d99
Make our CUDA code be compatible with the latest VS2019 update (#7062) 2021-03-22 14:39:45 -07:00
Thiago Crepaldi
df6a68f59c
Fix fallback providers for InferenceSession (#7091) 2021-03-22 13:38:58 -07:00
RandySheriffH
529da3b003
Thread pool profiler (#6748)
* add profiler

* add thread id

* refactoring

* switch to vector

* add override keyword

* fix comments

* renaming

* add revoke time

* restore statics

* restore enable flag

* fix end error

* fix comments

* add comment

* add comments

* make profiler thread-safe

* switch to shared_lock

* switch to shared_timed_mutex

* switch to OrtMutex

* add per child thread counters

* switch to vector

* refactor LogCore

* fix comments

* cancel spin and block counter to reduce overhead

* fix minor format issue

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2021-03-22 10:49:57 -07:00
Thiago Crepaldi
867804bea1
Add auto doc gen for ORTModule API during CI build (#7046)
In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI
2021-03-22 10:20:33 -07:00
Dmitri Smirnov
3b58fc7b97
Add types support for Sparse Initializer in Onnxruntime (#7004)
Add types support for DenseToSparse and SparseToDense conversions
  Address the case of empty sparse values and indicies when the initializer does
  not contain any NNZ.
  Add sparsify script.
2021-03-22 10:06:11 -07:00
Olivia Jain
4a3d1176d7
adding ngraph_DIR to fix build (#6975) 2021-03-22 09:43:02 -07:00
Edward Chen
4cbb8e166a
Update kernel def hashing (#7019)
Update the kernel def hashing in ORT format models. The new hashing logic ignores the ordering of type constraint types.
This is a backward compatibility breaking change, but we don't guarantee backward compatibility yet.
2021-03-22 09:28:27 -07:00
Brian Martin
06df28748f
Change tabs to spaces in Windows.AI.MachineLearning.idl (#7088)
noticed this in a recent PR, this file has some tabs that should be spaces.
2021-03-22 09:23:18 -07:00
raviskolli
79ba045d74
Enabled rocm support for graph transformations (#7057) 2021-03-22 09:02:10 -07:00
Scott McKay
b2c6617b0f
Use 'as_scalar' when checking the 'cond' value of 'If' (#7063)
#6884
2021-03-22 18:04:38 +10:00