Commit graph

5687 commits

Author SHA1 Message Date
baijumeswani
5da4e07daa
Make FusedAdam mathematically equivalent to Transformers AdamW (#9343) 2021-10-18 16:03:18 -07:00
Yulong Wang
5b65f1cb44
fixes SDL Native Rules warning in Node.js binding CI (#9402) 2021-10-18 13:05:46 -07:00
Jingqiao Fu
f60e603022
Add support for DmlExecutionProvider for transformer profiler tool (#9380)
* fixed a profiler.py bug

* Add dml support for profiler

* Remove commented line

* improve syntax
2021-10-18 12:31:29 -07:00
Ye Wang
0824207c0f
Add Dev Guide to transformer optimizer (#9329)
* a

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Add files via upload

* Update Dev_Guide.md

* Create Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md
2021-10-18 12:27:26 -07:00
Changming Sun
6ecb990fae Update win-ci-pipeline.yml 2021-10-18 10:43:19 -07:00
Tracy Sharpe
b130a7b715
fix MSVC micro benchmark build warnings (#9373) 2021-10-15 11:35:02 -07:00
Guoyu Wang
59dfab59dc
Fix integer overflow for large step for Slice OP (#9376) 2021-10-15 09:42:53 -07:00
Yulong Wang
901c7de918
[js/web] remove webgl from default fallback list (#9374) 2021-10-14 21:46:22 -07:00
pengwa
f05c285a58
Exception when duplicated autograd.Function name detected (#9351)
* Exception when duplicated autograd.Function name detected

* reorder a bit for a bittle bit better perf

* fix a bug in previous PR :(

* correct the error message a bit
2021-10-15 12:23:13 +08:00
Sunghoon
74eaaad768
[js/web] Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast and clip (#9249)
* Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast, clip

* merge master and update a operators.md

* resolve comment. revise pool and cast kernel implementation.

* skip fusion when clip min and max is not in initializer
2021-10-14 16:29:37 -07:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Abhishek Jindal
87e726d1a0
Abjindal/merge eager with external custom ops (#8986)
* switching to pytorch nightly build

* adding eager mode

* enable pybind and remove install step

* removing auditwheel repair process

* installing package

* adding auditwheel back

* disabling auditwheel repair for eager mode

* typo correction
2021-10-14 13:19:45 -07:00
Abhishek Jindal
23700a15a0
Abjindal/eager windows build (#9326)
* removing warnings which are causing errors from torch and changing flags for Windows

* adding MKL library resolution and comments

* cleaning up the code

* fixing onnxruntime_python file for windows build

* fix the include order to aovid the python_d.lib issue on win debug build

* changes for warnings, typos and other comments

* merge conflict

* adding fix for mkl library error

* Revert "adding fix for mkl library error"

This reverts commit 73b87c73c2.

* fix for dll path for windows

* typo for dll path

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-10-14 12:54:49 -07:00
Jeff Daily
3e879aab6b
work around ucx in rocm ci Dockerfile (#9360) 2021-10-14 09:49:31 -07:00
Xavier Dupré
11f0081c1e
Remove tensorflow, tf2onnx from the list of dependencies for the documentation (#9221)
* Remove tensorflow, tf2onnx from the list of dependencies for the documentation
* improve documentation
* update API
2021-10-14 18:07:35 +02:00
Xavier Dupré
22e3f8bf54
Refactor TrainingManager.forward (#9354)
* Refactor TrainingManager.forward
2021-10-14 12:54:31 +02:00
sumitsays
851554536c
[DML EP] ConstantsOfShape - Empty Output and EinSum - Optional Parameter (#9361)
* Added null check before filling tensor with a value. Passing optional parameter for EinSum in case of MatMul type

* Addressed comment on the PR

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-10-13 23:37:10 -07:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184)
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional

* add deepspeed zero1 and zero2 - checkoverflow & clip norm

* re-structure code and add the copyright

* update the document

* refine the code after validation
2021-10-14 09:01:23 +08:00
Viswanath Boga
4771256be3
fix to avoid quantizing attention with varied q,k,v sizes (#9357)
* fix to avoid quantizing attention with varied q,k,v sizes

* updated the changes to address the comments
2021-10-13 16:25:34 -07:00
Chandru Ramakrishnan
ba0cca96f0
Hooked up eager logging to ORT default logger. (#9340)
* Hooked up eager logging to ORT default logger.
2021-10-13 18:10:32 -04:00
groenenboomj
905fe36599
Add Conv and ConvTrans to ROCm EP (#9338)
Added support for Conv and ConvTrans operators
in the ROCm execution provider. Doubles not currently
supported.
2021-10-13 14:18:08 -07:00
Arthur Meyre
bccd09c688
Serizalize model only once to reduce backend preparation overhead (#8270)
* The serialization can be very heavy for large models
* Only use the serialized model check on compatible onnx versions
* onnx version >= 1.10.0 supports serialized model check
Signed-off-by: IceTDrinker <49040125+IceTDrinker@users.noreply.github.com>
2021-10-13 13:58:22 -07:00
George Nash
e8ba5145ce
Add Transpose, Reshape, Pow and LeakyRelu ops to DNNL execution provider (#9180)
* Transpose for DNNL EP

Transpose reorders the memory to the right format but has the wrong
dimentions and memory::format. So a new memory descriptor is created
that points to the reordered memory. However, that memory is in a
different location than the output expects.  An extra parameter was
added to the SetMemory to specify when memory must be copied if it is
output from the subgraph.

Signed-off-by: George Nash <george.nash@intel.com>

* Implementation of Reshape op for dnnl ep

Signed-off-by: George Nash <george.nash@intel.com>

* Add Pow op to dnnl execution provider

This Pow is limited; the exponent must be scaler or a one dimensional tensor
e.g. a tensor with only a single element. The exponent must also be a constant
initializer since it is only read when the primitive is created. OneDNN does
not have any way to change the exponent after the primitive is created.

The GraphViewer is now passed into the NodeCapability code since the GraphViewer
is needed to find out if an input is a constant initializer.

The unit tests for "Pow" did not make the exponent a constant initializer. To
help verify the dnnl execution providers Pow function a version of the Pow unit
tests was created for the DNNL execution provier that made the exponent a constant
initializer.

Signed-off-by: George Nash <george.nash@intel.com>

* Add LeakyRelu to DNNL execution provider

LeakyRelu was added to the dnnl elementwise ops.

In the elementwise op the GetAlpha method was modified
to take the default value for Alpha as a parameter instead
of reading it from a member varable. This felt like it would
be less likely to cause programer error.

Signed-off-by: George Nash <george.nash@intel.com>

* Switch dnnl_code_capability DataTypes from strings to enums

Signed-off-by: George Nash <george.nash@intel.com>

* Update DnnlSubgraphPrimitive.GetMemory function input

This updates the GetMemory member function to take DnnlTensor
instead of a string. This was done for two reasons.  Every
time the function was called it was always done using
DnnlTensor.Name() this will reduce the code repition. We never
called it using a saved string.

This also makes the function inputs more closely match the
GetMemoryAndReshape function. Making less differences between
member functions.

Signed-off-by: George Nash <george.nash@intel.com>
2021-10-13 10:20:07 -07:00
Yulong Wang
1527af3e30
[js/web] deduplicate test cases between opsets (#9327)
* [js/web] deduplicate test cases between opsets

* fix eslint error
2021-10-12 22:37:19 -07:00
TomWildenhain-Microsoft
fb31701f7e
Fix bug in determining default slice axes (#9328) 2021-10-12 16:17:11 -07:00
Moshe David
510b747821
w (#9319)
Co-authored-by: modav <modav@microsoft.com>
2021-10-12 16:02:40 -07:00
Tang, Cheng
f0bc35c4ba
fix a hardcode type (#9337) 2021-10-12 13:44:46 -07:00
Hariharan Seshadri
d5c5c4fa50
Handle implicit subgraph inputs required on different devices in Memcpy transformer (#9299) 2021-10-12 11:21:17 -07:00
Tang, Cheng
48737091c0
resolve the provider options before create training session in orttrainer (#9199)
* resolve the provider options before create training session in orttrainer

* Update orttraining/orttraining/python/orttraining_pybind_common.h

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* support clear the training ep instance pool

* fix status error

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-12 09:30:45 -07:00
ashbhandare
52c021d1f3
Fix export of aten op for Max and Avg Pool 2D (#9330) 2021-10-12 09:03:14 -07:00
mindest
f9cf62912a
Add same_shape case for BiasDropout (#9188)
* bias dropout improvement

* add transform case for same shape case

* combine kernel

* merge with vectorized kernel

* use "has_same_shape_bias"

* minor: a "N % 4 != 0" case

* add op UT for has_same_shape_bias

* address comments; add param case for 1d bias;
add param case tests for 1d and same-shape bias

* rewrite logic condition

Co-authored-by: Peng Wang <pengwa@microsoft.com>
2021-10-12 19:57:38 +08:00
Sunghoon
2f1204a5d5
[js/web] Enable wasm profiling and preserve function names in profiling (#9314)
* add p50 in test

* allow WebAssembly profiling and preserve function names

Co-authored-by: Yulong Wang <yulongw@microsoft.com>
2021-10-11 22:04:50 -07:00
Ye Wang
787dcb7dbc
Support extra addition before softmax in attention cuda kernel (#9205)
* checkin qk_add in cuda ep

* enable test

* added todo

* review comments
2021-10-11 15:31:31 -07:00
Jiaxu Dong
03276527b3
Fix typing error (#9316) 2021-10-09 14:39:11 -07:00
Edward Chen
79e736ed25
Make onnxruntime::Status nodiscard (#9279)
Mark onnxruntime::Status class with [[nodiscard]] attribute.
Fix existing warnings.
2021-10-08 17:10:31 -07:00
TomWildenhain-Microsoft
da56f01ac2
Fix bug in ReduceSum with noop_with_empty_axes (#9301) 2021-10-08 13:33:24 -07:00
Dmitri Smirnov
7b61bca6df
Fix inclusive sum overlfow when applied on int8_t buffer in Compress (#9295)
Use thrust::transform_iterator when feeding input to cub::DeviceScan::InclusiveScan() to make sure the accumulator type is wide enough not to overflow.
2021-10-08 11:29:28 -07:00
satyajandhyala
29379db432
Added SigmoidGrad schema and kernels. (#9244)
* Added SigmoidGrad schema and kernels.

* Added test_sigmoid_grad function.
2021-10-08 11:03:28 -07:00
Vincent Wang
cd65a8089e
Optimize Variadic Elementwise Ops (#9186)
* optimize variadic elementwise ops

* remove nvvp file

* correct comment

* resolve comments
2021-10-08 13:45:54 +08:00
Hariharan Seshadri
5f5f28bf14
Fix bug in allocation planner while planning location for initializers (#9306) 2021-10-07 19:05:07 -07:00
Tang, Cheng
68601fc296
error handling ffor eager mode's data transfer (#9261) 2021-10-07 17:16:33 -07:00
Suffian Khan
70cf61fa84
disable bart-l for now (#9305) 2021-10-07 16:55:54 -07:00
Maajid khan
72c4cea9e6
[OpenVINO-EP] V3.2 Release (#9232)
* model caching changes for 2021.4

Signed-off-by: Your Name <you@example.com>

* changed the ov version check

* Minor changes added

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added support for external data format

Starting from OpenVINO 2021.4 version, OpenVINO-EP
will support onnx models with Weights saved in external
file location.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Introduced Hetero/Multi options for perf_test

Enabled to use HETERO/MULTI device feature from
OpenVINO-EP using the onnxruntime_perf_test tool.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* cleaned up CMake code for older OV version support

OV 2020.3 is now longer supported by OpenVINO-EP.
This check is not required now.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add option to disable graph partitioning

Added a option to diable graph partitioning
during build time for OpenVINO-EP.

with this option, when the model is not fully
supported on OpenVINO-EP, the model fully fall
backs to default CPU EP (MLAS).

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the flag for diabling graph partitioning

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes the flake8 check error

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added changes for disable graph partition option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed flake8 indentation error

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: Your Name <you@example.com>
2021-10-07 16:02:19 -07:00
ytaous
7166586d7e
Enable SkipCheck by default (#9215)
* Enable SkipCheck by default

* fix UTs

* fix UT

* fix UTs

* fix UTs

* address comments

* fix UT

* enable skipchecks

* move _SkipCheck back

* move _SkipCheck back

* move _SkipCheck back

* Update orttraining/orttraining/python/training/ortmodule/_inference_manager.py

* Update orttraining/orttraining/python/training/ortmodule/_utils.py

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-10-07 15:47:14 -07:00
Yulong Wang
88d5023885
[js/web] always use new data dir for ort web E2E karma tests (#9303)
* [js/web] always use new data dir for ort web E2E karma tests

* fix
2021-10-07 15:27:12 -07:00
Tang, Cheng
c002dc86a3
set mpi group init flag after add group (#9293) 2021-10-07 10:09:16 -07:00
Changming Sun
4f4875b0e8 Add "workspace: clean: all" to anybuild build yaml file 2021-10-06 22:49:37 -07:00
Gary Miguel
e2b1852eec
Build: respect onnxruntime_PREFER_SYSTEM_LIB for more things (#9181)
This is based on a patch applied locally by
https://github.com/conda-forge/onnxruntime-feedstock. Having this in
master seems useful.
2021-10-06 13:49:28 -07:00
Thiago Crepaldi
52d067402a
Fix all-or-nothing fallback for bad ORTModule init (#9277)
* Fix all-or-nothing fallback for bad ORTModule init

* Address comments
2021-10-06 15:12:27 -04:00
Suffian Khan
510b58c877
Increase AMD CI pipeline timeout to 120 min (#9280)
* increase timeout

* add timeout

* add timeout

* rename
2021-10-06 10:43:09 -07:00