Commit graph

5716 commits

Author SHA1 Message Date
Sherlock
3ed8ade675
Use SafeInt for malloc related computation (#9503)
* Use SafeInt for malloc related computation
2021-10-22 16:42:12 -07:00
Wei-Sheng Chin
beddbdec5a
Fix PythonOp exporter (#9318)
Register PythonOp exporter with the right symbol.
2021-10-22 10:45:45 -07:00
stevenlix
5adf175847
pad shape 0 is not allowed in edge mode to comply with latest numpy (#9488) 2021-10-22 10:42:51 -07:00
Wei-Sheng Chin
d2d480a0db
Allow None As Autograd Context (#9315)
* Allow none ctx

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py

Co-authored-by: pengwa <pengwa@microsoft.com>

* Address a comment

Co-authored-by: pengwa <pengwa@microsoft.com>
2021-10-21 20:37:36 -07:00
Guoyu Wang
b64b2d48f3
Move iOS e2e test to XCUITest (#9422)
* Move iOS test to user UITest

* minor update

* Update readme

* update test's ios deployment target

* address cr comments
2021-10-21 18:51:13 -07:00
Ginés Hidalgo
7f2f56633c
Fixed implicit conversion warnings (#9481) 2021-10-21 16:13:28 -07:00
Stella Stamenova
49b66c7486
NFC: Normalize whitespace around if statements in CMakeLists.txt (#9464)
Always add a space after if to make the file consistent
2021-10-21 15:35:58 -07:00
Jeff Daily
ca7116ca3e
CUDA EP's ResizeImpl now uses functors, hipify for ROCm EP (#9466)
Support for device function pointers is not yet available for ROCm.
Instead, the device function pointers were converted to device functors.
Case statements, lambdas, and macros are used for dispatch; as a result,
all combinations of kernels are compiled with inlined functors. The
basis of this approach can be found in PyTorch.

Lastly, hipify and register Resize and Upsample for ROCm EP.
2021-10-21 15:02:41 -07:00
Jeff Daily
66ceb6926d
rehipify ROCm EP files under orttraining (#9443)
* rehipify rocm ep files under orttraining committed to source control

* fix flake8 error
2021-10-21 13:36:21 -07:00
Sherlock
ff23b9ff55
Avoid cudaStreamSync at the end of Forward/Backward (#9470)
* Skip cudaStreamSynchronize at the end of fw 

* skip sync stream for end of backward
2021-10-21 11:28:25 -07:00
Xavier Dupré
5797bd6db3
Remove one unnecessary deepcopy in unflatten_user_output (#9353)
* Removes one unnecessary deepcopy
2021-10-21 10:44:27 +02:00
Sunghoon
4028e51e7e
Update the compatibility of ONNX Runtime Web (#9444) 2021-10-20 18:03:12 -07:00
George Nash
1249c7c29e
Resolve issue when running Yolov4 on DNNL EP (#9355)
The dnnl_binary ops need the memory format to match the format expected by
Onnxruntime. If the memory format of the inputs do not match each other
there will be an error in the calculated results.

Additionally, since the code manually pads the tensor dimensions for broadcasting
the inputs are expected to be in Onnxruntimes format.

Since detecting and reordering the memory to Ort format matches what was previously
done for the Reshape op the code was moved from dnnl_reshape to
dnnl_subgraph_primitive under the name GetMemoryInOrtFormat.

One small additional change made to the capability code log to also print the
percentage of nodes run by the dnnl execution provider.

Signed-off-by: George Nash <george.nash@intel.com>
2021-10-20 13:10:31 -07:00
Stella Stamenova
9fc53df33a
Only add aliasing to targets if the corresponding package was found (#9404) 2021-10-20 11:32:08 -07:00
Nick Kreeger
f1123c2fb3
Fix whitespace and style in concat.cc (#9452) 2021-10-20 12:43:46 -05:00
Jeff Daily
89a22fb641
Add TopK to ROCm EP (#9391)
* Add TopK to ROCm EP

* flake8 fix
2021-10-20 10:39:44 -07:00
Jeff Daily
f8acc6d0e8
Add NonMaxSuppression and RoiAlign to ROCm EP (#9394) 2021-10-20 10:38:45 -07:00
Jeff Daily
c33391329a
Add QuantizeLinear and DequantizeLinear to ROCm EP (#9401) 2021-10-20 10:37:58 -07:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
Bowen Bao
e983f37121
Bifurcation detector for aggressive decoding (#9432)
```
Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens,
starting from previous suffix match index, and predicted tokens.
Concat predicted tokens, starting from bifurcation index, to the back
of current tokens. This forms the output tokens.
Detect suffix match index in source tokens, between source tokens and output tokens.
Detection is based on finding the appearances of last n-gram in output tokens
in source tokens.
A match is considered found if source tokens contain a single matching n-gram.
Return the index of the start of the n-gram in source tokens.
No matching if found if src tokens contain multiple or zero matching n-grams. Return -1.
```
2021-10-19 19:53:56 -07:00
baijumeswani
20eaed43e5
Ignore all string inputs to ORTModule AB#1310803 (#9344) 2021-10-19 16:34:47 -07:00
Hariharan Seshadri
4698b73725
Fix output shape description of Attention op's schema (#9406) 2021-10-19 15:56:35 -07:00
George Wu
3873885316
add missing atomic include (#9440) 2021-10-19 14:42:50 -07:00
Jeff Daily
52c53e396d
hipify tensor/gather_nd_impl.cu (#9392) 2021-10-19 14:15:49 -07:00
Jeff Daily
a2ba923ac7
hipify fast_divmod.h (#9400) 2021-10-19 12:34:46 -07:00
Jeff Daily
a8e2e8d76a
hipify tensor/transpose.cc and tensor/transpose.h (#9397) 2021-10-19 12:27:36 -07:00
baijumeswani
757bc66720
Set cuda version to be None instead of an empty string (#9435) 2021-10-19 11:10:52 -04:00
Sherlock
e22920d954
Update ORTTraiing frontend codeowner (#9427) 2021-10-18 23:56:21 -07:00
Yufeng Li
da3dd398c5
Kernels for QLinearConv with symmetrically quantized filter (#9323)
Add kernels for QLinearConv with symmetric quantized filter, e.g., filter type is int8 and zero point of filter is 0. This PR includes kernels for avx2, avxvnni, avx512 and avx 512 vnni. Will adds kernels for ARM64 in following PR.

Kernels uses direct input buffer directly for pointwise, and in-direct buffer for depthwise and non-group conv.

The advantages of those new kernels are:

no need to compute the sum of each pixel output image, and sum/offset of filter can be combined with bias.
with in-direct buffer, im2col returns an array of buffer pointers instead of memcpy'ing the original data. This saves memcpy time and reduces the size of the intermediate buffer needed to hold the im2col transform. In the future, will compute im2col ahead of time for input with fixed input size.
2021-10-18 19:40:18 -07:00
baijumeswani
5da4e07daa
Make FusedAdam mathematically equivalent to Transformers AdamW (#9343) 2021-10-18 16:03:18 -07:00
Yulong Wang
5b65f1cb44
fixes SDL Native Rules warning in Node.js binding CI (#9402) 2021-10-18 13:05:46 -07:00
Jingqiao Fu
f60e603022
Add support for DmlExecutionProvider for transformer profiler tool (#9380)
* fixed a profiler.py bug

* Add dml support for profiler

* Remove commented line

* improve syntax
2021-10-18 12:31:29 -07:00
Ye Wang
0824207c0f
Add Dev Guide to transformer optimizer (#9329)
* a

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md

* Add files via upload

* Update Dev_Guide.md

* Create Dev_Guide.md

* Update Dev_Guide.md

* Update Dev_Guide.md
2021-10-18 12:27:26 -07:00
Changming Sun
6ecb990fae Update win-ci-pipeline.yml 2021-10-18 10:43:19 -07:00
Tracy Sharpe
b130a7b715
fix MSVC micro benchmark build warnings (#9373) 2021-10-15 11:35:02 -07:00
Guoyu Wang
59dfab59dc
Fix integer overflow for large step for Slice OP (#9376) 2021-10-15 09:42:53 -07:00
Yulong Wang
901c7de918
[js/web] remove webgl from default fallback list (#9374) 2021-10-14 21:46:22 -07:00
pengwa
f05c285a58
Exception when duplicated autograd.Function name detected (#9351)
* Exception when duplicated autograd.Function name detected

* reorder a bit for a bittle bit better perf

* fix a bug in previous PR :(

* correct the error message a bit
2021-10-15 12:23:13 +08:00
Sunghoon
74eaaad768
[js/web] Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast and clip (#9249)
* Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast, clip

* merge master and update a operators.md

* resolve comment. revise pool and cast kernel implementation.

* skip fusion when clip min and max is not in initializer
2021-10-14 16:29:37 -07:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Abhishek Jindal
87e726d1a0
Abjindal/merge eager with external custom ops (#8986)
* switching to pytorch nightly build

* adding eager mode

* enable pybind and remove install step

* removing auditwheel repair process

* installing package

* adding auditwheel back

* disabling auditwheel repair for eager mode

* typo correction
2021-10-14 13:19:45 -07:00
Abhishek Jindal
23700a15a0
Abjindal/eager windows build (#9326)
* removing warnings which are causing errors from torch and changing flags for Windows

* adding MKL library resolution and comments

* cleaning up the code

* fixing onnxruntime_python file for windows build

* fix the include order to aovid the python_d.lib issue on win debug build

* changes for warnings, typos and other comments

* merge conflict

* adding fix for mkl library error

* Revert "adding fix for mkl library error"

This reverts commit 73b87c73c2.

* fix for dll path for windows

* typo for dll path

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-10-14 12:54:49 -07:00
Jeff Daily
3e879aab6b
work around ucx in rocm ci Dockerfile (#9360) 2021-10-14 09:49:31 -07:00
Xavier Dupré
11f0081c1e
Remove tensorflow, tf2onnx from the list of dependencies for the documentation (#9221)
* Remove tensorflow, tf2onnx from the list of dependencies for the documentation
* improve documentation
* update API
2021-10-14 18:07:35 +02:00
Xavier Dupré
22e3f8bf54
Refactor TrainingManager.forward (#9354)
* Refactor TrainingManager.forward
2021-10-14 12:54:31 +02:00
sumitsays
851554536c
[DML EP] ConstantsOfShape - Empty Output and EinSum - Optional Parameter (#9361)
* Added null check before filling tensor with a value. Passing optional parameter for EinSum in case of MatMul type

* Addressed comment on the PR

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-10-13 23:37:10 -07:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184)
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional

* add deepspeed zero1 and zero2 - checkoverflow & clip norm

* re-structure code and add the copyright

* update the document

* refine the code after validation
2021-10-14 09:01:23 +08:00
Viswanath Boga
4771256be3
fix to avoid quantizing attention with varied q,k,v sizes (#9357)
* fix to avoid quantizing attention with varied q,k,v sizes

* updated the changes to address the comments
2021-10-13 16:25:34 -07:00
Chandru Ramakrishnan
ba0cca96f0
Hooked up eager logging to ORT default logger. (#9340)
* Hooked up eager logging to ORT default logger.
2021-10-13 18:10:32 -04:00
groenenboomj
905fe36599
Add Conv and ConvTrans to ROCm EP (#9338)
Added support for Conv and ConvTrans operators
in the ROCm execution provider. Doubles not currently
supported.
2021-10-13 14:18:08 -07:00