Commit graph

5767 commits

Author SHA1 Message Date
Guoyu Wang
fa4658e8a9
Move to XCode new build system if building on Mac using XCode (#9617)
* Use xcode new build system

* Address cr comments
2021-10-29 18:44:55 -07:00
Guoyu Wang
57491b6f93
Add App Center test for iOS package (#9605)
* Add app center test for iOS package

* fix flake8

* fix yml templates path

* Address CR comments
2021-10-29 15:23:01 -07:00
Hariharan Seshadri
b5f7bb7d10
Update ONNX (#9462) 2021-10-29 10:33:40 -07:00
sumitsays
7744cc1013
[DmlEp] Make DmlEp compatible with Clang for EPIC (#9585)
* Make DmlEp Clang compatible for EPIC

* Fix build issues occurred when engine/lotus points to ORT Github latest

* Fix more build errors

* Fixed one build issue and removed temporary changes for Clang

* Addressed comments on the PR.

* Style fixes

* Fix unreachable code

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2021-10-29 03:19:35 -07:00
Scott McKay
eb2612b588
Remove netcoreapp2.1 target as it is EOL and out of support. Attempting to use it with VS now causes unit test run failures. (#9603) 2021-10-29 11:11:22 +10:00
Changming Sun
173e538b80 Update mac-ios-packaging-pipeline.yml 2021-10-28 14:25:29 -07:00
Changming Sun
cc73bcc243 Suppress component governance component warnings for ios 2021-10-28 14:25:29 -07:00
Ginés Hidalgo
1731f0080a
Update attention_cpu_base.h to suppress static analysis warning 2021-10-28 13:35:57 -07:00
Xavier Dupré
9c15c68ed4
Enable fallback when forward fails due to non contiguous tensor (#9369) 2021-10-28 13:04:54 -07:00
Tianlei Wu
a01a3f2552
Add more statistics in transformer profiler (#9578)
* add statistics of cuda kernel
* grouping by provider + operator
* add --input to import profiling result
2021-10-28 11:35:03 -07:00
Viswanath Boga
85874bb315
embed layer fusion gpt2 (#9336)
* Changes to fuse embed layer for gpt2, kernal changes pending

* verified add output and regular add match

* Test added for additional output embedlayernorm, working on CUDA

* Test passing on CPU

* updated convert_to_onnx toll to check parity correctly

* removed some debugs

* couple of TODO left as in optimizer.py

* removed changes to optimizer.py

* fixing build

* fixing build

* updated order of initilization

* added a test case for float16

* updating the docs

* updating tests failing due to embed layer fusion

* update unit tests

* updating CUDA documentation in operatorkernels.md

* addressing comments

* OperatorKernels.md updated with CUDA

* adding TODO to qembed_layer

* minor edit

* updated docs

* addressing comments

* adding position ids to embed layer gpt2

* updating fused gpt2 model

* added extra test

* remove comments

* addressing comments

* contrib_defs.cc updated

* all tests passing

* fixing a typo

* minor edit

* trigger build

* qembedlayernorm checkinputs updated

* fixing build error

* fixing build error

* fixing build error
2021-10-28 11:06:26 -07:00
Tianlei Wu
a555740708
Attention fusion: update uint8 tensor parsing for ONNX upgrade (#9564)
* use UnpackTensor to parse uint8 tensor
* address review feedback
2021-10-28 10:38:10 -07:00
Sunghoon
17cf39a964
Clean up unnecessary codes in softmax and hardmax kernel (#9580)
* add p50 in test

* remove unnecessary codes from softmax

* remove unnecessary codes from hardmax

Co-authored-by: Yulong Wang <yulongw@microsoft.com>
2021-10-28 10:01:46 -07:00
TomWildenhain-Microsoft
e8268c9a18
Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284)
* Add Transpose Optimizer and modify nhwc optimizer to use it.

* Fix casts

* Fix casts2

* Fix move

* Add tests

* Add headers

* Fixes and tests

* Remove explicit template instantiation

* Fix build warning

* Name unit tests

* Code review fixes

* Add some comments

* Fix some casts

* Make optimization slightly less agressive

* Some unit test fixes

* Update Attention pattern to work with transpose optimizer

* Update attention fuser

* Fix attention fusion python script

* Improve transpose optimizer documentation

* Create OptimizerCtx struct

* Disable Slice handler for testing

* Implement Slice int32

* Only push transposes leading up to other transposes

* Improve optimization heuristic

* Add exemption for MaxPool

* Document transpose optimizer api.h

* Revert fusion tests to master

* Remove temp files

* Replace typedef with using

* Trim trailing whitespace

* Move class declarations from api_impl.h to api_impl.cc

* Remove copy constructors and move allocator

* Alphabetize headers

* Add override keyword

* Comments for nhwc_transformer

* Rename OrtGraph to ApiGraph, etc.

* Wrap line

* Remove extra qualifier on ApiGraph

* Refector attention fusion

* Remove c-style casts from api_impl.cc

* Improve documentation

* Avoid printing vector in ORT_ENSURES

* Revert attention fusion refactor

* Remove duplicate cost heuristics and improve documentation

* Fix size_t casts

* Fixes from Scott's review

* Unrevert attention refactor and more updates from Scott's review

* Revert api_impl.cc ValueInfo change

* only optimize first transpose input

* Unrevert api_impl.cc changes

* Make vector call reserve

* transpose_optimizer.cc update from Scott's comments

* Rename api::Graph to api::GraphRef etc.

* Consider domains 'onnx.ai' and '' equal

* Replace AddInput with SetInput

* Improve tests

* quantization and heuristic tests

* Comments for tests

* Replace const string_view with string_view and update tests

* Fixes requested by Edward

* Fix std::string to string_view conversion

* Add <string> to includes

* Fix bug for broadcasting ops with unknown rank. Slight safety improvements

* Changes requested by Edward

* Fix formatting

* Improve description of cost metric
2021-10-27 22:10:39 -07:00
Changming Sun
87b1fddd97
Add Linux/MacOS ARM64 support to nuget packaging pipeline (#9570) 2021-10-27 19:00:43 -07:00
Ginés Hidalgo
2d44bd525b
DML functions always returning a value (#9485)
* Always return a value
* @fdwr advice added
2021-10-27 15:21:32 -07:00
Scott McKay
a2b3e6bb23
Remove pointless assert. (#9571) 2021-10-28 07:33:40 +10:00
Dmitri Smirnov
4e76360261
Prevent PySparseTensor form being garbage collected if we have an outstanding OrtValue (#9540)
Prevent PySparseTensor form being garbage collected if we have an outstanding OrtValue
  Improve  comments.
2021-10-27 11:28:37 -07:00
Changming Sun
aa76520e60
Update macOS build agents to macOS 11 (#9562) 2021-10-27 10:00:04 -07:00
Thiago Crepaldi
5d5c03bcdc
Fix opset version change by not using copy of global constant (#9393) 2021-10-27 12:42:06 -04:00
Scott McKay
b5a652c578
Add Xamarin support (#9436)
Add Xamarin support to the ORT nuget packages.
  - Update C# code to support Xamarin builds for iOS and Android
  - refactor some things to split out common code
  - include iOS and Android ORT native shared library in native nuget package
2021-10-27 20:07:07 +10:00
Ginés Hidalgo
12f216aab5
Bug in DmlOperatorResize.cpp with m_inputDimensions (#9456) 2021-10-27 02:50:54 -07:00
Ginés Hidalgo
9639eded4b
Missing #pragma once in dml_provider_factory.h (#9457) 2021-10-27 02:49:52 -07:00
Ginés Hidalgo
1efdbff1a3
Fixed compiler error in Clang (for Win64) for ExecutionProvider (#9482) 2021-10-27 02:47:22 -07:00
Yi-Hong Lyu
0301f401ee
Cleanup unnecessary opset_version arguments (#9558) 2021-10-27 02:25:54 -07:00
Sunghoon
c79307e7b4
[js/web] support opset-13 of softmax (#9493)
* add p50 in test

* support opset-13 of softmax

* update a operators.md

* resolve comments

* fix lint and format

Co-authored-by: Yulong Wang <yulongw@microsoft.com>
2021-10-26 23:58:50 -07:00
Ginés Hidalgo
d079e0d48f
Fixed Clang (on Windows) compiler error with #pragma's (#9484) 2021-10-26 21:31:45 -07:00
RajalakshmiSR
c54ad0dd0b
POWER: Add Dgemm kernel for POWER processor (#9459)
* POWER: Add Dgemm kernel for POWER processor

This patch adds new dgemm kernel specific to POWER processor.

* POWER: Restrict new functions to VSX in header

* Remove warning check in header

* POWER: Dgemm Adjust indentation

Fixing indentation based on review comments.

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-10-26 20:27:24 -07:00
Yulong Wang
90555bf96d
[node.js binding] enable CI for macOS arm64 (#9532)
* nodejs aggr

* add dependency

* no unzip

* fix aggregation

* add arm64 for mac

* mac arm64 build

* fix commandline

* add check for multi-CMAKE_OSX_ARCHITECTURES

* fix
2021-10-26 16:42:19 -07:00
Zhang Lei
c1b0f924b7
quantization tool better support operator when subgraph is enabled (#9463)
* Fix is_valid_quantize_weight recursive issue when enable subgraph.

* some clear
2021-10-26 15:36:19 -07:00
Zhang Lei
33ef1d7700
disable inner parallel for global avg pool as normally they are small (#9487)
* Using cost model's thread count rather than max number of threads when
parallel tasks.

* according to perf test result, decrease parallel on channels.

* Seems no use on parallel channels for qavg_pool according several models, remove it.

* Revert "Using cost model's thread count rather than max number of threads when"

This reverts commit 5fa47cd5b5ddbaa4e5ef97ccbc53200324379544.
2021-10-26 15:35:49 -07:00
Changming Sun
df7a5342a5
Upgrade com.diffplug.spotless to 5.17.0 (#9546) 2021-10-26 14:29:46 -07:00
Changming Sun
f39821adbc
Fix a bug in CMakeLists.txt when handling NO RTTI (#9547) 2021-10-26 14:29:29 -07:00
Jingqiao Fu
da15f5fc2f
change cmake condition to prevent WCOS fom linking advapi32 (#9500)
* change condition to prevent WCOS fom linking advapi32.dll

* Remove linkage to advapi32.lib
2021-10-26 12:16:49 -07:00
Stella Stamenova
542f1a9737
Cleanup some whitespace and capitalization for set (#9504) 2021-10-26 12:02:07 -07:00
Ginés Hidalgo
a036cc6d4b
Fixing bugs in ORT_NO_EXCEPTIONS (#9479)
ORT_NO_EXCEPTIONS is not working after the latest changes in:

onnxruntime/core/graph/function.cc
onnxruntime/core/graph/graph.cc
2021-10-26 10:50:32 -07:00
Ginés Hidalgo
1aabba7120
Avoided warning C4458: declaration of 'X' hides class member. (#9541) 2021-10-26 10:49:24 -07:00
satyajandhyala
f29057c7c0
Added TanhGrad. (#9507)
* Added TanhGrad.
2021-10-26 09:10:03 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
Yi-Hong Lyu
27ad20df23
Add QDQ support of Resize to able to fuse it into a quantized Resize (#9476) 2021-10-25 21:48:15 -07:00
ashbhandare
0270ff7951
Minor import fix (#9538) 2021-10-25 21:29:31 -07:00
Changming Sun
f92b8e2ac8
Clean up optional-lite references (#9534) 2021-10-25 21:05:45 -07:00
Yulong Wang
bf4c3fa3d6
[node.js binding] aggregate binaries for multiple platforms in single NPM package (#9501) 2021-10-25 20:16:10 -07:00
Vincent Wang
fb4f7dbbb7
Call ATenOp for ReduceSum on ORTModule (#9471)
* call ATenOp for ReduceSum

* Enable ReduceSum ATenOp for training only

* always load extension
2021-10-26 09:48:57 +08:00
marcusfreisleben
651955d3c9
CUDA: Enable parallel compilation (#8974)
* Pass on parallel option to nvcc

* Fixed build.py

* Added missing string conversion

* Adressed review points
2021-10-25 16:42:58 -07:00
Scott McKay
39d1b9e1c1
Fix bug in Slice helper when dim value is zero (#9492)
* Don't clamp if dim_value is zero as that will set `step` to an invalid value.
2021-10-25 17:39:01 +10:00
Ginés Hidalgo
dbe1b57a71 Update thread_utils.cc 2021-10-22 16:59:09 -07:00
Ginés Hidalgo
a79d375d24 Added fixes for Clang on Win64 2021-10-22 16:59:09 -07:00
Ginés Hidalgo
9335cf102a Deleted duplicated "core/graph/function.h"
"core/graph/function.h" appears twice:
- `include/onnxruntime/core/graph/function.h`
- `onnxruntime/core/graph/function.h` --> This one is redundant and not used anywhere
2021-10-22 16:58:29 -07:00
Stella Stamenova
d608504438
Don't use legacy mode for protobuf (#9498) 2021-10-22 16:50:29 -07:00