Commit graph

6493 commits

Author SHA1 Message Date
Changming Sun
2d2eebb844 Correct a comment
"WINVER=0x0602" means  Windows 8.  source: https://docs.microsoft.com/en-us/cpp/porting/modifying-winver-and-win32-winnt?view=msvc-170
2022-03-11 11:42:41 -08:00
Ryan Lai
2e7592ddf8
avoid using LocalFree on FormatMessageW buffer (#10796)
* remove local free

* Remove local free from onnxruntime

* don't allocate

* Change to use constexpr to satisfy  CPU build warning
2022-03-11 11:11:40 -08:00
Kotaro Yamamoto
64556888a1
add python binding for RunOptions config entry (#10694) 2022-03-11 08:49:22 -08:00
pengwa
d478a53d43
don't clear grad_fns & add test (#10671) 2022-03-11 14:31:54 +08:00
Edward Chen
1a62306db7
Use separate build directories for full and mobile iOS packages. (#10835) 2022-03-10 19:33:06 -08:00
Chun-Wei Chen
5202efd11e
remove unused six in code and CIs (#10832) 2022-03-10 15:38:44 -08:00
Changming Sun
f87a06cd96
Patch absl so that it doesn't disable important VC++ warnings (#10836)
This PR is just for making onnxruntime passing Binskim rules.

Below is how I made it:

git clone absl repo, checkout the version we are using
Then apply our patch file
Make modifications
Regenerate the patch file by "git diff > C:\src\onnxruntime\cmake\patch\xxx.patch"
Then submit the change to our repo
You will need to repeat the steps when you need to advance the absl commit or add more changes to it.
2022-03-10 15:35:39 -08:00
Pranav Sharma
97ae44d060
Mark end of version 11 C API. (#10803)
* Mark end of version 11 C API

* Add static_assert
2022-03-10 15:11:02 -08:00
Abhishek Jindal
3ae2bfaefe
Abjindal/torch api change gelu (#10833)
* changing gelu backward op and adding required files

* cleaning up file and adding comments

* version comparison issue
2022-03-10 11:56:30 -08:00
Dmitri Smirnov
1d545dfe87
Address performance issue with abseil flat_hash_table. (#10819)
When returning by value in a cross DLL call, the hash table
even though containing all the entries that are originally there
can not find at least some of them. Reverting to std::unordered_set
pending further investigation.
2022-03-10 09:49:55 -08:00
Hariharan Seshadri
e80ff63274
Fix bug in MemcpyToHost (#10816) 2022-03-10 07:02:27 -08:00
Ryan Hill
9853eaa14f
Detect runtime CUDA JIT and warn the user (#10781)
* Use cudaMalloc vs cudaDeviceSynchronize and show the total time
2022-03-09 19:15:16 -08:00
Changming Sun
cc3a3476ed
Uninstall onnxruntime-training before running local tests (#10827)
* Uninstall onnxruntime-training before running local tests
2022-03-09 18:45:04 -08:00
zhangyaobit
9cbcc93e03
Add micro-benchmarks for Attention and SkipLayerNormalization ops. (#10798)
* Add micro-benchmarks for Attention and SkipLayerNormalization ops.

* Add choices for argument provider and precision.

* Automatically select CUDA or ROCM execution provider.
2022-03-09 18:18:51 -08:00
Abhishek Jindal
1c313f4476
changing gelu backward op and adding required files (#10813)
* changing gelu backward op and adding required files

* cleaning up file and adding comments
2022-03-09 16:54:51 -08:00
Edward Chen
0293e525ea
Make QDQSelectorActionTransformer() is_int8_allowed parameter required. (#10820)
Make QDQSelectorActionTransformer() is_int8_allowed parameter required.
Set it to QDQIsInt8Allowed() in places it was previously set to false.
2022-03-09 16:19:43 -08:00
Changming Sun
cc6bc34c8c
Update protobuf submodule (#10801) 2022-03-09 09:37:58 -08:00
Dmitri Smirnov
58521fb822
Make training CUDA kernels to adhere established code structure patterns (#10735)
Current training optimizer kernels include CPU headers
  that affects changes that we can make in the CPU code with C++14 compiler and
  other refactoring efforts. Rearrange the kernel according to the established patterns
  and do not include headers that are not needed.
2022-03-09 09:06:45 -08:00
Adam Pocock
4ef81b142d
Making the Java tests faster by optionally disabling ones which require running multiple JVMs. (#10811) 2022-03-08 22:19:37 -08:00
Hariharan Seshadri
ae97ecf05b
Fix CPU, CUDA Selu activation logic (#10771) 2022-03-08 19:53:27 -08:00
Edward Chen
c147c9dda6
Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD. (#10778)
Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD as it is now implied by ORT_EXTENDED_MINIMAL_BUILD.
Remove related CMake option.
2022-03-08 16:18:49 -08:00
George Wu
769aa8363d
update onnx-tensorrt to bring in https://github.com/onnx/onnx-tensorrt/pull/812 (#10810) 2022-03-08 14:51:07 -08:00
Jingqiao Fu
f4fd67cc2c
Revert "add load from buffer (#10162)" (#10590)
This reverts commit 5cd57bb726.
2022-03-08 13:35:23 -08:00
dependabot[bot]
7e04dccca7
Bump numpy in /tools/ci_build/github/linux/docker/scripts (#10385)
Bumps [numpy](https://github.com/numpy/numpy) from 1.16.6 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.16.6...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-08 11:02:36 -08:00
Sunghoon
68c8f5a1ef
Change a pipeline vmImage from windows-latest to windows-2019 (#10804) 2022-03-08 10:49:59 -08:00
Yufeng Li
33c6819196
add qdq support of Sigmoid (#10800) 2022-03-08 10:29:15 -08:00
Changming Sun
6260733533
Fix eager mode pipeline (#10802)
It was still using python 3.6
2022-03-08 09:26:20 -08:00
Hariharan Seshadri
a9d9c6b486
Register CPU, CUDA and ROCM opset-16 kernels for some operators (#10643) 2022-03-08 09:18:39 -08:00
Changming Sun
ce07dc30fd
Change how we apply patches to absl (#10799) 2022-03-08 02:03:06 -08:00
George Wu
1e4a4bfe58
update onnx-tensorrt reference. (#10795) 2022-03-07 21:45:46 -08:00
liqun Fu
da885a72e8
update with onnx 1.11 release (#10441) 2022-03-07 21:10:55 -08:00
Yulong Wang
80917342b7
[js] upgrade mocha@8.2.1 to 9.2.1 (#10793) 2022-03-07 20:40:24 -08:00
dependabot[bot]
4d943c9bd3
Bump numpy from 1.16.6 to 1.21.0 in /tools/ci_build/github/linux/docker/scripts/manylinux (#10387)
* Bump numpy in /tools/ci_build/github/linux/docker/scripts/manylinux
2022-03-07 20:39:49 -08:00
PeixuanZuo
c07a27a008
[FIX] delete python3.6 from AMD python package docker image builder (#10790)
* [UPDATE] delete python3.6 to cooperate numpy==1.21.0

* [UPDATE] delete python3.6 to cooperate numpy==1.21.0
2022-03-07 18:21:43 -08:00
Vincent Wang
4a38f9e31d
enable strided tensor for training only (#10748) 2022-03-08 08:31:28 +08:00
zhangyaobit
b7f00b9682
Refactor the common code per operator into an abstract base class. (#10785) 2022-03-07 13:15:49 -08:00
Daigo HIROOKA
a08036da09
correct symbolic name of GridSample operation (#10782)
Function name needs to match PyTorch ATen op name, which is `aten::grid_sampler`.
2022-03-07 12:49:12 -08:00
dependabot[bot]
3e54f94bb0 Bump karma from 6.3.14 to 6.3.16 in /js/web
Bumps [karma](https://github.com/karma-runner/karma) from 6.3.14 to 6.3.16.
- [Release notes](https://github.com/karma-runner/karma/releases)
- [Changelog](https://github.com/karma-runner/karma/blob/master/CHANGELOG.md)
- [Commits](https://github.com/karma-runner/karma/compare/v6.3.14...v6.3.16)

---
updated-dependencies:
- dependency-name: karma
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-07 11:47:23 -08:00
Yulong Wang
25fdcfbd14
[js/web] allow multiple inference session creating concurrently (#10784)
* test case

* bugfix

* fix

* support multi session init
2022-03-07 11:35:06 -08:00
RandySheriffH
a4b5fa334a
Add type and shape information to profiled numbers (#10773)
* add func to collect type shape

* reformat

* refactor perf view

* remove obsolete
2022-03-07 10:17:58 -08:00
Changming Sun
d8bf9a479b
Remove python 3.6 from training pipelines (#10780)
Because the numpy we use doesn't support python 3.6. And inference pipelines already removed python 3.6.
2022-03-07 09:57:24 -08:00
Hariharan Seshadri
9d30262422
Fix AMD training pipeline (#10788) 2022-03-07 08:53:08 -08:00
Chen Fu
50a6f095cd
Symmetric QGEMM kernel for ARMv8 A55 chip (#10754)
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace

ldr q4,[x1],#16

with

ldr d4,[x1],#8
ldr x11,[x1],#8
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-03-07 08:41:13 -08:00
PeixuanZuo
55af7a96a7
update the amd ci pipeline (#10723)
* [TEST] test to get amd pipeline information

* [FIX] lower the threshold

* [UPDATE] add retry task

* [UPDATE] add retry task

* [ERROR] error to occur retry

* [FIX] error

* [UPDATE] update retryCountOnTaskFailure to 1 time

* [UPDATE] add showmeminfo
2022-03-07 18:39:42 +08:00
Fei Hu
60acfd3dd8
Support CUDA Graph in the CUDA EP (#9978) 2022-03-06 20:47:31 -08:00
Tianlei Wu
0e335aba37
Update BeamSearch operator spec to support t5 (#10777)
* change BeamSearch op to support encoder decoder model

* check model_type and decoder attribute

* fix

* update comments

* warn shape inference issue with onnx v1.11 or T5

* skip parity test when tempature != 1.0

* fix build
2022-03-04 21:52:45 -08:00
George Nash
6be5185088
Update dnnl Add, Mul, Sub, Div ops to handle scalar values (#10756)
* Update dnnl Add, Mul, Sub, Div ops to handle scalar values

Signed-off-by: George Nash <george.nash@intel.com>

* Add additional scalar support for dnnl execution provider

This will add scalar support for:
Eltwise operators: Abs, Elu, Exp, LeakyRelu, Log, Relu, Round,
                   Sigmoid, Softplus, Sqrt, and Tanh
Gelu operators: BiasGelu, FastGelu, and Gelu
Softmax operator

Signed-off-by: George Nash <george.nash@intel.com>
2022-03-04 19:28:25 -08:00
Ye Wang
259ade2557
Add ability to modify num_hidden_layers from benchmark script (#10760)
* add ability to modify num_hidden_layers from benchmark script

* comment

* Revert "comment"

This reverts commit 28794b0e4f86506dcc937738894fcef97fc84e48.

* Revert "add ability to modify num_hidden_layers from benchmark script"

This reverts commit 96f36ed7f751721bcf4e3ab8748a715f19a4e044.

* review coments

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-03-04 18:28:51 -08:00
Ella Charlaix
fde847473b
Add min max moving average calibration method (#10753)
* Add min max moving average calibration method

* Modify the calibration extra options dictionnary creation
2022-03-04 14:55:31 -08:00
Maxiwell
43ff27c7c8
ppc64le: optimizing the MlasQuantizeLinear() with VSX (#10644)
This code is valid only when -mcpu is set to utilize POWER9 technology
or above. A compatible code for POWER8 was created as well, but it
was not tuned for performance.
2022-03-04 14:54:56 -08:00