Commit graph

6455 commits

Author SHA1 Message Date
Yulong Wang
25fdcfbd14
[js/web] allow multiple inference session creating concurrently (#10784)
* test case

* bugfix

* fix

* support multi session init
2022-03-07 11:35:06 -08:00
RandySheriffH
a4b5fa334a
Add type and shape information to profiled numbers (#10773)
* add func to collect type shape

* reformat

* refactor perf view

* remove obsolete
2022-03-07 10:17:58 -08:00
Changming Sun
d8bf9a479b
Remove python 3.6 from training pipelines (#10780)
Because the numpy we use doesn't support python 3.6. And inference pipelines already removed python 3.6.
2022-03-07 09:57:24 -08:00
Hariharan Seshadri
9d30262422
Fix AMD training pipeline (#10788) 2022-03-07 08:53:08 -08:00
Chen Fu
50a6f095cd
Symmetric QGEMM kernel for ARMv8 A55 chip (#10754)
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace

ldr q4,[x1],#16

with

ldr d4,[x1],#8
ldr x11,[x1],#8
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-03-07 08:41:13 -08:00
PeixuanZuo
55af7a96a7
update the amd ci pipeline (#10723)
* [TEST] test to get amd pipeline information

* [FIX] lower the threshold

* [UPDATE] add retry task

* [UPDATE] add retry task

* [ERROR] error to occur retry

* [FIX] error

* [UPDATE] update retryCountOnTaskFailure to 1 time

* [UPDATE] add showmeminfo
2022-03-07 18:39:42 +08:00
Fei Hu
60acfd3dd8
Support CUDA Graph in the CUDA EP (#9978) 2022-03-06 20:47:31 -08:00
Tianlei Wu
0e335aba37
Update BeamSearch operator spec to support t5 (#10777)
* change BeamSearch op to support encoder decoder model

* check model_type and decoder attribute

* fix

* update comments

* warn shape inference issue with onnx v1.11 or T5

* skip parity test when tempature != 1.0

* fix build
2022-03-04 21:52:45 -08:00
George Nash
6be5185088
Update dnnl Add, Mul, Sub, Div ops to handle scalar values (#10756)
* Update dnnl Add, Mul, Sub, Div ops to handle scalar values

Signed-off-by: George Nash <george.nash@intel.com>

* Add additional scalar support for dnnl execution provider

This will add scalar support for:
Eltwise operators: Abs, Elu, Exp, LeakyRelu, Log, Relu, Round,
                   Sigmoid, Softplus, Sqrt, and Tanh
Gelu operators: BiasGelu, FastGelu, and Gelu
Softmax operator

Signed-off-by: George Nash <george.nash@intel.com>
2022-03-04 19:28:25 -08:00
Ye Wang
259ade2557
Add ability to modify num_hidden_layers from benchmark script (#10760)
* add ability to modify num_hidden_layers from benchmark script

* comment

* Revert "comment"

This reverts commit 28794b0e4f86506dcc937738894fcef97fc84e48.

* Revert "add ability to modify num_hidden_layers from benchmark script"

This reverts commit 96f36ed7f751721bcf4e3ab8748a715f19a4e044.

* review coments

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-03-04 18:28:51 -08:00
Ella Charlaix
fde847473b
Add min max moving average calibration method (#10753)
* Add min max moving average calibration method

* Modify the calibration extra options dictionnary creation
2022-03-04 14:55:31 -08:00
Maxiwell
43ff27c7c8
ppc64le: optimizing the MlasQuantizeLinear() with VSX (#10644)
This code is valid only when -mcpu is set to utilize POWER9 technology
or above. A compatible code for POWER8 was created as well, but it
was not tuned for performance.
2022-03-04 14:54:56 -08:00
Tianlei Wu
379b3cdef6
T5 to ONNX conversion script (#10766)
* T5 onnx conversion script
2022-03-04 14:42:04 -08:00
Olivia Jain
12eb660415
Compare TRT vs ORT-TRT Accurately (#10565)
* get inputs independently for trtexec

* track one process only

* remove engine and profile files

* change time to commit time

* add runtime option for io binding

* move to commit date

* fixes

* add option for graph optimization

* cleanup docker script

* include remaining changes

* choose graph optimization option

* add space in option
2022-03-04 10:14:18 -08:00
dependabot[bot]
e3c85d4262 Bump numpy
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 09:51:32 -08:00
dependabot[bot]
b780a3784e Bump numpy in /tools/ci_build/github/linux/docker/scripts/training
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 09:38:38 -08:00
dependabot[bot]
0b0e8ccf92 Bump numpy
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-04 09:34:58 -08:00
Changming Sun
283d0c47b4
Update our absl cmake files (#10762) 2022-03-04 09:28:04 -08:00
zhangyaobit
4c88fa5971
Add micro-benchmark for FastGelu (#10744)
* Add micro-benchmark for FastGelu

* Delete the bert-base case, as it is very similar to the bert-large one.

* Add argument parsing and more user-friendly provider type assertion.
2022-03-04 08:51:15 -08:00
Valery Chernov
46d0b20ac2
upstream TVM. small code cleaning (#10515)
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-03-04 12:15:29 +01:00
Edward Chen
395a7242d6
[iOS packaging] Minor updates. (#10755)
* Change storage container, simplify build definition parameters.
* Remove explicit version from Objective-C docs.
* Increase timeout.
* Use real storage account.
* Get static website URL with az cli.
2022-03-04 16:02:53 +10:00
Scott McKay
e337f5faf3
Enable QDQ cleanup and NHWC optimizers in an extended minimal build. (#10729)
* Enable QDQ cleanup and NHWC optimizers in an extended minimal build.
2022-03-04 15:45:42 +10:00
Guoyu Wang
7aa706854f
Pipeline changes to build full ORT package for Android (#10654)
* Add android package build settings for full build
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2022-03-04 15:35:54 +10:00
Scott McKay
6072c6b65e
Simplify QLinearConv registration so type reduction works with it. (#10747)
* Simplify QLinearConv registration so type reduction works with it.
* Update QLinearMatMul registration to be a standard typed registration
2022-03-04 14:06:04 +10:00
Abhishek Kulkarni
c2c85dd6b1
Add an option to export ONNX graphs in ORTModule tests (#10579)
Co-authored-by: Abhishek Kulkarni <abkulkarni@microsoft.com>
2022-03-03 16:56:19 -08:00
Yulong Wang
745fa5885f
optimize web assembly build flags for multi-thread (#10759) 2022-03-03 16:44:14 -08:00
Edward Chen
c8ec7782bd
Fix unused variable warning, move variable definitions closer to usages. (#10757) 2022-03-04 09:18:33 +10:00
Olivia Jain
ed87e1b721
Change axis to 0D in cumsum tests. (#10715)
* changing axis to 0

* if def for openvino

* removing extra header

* include changes

* pass in 0D scalar

* Add comment explaining change.
2022-03-03 10:44:46 -08:00
Changming Sun
b3e96d6195
A new pipeline to replace the existing WindowsAI packaging pipeline (#10646) 2022-03-03 08:56:49 -08:00
Hubert Lu
fe8d867efa
Optimize BinaryElementWise and BiasGeluGrad kernels for AMD (#10594)
* Optimize elementwise and biasgelugrad kernels for AMD

* Clean up for BiasGeluGradDxKernel
2022-03-03 08:07:15 -08:00
cloudhan
4c20f6863d
Fix build with gcc 7.5 (#10567) 2022-03-03 18:29:02 +08:00
Fei Hu
75160d6779
Add the missing status return in beam search (#10738) 2022-03-03 01:24:44 -08:00
Rachel Guo
a9dc50ba8b
Add option to force QDQIsInt8Allowed to return true when exporting to ORT format (#10719)
* wip

* save

* minor update

* fix

* fix

* Revert "fix"

This reverts commit a76f364b2d.

* revert

* revert

* revert submodule removal

* address pr comments

* minor fix

* address cr comments

* fix format

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-03-02 23:26:14 -08:00
Ye Wang
44d08d80a0
Add restriction to first usage in allocation planner (#10724)
* Add restriction to first usage in allocation planner

* change phrases

* add UT

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-03-02 22:03:50 -08:00
Tianlei Wu
47ab0c2006
Auto mixed precision conversion of GPT-2 onnx model (#10711)
* add auto mixed precision
* Add float_to_float16_max_diff, update fp16 constants
* remove cascaded Cast nodes
2022-03-02 21:08:51 -08:00
Olivia Jain
7ebff2b273
add missing link to openvino (#10737) 2022-03-02 15:10:59 -08:00
Baiju Meswani
f9b6eef05f
orttraining packaging pipeline for rocm 5.0.1 (#10725) 2022-03-02 12:32:14 -08:00
Yufeng Li
7ab0c607b4
add qdq support of (un)squeeze and GlobalAveragePool (#10721) 2022-03-02 10:58:35 -08:00
Numfor Tiapo
9ad95bf068
Skip SetName test on inbox build (#10699) 2022-03-02 10:28:58 -08:00
RajalakshmiSR
5d8c5409ab
POWER10: QGEMM optimization (#10642)
* POWER10: QGEMM optimization

This patch makes use of POWER10 MMA feature for QGEMM function.
This optimization includes signed and unsigned cases.Tested and
there are no new failures with gcc11 and clang-14.

* Changes as per review comments

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2022-03-02 08:36:26 -08:00
Funtowicz Morgan
e5c6dc1fc8
Add ability to save calibration augmented models through external data format when model size exceeds 2Gb. (#10695) 2022-03-02 08:35:30 -08:00
Valery Chernov
62cc981599
[TVM EP] support of TVM Virtual Machine (#10341)
* add executor option (vm or graph) and support virtual machine methods

* nullptr check for compile and run methods (see also PR#10211 from microsoft:onnxruntime)

* get output shapes for VM

* remove run_with_benchmark. remove run methods from python api, get it from native side

* get outputs method for VM was implemented

* support multiple input for VM

* update python logging and exception

* small fix

* update tvm with patch for VM API

* update nhwc transformations for TVM EP

* add data alignment check and support set_input_zero_copy for GE in TVM EP

* fix logger name

* return back to apache/tvm with VM fixes instead of local dev branch

* hide customized tvm logger while issue is not resolved. fix tvm warning related to target_host

* flake8 fix

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-03-02 11:02:33 +01:00
Sunghoon
a7f6442c45
[js] release pipeline for web and react native (#10656)
* skip browserstack test at release pipeline

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* pool name as a parameter to run at lotus

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* Update web-ci-pipeline.yml for Azure Pipelines

* create a packaging pipeline for web

* Update web-packaging-pipeline.yml for Azure Pipelines

* make web-ci-pipeline as a template

* make web-ci-pipeline as a template

* make web-ci-pipeline as a template

* make web-ci-pipeline as a template

* change a paramter name checking a pipeline

* make a pool name changable for react native pipeline

* disable code sign validation for react native

* fix react native package.json publish

* fix indentation

* remove unnecessary comment

* test onnxruntime-common package publish

* ts and js files use lf as eol for windows

* use Linux style of ending line break

* change newLine at only tsconfig.json

* restore a commented code

* fix git restore directory for npm packaging

* fix a typo

* force eol to lf on windows for js directory in CI
2022-03-01 21:38:33 -08:00
Edward Chen
9e7d7a9e97
Convert ConvActivationFusion transformer to a selector action transformer. (#10687) 2022-03-02 13:47:55 +10:00
Tianlei Wu
fa9090f259
check gpt-2 graph in converting beam search (#10712) 2022-03-01 19:04:34 -08:00
Edward Chen
d07a2377b1
Fix race condition in CUDA, ROCm, and TensorRT EP GetKernelRegistry() implementations. (#10200)
Make GetKernelRegistry() kernel registry initialization thread-safe.
2022-03-01 17:53:58 -08:00
Tianlei Wu
2fb2dae42f
Print tensor snippet in dumping node Inputs/Outputs to StdOut (#10707)
* dump tensor snippet
2022-03-01 16:59:12 -08:00
zhangyaobit
a7738b52c5
Add microbench to benchmark single operators. (#10678)
* Add microbench to benchmark single operators.

* Move to tool directory; seperate data genration from io binding.

* Refector.

* Clean up.

* Use precision instead for extensibility.

* Refactor the create_io_binding function to take in torch tensors
instead of numpy arrays; this reflects more accurately what
the function does, because it is torch tensors that got bound.
2022-03-01 16:00:16 -08:00
Guoyu Wang
19464614e7
[NNAPI QDQ] Add QDQ Concat (#10666)
* add qdq concat

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-03-02 09:08:36 +10:00
Bowen Bao
6448ca64e6
Fix reshape allowzero with unknowndim (#10665) 2022-03-01 10:47:48 -08:00