Commit graph

5427 commits

Author SHA1 Message Date
Yulong Wang
e8564d6597
[js/web] update emsdk to v2.0.26 (#8653)
* update emsdk to v2.0.26

* fix pooling build warning

* fix build break

* use pragma diagnostic semantic only when __GNUC__ is defined

* fix build break

* disable AttentionPastState_dynamic
2021-08-26 15:31:34 -07:00
Sunghoon
a16c681103
[js/web] Prepare to integrate ONNX Runtime Web CI with BrowserStack (#8843)
* Integrate BrowserStack with ONNX Runtime Web CI pipeline

* Change to Linux command for BrowserStack CI

* Set preferTriggeringPipeline as true

* Fix a commit fetching script

* Remove wasm binary download from the latest build

* Use release build of WebAssembly

* Disable check-out of commit for testing

* Use commit of WebAssembly build CI pipeline

* Need to issue two PRs to prevent build failure
2021-08-26 11:57:31 -07:00
Chi Lo
eb8f84e2a2
Fix issue of GPU tarball/zip/java package (#8850)
* modify for test

* modify for test

* modify for test

* modify for test

* modify for test

* modify for test

* prepare for PR

* Rename cuda directory to gpu directory in tarball

* Fix gpu java package

* fix bug

* fix small bug
2021-08-26 10:16:16 -07:00
Edward Chen
0cfc4ec09d
[Objective-C] Enable static analysis (#8842)
Add Objective-C API static analysis pipeline.
2021-08-26 09:13:52 -07:00
Sherlock
c325207f7a
Optimize MatmulGrad (#8846)
Optimize two special cases of MatmulGrad using FusedMatMul.
2021-08-25 23:36:40 -07:00
Changming Sun
ced2d8e597
Clean up TRT docker files (#8847) 2021-08-25 22:26:31 -07:00
Changming Sun
9cd7d836f7
Delete Dockerfile.ubuntu_for_android (#8848) 2021-08-25 22:25:14 -07:00
Scott McKay
b21ea00020
Cleanup C# bindings to add EP (#8810)
Fix C# add EP bindings.
Add stubs to ORT so that if EP is not included in the build we return a graceful error message.
Move declaration of stubs into C API and out for EP so they're in one place and are easier to use (no extra header required in the C/C++ world and consistent with the CUDA EP setup).
Fix inconsistency in ROCM EP.
Cleanup a few other things.
2021-08-26 13:59:40 +10:00
Guoyu Wang
613a600471
relax android ci timeout to 180 minutes (#8844) 2021-08-25 19:59:48 -07:00
Chi Lo
32ecbf4691
Create combined GPU tarball and zip file package (#8827)
* Add onnxruntime_providers_shared.dll into gpu nuget package

* Modify for test

* Temporarily remove for test

* Modify for test

* Modify for test

* Test packging Windows combined GPU

* Test packging Windows combined GPU

* Test packging Windows combined GPU

* Test packging Windows combined GPU

* modify for test

* modify for test

* fix bug

* Modify for test

* Modify for test

* Modify for test

* Modify for test

* Modify for test

* Modify for test

* Modify for test

* Modify for test

* Prepare for PR

* Prepare for PR

* Code refactor

* Rename proper Artifact name

* Rename intermediate Artifact names

* Revert Artifact Names

* Rename Artifact Names

* Modify Artifact name

* Modify Artifact name

* Modify Artifact name

* Update Java package

* Update Java package

* fix bug to change artifact name

* Fix bug for the wrong file path

* Fix no fetching correct artifact and test

* temporarily modify for test

* undo the change for test
2021-08-25 13:51:18 -07:00
Hariharan Seshadri
cee79526fd
Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442) 2021-08-25 12:04:20 -07:00
Rajalakshmi Srinivasaraghavan
33a97e995b POWER: Fix compilation issues with clang
This patch fixes some compilation errors when using
clang11 on POWER processors.
2021-08-25 11:40:29 -07:00
Sherlock
73fe7bfa0f
Add ATenOp at::diagonal (#8838)
* Register at::diagonal for ATenOp
2021-08-25 09:45:53 -07:00
Tianlei Wu
237076a660
Add option to disable FastGelu half2 cuda kernel (#8819)
Allow FastGelu half2 kernel to build without --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=xx
Add environment variable ORT_TRANSFORMER_OPTIONS=4 to disable half2 FastGelu kernel for testing purpose
Test parity of FastGelu operator with fp16 inputs.
2021-08-25 08:37:41 -07:00
Chandru Ramakrishnan
98ed235fc7
Removed MSNPU code from eager. (#8832) 2021-08-25 09:40:25 -04:00
ashari4
4251e04eae
Removed assert (#8779) 2021-08-24 20:26:08 -07:00
Ye Wang
56b37e55e5
Add new transformers model type: Bart (#8698)
* update

* bart-base encoder attention fusion

* update

* update

* update

* update

* update

* yapf

* review comments
2021-08-24 18:13:46 -07:00
Changming Sun
3837027506
Remove pyopenssl from installation (#8830) 2021-08-24 17:07:22 -07:00
KeDengMS
ddd4586a2f
[Symbolic Shape Infer] add more ops for auto merge (#8824)
As Less/Equal/Greater/LessOrEqual/GreaterOrEqual ops can broadcast
2021-08-24 16:33:23 -07:00
ashari4
7f1e880649
Reorder ORT eager headers (#8813) 2021-08-24 14:48:43 -07:00
Guoyu Wang
8992e31c85
Move iOS package from framework to xcframework (#8805)
* additional changes

* test package run

* minor fix

* minor fix

* minor fix

* Get around no arm64 simulator

* fix objc pod build failure

* downgrade_eigen

* update objc podspec template
2021-08-24 13:38:14 -07:00
Yufeng Li
e25986781f
Fallback to default quantization if quantization params is not found (#8788) 2021-08-24 11:20:19 -07:00
Hariharan Seshadri
17b0664e34
Optimize sequence type usage on CUDA [2/n] (#8720) 2021-08-24 10:40:28 -07:00
Jorn Tuyls
9053e1522d
Check for Python_EXECUTABLE in pyxir.cmake to fix Vitis AI EP build (#8631)
Co-authored-by: Jorn Tuyls <jornt.tuyls@gmail.com>
2021-08-24 08:39:50 -07:00
Changming Sun
4bfff45859
Downgrade Eigen (#8817) 2021-08-23 18:06:23 -07:00
Chandru Ramakrishnan
2693af9799
Ported changes / bug fixes from torch/ort. (#8784)
* Ported changes / bug fixes from torch/ort.

* Fixed formatting

* Renamed function

* Renamed module_ to module.

* Revert "Renamed module_ to module."

This reverts commit b17fc114b3db20d174283811d90592b5b8154c19.

* Include pybind common header to fix linker errors on windows debug.

* Fix to generation of > 1 custom op.

Co-authored-by: Ashwin Hari <ashari@microsoft.com>
2021-08-23 17:45:40 -04:00
Chandru Ramakrishnan
f51f2bad66
Fix for doxygen doc errors. (#8814) 2021-08-23 15:52:15 -04:00
Tiago Koji Castro Shibata
62c0d24340
Fix Windows Store build (#8753)
* Remove APIs unavailable in Store in #8349, #8178, #8065

* Add UWP stubs of C runtime functions

* Remove UWP incompatible tests from UWP build

* Remove incompatible tests from Store

* Use UWP stubs in store only

* Skip partition check outside of Windows

* Remove unused WRL include

* Workaround Windows header not including what it uses

* Fix precompiled header name clash

* Workaround SDK bugs

* DXCore workaround in Win7

* Fix warning

* Fix more warnings

* Bump WinML to target Windows 8

* Fix more warnings

* Remove unnecessary workarounds

* Remove Desktop only APIs from DML adapter
2021-08-23 11:19:03 -07:00
Edward Chen
ea68955c71
Add more info to kernel registry manager hash lookup error message. (#8801) 2021-08-23 11:09:30 -07:00
George Nash
d4a88cfe3f
Add Gemm op to DNNL Exectution provider (#8799)
* Implement Gemm op for DNNL execution provider

Signed-off-by: George Nash <george.nash@intel.com>

* Remove KernelRegistry and Gemm op for dnnl ep

The KernelRegistry for the dnnl execution provider only
registered a Gemm op that as best we can tell was never
actually used and also was not using the dnnl library.

We have implemented a Gemm op in the DNNL execution provider
subgraph code and thus are removing the unused Gemm op that
was in the dnnl KernelRegistry.

Signed-off-by: George Nash <george.nash@intel.com>

* Fix duplicated output and kernelshape inference

fix getcapability to make sure subgraph outputs do not have duplicates

fix kernelshape inference in pool

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Removed most dnnl specialized ifdefs from gradient_ops_test code

Re-enable GlobalAveragePoolGrad test for dnnl ep

The bugs that were exposed by the GlobalAveragePoolGrad test have
been fixed and this test no longer needs to be disabled for DNNL.

Removed the ReluGradDnnl test. We are getting the testing from the
already existing ReluGrad test.

MaxPoolGrad test no longer has specialized execution provider
enabling for DNNL execution provider. It will now run without
the extra enabling.

ConvGrad is the only test that still has dnnl specialized ifdefs
However, the ConvGrad code was not being executed by the code
unless it was listed first in the list of execution providers.

Signed-off-by: George Nash <george.nash@intel.com>

* Fix transpose issue on Gemm

On transposing square matrices, getmemoryandreshape will fail to reshape

fix by adding a bool

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Save memory space by reusing internal tensor for output

The intermediat matmul output tensor can be used as the output
tensor for the binary calculation.

Remove the unused IsAttributeSupported from the
DnnlGemmNodeCapability class since we now support all of the
Gemm attributes in our implementation.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Wang <zhaoyang.wang@intel.com>
2021-08-23 08:45:34 -07:00
Guoyu Wang
89656bb712
[CoreML/NNAPI EPs] Move direct use of initializer data to unpacked tensor data (#8780) 2021-08-21 14:58:41 -07:00
KeDengMS
0c5a305742
Bump up Nuphar cache version (#8806)
To avoid confusion with 2.3.0
2021-08-21 12:05:05 -07:00
Suffian Khan
9fa0d8392a
Extend node debugging utilities to push tensors and node placement to SQL database (#8672)
* adding support for tracing to sqldb instead of files

* use compiled statements

* script to pull tensors from db

* link sqlite3

* remove node info redundant with onnx graph

* addressing PR comments

* address PR comments and include program counter

* third party notice

* use find_pacakge

* add to cgmanifests.json

* address thread safety and add pid suffix

* build fi

* python script to select on devicetype

* remove unpopulated and redundant Shape and Type fields

* comment

* comment

* PR comments

* add graph execution counter to session state

* move increment to inference session

* std::endl to \n

* ifdef on graph execution counter

* add ifdef to inference session

* move DEBUG_NODE_INPUTS_OUTPUTS to CMakeLists.txt
2021-08-21 00:40:12 -07:00
Olivia Jain
4666a49106
Add Component Governance (#8794)
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
2021-08-20 17:41:18 -07:00
XiyinOSS
19b82b438b
GridSample OP implementation for CPU and CUDA (#8551)
* GridSample OP implementation for CPU and CUDA

**Description**: This change contains implementation for torch grid_sample OP.
Cuda implementation contains contribution from Muscle Wu.

* Use interpolation for out-of-bound points in zero padding mode

Out-of-bound points in zeros padding mode changed from constant 0 to
interpolation of surrounding pixels. This aligns with Pytorch implementation.

A bug in CUDA batch offset calculation is fixed.

Custom op exporter type is added.

* Fix nearest bug in CPU

* Update per CI build finding and review comments

* Force float to avoid potential integer T issue

* Style update

* PR update

* Remove c++17 feature from cuda code
2021-08-20 12:37:38 -07:00
Thiago Crepaldi
6f2f4721ec
Update Python setuptools classfiers to remove windows and mac (#8776) 2021-08-20 08:53:25 -07:00
Chen Fu
c117ac57b7
New S8S8 Neon kernel for arm64 only (#8783)
Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-08-19 15:20:57 -07:00
Edward Chen
94c3e2048b
[convert_onnx_models_to_ort.py] Add option to specify NNAPI EP partitioning stop ops. (#8668)
Add option to specify NNAPI EP partitioning stop ops from the ORT format model conversion script.
2021-08-19 13:02:28 -07:00
Sherlock
81889a1cf6
Invertible ReluGrad (#8773)
* Invertible Relu Grad
2021-08-19 11:29:05 -07:00
liqun Fu
2beb873c6b
move training CI agent pools to 1ES hosted (#8775) 2021-08-18 18:36:19 -07:00
pengwa
39059f2539
enable torch interop build (#8493)
* fix build - python.h not found

* disable --build_shared_lib for ortmodule tests

* fix

* fix the build flag

* disable --build_shared_lib for training path (not only for ortmodule)

* fix missing test model files

* disable test CApiTest.test_custom_op_library when ENABLE_TRAINING_TORCH_INTEROP is ON

* enable custom_op_library build

* fix build

* fix

* merge master and fix build failure

* build onnx_test_runner when onnxruntime_ENABLE_TRAINING_TORCH_INTEROP is ON

* resolve comments

* use --enable_training_torch_interop to replace "onnxruntime_ENABLE_TRAINING_TORCH_INTEROP=ON"
2021-08-19 09:16:32 +08:00
Chi Lo
51152e1aaa
Integrate TensorRT EP libs into existing GPU Nuget Package (Approach#1) (#8727)
* Merge CPU/GPU nuget pipeline

* Include TensorRT EP libraries into existing GPU nuget package pipeline

* modify to use correct YAML

* Modify for test

* modify for test

* Add depedance

* Add depedance (cont.)

* modify for test

* Add create TensorRT nuget package

* modify for test

* modify for test

* Merge CPU/GPU nuget pipeline

* Include TensorRT EP libraries into existing GPU nuget package pipeline

* modify to use correct YAML

* Modify for test

* modify for test

* Add depedance

* Add depedance (cont.)

* modify for test

* Add create TensorRT nuget package

* modify for test

* fix merge bug

* code refactor

* code refactor

* modify for test

* modify for test

* modify for test

* modify for test

* modify for test

* modify for test

* cleanup

* modify for test

* fix bug

* modify for test

* refactor

* fix bug and test

* Modify for test

* Modify for test

* Modify for test

* Modify for test

* Prepare for PR

* Prepare for PR

* code refacotr from review

* Remove naming 'Microsoft.ML.OnnxRuntime.TensorRT' to avoid confusion

* Add linux TensorRT libraries

* Remove redundant variable in YMAL

* revert file

* undo revert file

* Modify regular expression so that it can capture the correct file

* Remove newline at end of file

* small fix

* Revert to CUDA11.1 on Windows

* Add unit tests for nuget package on Linux

Co-authored-by: Changming Sun <chasun@microsoft.com>
2021-08-18 17:26:34 -07:00
Dmitri Smirnov
fe5046f48e
Add SparseToDenseMatMul to the list of required by test ops (#8774) 2021-08-18 14:08:07 -07:00
liqun Fu
fa9fcb5634
fixed the link (#8757) 2021-08-18 11:45:42 -07:00
Aaron Bockover
b2813656f5
eager: fix build against latest PyTorch master (#8745)
Improve README as well.
2021-08-18 14:27:21 -04:00
Yulong Wang
cb67fca738
[js/web] enable 'use_ort_model_bytes_directly' by default (#8734) 2021-08-18 11:18:58 -07:00
Changming Sun
401de5911b
Remove CUDNN dir from setup_env_cuda.bat (#8762) 2021-08-18 10:32:57 -07:00
KeDengMS
b0c707caa8
[Nuphar] Do not handle MatMulInteger with zero-points (#8760)
MatMulInteger can take zero-points as input, and Nuphar
does not handle that yet. Fall back to CPU EP in that case.
2021-08-18 10:32:42 -07:00
Chen Fu
00b345eb7b
ARM Neon S8S8 kernel for QGemm (#8695)
Using signed int, qgemm kernel avoids extending uint8 to int16 while computing matrix multiplication, achieving higher performance. We also find that by using only lower 64b of vector registers to load A and B matrix, we can get further performance improvements. We also experimented with using ldp to load two 64b in one shot, vs using two ldr to load one 64b at a time, in both Big and little cores, there is no noticeable differences.

Submitting the LDP version. At this point we don't need to choose kernel based on micro-architecture.

Inference time of resnet50, thread count 2

Big Core on Pixel 3a
Current master: 292.947 ms
First iteration S8S8: 188.239 ms
LDP load two 64b reg: 178.715 ms
LDR load one 64b reg: 179.536 ms

Little Core
Master: 546.317 ms
S8S8: 513.332 ms
LDP: 489.19 ms
LDR: 497.865 ms

Raspberry Pi 3B+
Master: 660.08 ms
S8S8: 608.577 ms
LDP: 603.675 ms
LDR 602.075 ms
2021-08-18 09:58:47 -07:00
Rachel Guo
78759059f1
[CoreML EP]Make coreml ep build on non-macOS platform (#8677)
* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* clean

* remove unused defs

* correct typo

* remove onnxruntime_coreml_proto

* cr comments

* enablie nnapi/coreml in minimal build

* enable nnapi/coreml in one build

* refine dependencies

* fix nnapi build failure and remove onnxruntime_coreml_proto dependencies in unit tests cmake files

* small fix

* fix

* fix build

* revert

* fix build

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2021-08-18 09:35:32 -07:00