Commit graph

1070 commits

Author SHA1 Message Date
daquexian
3cbbf9dcae
Fix wasm static lib in sub-project (#11671)
* wasm_static_lib_global

Signed-off-by: daquexian <daquexian566@gmail.com>

* make wasm static lib global

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix the property

Signed-off-by: daquexian <daquexian566@gmail.com>

* add code missing after merge

Signed-off-by: daquexian <daquexian566@gmail.com>
2022-06-14 15:18:11 -07:00
Gary Miguel
e8b0d24071
Support per-test tolerances for ONNX tests (#11775)
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.

In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.

Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.

Also fix various minor issues raised by linters.

Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
2022-06-14 15:12:23 -07:00
Scott McKay
6bf6bac1fd
Add patching of xnnpack CMakeLists.txt to allow building with Emscripten. (#11829) 2022-06-14 09:31:17 +10:00
Hector Li
7582644f57
cmake changes for SNPE EP (#11821)
* move code used to find the SNPE libs to a separate cmake file

* Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict
2022-06-13 08:15:37 -07:00
Guenther Schmuelling
d4ea59654c
make xnnpack build for ort-web (#11745)
* make xnnpack build for ort-web

* make ci happy
2022-06-10 08:47:57 -07:00
Vincent Wang
5ecfaef042
ATen Fallback for Inference (#11597)
* aten op for inference

* fix build error

* more some code to training only

* remove domain from operator name

* move aten_op_executor ext out from ortmodule

* add pipeline

* add exec mode

* fix script

* fix ut script

* fix test pipeline

* failure test

* rollback

* bugfix

* resolve comments

* enable aten for python build only

* fix win build

* use target_compile_definitions

* support io binding

* turn off aten by default

* fix ut

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
2022-06-09 16:07:30 +08:00
Alex Fuller
8156b9370c
[Abseil] Adding URL_HASH so that an existing archive can be used from disk (#11690) 2022-06-08 17:12:59 -07:00
Changming Sun
eeeb249a27
Update onnxruntime_providers.cmake to remove the reference of "onnxruntime_tvm_dependencies" (#11780) 2022-06-08 09:06:00 -07:00
Valery Chernov
4296968f20
[TVM EP] update set input method for VirtualMachine (#11674)
* update TVM

* get alignment constant from TVM

* update TVM_VM_SetInputs to upstream with TVM API

* fix CI issue: update TVM EP dependencies

* add sudo

* revert changes needed to install missing package

* add package for TVM EP CI

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-04 09:31:01 +02:00
Hector Li
95a16c1ffe
Snpe ep (#11665)
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments
2022-06-03 14:10:02 -07:00
Scott McKay
4445dd6bc1
XNNPACK EP (#11445)
* Implement XNNPACK support via an EP.
  * Layout transform uses the GraphPartitioner infrastructure.
  * Node fusion is supported.
  * Conv and MaxPool implementations were ported from Changming's PR.
  * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled
2022-06-03 20:22:34 +10:00
Scott McKay
4fabc400de
Fix CUDA 11.6 build error on Windows (#11578)
* Avoid windows header that defines 'small'
2022-05-28 08:04:46 +10:00
Yi Zhang
a3f05da338
Revert "[TVM EP] update set input to remove excess copying inside TVM (#11247)" (#11504)
This reverts commit 5ae461ec0a.
2022-05-13 02:27:36 +08:00
Tianlei Wu
ece1274ffa
revert safeint version (#11500) 2022-05-12 11:24:43 -07:00
Tianlei Wu
f5473596fa
Change longformer default kernel (#11470)
* change default to compact memory kernel
* Remove a cuda stream synchronize that is not needed
* Update longformer benchmark tool
2022-05-11 10:54:59 -07:00
symphonylyh
c2de603c10
Contrib ops for TRT plugin: Disentangled Attention Plugin (#11287)
* Add disentangled attention TRT plugin as contrib op

* update plugin name & remove null character

* update onnx-tensorrt submodule with my beta version

* use suggested plugin name & simpler shape propagation

* update onnx-tensorrt gitsubmodule to temporary fork

* update onnx-tensorrt to temporary commit

* redirect submodule back to latest 8.2-GA release of onnx-tensorrt repo

Co-authored-by: HHH-ComputeLab <haohangh@nvidia.com>
2022-05-08 15:25:25 -07:00
Dwayne Robinson
69b2fab810
Update DirectML from 1.8.0 to 1.8.2 (#11459) 2022-05-06 17:52:52 -07:00
Tang, Cheng
3f3c5fcd68
Unify the Compile API for mobile build and normal build (#10632)
* use the lightweight compile api as default; use dnnl ep for testing

* apply to tensorrt ep

* fix the missing files

* fix build

* fix the copy issue on linux

* migrate migraphx and openvino ep

* fix openvino build break

* fix linux build

* fix unused parameter

* fix coreml build

* use graph view's filtered initializers

* fix openvino break

* fix tvm compile api

* fix tvm / rknpu / vitisai ep build

* add IsInitializedTensor in graph_viewer; fix nuphar build

* use serializer directly as tvm ep is still static lib

* fix the type mismatch

* fix the type mismatch

* fix merge conflict

* add a comment

* fix minimal build

* fix the DML EP's legacy approach

* save type/shape in dnnl IR

* fix linux break

* fix tvm failure

* dnnl ep: move initializer referenced out of dnnl subgraph

* Revert "add IsInitializedTensor in graph_viewer; fix nuphar build"

This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587.

* add IsInitializedTensor to graph viewer

* add the legacy code for nuphar build to temporarily make nuphar build work

* ignore internal test for nuphar

* remove the out of date tests

* keep the legacy API in EP for a while

* turn serializer into a static function

* update comments

* fix tvm build

* Update include/onnxruntime/core/framework/execution_provider.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update include/onnxruntime/core/framework/execution_provider.h

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* Update onnxruntime/core/framework/execution_provider.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

* updatee comments; add warning message for legacy compil call

* add a flag to control out of scope arg in serialization

* fix trt  build; improve the test

* resolve merege errors

* fix a typo

Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
2022-05-05 08:30:07 -07:00
Valery Chernov
5ae461ec0a
[TVM EP] update set input to remove excess copying inside TVM (#11247)
* update TVM

* small fixes

* update TVM with new set_input and NDArray API

* use set_input instead of set_one_input

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-05-05 14:25:02 +02:00
Tang, Cheng
4b875e3543
Re-implment the function support in onnxruntime (#11167)
* initial fix

* refactor the function handle

* update the implementation

* fix linux build break

* fix training build

* fix minmal build

* fix gradient checker

* deprecate the local function members in graph. host it in model

* fix changming's comments

* fix comments about inlined containers

* fix a missed inlined container

* fix training build

* avoid const for std string_view

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2022-04-29 10:15:58 -07:00
Edward Chen
e194a01787
Update SafeInt version. (#11379) 2022-04-28 10:51:59 -07:00
Dmitri Smirnov
a7d0158c24
Introduce a way to disable Abseil library (#11353)
Introduce a way to disable Abseil library.
Use cmake extra args, no new build switch.
2022-04-27 08:57:52 -07:00
Scott McKay
63d4f45186
Add stub implementation of the NNAPI interface (#11288)
* Add stub implementation of the NNAPI interface so that model builder code can be unit tested on all platforms.

Needed to fix a lot of type mismatch warnings. As these don't occur on Android builds used static_cast for simplicity.
2022-04-27 15:39:09 +10:00
Tianlei Wu
1d96cbec73
Move gpt2 script to models\gpt2 sub-directory (#11256)
* move gpt-2 scripts to models\gpt2
* change gpt2 beam search helper to make test_gpt2 passes
2022-04-20 11:09:26 -07:00
cloudhan
013306c940
[MinBuild] 132KB minimal build binary size reduction via dummy __cxa_demangle (#11071)
Minimal build binary size reduction via dummy __cxa_demangle
2022-04-21 00:11:10 +08:00
Lukas Berbuer
efb0928e2b Fix find_package for benchmark 2022-04-18 15:25:43 -07:00
George Nash
d9eeb48393
One dnn v2.6 update (#11220)
* Disable training code in DNNL LayerNorm code

The capability code already does not claim the LayerNorm and
SkipLayerNorm that require more than one output. However,
building with training enabled was causing issues.

The training specific code has been removed even when building with
training enabled.

Signed-off-by: George Nash <george.nash@intel.com>

* Fix for DNNL FusedMatMul op.
The bug was in the transpose code.

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Use agreed upon memory format type when runnig Pooling Gradient in dnnl ep

The dnnl ep does not currently have a way to pass memory_format information
between the forward pooling primitive to the backward pooling primitive.

This change explicitly sets the memory_format to use match that of Onnxruntime.
For both the forward and backward pooling code. This will prevent using un-matched
memory format that could result in an `unimplemented` error from dnnl ep.

Signed-off-by: George Nash <george.nash@intel.com>

* Update dnnl ep to use OneDNN v2.6

Do not run ReduceInfLogSum on the kDnnlExecutionProvider due to a
calculation bug when doing Log or infinity valuse. The fix for this
issue will be part of the next OneDNN release.

Signed-off-by: George Nash <george.nash@intel.com>

* Update PrintMemory function in dnnl ep

This modification can be used to enable/disable memory printing
for dnnl ep develpers.  This is considered a developer only feature
and is disabled by default. It must be enabled and code recompiled
to use.

Even if it is enabled it will not actually print any memory because
the developer needs to take the extra step of spefifying the memory
that will be printed to the screen.

Signed-off-by: George Nash <george.nash@intel.com>

* Update binary ops to run on intel GPU when using dnnl ep

Binary ops (i.e. Add, Div, Mul, and Sub ) was updated to no longer
call GetMemoryAndReshape in the past this would move the memory from
CPU to the GPU.  This extra call is no longer needed since it is taken
care of by the GetMemoryInOrtFormat call. Removing the GetMemoryAndReshape
prevented copying the memory to GPU twice.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
2022-04-15 12:51:11 -07:00
Xavier Dupré
833f5d5604
Remove dependancy on EP TVM in unit test project (#11170) 2022-04-12 09:03:57 +02:00
Yi-Hong Lyu
749c0ddd1e
Upsample support NHWC (#10824)
This patch implement bilinear interpolation for Upsample/Resize 4-D input with
the outermost and innermost scale (usually channel of NHWC) as 1. It is
parallelized with output_height * output_width instead of one dimension only.

Besides, I also revert the HandleResize back to the original implementation for
TransposeOptimizerTests.TestResize* tests.

Finally, I add microbenchmark BM_NhwcUpsampleBilinear.
2022-04-11 11:39:17 -07:00
Tianlei Wu
00b595e389
move longformer and t5 to models subdirectory (#11161)
* move longformer scripts to models subdirectory
* Copy transformers\models\t5 to python package as well
2022-04-09 22:35:14 -07:00
Lukas
4c37f15c1b
Find boost, nsync, json, cpuinfo system libs with CMake option onnxruntime_PREFER_SYSTEM_LIB (#11146) 2022-04-08 00:11:02 -07:00
Lukas
1b664e6d4c
Link cpuinfo only if supported (#11147)
* Remove unnecessary target_include_directories for cpuinfo

Headers already exposed as public by CMake target: 5916273f79/CMakeLists.txt (L213)

* Link to cpuinfo library only if supported
2022-04-07 21:32:12 -07:00
Justin Stoecker
7609694464
Enable building with a GDK (#11126) 2022-04-07 15:06:31 -07:00
Maajid khan
81fa28bc56
OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025)
* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Modification to include new api 2.0 changes in the code

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Log comments updated

* Changes to enable 2.0 api

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix build issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes issues

*Fixes compiler warnings c4458 on windows.
*Fixes the bug in device_type check logic
*Adds print info for enable_opencl_throttling
option in onnxruntime_perf_test

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* commit to make openvino_2021.4 compatible

* Fixed IO Buffer Optimization

* Fix output names issue

* Fix 2021.3 branch

* Bug Fix for Multiple inputs/outputs

- Assigns the right output_name and
input_name for the graph when
returned by CompiledModel::inputs()
OV function.

- Also takex care of output mismatch
issue b/w openvino output and onnx
output

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add comments for the changes made

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* IO Buffer Changes

* Commit for Disabling GPU Throttling for 2021.4

* Updated branch

* Fix windows build

->Fixed windows build in debug mode
->Disabled scatternd3_tensor_int64

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed CPP Unit tests for CPU

-Fixed shrink, MVN, ReduceL2, Maxpool,
upsample, scatter, slice, reshape,
unsqueeze.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed first set of GPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed additional failing tests on GPU

->Added conditions to disable certain ops
under certain conditions

->Disabled certain tests

->Added some op supports for no_dimension
supported

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added Expand op support for CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added condition for squeeze op

->Shape can't have empty axes attribute

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add support for LessOrEqual op function

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV Interface wait for replaced by indefinite wait call

* use names from ONNX model to access OV tensors

This chnage is to use the input/output names
retrieved from original onnx model to access
OV tensors and to check if there's any input
or output names mismatch b/w ONNX naming
and OV naming.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes Myriad unit tests and other issues

->Fixes Myriad CPP unit tests
->Fixes output mismatch issue with models with
sub graph partitioning

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix segfault issue

->Fixed case 3b condition in get_capability()
which was causing the segfault issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed build isuse with ov 2021.4 with I/O buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disables performance counters for I/O Buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed inputs/outputs mismatch for HDDL with 2022.1

Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>

* Fix to enable GPU FP16

* Enabled mlperf_ssd_mobilenet_300 model fully on CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added ov version specific dll packaging for nuget

* Fixed conditions for few ops

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Dockerfile updates

* Updated License Info

-Updated the copyrights License Info
-modified FP16 transformations with OV 2022.1

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling mlperf_ssd_mobilenet_300 model

->Disabled this model for openvino. The
test is failing in Internal_CI pipelines.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling failing python CPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed flake8 python errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: hdgx <harinix.d.g@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>
2022-04-06 13:30:33 -07:00
Ben Niu
20fbf603d3
Fix ARM64EC build breaks (#11111)
Apply this 4c015dbb49 to fix ARM64EC build breaks.
2022-04-05 10:00:42 -07:00
Jack·Boos·Yu
01631893cd
[cmake] Re-factor pre-compile header usage (#11093) 2022-04-04 16:28:34 -07:00
Jack·Boos·Yu
ea004e953f
[cmake] Export multi targets in static build (#11063)
* [cmake] Export multi targets in static build

* Install more components in static build, format some code

* Fix code pos
2022-04-03 22:37:18 -07:00
Jack·Boos·Yu
2dfd81b9bb
[cmake] Add option onnxruntime_ENABLE_CPUINFO (#11084) 2022-04-01 22:29:27 -07:00
Chen Fu
dc72159105
Symmetric Quant indirect Conv kernel for ARMv8 A55 chip (#10862)
ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.

This change adds a Symmetric Quant indirect Conv kernel for a55 micro-architecture, where we replace

ldr q4,[x1],

with

ldr d4,[x1],
ldr x11,[x1],
ins v4.d[1],x11

so that we can try to hide the memory load cycles behind computing cycles in the kernel.

With this new kernel, cartoongan model shows significant perf improvement on Pixel5a little cores (2 threads running on two little cores):

new kernel: 2188.59 ms
old kernel: 2360.61 ms
2022-03-25 17:10:47 -07:00
leqiao-1
8ddc45f52d
Add linux and macos arm64 java aritifacts (#10981) 2022-03-25 16:23:17 -07:00
Jack·Boos·Yu
d1be71eaa3
[cmake] Add keyword STATIC to add_library in function onnxruntime_add_static_library (#10998) 2022-03-25 16:19:36 -07:00
pengwa
89ef987ab1
Improve NonZero on CUDA/ROCM (#10307)
* improve NonZero

* fix megatron_fp16 optimzier, fix the doc

* multi_tensor_applier

* resolve comment

* fix building warning

* fix build error when enabling training and use tensorrt
2022-03-25 07:35:45 +08:00
Shucai Xiao
7ee52fb8a0
amdmigraphx_ep-add ops to be supported by migraphx and fixed a bug in check ops to be supported (#10496)
* backup debugging information related to debugging a jira ticket

* fixed a bug in checking whether an input can be constand folded

* added more operators that are supported by migraphx

* revert unnecessary changes

* remove unused logger parameter

* rename function to make name style consistent

* backup code changes

* fix review comments

* refactor graph utility functions to add unit tests

* backup additional changes

* fixed a link error in build migraphx_basic_test

* add unit test for some migraphx utility functions

* add more supported ops in migraphx
2022-03-23 19:17:19 -07:00
Baiju Meswani
565318ce86
Support ORT WASM compilation with the training flag (#10973)
* Add training support for ORT web assembly compilation

* Use wrapper for eigen includes in training
2022-03-22 16:13:35 -07:00
Yulong Wang
dce5d719c5
add build flag for emscripten settings (#10963)
* allows multiple '--cmake_extra_defines' flags

* fix flake8 error

* Add build flag for emscripten settings

* remove "emscripten_settings" in generate_build_tree()

* format code
2022-03-22 11:55:45 -07:00
Sunghoon
b34d9f6867
[js/wasm] Add WebAssembly static library build into web CI pipeline (#10959)
* add webassembly static library build into ci

* add webassembly static library build into ci

* skip publishing on static lib

* fix type
2022-03-21 15:49:49 -07:00
Tiago Koji Castro Shibata
5ed2f4ad5f Remove Windows Store specific code 2022-03-17 23:38:14 -07:00
Valery Chernov
625a1f7673
[TVM EP] code refactor (#10655)
* rename info to options for TVM EP

* transfer options processing from TVMExecutionProvider to TVMEPOptions

* transfer TVMRunner to separated files

* implement TVMCompiler class

* replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider

* correct logging of TVM EP options

* RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented

* add prepareComputeInfo method

* remove update_output_shapes flag

* embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner

* refactor compileModel method

* small cleaning

* separate TVM EP options data store and processing

* replace TvmTensorShape by InlinedVector with max_size 5

* correct indentation

* update TVM hash

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-03-16 13:55:04 +01:00
Scott McKay
f385c73058
Fix a couple of issues with the python package tools (#10858)
* Tweaks to the model utils
  * Add handling for a dim_value of -1 when replacing the entire input shape. This occurs in models exported from PaddlePaddle
  * make pytorch helpers accessible in package
  * make QDQ helpers accessible in package
2022-03-15 15:52:12 +10:00
Edward Chen
e53422c6d0
Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765)
Add runtime optimization support to ONNX -> ORT format conversion script.
Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option.
2022-03-14 16:50:41 -07:00