Commit graph

478 commits

Author SHA1 Message Date
Changming Sun
c2c4e6760b
Fix code sign validation errors in nuget and nodejs pipeline (#4527) 2020-07-20 14:18:47 -07:00
Wei-Sheng Chin
21d2728974
Revise pipeline schedule to consider communication ops (#4524)
* Revise pipeline schedule to consider communication ops

* Add test

* Fix warning

* inline some short functions

* Fix warnings

* Rename a class

* Add comment for test

* op renamed to task

* Fix NVTX wrapper's bug
2020-07-17 10:04:56 -07:00
Changming Sun
8ada440961
Move model tests to onnxruntime_test_all (#4521)
1. Move model tests to onnxruntime_test_all
2. Publish TestResults of Windows CI build.
2020-07-15 16:46:18 -07:00
stevenlix
0ebe2fab51
Refactor TensorRT EP code to better handle dynamic shape subgraphs (#4504)
* build engine in runtime for dynamic shape subgraphs

* Update TensorRT-ExecutionProvider.md

* Update TensorRT-ExecutionProvider.md

* fix build issue

* Add more instructions on how to use engine caching

* add precision to trt node name

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc
2020-07-15 02:35:42 -07:00
Yufeng Li
5dc7339be6
Add quantization tool to python package (#4458)
* Add quantization tool to python package
2020-07-08 21:42:53 -07:00
Tracy Sharpe
aa06d308a6
Build new AVX file with /ARCH:AVX (#4442)
Build new file with /ARCH:AVX on Windows to ensure correct vzeroupper behavior.
2020-07-07 12:00:12 -07:00
EronsJ
632b2896f3
Onnxruntime fuzzing (#4341)
* Add protobuf mutator library as a git submodule

* Added files and instructions to build the protobuf mutator library in CMake

* Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019'

* Added src files and build instructions for the main fuzzing engine

* Removed Random number generation test from inside the engine

* Added license header to files

* Removed all pep8 violations introduced by this change and other E501 violations
2020-07-06 16:34:34 -07:00
Christian Goll
3588484336
use system libnsync (#4377)
* use system libnsync
2020-07-06 07:53:22 -07:00
suffiank
f6bf66c8cf
Adjustments to MPI and NCCL library discovery on build (#4407)
* cmake edits for mpi_home and nccl_home

* cmake syntax error on else
2020-07-02 12:03:42 -07:00
Tiago Koji Castro Shibata
7fea332f93
Support builds without RTTI (#4333)
* Support builds without RTTI

* Disable RTTI in all builds
2020-07-01 13:05:35 -07:00
Zhang Lei
94c98aa0a7
qlinaradd for arm/sse2/avx2 using intrinsic, enable binary broadcasting parallel (#4216)
* Support quantization linear binary element wise math ops, implement QLinearAdd.
Support tests for quantization linear binary element wise math ops, implement test for QLinearAdd.
Add QlinearAdd with SSE2 intrisinc implemntation, Avx2 assembly implemntation, Neon intrisinc support.
QLinearAdd support VectorOnVector, VectorOnScalar, ScalarOnVector.
Generalized QlinearBinaryOp parallel related with broadcasting.

* Modify according to PR feedbacks. Mainly:
    * template helper for generalize the qladd logic on v2v, s2v, v2s
    * remove GetKernel related.
    * change mixed lagecy MM/SSE code in the AVX code
    * formater, typos, convensions, etc.

* Utilize MlasSubtractInt32x4 in MlasDequantizeLinearVector().

* Some format fix.

* More nature parallel parameter type.

* Fix build break for x86.

* Comment goes to 80 before wrap.

* Many change on assembly on Marco related.
Using vminps than vpminsd to handle NaN.
tested on windows.

* Using CLang Format to format the file.

* Fix arm32 build error.

* Remove some duplicate in different #if defined

* working add.u8.vector to vector

* Fix runtime bus error on real arm32 linux.

* fix typo in store last one lane.

* arm32 qlinearadd handle scalar.

* Move qladd to seperate c++ file

* Add neon64 qladd.

* refactor some, enhance two instructions on arm64 only instructions

* Fix typo for arm64

* use strict op in pure c++ (min/max on float value)

* sse2 new version.

* mrege arm/sse2/avx2

* pass arm/sse/avx2 linux test

* remove non-used assembly file.

* Remove unused data definition and tailing spaces.

* Fix broadcasting parallel issue.

* Enhance broadcasting scenarios. Allow testing result diff due to round
on half.

* Add Mlas or MLAS_ prefix for namespace safety.

* Handle alignment issue for arm32 for GCC/MSVC. remove some unused
signed/unsigned int ops.

* Specify /arch:AVX2 for qladd_avx2.cpp

* Fix type during copy/paste when unrolling. Better one GreatEqual
condition. Better formater by splitting two statements on single line.

* Arm neon alignment parameter is bits rather than bytes, change it.

* Move qladd_avx2.cpp to intrinsics/avx2/ folder

* Formatting using mlas style.

* Double check mlas style for these files.

* change indent 2 to 4 for qladd_avx2.cpp

* Fix windows x86 build error due to sse2 no _mm_cvtsi128_si64

* To re-trigger all as old failed pipeline updated.

Co-authored-by: Lei Zhang <phill.zhang@gmail.com>
2020-07-01 11:54:44 -07:00
Weixing Zhang
2601f8e1b4
Support to build CUDA EP for NV Ampere GPU (#4345)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-29 21:46:13 -07:00
gwang-msft
9e0f5fc7af
The initial PR for NNAPI EP (#4287)
* Move nnapi dnnlib to subfolder

* dnnlib compile settings

* add nnapi buildin build.py

* add onnxruntime_USE_NNAPI_BUILTIN

* compile using onnxruntime_USE_NNAPI_BUILTIN

* remove dnnlib from built in code

* Group onnxruntime_USE_NNAPI_BUILTIN sources

* add file stubs

* java 32bit compile error

* built in nnapi support 5-26

* init working version

* initializer support

* fix crash on free execution

* add dynamic input support

* bug fixes for dynamic input shape, add mul support, working on conv and batchnorm

* Add batchnormalization, add overflow check for int64 attributes

* add global average/max pool and reshape

* minor changes

* minor changes

* add skip relu and options to use different type of memory

* small bug fix for in operator relu

* bug fix for nnapi

* add transpose support, minor bug fix

* Add transpose support

* minor bug fixes, depthwise conv weight fix

* fixed the bug where the onnx model input has mismatch order than the nnapi model input

* add helper to add scalar operand

* add separated opbuilder to handle single operator

* add cast operator

* fixed reshape, moved some logs to verbose

* Add softmax and identity support, change shaper calling signature, and add support for int32 output

* changed the way to execute the NNAPI

* move NNMemory and InputOutputInfo into Model class

* add limited support for input dynamic shape

* add gemm support, fixed crash when allocating big array on stack

* add abs/exp/floor/log/sigmoid/neg/sin/sqrt/tanh support

* better dynamic input shape support;

* add more check for IsOpSupportedImpl, refactored some code

* some code style fix, switch to safeint

* Move opbuilders to a map with single instance, minor bug fixes

* add GetUniqueName for new temp tensors

* change from throw std to ort_throw

* build settings change and 3rd party notice update

* add readme for nnapi_lib, move to ort log, add comments to public functions, clean the code

* add android log sink and more logging changes, add new string for NnApiErrorDescription

* add nnapi execution options/fp16 relax

* fix a dnnlibrary build break

* addressed review comments

* address review comments, changed adding output for subgraph in NnapiExecutionProvider::GetCapability, minor issue fixes

* formatting in build.py

* more formatting fix in build.py, return fail status instead of throw in compute_func

* moved android_log_sink to platform folder, minor coding style changes

* addressed review comments
2020-06-26 00:02:39 -07:00
Changming Sun
deea945f80
Remove openmp and scipy from build pipelines (#4305)
1. Remove openmp because the default thread pool is already good enough.
2. Remove scipy from build pipelines because it stops support python 3.5.
2020-06-23 20:18:16 -07:00
Yufeng Li
867ba846f7
Implement MinMax with SIMD (#4285)
* Implement MinMax with SIMD
2020-06-23 20:07:53 -07:00
Pranav Sharma
2204d39a06
Add build option to disable traditional ML ops from the binary. (#4272)
* Add build option to disable traditional ML ops from the binary.

* Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources.

* Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.
2020-06-20 06:36:06 -07:00
Yang Chen
a490beedf1
update tvm submodule (#4284) 2020-06-19 14:51:18 -07:00
goloskokovic
478b923e19
Expose ACL/ARMNN providers to Python (#4260)
* expose ACL/ARMNN providers to python

* add -acl / -armnn to package name when use_acl / use_armnn is specified

* build python wheel for ARMNN EP

* link ACL/ARMNN EPs into onnxruntime_pybind11_state

* wrong argument order in build_python_wheel for wheel_name_suffix
2020-06-18 20:24:14 +05:30
Tracy Sharpe
5d773ee57b
MLAS: add sgemv path for aarch64 builds (#4254)
Implement a fast path for GEMMs where M=1 and TransB=CblasNoTrans.
2020-06-17 20:10:35 -07:00
Chih-Hsuan Yen
5da849b414
Fix detection of protobuf with onnxruntime_PREFER_SYSTEM_LIB on Linux (#4230)
The CMake module is FindProtobuf.cmake [1]. Thus the name should be
capitalized so that protobuf can be found on case-sensitive file
systems.

[1] https://github.com/Kitware/CMake/blob/v3.17.3/Modules/FindProtobuf.cmake
2020-06-17 17:34:47 -07:00
Wei-Sheng Chin
189fb60ef9
Fix a bug and add code to profile memory (#4241)
* Fix a bug and add code to profile memory

1. Compile Send/Recv again (currently broken because of
   HOROVOD refactor).
2. Add code to print out initializer allocation size and
   activation memory size.

* Address comments

* Split memory counts per locations

* Fix a metric
2020-06-16 10:17:27 -07:00
Weixing Zhang
b4b1c6440a
Enable ORT with CUDA 11 toolkit (#4168)
* ORT on CUDA 11

1. Seperate HOROVOD and MPI
2. Seperate NCCL from HOROVOD in CMakeLists.txt
2. Remove dependency on external cub
3. cudnnSetRNNDescriptor is changed in cuDNN 8.0

* polish the code about MPI/NCCL in CMakeLists.txt and build.py

* check CUDA version

* ${MPI_INCLUDE_DIRS} should be PUBLIC

* sm30, sm50 are deprecated in CUDA 11 Toolkit

* update change based on code review feedback.

* add sm_52

* improve MPI/NCCL build path

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-15 08:47:03 -07:00
Hariharan Seshadri
b377266eb3
Fix Mac build linker warnings (#4155) 2020-06-12 21:10:12 -07:00
Tiago Koji Castro Shibata
2e3607c7cd
Remove hardcoded desktop lib (#4193) 2020-06-12 16:51:54 -07:00
Yulong Wang
73bc6be5d1
build: split nodejs binding build and test to avoid timeout issue (#4188)
* split nodejs binding build and test

* enable nodejs tests
2020-06-10 19:16:32 -07:00
Dmitri Smirnov
af0750ba1b
Java GPu artifact naming (#4179)
Modify gradle build so artifactID has _gpu for GPU builds.
  Pass USE_CUDA flag on CUDA build
  Adjust publishing pipelines to extract POM from a correct path.

Co-Authored-By: @Craigacp
2020-06-10 11:15:48 -07:00
Tiago Koji Castro Shibata
8eb6a539bd
Hardcode WinML tests umbrella lib (#4161) 2020-06-08 15:24:08 -07:00
Tiago Koji Castro Shibata
6bbd18efd0
Hardcode WinML umbrella lib to windowsapp.lib (#4133) 2020-06-08 11:04:44 -07:00
Andrews548
62b44527e5
Add ArmNN Execution Provider (#3714)
* Add ArmNN Execution Provider

Add a new execution provider targeting Arm architecture based on ArmNN.
Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models.

reviewed-by: mike.caraman@nxp.com

* Minor fixes

- renamed onnxruntime_ARMNN_RELU_USECPU to onnxruntime_ARMNN_RELU_USE_CPU
- fixed acl typo

* remove extra includes. added exception for ArmNN in test

* fix indentation

* Separated the activation implementation from the cpu and fixed the blockage from the endif

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-06-03 22:57:51 +05:30
Dmitri Smirnov
afca0d15ee
Create Java publishing pipeline (#3944)
Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.
2020-06-01 18:18:57 -07:00
Pranav Sharma
6c1b2f33b7
Fix crash reported in #4070. (#4091)
* Fix crash reported in #4070.

* Add newline to warning message

* Add comment for using cout instead of the logger
2020-06-01 15:27:14 -07:00
edgchen1
a715d55bcc
Training Python package fixes (#4063)
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
2020-06-01 09:30:56 -07:00
edgchen1
38d76cc904
Clean up training E2E test (#4078)
Update training E2E build to not go through CTest and call test scripts directly.
2020-05-29 09:20:47 -07:00
Yulong Wang
b3ec8035ee
[Node.js binding] add build flag for node.js binding (#3948) 2020-05-27 13:30:22 -07:00
Tracy Sharpe
0d8abc1a99
MLAS: qgemm refactoring (#4030)
Treat U8U8 as U8S8 for VNNI for performance and optimize SSE2 kernel.
2020-05-26 17:27:32 -07:00
Paul Fultz II
7759136610
Add amd migraphx execution provider to onnx runtime (#2929)
* Add amd migraphx execution provider to onnx runtime

* rename MiGraphX to MIGraphX

* remove unnecessary changes in migraphx_execution_provider.cc

* add migraphx EP to tests

* add input requests of the batchnorm operator

* add to support an onnx operator PRelu

* update migrapx dockerfile and removed one unused line

* sync submodules with mater branch

* fixed a small bug

* fix various bugs to run msft real models correctly

* some code cleanup

* fix python file format

* fixed a code style issue

* add default provider for migraphx execution provider

Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
2020-05-27 04:24:59 +08:00
Tiago Koji Castro Shibata
faf65e960f
Refactor delayloading (#4019)
* Refactor delayloading

* Remove explicit linking to windowsapp.lib
2020-05-24 23:26:30 -07:00
Wei-Sheng Chin
24eda3df33
Create Utils for Adding Range and Marker (#4013)
In this PR, we
  1. create some APIs for creating NVTX objects
  2. apply those APIs in pipeline-related operators and sequential executor.
As a result, we can explicitly see how a pipeline schedule is run by GPUs in 
Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's
limited support.
2020-05-24 22:55:24 -07:00
Scott McKay
475e7e43e6
Older flake8 versions report false positives and don't handle the same things in the config file. (#3983)
Require the current flake8 version or later so we get consistent results.
2020-05-20 07:29:22 +10:00
edelaye
64b5f7edf6
Initial release of Vitis-AI Execution Provider (#3771)
* Initial release of Vitis-AI Execution Provider

* Add documentation, fix for onnxruntime::Model changes and use stringstream instead of file dump for model passing

* - Add Vitis-AI docker file
- Add online quantization flow Vitis-AI execution provider
- Fix remarks

* - Add fatal error build message for Vitis-AI cmake build on Windows
- Fix pep8 issue in build.py
- Add Vitis-AI execution provider example in docs

Co-authored-by: Elliott Delaye <elliott@xilinx.com>
Co-authored-by: Jorn Tuyls <jornt@xilinx.com>
Co-authored-by: Jorn Tuyls <jtuyls@users.noreply.github.com>
2020-05-19 05:32:32 -07:00
gwang-msft
a75a83b41a
Minor android build fix (#3980) 2020-05-18 19:20:23 -07:00
Adam Pocock
9d2d1eb6f6
[java] Adds a CUDA test (#3956)
* [java] - adding a cuda enabled test.

* Adding --build_java to the windows gpu ci pipeline.

* Removing a stray line from the unit tests that always enabled CUDA for Java.
2020-05-18 12:05:51 -07:00
ytaous
bc441b7e5c
Add cpu/mem usage for perf metrics (#3947)
* add cpu/mem usage

* on comments

* on comments

* renaming

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-05-15 12:29:40 -07:00
Jeff Bloomfield
e6da5946d1
Update DML Nuget version and DML EP Doc (#3945)
Update DML Nuget version and DML EP Doc
2020-05-14 17:33:46 -07:00
Scott McKay
5e0928a777
Enable running PEP8 on python scripts using flake8 (#3928)
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
2020-05-15 07:15:06 +10:00
gwang-msft
cba8bdc790
Make some compile change for Android NNAPI provider using DNNLibrary (#3935)
* Change compile settings for NNAPI with DNNLib

* update build.py

* update build readme
2020-05-14 10:53:37 -07:00
Tiago Koji Castro Shibata
385073e1cd
Fix DmlCopyTensor test (#3923)
* Fix heap corruption

* Cleanup
2020-05-13 09:14:55 -07:00
Ori Levari
7b858d60b0
Various changes for automated downlevel test pipeline (#3901)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2020-05-12 17:22:47 -07:00
Hariharan Seshadri
3065219cc1
Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871) 2020-05-12 17:07:06 -07:00
Xiang Zhang
bccbdd03f1
User/xianz/enable batch tests (#3914)
* enable batch tests in winml_image_test

* copy batchGroundTruth folder

* skip GPU tests when GPU is unavailable
2020-05-12 15:46:46 -07:00