Commit graph

2738 commits

Author SHA1 Message Date
Changming Sun
deea945f80
Remove openmp and scipy from build pipelines (#4305)
1. Remove openmp because the default thread pool is already good enough.
2. Remove scipy from build pipelines because it stops support python 3.5.
2020-06-23 20:18:16 -07:00
Yufeng Li
867ba846f7
Implement MinMax with SIMD (#4285)
* Implement MinMax with SIMD
2020-06-23 20:07:53 -07:00
edgchen1
4e39fda06a
Fix version of torch and torchvision in install_deps.sh. (#4316) 2020-06-23 14:55:18 -07:00
Bowen Bao
15cb4b3023
Fix session load state & run extra_postpasses only once (#4255)
* Fix session load state & run extra_postpasses only once

* add testcase for onnx model as well
2020-06-23 11:45:26 -07:00
Prabhat
d3c5cb6349
Use providers_available array from constants.h to avoid code duplication (#4300) 2020-06-23 11:52:51 +05:30
edgchen1
737c22a911
Refactor Python packaging builds (#4283)
Reuse the same template file for all Python packaging builds.
2020-06-22 17:13:22 -07:00
Tim Harris
9e3b5c62fb
Use OpenMP-like synchronization patterns in Eigen thread pool (#4236)
Updates the thread pool implementation to make work distribution over the Eigen thread pool more closely resemble techniques used in OpenMP. In particular:

(1) A thread entering a parallel loop works on the iterations itself, rather than requiring a thread switch to/from a thread in the pool, if called from outside the thread pool.

(2) To support this, work items pushed to the thread pool run a loop to claim iterations from a shared counter via atomic-fetch-and-add, as opposed to having work items themselves represent individual batches of iterations. This means that any thread working on the loop can execute any batch of iterations, including having the main thread run through all of the batches itself if the loop turns out to be short-running.

(3) As with OpenMP active scheduling, the worker loop spins waiting for work prior to blocking. This avoids OS blocking / wake-up paths in workloads with series of short-running parallel sections.
2020-06-22 10:04:53 +01:00
Prabhat
57fabfba7a
Added GetAvailableProviders() to C API (#4247)
* Added GetAvailableProviders to C API

* Fix API version and Windows build error

* Changed function name

* Changed ORT_API_VERSION to 4

* Moved all_providers array to constants.h

* Move check for providers to constants.h

* Changed name of array to avoid warning

* Address review comment

* Added unit test
2020-06-22 10:10:25 +08:00
Scott McKay
175983c082
Move memory info into IAllocator (#2850)
- Update IAllocator setup to move the OrtMemoryInfo to the base class instead of requiring derived classes to have that as a member and override a virtual method to return it.
  - Cleanup CreateAllocator setup to take an argument as to whether to wrap the device allocator in an arena allocator. The choice to do that isn't a property of the underlying device allocator.
  - Minor cleanups in the various EPs to adjust to the change to IAllocator and CreateAllocator, and to use the create_arena flag consistently when available.
2020-06-22 11:18:52 +10:00
Yang Chen
064afa0f93
define dim_idx before use it (#4290) 2020-06-20 21:05:13 -07:00
Pranav Sharma
2204d39a06
Add build option to disable traditional ML ops from the binary. (#4272)
* Add build option to disable traditional ML ops from the binary.

* Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources.

* Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.
2020-06-20 06:36:06 -07:00
alkoumpa
3c633384c2
Fix TensorRT memory leaks (#4227)
* fix tensorrt memory leaks

* wrap unique_pointer in a namespace to avoid conflicts

Co-authored-by: alex <act@act.com>
2020-06-20 03:37:38 -07:00
Derek Murray
a541d28fb4
Lazily get allocator when allocating an MLValue (#4276)
According to profiling in #4267, getting the allocator can account for a large fraction of overhead when accessing a kernel output, due to STL container operations. The allocator isn't used when (i) we're not creating a fence, and (ii) we have a memory pattern and a pre-allocated buffer, so we can avoid this overhead.
2020-06-19 15:55:43 -07:00
Yang Chen
a490beedf1
update tvm submodule (#4284) 2020-06-19 14:51:18 -07:00
Tianlei Wu
e08181f74d
Update Bert Notebooks for ORT 1.3.0 (#4274)
* update keras notebook
* re-run pytorch bert notebook
2020-06-19 14:02:16 -07:00
Tianlei Wu
466511c1c3
Update gpt2 benchmark with position_ids and fp16 (#4275)
* support position_ids input
* support fp16 conversion for gpt2 past state
* output results to csv file
* Remove the useless check that output of matmul is in cuda
2020-06-19 14:01:37 -07:00
Changming Sun
0349479b19
Fix component governance and codesign validation errors (#4277)
Adjust the job steps so that these security tasks run before the build directory clean up.
2020-06-18 15:54:18 -07:00
Hariharan Seshadri
d5610e666b
Support CUDA kernel for Einsum op (#4095) 2020-06-18 15:03:23 -07:00
goloskokovic
478b923e19
Expose ACL/ARMNN providers to Python (#4260)
* expose ACL/ARMNN providers to python

* add -acl / -armnn to package name when use_acl / use_armnn is specified

* build python wheel for ARMNN EP

* link ACL/ARMNN EPs into onnxruntime_pybind11_state

* wrong argument order in build_python_wheel for wheel_name_suffix
2020-06-18 20:24:14 +05:30
Changming Sun
e505faa022
Fix two compiler warnings (#4263) 2020-06-17 20:47:01 -07:00
Tracy Sharpe
5d773ee57b
MLAS: add sgemv path for aarch64 builds (#4254)
Implement a fast path for GEMMs where M=1 and TransB=CblasNoTrans.
2020-06-17 20:10:35 -07:00
Chih-Hsuan Yen
5da849b414
Fix detection of protobuf with onnxruntime_PREFER_SYSTEM_LIB on Linux (#4230)
The CMake module is FindProtobuf.cmake [1]. Thus the name should be
capitalized so that protobuf can be found on case-sensitive file
systems.

[1] https://github.com/Kitware/CMake/blob/v3.17.3/Modules/FindProtobuf.cmake
2020-06-17 17:34:47 -07:00
Changming Sun
43deec2174
Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266) 2020-06-17 16:25:24 -07:00
Vincent Wang
b41fcf1570
Bugfix for shape inference and GetShape. (#4243)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-06-17 15:11:02 +08:00
Yulong Wang
12367a6b11
[C#] enable string-typed FixedBufferOnnxValue in input (#4178) 2020-06-16 11:06:11 -07:00
Wei-Sheng Chin
189fb60ef9
Fix a bug and add code to profile memory (#4241)
* Fix a bug and add code to profile memory

1. Compile Send/Recv again (currently broken because of
   HOROVOD refactor).
2. Add code to print out initializer allocation size and
   activation memory size.

* Address comments

* Split memory counts per locations

* Fix a metric
2020-06-16 10:17:27 -07:00
edgchen1
63bf587623
Use azcopy to download test data (#4221)
Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy.
Update download_test_data.py to use helper function.
2020-06-16 10:14:34 -07:00
Tianlei Wu
61fa5476d5
Update PyTorch Bert notebooks (#4239)
update PyTorch Bert SquAD notebooks to use onnxruntim-tools and update usage of intra_op_num_threads.
rename python files according to coding style
Fix change_input_to_int32.
update keras notebook to copy script from rel-1.3.0 branch (Will update them later)
2020-06-16 09:36:51 -07:00
Weixing Zhang
7ccce4379e
Improve fast_divmod (#4224)
* improve fast_divmod

BERT-L throughput is improved about ~1.8%

* fix Win build.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-16 03:03:58 -07:00
Changming Sun
825392c25b
Fix ORT server CI build (#4165) 2020-06-15 21:26:19 -07:00
ytaous
5d28efd434
opset12 code cleanup (#4242)
* opset12 code cleanup

* opset12 code cleanup

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-15 19:45:35 -07:00
ytaous
e0334f177c
Opset12 upgrade for existing models used by perf/e2e pipelines (#4238)
* opset12 support

* opset12 support

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-15 14:26:53 -07:00
Ashwini Khade
4486c66ed4
enable conv transpose 3D (#4218)
* enable convtranspose 3D

* test fix
2020-06-15 13:38:32 -07:00
Bowen Bao
b08771f00e
Add ONNX Training Post-Passes to Front-End - Cont (#4041)
* Add ONNX postpasses

* add flag + add bert test from onnx file

* address PR comments

* fix typo

* fix rebase

* address comments

* Fix test failures

* add new pass for expand for new pt version, add comments

* fix rebase

Co-authored-by: lahaidar <lahaidar@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-15 10:33:26 -07:00
Cecilia Liu
0b5bbb16b8
Benchmark With IO Binding (#4206)
* add io binding to benchmark.py
2020-06-15 10:06:33 -07:00
Weixing Zhang
b4b1c6440a
Enable ORT with CUDA 11 toolkit (#4168)
* ORT on CUDA 11

1. Seperate HOROVOD and MPI
2. Seperate NCCL from HOROVOD in CMakeLists.txt
2. Remove dependency on external cub
3. cudnnSetRNNDescriptor is changed in cuDNN 8.0

* polish the code about MPI/NCCL in CMakeLists.txt and build.py

* check CUDA version

* ${MPI_INCLUDE_DIRS} should be PUBLIC

* sm30, sm50 are deprecated in CUDA 11 Toolkit

* update change based on code review feedback.

* add sm_52

* improve MPI/NCCL build path

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-06-15 08:47:03 -07:00
Emad El-Haraty
88a9cceb41
fix relative links in CONTRIBUTING.md (#4212)
* fix a links to Engineering Design and API in CONTRIBUTING.md

* fix additional links in CONTRIBUTING.md

* correct the link to the public API in CONTRIBUTING.md

Co-authored-by: Emad El-Haraty <emad.elharaty@limebike.com>
2020-06-15 06:48:09 -07:00
Guoliang Hua
d0d31efd86
fix transformer doc format (#4003)
fix transformer doc format
2020-06-15 01:30:47 -07:00
Wei-Sheng Chin
ecc901717e
Use subset to release gradient tensors earlier (#4222) 2020-06-14 22:52:54 -07:00
Andrews548
886befaba1
Add BatchNorm and Concat to ACL EP (#4190)
* Fix acl padding

* Add BatchNormalization operator to ACL Execution Provider

* Add Concat operator to ACL Execution Provider

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-06-14 21:48:22 -07:00
Hariharan Seshadri
877862184e
Fix subgraph based reshape fusion (#4185) 2020-06-14 21:10:08 -07:00
Tracy Sharpe
bf3c32166d
fix optional input/outputs (#4229) 2020-06-15 08:10:22 +10:00
Hariharan Seshadri
5708c4feaf
Handle corner case in Resize op (#4183)
* Handle corner case in Resize op

* Nit

* Fix build

* PR feedback
2020-06-13 18:05:25 -07:00
Tracy Sharpe
7a96cfc8f5
operator code cleanup (#4228)
Search/replace of the pattern "const auto foo = tensor.Shape()" to "const auto& foo = tensor.Shape()" to avoid unneeded copies at runtime and reduce code size (8KB drop for onnxruntime.dll). Remove some unnecessary header includes.
2020-06-13 14:47:44 -07:00
jornt-xilinx
c55f6d76be
[Vitis-AI EP] Fix to enable multi-output subgraphs inside Vitis-AI EP + edit docs (#4171) 2020-06-13 04:56:07 -07:00
Wei-Sheng Chin
de9da123cf
Enable static memory planning for pipeline. (#4204)
* Enable static memory planning for pipeline.
1. We fix a bug when resolving symbolic shape for scalars.
2. We pass the original inputs to all pipeline stages so that
   the symbolic shapes can be resolved.

* Further Improvements
1. Address comments.
2. Further reduce activation size by ~50% when pipeline is on.
   This is done by removing all but one gradient tensor from the last
   RecordEvent in the backward pass.

* Address a comment

* Fix Windows build
2020-06-12 21:43:50 -07:00
Hariharan Seshadri
b377266eb3
Fix Mac build linker warnings (#4155) 2020-06-12 21:10:12 -07:00
Hariharan Seshadri
91a41298cc
Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954) 2020-06-12 19:32:57 -07:00
Tracy Sharpe
155e22d1ab
MLAS: fuse float output into quantized GEMM (#4215)
Add more variants of MlasGemm that do a u8x8 GEMM with the output type as float. This fuses the common sequence of MatMulInteger + Cast + Mul(OutputScale) + optional Add(BiasVector).
2020-06-12 17:50:40 -07:00
Tiago Koji Castro Shibata
2e3607c7cd
Remove hardcoded desktop lib (#4193) 2020-06-12 16:51:54 -07:00