Commit graph

2772 commits

Author SHA1 Message Date
Hariharan Seshadri
012aaa6491
Minor optimization in CUDA Reduction ops (#4353) 2020-06-28 01:14:28 -07:00
Scott McKay
274e6b4153
Cleanup SessionState. Move allocator lookup to SessionState. (#4194)
* Move allocators to SessionState so they're decoupled from ExecutionProviders
  - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored
  - add device based lookup
    - simplifies logic for copying feeds/fetches across devices
Cleanup SessionState and SessionStateInitializer
  - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters
  - simplify SessionStateInitializer
    - reduced down to FinalizeSessionState method
2020-06-28 14:55:42 +10:00
S. Manohar Karlapalem
4a1ecd9879
[OpenVINO] Upgrade OpenVINO docker base to Ubuntu 18.04 (#4346)
* update deps installer to ov 2020.3

* Upgrade docker base to Ubuntu 18.04
2020-06-27 01:57:47 -07:00
Du Li
d1777910a8
fix onnx server build failure. (#4347) 2020-06-26 15:15:58 -07:00
liqunfu
c3c4ce5ceb
refactor prototypes into headers (#4337)
* refactor prototypes into headers
2020-06-26 12:02:14 -07:00
Yufeng Li
fc5e65a22d
Add quantization support for GPT2 past state and use model to generate outputs in OpTester (#4340)
* Make quantization support GPT2 past state
* Make OpTester to be able to generate reference outputs with a model. With it, there is no need to compute outputs manually, which are impossible for some cases.
2020-06-26 09:29:29 -07:00
S. Manohar Karlapalem
ceedf126a2
[nGraph] Deprecation notice for nGraph EP (#4344) 2020-06-26 01:15:34 -07:00
ytaous
381f4c442a
LayerNormFusion - Cast support (#4320)
* cast support for layernormfusion

* cast support for layernormfusion

* bug fix

* fix build

* bug fix

* fix test

* minor refactor

* on comments

* on comments

* on comments

* on comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-26 00:04:12 -07:00
gwang-msft
9e0f5fc7af
The initial PR for NNAPI EP (#4287)
* Move nnapi dnnlib to subfolder

* dnnlib compile settings

* add nnapi buildin build.py

* add onnxruntime_USE_NNAPI_BUILTIN

* compile using onnxruntime_USE_NNAPI_BUILTIN

* remove dnnlib from built in code

* Group onnxruntime_USE_NNAPI_BUILTIN sources

* add file stubs

* java 32bit compile error

* built in nnapi support 5-26

* init working version

* initializer support

* fix crash on free execution

* add dynamic input support

* bug fixes for dynamic input shape, add mul support, working on conv and batchnorm

* Add batchnormalization, add overflow check for int64 attributes

* add global average/max pool and reshape

* minor changes

* minor changes

* add skip relu and options to use different type of memory

* small bug fix for in operator relu

* bug fix for nnapi

* add transpose support, minor bug fix

* Add transpose support

* minor bug fixes, depthwise conv weight fix

* fixed the bug where the onnx model input has mismatch order than the nnapi model input

* add helper to add scalar operand

* add separated opbuilder to handle single operator

* add cast operator

* fixed reshape, moved some logs to verbose

* Add softmax and identity support, change shaper calling signature, and add support for int32 output

* changed the way to execute the NNAPI

* move NNMemory and InputOutputInfo into Model class

* add limited support for input dynamic shape

* add gemm support, fixed crash when allocating big array on stack

* add abs/exp/floor/log/sigmoid/neg/sin/sqrt/tanh support

* better dynamic input shape support;

* add more check for IsOpSupportedImpl, refactored some code

* some code style fix, switch to safeint

* Move opbuilders to a map with single instance, minor bug fixes

* add GetUniqueName for new temp tensors

* change from throw std to ort_throw

* build settings change and 3rd party notice update

* add readme for nnapi_lib, move to ort log, add comments to public functions, clean the code

* add android log sink and more logging changes, add new string for NnApiErrorDescription

* add nnapi execution options/fp16 relax

* fix a dnnlibrary build break

* addressed review comments

* address review comments, changed adding output for subgraph in NnapiExecutionProvider::GetCapability, minor issue fixes

* formatting in build.py

* more formatting fix in build.py, return fail status instead of throw in compute_func

* moved android_log_sink to platform folder, minor coding style changes

* addressed review comments
2020-06-26 00:02:39 -07:00
Negin Raoof
37cbe8551d
Adding export registration and tests for custom ops (#4248) 2020-06-25 22:29:02 -07:00
Josh Bradley
990b43ddf2
Add modern C++ standards to the C++ API (#4217)
As a zero-cost wrapper around the C API, the current state of the C++ API is still pretty low-level and requires programmers to use C-style standards to interact with ONNX.
2020-06-25 22:28:00 -07:00
Tracy Sharpe
72fb5183d4
Fix Windows ARM64 break (#4343) 2020-06-25 21:06:18 -07:00
Chih-Hsuan Yen
a37e2e33b4
Add compatibility with Protobuf 3.12 (#4291)
In Protobuf 3.12, classes generated from protobuf files are declared as
`final`, so use those classes as members rather than base classes.

Ref: https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.0
2020-06-25 20:34:08 -07:00
Changming Sun
5db67ec000
Fix python package issue and upgrade the linux image to 2010 (#4342)
1. Increase job timeout, while we are investigating why the tests take much longer
2. Upgrade the linux docker image to manylinux2010, by request from Tianlei. (We had an offline discussion with Pranav and Tracy)
3. Remove the installation of "devtoolset-7" in the CUDA image. It was added for CUDA 10.0, it is not needed for CUDA 10.1. We have moved to CUDA 10.1.
2020-06-25 20:22:39 -07:00
Shucai Xiao
bfc888613f
Migraphx improvements (#4328)
* Add amd migraphx execution provider to onnx runtime

* rename MiGraphX to MIGraphX

* add migraphx EP to tests

* support multiple program output

* disable more tests

* backup changes related to program multiple outputs

* remove logging code

* remove unnecessary changes in migraphx_execution_provider.cc

* add migraphx EP to tests

* add input requests of the batchnorm operator

* add to support an onnx operator PRelu

* update migrapx dockerfile and removed one unused line

* chagnes related to support dynamic input shape

* fix build error

* code backup

* code backup

* version that has 106 models run correctly

* code backup

* code backup

* remove unnecessary print info

* code backup

* code backup

* code backup

* code backup

* code backup

* code backup

* changes corresponding to migraphx change

* fix merge conflict

* minor code cleanup

* code cleanup

* remove unnecessary code

* remove unnecessary code

* add to support more constant folding analysis

* more constant folding checking for shape input

* add env var to control whether fp16 is enabled. Modify docker file to use ROCM3.3

* fix function name to avoid build error

* add build and execution instruction for migraphx execution provider

* added more build instructions

* fixed a small format error

* a minor change

* fix review comments

* another minor change

* additional refinement of the documents

* additional changes

* remove unnecessary changes in the dockfile

* additional changes for the dockerfile

* code change backup

* fix errors related to a few unit tests

* fix a build error related to api change

* fix unit test errors by either disabling the test or fix related isssues

* remove unnecessary log info

* sync submodule tvm with master

* remove unnecessary changes

* remove an unnecessary code line

* refine documents for addition example
2020-06-25 19:22:57 -07:00
edgchen1
0b450dcd9f
Enable BiasGelu fusion for training (#4146)
Add gradient for BiasGelu and FastGelu with bias.
Enable BiasGeluFusion and GeluApproximation transformers in TrainingSession.
2020-06-25 17:48:12 -07:00
Faith Xu
b544f5c83c
Sample updates (#4303)
* Add section for product integrations

* Wording updates
2020-06-25 16:09:17 -07:00
Du Li
645a988c04
Support binding input only for IOBinding in python api. (#4079)
* Support binding input only in python api.

* Addressing PR comments.

* fixing build issues
2020-06-25 12:20:02 -07:00
Dmitri Smirnov
a08805daf9
Fix a minor typon in POM file name (#4250)
Co-authored-by: Changming Sun <chasun@microsoft.com>
2020-06-25 11:15:14 -07:00
Tim Harris
3fc68cb150
Remove non-trivially-destructible thread-local from thread pool state, blocking ARM64 builds (#4336)
- Move thread hint vectors from thread-local struct

- Add static_assert that the per-thread state in the thread pool is trivially-destructible

- Rename "thread_data" to "worker_data" (only allocated for workers in the pool, not threads calling into the pool)
2020-06-25 19:04:31 +01:00
George Wu
a3b466cdf1
fix python ep default ordering. (#4335)
* fix python ep default ordering. cpu provider should be last.

* add comment.

* add test case to ensure no regressions for get_all_providers().

* expand on get_all_providers() api documentation
2020-06-25 04:25:43 -07:00
Prabhat
151ef1c8a5
Add C++ wrapper for GetAvailableProviders() C API (#4313) 2020-06-25 13:11:55 +05:30
edgchen1
a6d10376df
Fix build error when USE_NCCL is defined. (#4334) 2020-06-24 23:32:31 -07:00
Josh Bradley
0d9db2b28d
add informative error message regarding symbolic dimensions (#4297)
* add informative error message regarding symbolic dimensions

* fix code format and move negative value check in for loop
2020-06-25 11:56:14 +10:00
Aaron Bockover
64264c3846
Allow --cmake_generator to work on macOS (#4278) 2020-06-24 16:30:33 -07:00
S. Manohar Karlapalem
15c07c75f8
[OpenVINO-EP] Upgrade version info to 2020.3 in docs (#4304)
* Upgrade version to 2020.3 in docs

* update online installer size for 2020.3

* update OV 2020.3 install dir path
2020-06-24 15:01:55 -07:00
Tim Harris
a241eb0bbe
Renaming --partition_optimizer to --deepspeed_zero_stage (#4312)
* Rename partition_optimizer -> deepspeed_zero

* Use ZeROConfig in orttraining_pybind_state.cc

* deepspeed_zero -> deepspeed_zero_stage for clarity

* Expose as deepspeed_zero_stage in pybind
2020-06-24 22:05:03 +01:00
Cecilia Liu
7e71ff2a1f
Match Reshape Subgraph Pattern For GPT2 (#4279)
Reshape fusion for one element subgraph patterns.
2020-06-24 10:07:30 -07:00
Tim Harris
5c6a27408a
Remove signed/unsigned compiler warnings, add additional pipeline test case (#4314)
* Avoid signed/unsigned warning on loops

* Report sizes when distributed world configuration is inconsistent

* Add DistributedRunContextTest for pipeline stage configuration
2020-06-24 11:36:18 +01:00
Pranav Sharma
44f06ec480
Fix memory usage when loading a model + some other minor fixes to avoid unnecessary heap allocations. (#4318) 2020-06-24 00:23:11 -07:00
Scott McKay
5dd3ebb3b1
Tune setting for when to use MlasComputeSoftmax due to changes in #3906. (#4170) 2020-06-24 16:58:43 +10:00
Vincent Wang
f26c149d7d
Set NonZero Output Shape for Gradient Building. (#4246)
* Set NonZero output shape for gradient building.

* Resolve comments.

Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>
2020-06-24 13:43:22 +08:00
suryasidd
20e205aa0a
[OpenVINO-EP] Changed the default scheduler for VAD-M (#4295)
* Changed the scheduler for VAD-M to bypass scheduler and modified logic

* Added extra configuration step to documentation for VAD-M

* Removed cout statement

* Fixed documentation

* Removed softmax restriction

* Added VPU config setting for graphs with dynamic shape

* Set VPU config only for MYRIAD

* Added log statement
2020-06-23 21:21:58 -07:00
Vincent Wang
3374733783
Refactor ReduceMean/Sum Gradient without Shape Dependency. (#4261)
* ReduceMean/Sum gradient without shape dependency.

* optimize expand and use it to replace add.

* Adjust test.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2020-06-24 11:36:53 +08:00
Changming Sun
deea945f80
Remove openmp and scipy from build pipelines (#4305)
1. Remove openmp because the default thread pool is already good enough.
2. Remove scipy from build pipelines because it stops support python 3.5.
2020-06-23 20:18:16 -07:00
Yufeng Li
867ba846f7
Implement MinMax with SIMD (#4285)
* Implement MinMax with SIMD
2020-06-23 20:07:53 -07:00
edgchen1
4e39fda06a
Fix version of torch and torchvision in install_deps.sh. (#4316) 2020-06-23 14:55:18 -07:00
Bowen Bao
15cb4b3023
Fix session load state & run extra_postpasses only once (#4255)
* Fix session load state & run extra_postpasses only once

* add testcase for onnx model as well
2020-06-23 11:45:26 -07:00
Prabhat
d3c5cb6349
Use providers_available array from constants.h to avoid code duplication (#4300) 2020-06-23 11:52:51 +05:30
edgchen1
737c22a911
Refactor Python packaging builds (#4283)
Reuse the same template file for all Python packaging builds.
2020-06-22 17:13:22 -07:00
Tim Harris
9e3b5c62fb
Use OpenMP-like synchronization patterns in Eigen thread pool (#4236)
Updates the thread pool implementation to make work distribution over the Eigen thread pool more closely resemble techniques used in OpenMP. In particular:

(1) A thread entering a parallel loop works on the iterations itself, rather than requiring a thread switch to/from a thread in the pool, if called from outside the thread pool.

(2) To support this, work items pushed to the thread pool run a loop to claim iterations from a shared counter via atomic-fetch-and-add, as opposed to having work items themselves represent individual batches of iterations. This means that any thread working on the loop can execute any batch of iterations, including having the main thread run through all of the batches itself if the loop turns out to be short-running.

(3) As with OpenMP active scheduling, the worker loop spins waiting for work prior to blocking. This avoids OS blocking / wake-up paths in workloads with series of short-running parallel sections.
2020-06-22 10:04:53 +01:00
Prabhat
57fabfba7a
Added GetAvailableProviders() to C API (#4247)
* Added GetAvailableProviders to C API

* Fix API version and Windows build error

* Changed function name

* Changed ORT_API_VERSION to 4

* Moved all_providers array to constants.h

* Move check for providers to constants.h

* Changed name of array to avoid warning

* Address review comment

* Added unit test
2020-06-22 10:10:25 +08:00
Scott McKay
175983c082
Move memory info into IAllocator (#2850)
- Update IAllocator setup to move the OrtMemoryInfo to the base class instead of requiring derived classes to have that as a member and override a virtual method to return it.
  - Cleanup CreateAllocator setup to take an argument as to whether to wrap the device allocator in an arena allocator. The choice to do that isn't a property of the underlying device allocator.
  - Minor cleanups in the various EPs to adjust to the change to IAllocator and CreateAllocator, and to use the create_arena flag consistently when available.
2020-06-22 11:18:52 +10:00
Yang Chen
064afa0f93
define dim_idx before use it (#4290) 2020-06-20 21:05:13 -07:00
Pranav Sharma
2204d39a06
Add build option to disable traditional ML ops from the binary. (#4272)
* Add build option to disable traditional ML ops from the binary.

* Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources.

* Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.
2020-06-20 06:36:06 -07:00
alkoumpa
3c633384c2
Fix TensorRT memory leaks (#4227)
* fix tensorrt memory leaks

* wrap unique_pointer in a namespace to avoid conflicts

Co-authored-by: alex <act@act.com>
2020-06-20 03:37:38 -07:00
Derek Murray
a541d28fb4
Lazily get allocator when allocating an MLValue (#4276)
According to profiling in #4267, getting the allocator can account for a large fraction of overhead when accessing a kernel output, due to STL container operations. The allocator isn't used when (i) we're not creating a fence, and (ii) we have a memory pattern and a pre-allocated buffer, so we can avoid this overhead.
2020-06-19 15:55:43 -07:00
Yang Chen
a490beedf1
update tvm submodule (#4284) 2020-06-19 14:51:18 -07:00
Tianlei Wu
e08181f74d
Update Bert Notebooks for ORT 1.3.0 (#4274)
* update keras notebook
* re-run pytorch bert notebook
2020-06-19 14:02:16 -07:00
Tianlei Wu
466511c1c3
Update gpt2 benchmark with position_ids and fp16 (#4275)
* support position_ids input
* support fp16 conversion for gpt2 past state
* output results to csv file
* Remove the useless check that output of matmul is in cuda
2020-06-19 14:01:37 -07:00