Commit graph

352 commits

Author SHA1 Message Date
Adam Pocock
e9dc8954ac Adding support for ACL and DML to the Java API. 2020-04-14 20:35:03 -07:00
Ori Levari
f564569a80
Adapter Model and Environment tests (#3469)
*Adapter Model and Environment tests
*winml test macro clean up and extension
2020-04-14 13:36:31 -07:00
Du Li
621b3ac03a
FFT contrib ops (#3381)
* add custom op skeleton

* Adding Rfft, Irfft kernels.

* Fix a few errors:
1. make kernel stateless to avoid race condition
2. reclaim cufft plan

* Adding MLFloat16 support

* Adding fp16 support for fft ops.

* Adding cufft plan cache.

* adding a util func

* adding copyright info.

* Accommodating PR comments.
2020-04-14 10:12:04 -07:00
Ye Wang
66a79d2c9f
fix (#3512) 2020-04-13 18:30:58 -07:00
Ye Wang
cbe30f3e19
update FeaturizersLibrary (#3511) 2020-04-13 15:47:51 -07:00
Ye Wang
438353abcd
Fix TruncatedSVDFeaturizer's test failure and re-enable it's kernel test (#3458)
* checkin

* fix linux & macos build

* fix test

* revert the changes for a single-aimed PR

* fix
2020-04-13 13:59:38 -07:00
Tiago Koji Castro Shibata
d09d4a6b0d
Fix OS build (#3481) 2020-04-09 21:46:01 -07:00
Yufeng Li
a443b1b6b9
Revert "Use IMMA for int8 matmul to leverage Turing Tensor Core (#3413)" (#3472)
This reverts commit 4d71958ccf.
Revert the PR. Looks like it triggers a bug in nvcc and failes the GPU pipeline.
2020-04-09 15:59:52 -07:00
Yufeng Li
4d71958ccf
Use IMMA for int8 matmul to leverage Turing Tensor Core (#3413)
Use IMMA for int8 matmul to leverage Turing Tensor Core
Format files under onnxruntime/core/providers/cude
2020-04-07 15:22:04 -07:00
Ye Wang
4ebad8805b
change (#3431) 2020-04-06 11:30:21 -07:00
Changming Sun
0dcc6035b1
Disable strong inline (#3399)
To bypass a MSVC bug. Without this change, people can't use VS2017 to build onnxruntime in Release or RelWithDebInfo mode.
2020-04-06 11:19:09 -07:00
Changming Sun
33006f48c0
Update onnx submodule to 1.7.0 release candidate (#3405)
Update onnx submodule to 1.7.0 release candidate.  This isn't a release tag,  but it will be released soon, in 1-2 weeks.
2020-04-04 16:23:42 -07:00
Pranav Sharma
14f4c3e25f
Fix issue in construction of DummyArena. (#3416) 2020-04-03 08:28:05 -07:00
Tiago Koji Castro Shibata
1671072b6b
[WIP] Port image tests from WAI (#3365)
* Copy image tests from ADO

* wip

* Port tests to googletest

* Add FNS-Candy license

* Add missing collaterals

* Remove brand images

* Fix typos

* Use PrepareModelSessionBinding in MnistImageTest

* Fix typos
2020-04-01 15:38:44 -07:00
Changming Sun
accffded5d
Build options for enabling AVX/AVX2/AVX512 (#3373)
1. Add build options for enabling AVX/AVX2/AVX512
2. Update eigen to a newer version, because the current one doesn't work with VC and AVX512.
2020-04-01 10:07:22 -07:00
Dmitri Smirnov
a4fe60c4d3
OpSet 12 ops (#3341)
Advance ONNX commit to pickup the latest ArgMax, ArgMin,
  ReduceMax/ReduceMin, MaxPool
  Declare new versions for CPU/CUDA.
  Implement infrastructure support for int8/uint8.
  Adust GatherOp test for a new error.
  Adjust Scan9.BadShape test.
  Add exclusions for index out of bounds checks.
  Rework result verification for SVDTransformer.
2020-03-31 15:31:06 -07:00
stevenlix
2332a93db0
Update onnx-tensorrt parser (#3369)
* sync onnx-tensorrt parser and update TensorRT doc

* remove --msvc_toolset 14.16 in tensorrt ci pipeline
2020-03-30 20:31:59 -07:00
Jan Scholz
ce9acf0c21
iOS crosscompilation under linux (#3298)
* added support for ios crosscompilation under linux

* reverted cmake generator change

* if --ios is added protoc can be compiled for host system

* accidently reverted change to compile protoc for host system for ios if protoc exe is not set

* wdata is now used

* accidentally pasted CMAKE_OSX_ARCHITECTURES into CmakeLists.txt, also made bad merge on build.py previously

* removed print

* fixed typeo, deleted commented statements for earlier debugging

* reverted accidental delete

* added asmmacro.h for aarch64 asm
now MlasSgemmKernel**** gets underscore added if needed
no need anymote to differentiate between iOS arm64 and normal amr64 build
onnxruntime.cmake: added check if iOSCross is set to properly set RPATH

* removed 2 spaces

* fix: logcial error fixed, now protoc gets compiled if not supplied with --path_to_protoc_exe

* removed unecessarily added spaces

* removed some more spaces
2020-03-30 19:39:17 -07:00
Changming Sun
06fc9506fd
Thread pool changes (#3153)
1. Copy tensorflow's thread pool class to ORT, so that we can get a better implementation of thread pool based parallelfor
2. Copy Eigen's thread pool class to ORT
3. Support thread affinity
4. Remove RNN kernel’s private thread pool
5. Modify pool kernels to use the thread pool when openmp is disabled.
2020-03-30 12:18:40 -07:00
George Wu
355f39ddee
fix cuda build for cmake >= 3.17.0 (#3362) 2020-03-30 00:38:57 -07:00
Tiago Koji Castro Shibata
c3cea486d0
Port ConcurrencyTests from TAEF (#3086)
* Add ConcurrencyTests

* Make ConcurrencyTests compatible with TAEF

* Use test PCH in concurrency tests

* Fix include header

* Ignore unused code warnings on WINML_SKIP_TEST

* Remove BOM

* Remove conflicting namespace in older SDK

* Refactor duplicate code

* Fix unused DELAYLOAD

* Fix unused DELAYLOAD

* Remove link to internal bug

* Address code style fixes

* Add new concurrency tests
2020-03-27 17:39:22 -07:00
Sheil Kumar
b72fe13941
Update WinML Projection to accept sequence of tensors (#3287)
* Enable sequence of tensor

* add tests

* small updates

* There should only be 2 elements returned

* CR feedback, and another 6->2 check update in the test.

* missing semicolon...

* Add explicit to constructor taking pointer paramter

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-03-23 15:55:20 -07:00
Tracy Sharpe
57468c651c
QLinearMatMul speed up (#3283)
The equivalent of PR#3196 but done for QLinearMatMul. Use MLAS to do a u8u8=s32 GEMM and then requantize this intermediate buffer.
2020-03-21 15:37:25 -07:00
Pranav Sharma
84015d9491
Fix post merge test. This doesn't get triggered as part of gated PR checks. (#3277) 2020-03-20 13:23:09 -07:00
Ye Wang
c5149e89d9
Wangye/shortgraindropper (#3273) (#3274)
* Featurizer Library update

* update Featurizer Library

* add short_grain_dropper_transformer

* resolve comments

* resolve comments

* resolve comments
2020-03-20 11:48:31 -07:00
Tiago Koji Castro Shibata
3bdb0b620a
Fix WCOS/Win32 linking bugs (#3126)
* Fix WCOS/Win32 linking bugs

* Remove unused NODEFAULTLIB flags

* Avoid plain target_link_libraries signature

* Avoid plain target_link_libraries signature

* Fix library list escaping

* Use library list instead of string

* Remove duplicate link to windowsapp.lib

* Remove Win32 build workarounds

* Specify CMake policies before initializing language

* Expose Win32 header definitions during build

* Force set API family

* Enable Win32 APIs in featurizer

* Use MT dynamic CRT

* Expose Win32 specific functions

* Disable app container globally

* Disable default wide functions in featurizers

* Add featurizers to test include path

* Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428

* Revert pipeline debugging hacks

* Skip /FI in CUDA sources

* Default to Win32 builds

* Enable WCOS when using WinML

* Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only
2020-03-19 08:52:40 -07:00
Pranav Sharma
435f014d71
Add support for sessions to share a global threadpool. (#3177)
* Add support for sessions to share a global threadpool.

* Fix build issues

* Add tests, fix build issues.

* Added some documentation

* Fix centos issue when threadpools become nullptr due to 1 core.

* Fix mac and x86 build issues

* Address some PR comments

* Disabled test for android, added few more tests and addressed more PR comments.

* const_cast
2020-03-18 15:42:46 -07:00
edgchen1
e03b8a1e2f
Move path_lib from onnxruntime/core/framework to onnxruntime/core/platform. (#3253)
Moved path_lib.h/cc from onnxruntime/core/framework to onnxruntime/core/platform and from the onnxruntime_framework to the onnxruntime_common libraries.
2020-03-18 11:53:46 -07:00
Tracy Sharpe
88c20eaef1
MLAS: rename AVX512BW->AVX512Core (#3216)
Cleanup change: remap functions and files with Avx512BW to Avx512Core.
2020-03-13 22:45:51 -07:00
Tracy Sharpe
fe0b2b2abd
QLinearConv speed up (#3196)
For x86/x64 builds, change the QLinearConv op to use MLAS for the u8u8=s32 GEMM, then requantize the intermediate buffer to u8.
2020-03-13 16:54:55 -07:00
KeDengMS
ade4fa108f
Disable delayload for cuda dlls (#3147)
This change fixes #3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
2020-03-05 14:40:22 -08:00
smk2007
6cdd2b4934
Enable DML Nuget Package for x64 or x86 architectures (#3120)
* add dml gpu pipelines

* add x86 to the gpu dml dev build pipeline

* Enable DML x86 builds

* Fix uint64_t -> size_t warning

* fix warnings

* enable dml on x86 ci builds

* operatorHelper 773 error uint32_t vs uint64_t

* operatorHelper 773 error uint32_t vs uint64_t

* make x86 pipeline use the gpu pool

* more warnings

* fix x86 directml path

* make dml nuget package

* disable tf_pnasnet_large

* disable zfnet512

* make validation use wildcards

* disable x86 dml gpu tests

* add args.

* update gpu.yml

* change nupkg wildcard

* add debug statements

* package x86 dml nupkg

* dont drop managed nuget again from dml pipeline build

* Add DML EULA

* directml license should be renamed to not clobber the existing license

* casing on dml package....

* {} to ()

* fix license name

* disable dml from x86 ci

* typo and cr feedback

* remove featurizers

* ship the dml pdb as well
2020-03-02 20:18:46 -08:00
edgchen1
37f5fd8fb8
Add support for loading TensorProtos with external data from optimizer Initializer (#3045)
- Added support for loading TensorProtos with external data from the optimizer Initializer class.
- Added some file path utilities.
2020-02-28 13:19:16 -08:00
Changming Sun
c6ed077441
Add d2FH4- flag to cuda (#3105) 2020-02-27 20:22:07 -08:00
Dmitri Smirnov
5008fc5b00
Featurizers: Import fix for Linux build adjust linkage (#3089)
Advance FeaturizersLibrary
  SetAbsError on Output
2020-02-27 15:49:18 -08:00
Changming Sun
d72639ef77
Fix CUDA 10.1 DLL names (#3102) 2020-02-27 14:43:16 -08:00
daquexian
37a905f557
Make Java API available on Android (#3030) 2020-02-27 08:23:50 -08:00
Ori Levari
5e0f7412cd
Properly handle downlevel and WCOS scenarios (#3075) 2020-02-25 17:47:02 -08:00
stevenlix
f4a5d17294
Upgrade to CUDA10.2 for TensorRT (#3084)
* Switch to CUDA10.2

* Update win-gpu-tensorrt-ci-pipeline.yml

* Update win-gpu-tensorrt-ci-pipeline.yml

* remove dynamic_shape

* update onnx-tensorrt submodule

* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests

* fix format issue

* add shape inference instruction for TensorRT

* update according to the reviews

* Update win-gpu-tensorrt-ci-pipeline.yml
2020-02-25 05:36:01 -08:00
Adam Pocock
b23b7f0fea
[java] Adds the provider compile-time flags where the JNI code expects them. (#3082) 2020-02-24 15:47:26 -08:00
Dmitri Smirnov
b8628404f3
Replace hardcoded include path value with the advertised setting. (#3083) 2020-02-24 13:55:00 -08:00
kile0
f367fd921c
Use a custom allocator for temporary buffers in reduction_ops.cc (#2775)
* port the mimalloc allocator

* hook mimalloc opt into common.h and reduction ops

* repurpose USE_MIMALLOC to only denote subbing in of default allocator with mimalloc and some refactoring

* fix unintended cherry pick diffs

* polish alloctor_mimalloc

* explicitly disable mimalloc where it already had been disabled

* update mimalloc to pull in stl allocator

* switch mimalloc stl allocator to use mimalloc library version

* turn mimalloc on by default (only the stl changes are enabled, the python interacting ones are off already and shall remain so)

* move FastAllocVector into cpu specific code

* separate out defines into arena and stl changes

* the rest of the define renames

* bfc arena allocator

* some typos and rename the bfc arena allocator to fit existing class naming conventions

* adjustments in response to comments

* different template instantiations are friends
2020-02-23 16:04:30 +10:00
Changming Sun
ae1f35fb9f
Ignore GCC no-deprecated-copy warnings (#3074) 2020-02-22 11:48:27 -08:00
Changming Sun
45ba325fa6
Remove USE_NSYNC macro (#3052) 2020-02-20 13:29:19 -08:00
Scott McKay
a1db87b382
Add SafeInt bounds checking to memory allocation size calculations. (#3022)
* Add SafeInt bounds checking to memory allocation size calculations.

* Fix TensorRT library includes
2020-02-20 11:41:03 -08:00
Changming Sun
cb24e2a214 Update nsync 2020-02-20 11:25:34 -08:00
Changming Sun
e3c27536d0
Python binding doesn't need to link to the python lib on Linux 2020-02-19 12:18:47 -08:00
James Yuzawa
411b3aa801
Java build system enhancements (#2866) 2020-02-18 15:41:49 -08:00
daquexian
4ca50d9352 Update DNNLibrary to v0.9.0 and update NNAPI GetSupportedNodes 2020-02-17 13:24:10 -08:00
ytaous
2b77cb19bd
merge training kernels to master (#2999)
* merge training kernels to master

* merge training kernels to master

* revert two files

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master

* merge training kernels to master
2020-02-13 14:52:35 -08:00