Commit graph

1998 commits

Author SHA1 Message Date
Tiago Koji Castro Shibata
c3cea486d0
Port ConcurrencyTests from TAEF (#3086)
* Add ConcurrencyTests

* Make ConcurrencyTests compatible with TAEF

* Use test PCH in concurrency tests

* Fix include header

* Ignore unused code warnings on WINML_SKIP_TEST

* Remove BOM

* Remove conflicting namespace in older SDK

* Refactor duplicate code

* Fix unused DELAYLOAD

* Fix unused DELAYLOAD

* Remove link to internal bug

* Address code style fixes

* Add new concurrency tests
2020-03-27 17:39:22 -07:00
Yang Chen
5278f73202
Fixed two issues in symbolic_shape_infer script (#3332)
* Fixed two issues in symbolic_shape_infer script

This change addressed #3293

There were two issues in the script:

* We need to handle a special case for infer_Reshape, where input_shape
is empty and target shape_value is [-1]. In such case, we need to
get sympy data for the output dim (or create one if it doesn't exist).

* We need to update computed dims for newly-created shape for Range op

* also call _update_computed_dims for _infer_Expand

addressed CR feedback

* added ai.onnx into opset list

* instead of manipulating _infer_Reshape, call _update_computed_dims
from _infer_Expand to update newly-computed dims
2020-03-26 23:27:37 -07:00
Xiang Zhang
810a10b230
Enable Onnxruntime Telemetry by Default for 1.3 (#3338) 2020-03-26 20:57:39 -07:00
Faith Xu
2e875f4e67
Delete outdated page (#3320) 2020-03-26 18:24:02 -07:00
Pranav Sharma
497e83eda5
Minor update to the issue template. Add a line to attach model where applicable. (#3339) 2020-03-26 14:28:27 -07:00
Hector Li
0e81962e98
correct the cmake version to 3.13 for Arm build (#3333) 2020-03-26 10:20:18 -07:00
Changming Sun
5f6ec8ea6d Fix a bug in Maxpool v8 2020-03-25 16:27:43 -07:00
Scott McKay
dee4fc8b8a
Apply the same check for no_transpose from the Reduce* ops to ArgMin and ArgMax (#3315) 2020-03-26 07:41:16 +10:00
Sheil Kumar
51e95ea946
Make ort errors appear in winml exceptions (#3316)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-03-25 12:20:40 -07:00
Scott McKay
4db01309cb
Use GEMM for SVMRegressor. (#3305) 2020-03-25 11:49:44 +10:00
Tianlei Wu
19edad132c
Move AzureML Bert notebook from onnx tutorial (#3302) 2020-03-24 12:31:02 -07:00
Weixing Zhang
fef7989866
Replacing CudaAsyncBuffer with TArray to improve perf (#3303)
* removing using CudaAsyncBuffer

* Keep CudaAsyncBuffer for these ops: non_max_suppression, cudnn_rnn_base, concat, split

* fix windows build error

* fix windows build error.

* fix build error

* fix windows build error

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-03-24 12:13:27 -07:00
Hariharan Seshadri
ef7b98f988
Support DisposableNamedOnnxValue inputs in c# Run() (#3175)
* Initial commit

* Update error message

* Update

* Updates to support holding onto onnxValue and pinnedmemoryBuffer

* Updates

* Minor updates

* Comment out a portion of the tests

* PR feedback

* Minor nit update

* Resolve comments

* PR feedback

* PR updates

* PR feedback
2020-03-23 18:36:12 -07:00
Faith Xu
fb5ab858d2
Update BUILD instructions (#3282)
Include guidance for building release packages per question from #3251
2020-03-23 18:35:22 -07:00
Sheil Kumar
b72fe13941
Update WinML Projection to accept sequence of tensors (#3287)
* Enable sequence of tensor

* add tests

* small updates

* There should only be 2 elements returned

* CR feedback, and another 6->2 check update in the test.

* missing semicolon...

* Add explicit to constructor taking pointer paramter

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-03-23 15:55:20 -07:00
Weixing Zhang
843ee346a8
Implement struct TArray and simplify code. (#3291)
* Implement operator[] for TArray and simplify the code.

* fix a build error.

* add a constructor with std::vector input

* fix build error

* update based on code review feedback

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2020-03-23 10:51:54 -07:00
Tracy Sharpe
57468c651c
QLinearMatMul speed up (#3283)
The equivalent of PR#3196 but done for QLinearMatMul. Use MLAS to do a u8u8=s32 GEMM and then requantize this intermediate buffer.
2020-03-21 15:37:25 -07:00
Changming Sun
9c3b6d2e4b Fix warnings in nuphar 2020-03-20 21:49:46 -07:00
Tianlei Wu
403f99cd77
Use yapf to format python (#3276)
Update ReformatSourcePython.bat to use YAPF to format python code, and add onnxruntime\test directory to be formatted.

Add onnxruntime\.style.yapf for configuration. The style is based on google, except max column width 120.

Format python scripts using ReformatSourcePython.bat.
2020-03-20 14:34:10 -07:00
Pranav Sharma
84015d9491
Fix post merge test. This doesn't get triggered as part of gated PR checks. (#3277) 2020-03-20 13:23:09 -07:00
Dmitri Smirnov
b880c48c4c
Make reduction ops handle Scalar input (#3260)
Handle Scalar values for CPU and GPU
  Ifdef CUDA nd TVM as they require more changes.
2020-03-20 12:04:47 -07:00
Ye Wang
c5149e89d9
Wangye/shortgraindropper (#3273) (#3274)
* Featurizer Library update

* update Featurizer Library

* add short_grain_dropper_transformer

* resolve comments

* resolve comments

* resolve comments
2020-03-20 11:48:31 -07:00
Tianlei Wu
1d9be2baed
Add Notebook for Bert Model exported by Keras2onnx (#3271)
* Add notebook for bert squad model exported by python 1.4

* update bert performance test tool:
(1) set OpenMP environment variable before importing onnxruntime.
(2) launch new process for each test.

* Add notebook
Reduce combinations in perf test

* update readme

* fix quote

* Allow test multiple batch_size

* Add latency percentile

* Add warm up run
Reset logger for notebook

* refine default settings to test for cpu/gpu

* Add script to dump machine info

* Add notebooks for PyTorch SQuAD model GPU and CPU inference

* Update machineinfo.py: add license header; format by yapf

* Do not reset log handler. Skip adding handler if existed.

* Add comments about GPU result diff.
Filter rows of batch set to keep only one setting.

* update according to review feedback

* Download script from master branch

* Add notebook for bert model exported by keras2onnx

* format columns in result table

* re-run and update notebook
2020-03-20 11:37:25 -07:00
Yufeng Li
a69d859912
fix quantize_bias (#3270) 2020-03-20 11:36:47 -07:00
Scott McKay
6dc25a60f8
Make the reduction ops more consistent in checking if no transpose is required and skipping the copy of the input data if that is the case. Significantly better performance when this is done (2x faster for model calling ReduceSumSquare with input of {2048,10}). (#3265) 2020-03-20 06:55:38 +10:00
Changming Sun
8f00147c14 Fix a few warnings 2020-03-19 09:22:28 -07:00
Tiago Koji Castro Shibata
3bdb0b620a
Fix WCOS/Win32 linking bugs (#3126)
* Fix WCOS/Win32 linking bugs

* Remove unused NODEFAULTLIB flags

* Avoid plain target_link_libraries signature

* Avoid plain target_link_libraries signature

* Fix library list escaping

* Use library list instead of string

* Remove duplicate link to windowsapp.lib

* Remove Win32 build workarounds

* Specify CMake policies before initializing language

* Expose Win32 header definitions during build

* Force set API family

* Enable Win32 APIs in featurizer

* Use MT dynamic CRT

* Expose Win32 specific functions

* Disable app container globally

* Disable default wide functions in featurizers

* Add featurizers to test include path

* Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428

* Revert pipeline debugging hacks

* Skip /FI in CUDA sources

* Default to Win32 builds

* Enable WCOS when using WinML

* Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only
2020-03-19 08:52:40 -07:00
Pranav Sharma
435f014d71
Add support for sessions to share a global threadpool. (#3177)
* Add support for sessions to share a global threadpool.

* Fix build issues

* Add tests, fix build issues.

* Added some documentation

* Fix centos issue when threadpools become nullptr due to 1 core.

* Fix mac and x86 build issues

* Address some PR comments

* Disabled test for android, added few more tests and addressed more PR comments.

* const_cast
2020-03-18 15:42:46 -07:00
edgchen1
e03b8a1e2f
Move path_lib from onnxruntime/core/framework to onnxruntime/core/platform. (#3253)
Moved path_lib.h/cc from onnxruntime/core/framework to onnxruntime/core/platform and from the onnxruntime_framework to the onnxruntime_common libraries.
2020-03-18 11:53:46 -07:00
Xiang Zhang
61621d4053
Add extra fields to ORT telemetry (#3234)
* Add extra fields to ORT telemetry

* fix linux build failure caused by using HRESULT

* little refactor
2020-03-18 09:37:35 -07:00
Xavier Dupré
bd348ec6ca
Add unit test to cover TreeEnsembleClassifier applied to binary classification and 2 classes (#3230)
* Add unit test to cover TreeEnsembleClassifier for binary classification
2020-03-18 11:32:58 +01:00
jaka.katrasnik
88c65f8add Fixes GTest deprecation warnings 2020-03-17 16:38:55 -07:00
Tianlei Wu
0700d13ece
Add Bert Optimization Notebooks (#3204)
* Add notebooks for GPU and CPU inference of PyTorch BERT SQuAD model
* update bert_optimization.py: Do not add duplicated logger handler
* Add machineinfo.py to show machine configuration for notebook.
* Update bert performance test tool:
(1) Set OpenMP environment variable before importing onnxruntime.
(2) Use sub-process for each test
(3) Allow test multiple batch_size
(4) Add latency percentile
(5) Add warmup
2020-03-17 11:56:36 -07:00
Faith Xu
8bc4e3195d
Updates to roadmap (#3155)
* Updates to roadmap

* remove redundant directML

* Add JS to future investments
2020-03-16 18:19:07 -07:00
Ori Levari
e63f817eb6
avoid IDXGIFactory 6 where possible to enable WinML GPU Path downlevel to RS3 (#3180) 2020-03-16 15:25:32 -07:00
Xiang Zhang
682dde2b3b
add dml_ep_lock (#3200)
* add dml_ep_lock

* Move Winml process-wide lock back to individual sessions
2020-03-16 14:32:12 -07:00
Xavier Dupré
6319357a99
Reduce number of allocations in TreeEnsemble (#3217)
* reduce number of allocations in TreeEnsemble

* Fix probabilities for binary case.

* fix outbound access

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2020-03-16 12:22:15 +01:00
Changming Sun
0fceb33288
Fix onnxruntime server docker file build failure (#3219)
1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well.
2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.
2020-03-15 14:46:46 -07:00
Tracy Sharpe
88c20eaef1
MLAS: rename AVX512BW->AVX512Core (#3216)
Cleanup change: remap functions and files with Avx512BW to Avx512Core.
2020-03-13 22:45:51 -07:00
Dmitri Smirnov
2a6e5ce978
Speedup and reduce binary size for TfIdfVectorizer (#3197)
Speed up TfIdf.
  Build Trie like structure to quickly exclude dead-ends. 
  Use ParallelFor() for each of the rows processing.
  Make it non-template, batch it.
  Check for short tail within the inner loop.
2020-03-13 17:00:59 -07:00
Tracy Sharpe
fe0b2b2abd
QLinearConv speed up (#3196)
For x86/x64 builds, change the QLinearConv op to use MLAS for the u8u8=s32 GEMM, then requantize the intermediate buffer to u8.
2020-03-13 16:54:55 -07:00
Changming Sun
0a1257e467
Adjust the grouping logic in ThreadPool::TryBatchParallelFor (#3207)
1. No more plus 1.
2. Use MlasPartitionWork function to calculate the work index range.
2020-03-13 12:49:17 -07:00
Yulong Wang
5bc0d8be5c
Fix TopK Cuda implementation (#3176)
Fixes a bug in TopK cuda implementation when input size is between GridDim::maxThreadsPerBlock and GridDim::maxThreadsPerBlock * 2. In this case, the BitonicTopK will generate all-zero outputs.
2020-03-13 11:46:17 -07:00
Ori Levari
93569bf0f4
fix regex to populate dll version information correctly 2020-03-13 11:35:49 -07:00
Yufeng Li
c69194ec4c
fix the missing return in _get_quantize_input_nodes and format code with yapf (#3199)
* fix the missing return for function _get_quantize_input_nodes

* format quantization code with yapf
2020-03-13 09:28:41 -07:00
Xavier Dupré
d99554bea1
Improves implementation of tree ensemble regressor and classifier (4 to 5 times faster) (#2692)
* Improves implementation of tree ensemble regressor (4 to 5 times faster)
* Use ORT_THROW
2020-03-13 14:10:37 +01:00
Scott McKay
e9d5ed270f
Normalizer performance improvements (#3201)
* Simplify Normalizer as the spec only requires support for 2D input.

Tried using eigen (LpNorm<1>(), and norm()) on each row but that was much slower.

* Remove unused variable
2020-03-13 22:15:44 +10:00
Scott McKay
890cb78b20
Use Eigen::logistic instead of manually computing values. (#3186)
* Use MlasComputeLogistic instead of manually computing values.
* Update test script to allow the tolerance to be specified when checking float output from logreg_iris.onnx.
2020-03-13 20:27:25 +10:00
Hariharan Seshadri
b8575dda7b
Avoid some heap allocations in the InferenceSession and Model classes (#3103)
* Avoid some heap allocations in the InferenceSession and Model classes
2020-03-12 18:38:10 -07:00
Changming Sun
a02638eb46
Adjust the threading logic in ThreadPool::ParallelFor (#3178)
1. Do not reuse the main thread.
2. Do not plus one when mlas calculate the number of tasks to schedule. (It was me put the plus one there)

This is the second try of #1839

It's known that this change has negative performance impact on some of the models.
2020-03-12 11:33:33 -07:00