Commit graph

1970 commits

Author SHA1 Message Date
edgchen1
e03b8a1e2f
Move path_lib from onnxruntime/core/framework to onnxruntime/core/platform. (#3253)
Moved path_lib.h/cc from onnxruntime/core/framework to onnxruntime/core/platform and from the onnxruntime_framework to the onnxruntime_common libraries.
2020-03-18 11:53:46 -07:00
Xiang Zhang
61621d4053
Add extra fields to ORT telemetry (#3234)
* Add extra fields to ORT telemetry

* fix linux build failure caused by using HRESULT

* little refactor
2020-03-18 09:37:35 -07:00
Xavier Dupré
bd348ec6ca
Add unit test to cover TreeEnsembleClassifier applied to binary classification and 2 classes (#3230)
* Add unit test to cover TreeEnsembleClassifier for binary classification
2020-03-18 11:32:58 +01:00
jaka.katrasnik
88c65f8add Fixes GTest deprecation warnings 2020-03-17 16:38:55 -07:00
Tianlei Wu
0700d13ece
Add Bert Optimization Notebooks (#3204)
* Add notebooks for GPU and CPU inference of PyTorch BERT SQuAD model
* update bert_optimization.py: Do not add duplicated logger handler
* Add machineinfo.py to show machine configuration for notebook.
* Update bert performance test tool:
(1) Set OpenMP environment variable before importing onnxruntime.
(2) Use sub-process for each test
(3) Allow test multiple batch_size
(4) Add latency percentile
(5) Add warmup
2020-03-17 11:56:36 -07:00
Faith Xu
8bc4e3195d
Updates to roadmap (#3155)
* Updates to roadmap

* remove redundant directML

* Add JS to future investments
2020-03-16 18:19:07 -07:00
Ori Levari
e63f817eb6
avoid IDXGIFactory 6 where possible to enable WinML GPU Path downlevel to RS3 (#3180) 2020-03-16 15:25:32 -07:00
Xiang Zhang
682dde2b3b
add dml_ep_lock (#3200)
* add dml_ep_lock

* Move Winml process-wide lock back to individual sessions
2020-03-16 14:32:12 -07:00
Xavier Dupré
6319357a99
Reduce number of allocations in TreeEnsemble (#3217)
* reduce number of allocations in TreeEnsemble

* Fix probabilities for binary case.

* fix outbound access

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2020-03-16 12:22:15 +01:00
Changming Sun
0fceb33288
Fix onnxruntime server docker file build failure (#3219)
1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well.
2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.
2020-03-15 14:46:46 -07:00
Tracy Sharpe
88c20eaef1
MLAS: rename AVX512BW->AVX512Core (#3216)
Cleanup change: remap functions and files with Avx512BW to Avx512Core.
2020-03-13 22:45:51 -07:00
Dmitri Smirnov
2a6e5ce978
Speedup and reduce binary size for TfIdfVectorizer (#3197)
Speed up TfIdf.
  Build Trie like structure to quickly exclude dead-ends. 
  Use ParallelFor() for each of the rows processing.
  Make it non-template, batch it.
  Check for short tail within the inner loop.
2020-03-13 17:00:59 -07:00
Tracy Sharpe
fe0b2b2abd
QLinearConv speed up (#3196)
For x86/x64 builds, change the QLinearConv op to use MLAS for the u8u8=s32 GEMM, then requantize the intermediate buffer to u8.
2020-03-13 16:54:55 -07:00
Changming Sun
0a1257e467
Adjust the grouping logic in ThreadPool::TryBatchParallelFor (#3207)
1. No more plus 1.
2. Use MlasPartitionWork function to calculate the work index range.
2020-03-13 12:49:17 -07:00
Yulong Wang
5bc0d8be5c
Fix TopK Cuda implementation (#3176)
Fixes a bug in TopK cuda implementation when input size is between GridDim::maxThreadsPerBlock and GridDim::maxThreadsPerBlock * 2. In this case, the BitonicTopK will generate all-zero outputs.
2020-03-13 11:46:17 -07:00
Ori Levari
93569bf0f4
fix regex to populate dll version information correctly 2020-03-13 11:35:49 -07:00
Yufeng Li
c69194ec4c
fix the missing return in _get_quantize_input_nodes and format code with yapf (#3199)
* fix the missing return for function _get_quantize_input_nodes

* format quantization code with yapf
2020-03-13 09:28:41 -07:00
Xavier Dupré
d99554bea1
Improves implementation of tree ensemble regressor and classifier (4 to 5 times faster) (#2692)
* Improves implementation of tree ensemble regressor (4 to 5 times faster)
* Use ORT_THROW
2020-03-13 14:10:37 +01:00
Scott McKay
e9d5ed270f
Normalizer performance improvements (#3201)
* Simplify Normalizer as the spec only requires support for 2D input.

Tried using eigen (LpNorm<1>(), and norm()) on each row but that was much slower.

* Remove unused variable
2020-03-13 22:15:44 +10:00
Scott McKay
890cb78b20
Use Eigen::logistic instead of manually computing values. (#3186)
* Use MlasComputeLogistic instead of manually computing values.
* Update test script to allow the tolerance to be specified when checking float output from logreg_iris.onnx.
2020-03-13 20:27:25 +10:00
Hariharan Seshadri
b8575dda7b
Avoid some heap allocations in the InferenceSession and Model classes (#3103)
* Avoid some heap allocations in the InferenceSession and Model classes
2020-03-12 18:38:10 -07:00
Changming Sun
a02638eb46
Adjust the threading logic in ThreadPool::ParallelFor (#3178)
1. Do not reuse the main thread.
2. Do not plus one when mlas calculate the number of tasks to schedule. (It was me put the plus one there)

This is the second try of #1839

It's known that this change has negative performance impact on some of the models.
2020-03-12 11:33:33 -07:00
Scott McKay
f49912c42a
Performance improvement to Transpose when moving single axis. (#3173)
* Avoid use of vectors for tracking reader/writer offsets as it adds too much overhead if there are a lot of readers or writers.

Tracy found improvements in resnet34-ssd1200 and BERT Squad with this approach.
2020-03-12 14:49:02 +10:00
Paul McDaniel
6791ed0217
Documentation updates for 1.2 for WinML (#3149)
* api goverannce draft

* Update CONTRIBUTING.md

updated for ABI proposals

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* Incomplete, a draft iteartion of 2 more changes - api docs and high levle design

* pushing to see how the picture size works on screen.

* added 2 charts on api choice and distribution choice

* details on contract checking

* lint cleanup and links

* PR feedback.

* fixed markdown and lists

* more markdown and lists

* fixed broken links

* PR feedback

* commas

* PR comments from nick

* PR feedback

* fixed build section

Co-authored-by: Nick Geisler <36938193+ngeisler11@users.noreply.github.com>
2020-03-11 14:19:30 -07:00
Hariharan Seshadri
a912415bac
Support custom ops targeting the CUDA EP (#3165)
* Initial commit

* Minor nit

* Comment

* Fix build

* Fix build
2020-03-11 00:49:01 -07:00
Hariharan Seshadri
3464801c3e
Explicitly specify NugetPackage parameter while validating nuget in some release pipelines (#3139) 2020-03-10 15:14:09 -07:00
Yufeng Li
3de1fc096d
Move zero point inputs of MatmulInteger to CPU memory (#3159) 2020-03-10 13:56:23 -07:00
Tianlei Wu
51a8c82908
Update bert optimization script for SQuAD model exported by keras2onnx (#3163)
Update script to make it work on fine-tuned bert model exported by keras2onnx
2020-03-10 12:57:49 -07:00
Yufeng Li
876d0c5430
Make quantization parameters as constant weigth instead of overrideable (#3160) 2020-03-10 08:35:02 -07:00
Scott McKay
3d928de778
Use GEMM for LinearRegressor and LinearClassifier operators to improve performance (#3154) 2020-03-10 20:24:25 +10:00
Dmitri Smirnov
f87b6913cd
Add package download step before pushing to feeds (#3162)
Add package download step before publishing.
2020-03-09 14:32:18 -07:00
Changming Sun
6ed5d7c332
Update post_binary_sizes_to_dashboard.py (#3161)
Discussed with Faith, because the data size is very small and changes are gradual, there is no need to delete the old data. We want to keep all the history.
2020-03-09 13:21:58 -07:00
Tiago Koji Castro Shibata
a59243090a
Publish release symbols (#3152)
* Publish release symbols

* Publish symbols if IsReleaseBuild
2020-03-05 22:32:18 -08:00
Andrew Kane
781a6ebb06 Updated Ruby supported versions 2020-03-05 19:50:41 -08:00
pranavm-nvidia
cfd18b583a Help output typo fix
Fixes a typo in the help output for `symbolic_shape_infer`
2020-03-05 19:50:13 -08:00
Tianlei Wu
5be6665b86
Update Gelu Fusion to support new graph pattern from PyTorch 1.4 (#3148)
* update GeluFusion to support pattern from PyTorch 1.4; 
* Fix a bug that missing the check of an edge between mul2 and root.
* update script to fuse gelu from PyTorch 1.4
* Add test for python optimizer
2020-03-05 18:31:52 -08:00
Dmitri Smirnov
e2894c5ffb
Fix package name overrides (#3150)
Add env var with the package name.
2020-03-05 17:10:55 -08:00
Yufeng Li
1d2b8115e2
Support u8u8 in quantization tool (#3140) 2020-03-05 14:42:46 -08:00
KeDengMS
ade4fa108f
Disable delayload for cuda dlls (#3147)
This change fixes #3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
2020-03-05 14:40:22 -08:00
Dmitri Smirnov
2c446a7f2f
Add push to ORT-NIGHTLY. (#3146) 2020-03-05 11:38:22 -08:00
Yufeng Li
fbb658e603
Implement QuantizeLinear and DequantizeLinear (#3098)
* Implement QuantizeLinear and DequantizeLinear
2020-03-04 13:30:20 -08:00
take-cheeze
83753bcbe3 Suppress maybe uninitialized warning in gcc-9 2020-03-04 11:52:40 -08:00
Dmitri Smirnov
ef8768a53f
Override native package name. Preserve managed package name the same. (#3133)
Override native package name. Preserve managed package name the same.
  Specify pckage name for validation purposes.
 Fix up validation package name parameter.
2020-03-04 10:12:55 -08:00
Prabhat
a2eeb126b9
Optimised kernel_dot() in SVM op (#3135) 2020-03-04 16:30:40 +00:00
Tianlei Wu
9d874c1225
Add bert performance and correctness test tools (#3108)
(1) Add performance test tool for bert model.
(2) Add accuracy test tool to compare inference results of original and optimized bert models.
(3) Add test data generator tool to create test data for onnxruntime_perf_test.exe
(4) Improve bert optimization script: Verify model producer for model_type; Add warning if model is not fully optimized.
(5) Add shape optimizer tool to assist developing optimization script.
(6) Update readme.
2020-03-03 23:18:08 -08:00
Yufeng Li
84ad4eda8b
Implement MatmulInteger on GPU (#3070)
* Implement MatmulInteger
2020-03-03 16:36:33 -08:00
Changming Sun
12605f05d1
Fix CUDA PATH (#3131)
Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess.
So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts.

(The problem only affect the C# test, not the C/C++ tests that forked from build.py).
2020-03-03 14:34:19 -08:00
smk2007
6cdd2b4934
Enable DML Nuget Package for x64 or x86 architectures (#3120)
* add dml gpu pipelines

* add x86 to the gpu dml dev build pipeline

* Enable DML x86 builds

* Fix uint64_t -> size_t warning

* fix warnings

* enable dml on x86 ci builds

* operatorHelper 773 error uint32_t vs uint64_t

* operatorHelper 773 error uint32_t vs uint64_t

* make x86 pipeline use the gpu pool

* more warnings

* fix x86 directml path

* make dml nuget package

* disable tf_pnasnet_large

* disable zfnet512

* make validation use wildcards

* disable x86 dml gpu tests

* add args.

* update gpu.yml

* change nupkg wildcard

* add debug statements

* package x86 dml nupkg

* dont drop managed nuget again from dml pipeline build

* Add DML EULA

* directml license should be renamed to not clobber the existing license

* casing on dml package....

* {} to ()

* fix license name

* disable dml from x86 ci

* typo and cr feedback

* remove featurizers

* ship the dml pdb as well
2020-03-02 20:18:46 -08:00
Dmitri Smirnov
e45326b5df
Create NuGet packaging pipeline for ORT Featurizers (#3125)
Create a new pipeline to publish ORT with Featurizers
  Update pipeline for two separate packages.
  Change package names.
2020-03-02 17:00:56 -08:00
Tracy Sharpe
b538cb7e46
NCHWc Upsample/Mul optimizations (#3116)
Extend the NCHWc layout optimizer to handle Resize(mode=nearest) and Mul.
2020-03-02 14:40:49 -08:00