Commit graph

2056 commits

Author SHA1 Message Date
Tianlei Wu
54bbbb78ae
Change mask_index input of Attention op to be optional (#3459)
Change Mask Index to optional
2020-04-12 22:55:37 -07:00
George Wu
7f6e407e09
fix python packaging manylinux1 build break. (#3482) 2020-04-11 06:58:22 +08:00
Ryan Lai
4223591043
Add automatic generation of tensors for Onnxruntime Perf Runner (#3448)
* Add flag to enable automatic generation of input for models with tensor inputs

* change wording of variable

* Naming convention changes to variables

* Handle free dimensions

* Comment with default allocator

* variable rename

* Remove input_count

* Cast to size_t to avoid warning

Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2020-04-10 11:54:17 -07:00
stevenlix
56e85484ba
Handle optional inputs and remove more empty shape nodes in TensorRT EP (#3455)
* check optional inputs and remove more empty shape affected nodes

* fix some minor issues

* update code according to feedback
2020-04-10 11:13:38 -07:00
Tiago Koji Castro Shibata
d09d4a6b0d
Fix OS build (#3481) 2020-04-09 21:46:01 -07:00
Pranav Prakash
95ade8f47b
Add check to prevent storing nullptr in value_info_ when proto has unused value info (#3461)
* Add unit test for serialization of unused value_info

* Do not add non-existent (nullptr) value_info_ when loading a model.

Fixes #3430
2020-04-09 19:25:10 -07:00
Pranav Sharma
2ccedb7b4d
Improve error logging when a kernel cannot be found. (#3473)
* Improve error logging when a kernel cannot be found.

* Fix mac build
2020-04-09 19:24:46 -07:00
KeDengMS
739c9d4875
Always call cudaSetDevice at the beginning of session::Run (#3475)
This is required for running multithreaded with multi-GPUs. Without it, when running in a work thread it would default to GPU 0, while CUDAExecutionProvider is assigned on other GPUs. That might cause CUDA crash when some CUDA resources is from GPU 0, while being used in GPU N>0.
2020-04-09 18:54:58 -07:00
Yufeng Li
a443b1b6b9
Revert "Use IMMA for int8 matmul to leverage Turing Tensor Core (#3413)" (#3472)
This reverts commit 4d71958ccf.
Revert the PR. Looks like it triggers a bug in nvcc and failes the GPU pipeline.
2020-04-09 15:59:52 -07:00
Scott McKay
40d80cde8f
Rework CDist (#3393)
* Make CDist faster via Eigen squaredNorma and GEMM.
  * Add call to abs() as the GEMM output may differ slightly due to floating point accuracy and result in a negative distance which returns NaN if sqrt() is applied to it.
* Update math::Gemm to use the type for alpha and beta instead of hardcoding to float. Matches the GemmEx definition.
* Provide Eigen based replication of the GEMM call on x86 if T=double.
* Make test model data deterministic.
* Do the GEMM first so we can avoid potentially subtracting two numbers that are very close to each other.
2020-04-09 14:05:25 +10:00
Yulong Wang
718068f020
update C# API to optimize inference latency (#3171)
* update C# API to optimize inference latency

* rename PinnedOnnxValue to fixedBufferOnnxValue and fix build break

* add more test cases

* add conditions on string tensors for pre-allocated outputs

* change to random inputs

* fix word spell

* resolve comments

* resolve comments

* remove FixedBufferOnnxValueTests.cs

* fix trivial typos in doc
2020-04-08 11:57:40 -07:00
Pranav Sharma
cdac74b3c3
Use Eigen threadpool for ReduceSum and ReduceMean. (#3441)
* Use Eigen threadpool for ReduceSum and ReduceMean.

* Fix mac build
2020-04-08 11:50:22 -07:00
Ye Wang
f8fa1dde55
Add a list of Featurizers kernels (#3435)
* wangye/pivot (#3432)

* check in

* work version

* add ForecastingPivot kernel

* fix mac os and linux build error

* update FeaturizerLibrary Version

* resolve comments

* remove changes

* Add Kernel for LagLeadOperator & RollingWindowFeaturizer (#3434)

* update

* update todo

* resolve comments

* relax eps for TruncatedSVD transformer

* mute TruncatedSVD_transformer due to undeterministic test result

* resolve comments

* update

* test

* update

* fix
2020-04-07 17:00:45 -07:00
Yufeng Li
4d71958ccf
Use IMMA for int8 matmul to leverage Turing Tensor Core (#3413)
Use IMMA for int8 matmul to leverage Turing Tensor Core
Format files under onnxruntime/core/providers/cude
2020-04-07 15:22:04 -07:00
Tracy Sharpe
de60a14c16
Fix output range for int8_t QuantizeLinear op (#3445) 2020-04-07 15:01:20 -07:00
Yulong Wang
aabf47b107
Fix Split CUDA implementation for zero sized input (#2942)
* Fix Split CUDA implementation for zero sized input

* resolve comments

* add case

* test case update: split into 2 tensors
2020-04-07 14:44:20 -07:00
Scott McKay
48e96ea65f
Reduce binary size of Slice implementation (#3238)
* Make the Slice implementation based on type sizes and reduce templatized code to a minimum.

* Remove using 'dynamic' as a template param to Slice as well.
2020-04-08 07:19:29 +10:00
Dmitri Smirnov
53b9d52fc6
Rework TensorToTensorProto. Do not put string data to raw_string. Eliminate redundant argument. (#3438)
Rework TensorToTensorProto. Eliminate redundant argument.
  Do not put string data into raw_data.
2020-04-07 11:42:10 -07:00
Andrews548
43d6c464fc
Fix ACL EP pooling build breakage (#3429)
The commit 06fc9506fd which refactored cpu Pool class broke ACL EP build.
Also worked on the commit a4fe60c4d3 as it also affects the new class.
Move the declaration of the new MaxPoolV8 cpu class in the header file. Implement MaxPool 8-11 in ACL EP.

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-04-07 07:03:52 -07:00
Tianlei Wu
4bdb5cc8e2
Add CPU implementation for FastGelu operator (#3398)
* Add CPU implementation for FastGelu operator
* Update optimization script  to fuse Gelu or FastGelu according to Elf or Tanh is used in graph.
* Merge BiasGelu and FastGelu into one class
* Enable FastGelu Fusion optimizer for CPU Execution Provider.
2020-04-07 00:19:30 -07:00
Changming Sun
9e65298d7a
Re-enable tests (#3437)
Re-enable some tests that was recently fixed.
2020-04-06 20:13:34 -07:00
Tianlei Wu
8ab09186b7
Bert Optimization Script Improvements (#3387)
Add opt_level option for graph optimization level in bert perf test.
Support BERT models that output each layer, where SkipLayerNormalization has more than 4 children.
Check weight and bias are 1D for layer norm fusion.
Add a dummy class Gpt2OnnxModel for further changes of GPT2 model.
2020-04-06 16:55:40 -07:00
Dmitri Smirnov
c8f5e6e632
Implement Min/Max/Clip(12) (#3410)
Implement Max/Min for opset 12.
  Add CLip(12) CPU impl.
  Implement Clip(12) for CPU and CUDA add tests
2020-04-06 14:24:59 -07:00
Yang Chen
7c69b1703b
Fixed a typo (no functional change) (#3433)
s/initailizer/initializer/
2020-04-06 13:46:17 -07:00
Ye Wang
4ebad8805b
change (#3431) 2020-04-06 11:30:21 -07:00
Changming Sun
0dcc6035b1
Disable strong inline (#3399)
To bypass a MSVC bug. Without this change, people can't use VS2017 to build onnxruntime in Release or RelWithDebInfo mode.
2020-04-06 11:19:09 -07:00
Yang Chen
d361121d98
Do not inline ExternOp's scalar tensor inputs (#3426)
An ExternOp's input needs buffers, so we cannot add compute_inline
schedule on it even if it's a scalar tensor. Instead, we need to
schedule it as compute_root.
2020-04-05 18:35:09 -07:00
Tiago Koji Castro Shibata
517693a507
Fix race condition creating ConverterResourceStore (#3419) 2020-04-04 20:10:07 -07:00
Changming Sun
33006f48c0
Update onnx submodule to 1.7.0 release candidate (#3405)
Update onnx submodule to 1.7.0 release candidate.  This isn't a release tag,  but it will be released soon, in 1-2 weeks.
2020-04-04 16:23:42 -07:00
Tracy Sharpe
d4d19a75ba
Use MlasConv for 1D convolutions (#3425)
Use the existing 2D convolution code in MlasConv to also handle 1D convolutions.
2020-04-04 09:43:10 -07:00
Jesse Benson
5835349614 Add #pragma once to providers.h, so avoid 'struct' redefinition error when including the header from multiple places. 2020-04-03 16:25:18 -07:00
Pranav Sharma
14f4c3e25f
Fix issue in construction of DummyArena. (#3416) 2020-04-03 08:28:05 -07:00
Scott McKay
85131e760c
Enable upsample2x optimization for opset 11 Resize (#3388)
* Enable use_nearest2x_optimization for opset 11 of Resize when possible
2020-04-03 17:36:11 +10:00
Pranav Sharma
3568f8d186
Allow a custom op with the same name to be registered for several providers. (#3400) 2020-04-02 15:38:51 -07:00
Changming Sun
a5fea26cb4 Disable model tests for Mac OS X builds 2020-04-02 15:14:32 -07:00
Changming Sun
aefa466334
Allow zero in split op (#3389)
Allow zero in split op (A change in onnx 1.7 without bumping up the op version)
2020-04-01 16:20:14 -07:00
Tiago Koji Castro Shibata
1671072b6b
[WIP] Port image tests from WAI (#3365)
* Copy image tests from ADO

* wip

* Port tests to googletest

* Add FNS-Candy license

* Add missing collaterals

* Remove brand images

* Fix typos

* Use PrepareModelSessionBinding in MnistImageTest

* Fix typos
2020-04-01 15:38:44 -07:00
Tiago Koji Castro Shibata
1c334ed0f1
Add Ninja generator to build.py (#3331) 2020-04-01 14:19:22 -07:00
Xavier Dupré
edec8043d4
Fix python examples in documentation (#3379) 2020-04-01 22:48:32 +02:00
Changming Sun
accffded5d
Build options for enabling AVX/AVX2/AVX512 (#3373)
1. Add build options for enabling AVX/AVX2/AVX512
2. Update eigen to a newer version, because the current one doesn't work with VC and AVX512.
2020-04-01 10:07:22 -07:00
Brian Martin
77c7d09ced
ERROR_NOT_SUPPORTED doesn't trigger Failed Hresult. Need E_NOTIMPL (#3396) 2020-04-01 10:06:00 -07:00
Brian Martin
052c1fda44
fix some warnings in concurrency tests (#3395) 2020-04-01 10:05:24 -07:00
Scott McKay
33d3239b67
Rework SVMClassifier to improve performance (#3363)
* Rework SVMClassifier
 - use GEMM for initial scoring
 - minimize data allocations and copies
 - parallelize the second half of the scoring for larger batches
2020-04-01 22:00:01 +10:00
Tiago Koji Castro Shibata
a61400de01
Fix ARM cross compilation (related to #3378, #3298) (#3385) 2020-03-31 17:10:48 -07:00
Changming Sun
55fd283d20
Fix a bug in FunctionImpl::FunctionImpl (#3376)
1. Fix a bug in FunctionImpl::FunctionImpl. It set wrong name for the new attribute.
2. Set error code to NOT_IMPLEMENTED if a function contains a not implemented op.
2020-03-31 15:54:47 -07:00
Dmitri Smirnov
a4fe60c4d3
OpSet 12 ops (#3341)
Advance ONNX commit to pickup the latest ArgMax, ArgMin,
  ReduceMax/ReduceMin, MaxPool
  Declare new versions for CPU/CUDA.
  Implement infrastructure support for int8/uint8.
  Adust GatherOp test for a new error.
  Adjust Scan9.BadShape test.
  Add exclusions for index out of bounds checks.
  Rework result verification for SVDTransformer.
2020-03-31 15:31:06 -07:00
manashgoswami
044c466158
Updated tags for v1.2.0 release (#3386)
Updated the tags in the table to reflect the new images for Release v1.2
2020-03-31 14:54:56 -07:00
Tianlei Wu
ecbacd7d79
Add Benchmark of GPT2 CPU inference (#3351)
* Add benchmark script and notebook for GPT2
* Update Reshape fusion for GPT2 model
* Add opt_level option for bert_model_optimization to disable onnxruntime by setting --opt_level 0
* Fix keras optimization
2020-03-31 13:43:09 -07:00
Scott McKay
ace741680d
Constant-12 support (#3304)
1. Support the new fields for Constant in opset 12
2. Support SparseTensor in the Constant node by converting to dense tensor when lifting the Constant to an initializer. Will make a model with a sparse tensor in a Constant work but isn't an overly efficient approach.
2020-03-30 23:13:52 -07:00
stevenlix
2332a93db0
Update onnx-tensorrt parser (#3369)
* sync onnx-tensorrt parser and update TensorRT doc

* remove --msvc_toolset 14.16 in tensorrt ci pipeline
2020-03-30 20:31:59 -07:00