Commit graph

1121 commits

Author SHA1 Message Date
pulkittomar
a50a63aa9e Serialize optimized onnx model (#1470)
* Model serialization

* Removed duplicate symbol

* Minor update

* Review comments

* add tests

* Model serialization

* Removed duplicate symbol

* Minor update

* Merged PR 1106437: Model Serialization in onnxruntime

* Review comments

* Merged PR 1107226: Review comments

Review comments

* add tests

* Fixed merge conflict

* Correct python tests

* InferenceSesssion Refeed Test

* Replace use of widechar const literal-L

* Fixed failing tests

* Updated comment

* Removed unnecessary session options

* Spell check on comments

* Do not serialize when level 3 optimization specified

* Updated error logs

* Changed log severity to WARN
2019-08-12 18:43:40 -07:00
Scott McKay
8a559d75ae
Minor perf improvements. (#1580)
* Minor perf improvements.

- Cache the vector sizes in IExecutionFrame and NodeIndexInfo to avoid calls to size().
  - 2 instructions instead of 10
- Remove an unnecessary check in IExecutionFrame
  - add a check to the ctor so we guarantee it's unnecessary
- Reserve memory for the vectors in BroadcastIterator
  - saves reallocs if more than one value is added
    - but rare with the mlperf models for multiple values to be added so benefit is limited.
  - slight tweak to the Broadcaster ctor code to make it more readable
2019-08-13 09:05:48 +10:00
Pranav Sharma
a6a4c4c079
Fix perf test executable. (#1598)
* Mention OrtCreateSessionFromArray in C API doc

* Fix perf test executable due to removal of certain C APIs

* fix linux build

* Avoid duplication

* Fix mem leak
2019-08-12 09:49:29 -07:00
AlbertSadovnikov
ce3c8f98dd Fix for CPU random ops seed narrowing conversion. (#1594) 2019-08-12 09:01:13 -07:00
Malik Shahzad Muzaffar
df9b1b8ec8 Include io_win32.h only if builds on windows (#1587)
* Include io_win32.h only if builds on windows

* looks like include order matters
2019-08-12 08:18:42 -07:00
Tomasz Dołbniak
69baf9e800 Update nGraph to v0.22.1 (#1582)
* Update nGraph to 0.21 and adjust the EP

* Share the graph initializers between custom ops

* Update nGraph to 0.22 and exclude Gather entirely

* Enable building on Windows with nGraph v0.21.1-rc.0

* Disable the unsigned input Shrink op tests for nGraph until the next update

* Line-shortening code refactor

* Fix for the master branch merge artifact

* MKLDNN patches adjustment for Windows

* Exclude MatMulInteger for non-const zero points

* Exclude ConvInteger for non-const zero points

* Enable full Cast op support

* Use the v0.22.1 tag

* Skip ConvTranspose_InvalidKernelShape test for ngraph provider

* Create sub-graph ModelProto from fused_node
2019-08-10 17:41:08 -07:00
Ashwini Khade
7be40b2946
put all gemmlowp common code in one place (#1590)
* put all gemmlowp common code in one place

* fix gpu build failures

* minor update
2019-08-10 17:01:07 -07:00
Ke Zhang
59c9d83f35
add int64 support for less op. (#1604) 2019-08-09 17:16:57 -07:00
Wei-Sheng Chin
0187d876cb Implement new LabelEncoder in opset 2 in ML domain (#1393)
* Implement new LabelEncoder in opset 2 in ML domain

* Fix compilation error

* Fix tests

* Include ONNX's fix

* Formatting and addressing a comment

* Address a minor comment
2019-08-09 14:03:58 -07:00
manashgoswami
6d783e8a07 Added license files in the base image (#1595)
* Update Dockerfile.openvino

* Update Dockerfile.cuda

* Update Dockerfile.cuda

* Update Dockerfile.openvino

* Update Dockerfile.cuda

* added ThirdParty notice file to base image.

* corrected license file name
2019-08-09 13:02:06 -07:00
ybrnathan
9b83545f66
Optimize Fence checking performance (#1593)
* For majority of nodes, we do not need to do fence check. Instead, we only need to do FenceCheck for CPU<->GPU mem sync node
But we pay the Fence check cost for every single node and every single input and output.

This change will minimize the Fence check to only do it when necessary.
2019-08-08 20:16:13 -07:00
stevenlix
1c5b15c2b8
Remove memory copy between TensorRT and CUDA (#1561)
* remove memory copy between CUDA and TRT

* add info to RegisterExecutionProvider input

* use new IDeviceAllocator for trt allocator

* remove SetDefaultInputsMemoryType from TRT EP

* remove onnx-tensorrt 5.0

* add submodule onnx-tensorrt branch 5.1

* remove redundancy

* Update transformer_memcpy.cc

* Update tensorrt_execution_provider.cc

* switch to TensorRT 5.1.5.0

* update python binding

* disable failed test case on TensorRT

* Update activation_op_test.cc

* upgrade to TensorRT container 19.06

* update according to feedback

* add comments

* remove tensorrt allocator and use cuda(gpu) allocator

* update onnx-tensorrt submodule

* change ci build cuda directory name
2019-08-08 19:31:39 -07:00
Hector Li
38d78542c3
Fix race condition issue in RNN/LSTM/GRU (#1544)
Fix race condition issue in RNN/LSTM/GRU.

Description:
The filter_desc and rnn_desc could also be changed in compute which could be in multi-thread. It will cause race condition issue.

Fix:
create temperate cudnn descriptors
cache cudnn_dropout_desc_ which won't change
2019-08-08 14:18:41 -07:00
Scott McKay
6e430c0526
A few performance improvements coming out of ssd_mobilenet and ssd_resnet34 analysis (#1578)
* A few performance improvements:
 - Make the iteration in NonZero more efficient by using a raw pointer and simplifying the increment logic
   - add another unit test to check the new logic works with 3 dimensional tensor
   - gains about 2% for ssd_mobilenet
 - Avoid floating point operations on each iteration on Concat
  - about 0.5% for ssd_mobilenet and ssd_resnet34
 - Put common case first in ExecutionFrame::AllocateAsPerAllocationPlan to avoid unnecessary call to IsSparseTensor
  - about 0.05% for ssd_mobilenet
 - Minor tweak to put some ctors in the TensorShape header so they can be inlined more easily
2019-08-08 07:20:00 +10:00
Pranav Sharma
a443b013dd
Remove unneeded C APIs + some refactoring. (#1555)
* Mention OrtCreateSessionFromArray in C API doc

* c api changes after review (1)

* updates...

* fixes

* Reorder include
2019-08-07 11:05:29 -07:00
Ashwini Khade
a93ece2727
update quatizelinear to process int8 input (#1576) 2019-08-07 10:09:15 -07:00
Changming Sun
aeb0bcb4a3 parallel build 2019-08-07 08:38:26 -07:00
Hariharan Seshadri
9a34089f67
Add more type support for OneHot op (#1565) 2019-08-06 17:45:42 -07:00
Changming Sun
9e926fef1c
Add a doc for cmake (#1524) 2019-08-06 07:51:53 -07:00
Changming Sun
65ff02fdb0
Set job timeout for code coverage pipeline to 120min(#1563) 2019-08-06 07:49:31 -07:00
Ashwini Khade
16087f3133
update default values for weight quatization (#1564) 2019-08-05 21:39:37 -07:00
Changming Sun
7ee8aca1bf
Avoid downloading test data into C:\ (#1562) 2019-08-05 19:53:15 -07:00
S. Manohar Karlapalem
05bbb3065c [OpenVINO-EP] Update hardware branding of VAD-R as VAD-M (#1552)
Replaces all occurrences of VAD-R/VAD_R with VAD-M/VAD_M.
Aligns with the official hardware branding.
2019-08-05 15:28:46 -07:00
Hariharan Seshadri
ceb8f1c1a2
Modify the kernel declaration for Shrink op (#1554)
* Add capability for the input and output of Shrink op to share a commong buffer

* Cosmetic change
2019-08-05 13:21:04 -07:00
pengwa
6c271c63ac
add test cases for commit c019bb9355a511f471e55e7302b26e1d370ed46a (#1556) 2019-08-04 17:18:45 +08:00
jywu-msft
8a6bfe00af
roll back model test update for ngraph provider. (#1551) 2019-08-02 15:53:32 -07:00
Yufeng Li
a098be12ba
Register kernel for Greater int64 (#1546)
Register int64 for Greater and refactor the register code
2019-08-02 14:01:43 -07:00
Ke Zhang
cb71c69d5e
checking execution provider logic updated. (#1547) 2019-08-02 13:29:39 -07:00
daquexian
93cb29f958 [WIP] NNAPI EP Update (#1540) 2019-08-01 22:25:56 -07:00
Scott McKay
9fb8867a24
Don't create implicit input for outer scope value if there is a subgraph input with the same name. (#1186)
* If there is an outer scope value that matches a subgraph input, don't create an implicit input from the outer scope value.

Minor unrelated change for issue noticed while debugging: Use unordered_set for implicit inputs so we don't add them multiple times.

* Add unit test based on onnx issue.
2019-08-02 07:23:41 +10:00
Ke Zhang
1cf5ebc4c5
copyfromhost/copytohost are not needed for mkldnn ep (#1532)
* memcpy is not necessary for mkldnn ep to copy from/to host.

* update
2019-08-01 13:22:15 -07:00
Hariharan Seshadri
624411bb69
Upload correct ESRP signed package (#1531) (#1534) 2019-08-01 10:56:18 -07:00
Changming Sun
3045a5f88b
Update test data (#1512)
* Update test data
2019-08-01 10:42:08 -07:00
Hariharan Seshadri
465b30e3ca
Bug fix for shape of optional output in Dropout op (#1507)
* Bug fix for shape of optional output in Dropout op

* Exclude new test from NGraph EP

* Account for the fact that mask could be of different type in different opset variants of the op

* Make accompanying Cuda changes

* Fix build break

* Exclude Opset 7 test for tensorRT EP

* PR comments
2019-07-31 22:37:11 -07:00
Hector Li
57e2482089
Fix a bug in Expand cuda op implementation. (#1528)
Description:
crash if the output shape has 0 in it. because the code to / output_shape[i]
Fix:
If the output shape has 0 which means output_shape.Size() is 0, so output should be null.
2019-07-31 21:21:49 -07:00
Ashwini Khade
b599360014
enable sse4.1 optimizations for gemmlowp (#1529) 2019-07-31 18:44:02 -07:00
Hariharan Seshadri
28a6f6b11b
Add back MacOS leg of the Python packaging job (#1523) (#1526)
* Add MacOS leg of Python packaging job

* Update copy files source directory for Mac OS leg

* Add a task to display the binaries directories contents after build wheel creation

* Revert some changes

* Add task to log

* Update

* Remove unnecessary logs
2019-07-31 15:57:26 -07:00
Hariharan Seshadri
4d768b3a0f
Fix inclusion of ARM binary in the release pkg (#1513) (#1521)
* Fix inclusion of ARM binary in the release pkg

* Add lib and pdb as well
2019-07-31 15:57:03 -07:00
shahasad
fb5d0fc538
Publish nuget package to azure blob store (#1525)
Publish daily build NuGet package to Azure blob store for sharing among internal partners
2019-07-31 14:17:54 -07:00
Tracy Sharpe
0b0e32909a
NCHWc: Enable Conv/Add fusion for stride=2 convolutions (#1518)
Update the NCHWc graph transformer to allow Conv/Add fusion for convolutions where stride=2.
2019-07-31 12:30:05 -07:00
Scott McKay
14d46ee890
Init prev_Ht for zero length sequence to avoid valgrind warning. (#1516)
Couple of performance cleanups
  - don't create debug label string unless dumping matrixes
  - use raw pointer in fill_n calls
2019-07-31 14:46:00 +10:00
Jorgen Thelin
fb7bdd177b Profiler-IsEnabled (#1503)
Avoid use of Hungarian naming convention for cross-platform API code.

I'm taking my cue here from the "ONNX Runtime coding conventions and standard" document which say we use the "Google C++ style guide", and that says "Do not use Hungarian notation"
https://github.com/microsoft/onnxruntime/blob/master/docs/Coding_Conventions_and_Standards.md
https://google.github.io/styleguide/cppguide.html#Windows_Code

X-ref: internal PR 4824
2019-07-30 13:32:01 -07:00
shahasad
a86486ab7f
Post binary sizes to dashboard database (#1517)
Python script and necessary changes in the azure-pipelines yaml file to post the binary size data from NuGet package build. Currently only posted from CPU pipeline. GPU and other pipelines may be added as necessary.
2019-07-30 08:59:43 -07:00
Pranav Sharma
44ab301586
More C API changes. (#1519)
* Mention OrtCreateSessionFromArray in C API doc

* Cleanup a few inconsistencies in the C API.

* updates

* More updates
2019-07-29 18:35:28 -07:00
Dwayne Robinson
cf73f63cb9 Enable float16 MatMul+Add -> GEMM fusion for performance boost (#1506) 2019-07-29 15:18:02 -07:00
Ke Zhang
cf5a4b5856
remove the GetStream from cuda ep. (#1514)
* remove the GetStream from cuda ep.

* fix comments
2019-07-29 15:01:29 -07:00
Yufeng Li
d6a30485be
Rename Tensor.Size() to Tensor.SizeInBytes() (#1502)
Rename Tensor.Size() to Tensor.SizeInBytes()
2019-07-26 14:15:53 -07:00
Hariharan Seshadri
6f538dc861
Support missing optional attribute in Squeeze operator (#1505)
* Make Squeeze operator support no axes attribute cases

* Fix build break

* Resolve PR comments and exclude tensorrt for the new tests
2019-07-26 11:16:35 -07:00
Hector Li
717e764e8e
Move Class CudnnDropout to cudnn_common.h (#1492)
1. Move non_max_suppression_test.cc to object_detection folder

2. Move Class CudnnDropout to cudnn_common.h so that can share it with other ops. Move the cuda memory allocation part out of CudnnDropout to avoid memory leak.
2019-07-26 10:41:13 -07:00
Emma Yu
8589be69b2 Organized build instructions (#1504) 2019-07-26 09:12:24 -07:00