Commit graph

3407 commits

Author SHA1 Message Date
Jeff Bloomfield
389cca7a45 Handle missing initializers in allocation planner to fix crashes with DML provider (#5244)
* Fix memory planning bug with DML EP

* Address PR comments

* Fix typo
2020-09-23 16:50:58 -07:00
Dwayne Robinson
b648fe5f74 ORT DirectML EP for Iron release, ONNX 1.5 (part 2) (#5263)
* Merged PR 5195856: Fix broken cases of zero size tensors in Cast/Reduce

 MaskRCNN failed when `Cast` tried to execute `Xor` with emptiness (zero in dimensions). This is perfectly legal and should be treated as a nop.

Ultimately DML itself should treat this case as a nop, just like how C's `memcpy` treats 0 count as a nop, but I'm just addressing it in ORT now, as enabling it in DML would impact more operators to be consistent (probably should incrementally add a flag to tensor validation so operators can be opted in gradually).

Corresponding WindowsAI PR: https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/5195850

Related work items: #27469839, #28761382

* Merged PR 5201369: Remove copy of initializers added in DMLXP refactor

When used in ORT, a common method shouldn't copy and return initializer data

Related work items: #29514403

Co-authored-by: Justin Stoecker <justoeck@microsoft.com>
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
2020-09-23 16:50:58 -07:00
Yufeng Li
eb75b492cc Fix bug in the back to back quantization of matmul and conv (#5264)
* fix bug in the back to back quantization of matmul and conv

* fix bug in back to back gather
2020-09-23 16:50:58 -07:00
Tianlei Wu
47447da4fd bump version to 1.5.1 (#5258) 2020-09-23 16:50:58 -07:00
Ye Wang
87b15f32ef Fix reshape fusion crash (#5252)
* fix reshape fusion crash

* handling start_node statelessly

* fix
2020-09-23 16:50:58 -07:00
Guoyu Wang
fc259de3bc Fix possible ios build break after update to Xcode 12 (#5246)
* Fix possible ios build break after update to Xcode 12

* Address comments
2020-09-23 16:50:58 -07:00
Sherlock
9fd76c8693 Place Shape's output in CPU memory (#5245)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-23 16:50:58 -07:00
edgchen1
9158679c43 Update BUILD.md training dependency info. (#5240)
Update training dependency versions based on Dockerfile.training.
2020-09-23 16:50:58 -07:00
Changming Sun
b9b7c279fa Update BUILD.md for CUDA versions (#5239) 2020-09-23 16:50:58 -07:00
George Wu
0cbe240ea3 update TensorRT docs (#5238)
* doc updates TensorRT

* update

* update

* fix warning

* newline

* format
2020-09-23 16:50:58 -07:00
Scott McKay
c93f292d1f Revert to using release SafeInt repo now that it supports a build with exceptions disabled. (#5233) 2020-09-23 16:50:58 -07:00
edgchen1
6371ad61c5 Fix TransposeScaleMatMul and MatMulScaleFusion issues (#5230)
- Rename TransposeScaleMatMul back to TransposeMatMul for backwards compatibility
- Fix MatMulScaleFusion issues:
  - Add check for supported execution providers
  - Add check for supported MatMul input types
2020-09-23 16:50:58 -07:00
stevenlix
c27f461c1d Create profile for all dynamic shape input tensors (#5229) 2020-09-23 16:50:58 -07:00
Adam Pocock
4427b1e2a3 [java] Fixing the buffer semantics. (#5223)
* [java] Fixing the buffer semantics.
* Renaming bufferCapacity to bufferRemaining.
* Adding a cast to char* so the pointer arithmetic works on Windows.
2020-09-23 16:50:58 -07:00
George Wu
c909c67701 fix _WIN32 (#5218) 2020-09-23 16:50:58 -07:00
Scott McKay
95b2e31659 Update conversion script and process to simplify creating ORT format models and a minimal build (#5217)
* Update conversion script and process to simplify creating ORT format models and a minimal build.
2020-09-23 16:50:58 -07:00
liqunfu
21a7afb2c6 --shm-size=1024m to fix nccl shared memory issue (#5214)
* --shm-size=256m to fix nccl shared memory issue

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-23 16:50:58 -07:00
RRRachelllll555
b791402f84 Remove shape inference and fix save large model(>2g) issue (#5210)
* remove shape inference and fix save large model problem

* remove unnecessary import

* refine code and add external format for quantize_qat

* remove initializers in tensors_to_calibrate

* small refine

Co-authored-by: t-yguo <t-yguo@microsoft.com>
2020-09-23 16:50:58 -07:00
Pranav Prakash
0a31b9ed3c Fix order of returned values in quantize_weight_per_channel (#5205)
Must match returned order of `quantize_inputs`
2020-09-23 16:50:58 -07:00
Tracy Sharpe
f726af34e0 NCHWc optimizer fixes for quantized models (#5203)
This updates the NCHWc transformer to not interfere with quantized convolution models, based on observations from internal models. The tensor type for MaxPool must be float. The input to GlobalAveragePool/GlobalMaxPool must be in NCHWc format.
2020-09-23 16:50:58 -07:00
S. Manohar Karlapalem
84ffdbc467 Corrects doc typos and formatting (#5201) 2020-09-23 16:50:58 -07:00
Pranav Sharma
24d111c342 Add API to allow configuration of the global thread pools. (#5199) 2020-09-23 16:50:58 -07:00
Zhang Lei
498483b464 MaxPool versioning in quantization tools. (#5194)
MaxPool versioning in quantization tools.
2020-09-23 16:50:58 -07:00
Suffian Khan
39a7f96a44 Fix softmax_warp_backward math when is_log_softmax = True and register LogSoftmax CUDA kernel (#5160)
* register logsoftmax cuda kernel; fix logsoftmaxgrad cuda kernal; fix tests to invoke dispatch_softmax_*

* forgot to remove axis check

* add tests all axis

Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-09-23 16:50:58 -07:00
Shucai Xiao
8e650c5384 Amdmigraphx improvements (#5158)
* code backup

* remove unnecessary log info

* code backup

* code backup

* merge changes from master branch

* code backup

* code backup

* merge changes from master branch

* code backup

* code backup for constant folding enhancement

* code backup

* include more scenarios for constant folding

* code backup

* remove unnecessary code

* remove unnecessary log information

* fix an error in comments

* update algorithm to do graph partition

* code backup

* remove unnecessary log information

* remove an unused function

* remove unnecessary changes
2020-09-23 16:50:58 -07:00
Ye Wang
b693cb1370 Fix a bug in EmbedLayerNorm fusion (#5150)
* fix embedlayernorm bug

* review comments

* interim checkin

* review comments

* Fix core dump in MacOS

* remove unnecessary lines

* update document

* Update graph_utils.cc

* Update onnx_exporter.py

* resolve comments
2020-09-23 16:50:58 -07:00
Changming Sun
5b5bcba9e3 Update MCR CUDA docker image to 10.2 (#5181) 2020-09-17 08:39:47 -07:00
Dmitri Smirnov
ece9a7c1fc Refactor TensorAt, prepare for release (#5180)
* Refactor TensorAt
  locations* must be const and int64_t since our dims are int64_t
  Remove unnecessary copy of locations.
  Remove unnecesary casting and C-casting. Simplify implementation.
  Add a check for string type.
  Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it
  covers more ground and eliminate redundant copy.
  Eliminate inner loop, compute strides first.
2020-09-17 08:39:47 -07:00
Tracy Sharpe
b2994492af MLAS: add sgemm weight prepacking (#5183)
Add support to MLAS to prepack weights for the float GEMM. Support for prepacking has been added to MatMul and Attention for this release.
2020-09-17 08:39:47 -07:00
Tiago Koji Castro Shibata
ecf04d23c4 Fix nuget build (#5163)
* Fix nuget content

* Revert "Fix nuget content"

This reverts commit e2cdcec4e39964c50eac2fb306c7a4bb84352443.

* Nuget packaging

* skip tests

* msbuild path

* Force msbuild version

* Workaround https://github.com/NuGet/Home/issues/7621

* cleanup
2020-09-17 08:39:47 -07:00
Tiago Koji Castro Shibata
b523fa08bc Use onecore umbrella lib in onecore builds (#5182)
* delayload hack

* Skip tests

* Onecore uses onecore umbrella

* Uncomment tests

* cleanup

* Disable dev mode for WinML
2020-09-17 08:39:47 -07:00
Chun-Wei Chen
393ff2f434 Add GetStartTime() for profiler to get private profiling_start_time_ (#4994)
* add GetStartTime() for profiler

* add function in inference_session

* remove qualified name

* add the api in cxx_api.h

* rename starttime to StartTimeNs, expost profiling object

* rename GetProfilingStartTime

* move Ortapis to the right place

* move to the end

* add const for session

* const the right place

* use const auto instead of const auto* for session

* remove const for auto getstarttime

* remove const for auto getstarttime

add unit tests

* nit: update test name and add comments
2020-09-17 08:39:47 -07:00
edgchen1
5d3c962481 Install ssh in builder image, fix segfault in TrainingRunnerTest.Basic. (#5186) 2020-09-17 08:39:47 -07:00
Bowen Bao
53d8779dbc Improve error message for FE model export checking (#5156) 2020-09-17 08:39:47 -07:00
Changming Sun
a0a435abc6
Add sympy==1.1.1 to Linux docker image (#5177) 2020-09-15 16:08:49 -07:00
Tianlei Wu
0752fd7425
change version number from 1.4.0 to 1.5.0 (#5178) 2020-09-15 15:50:25 -07:00
Chi Lo
9f526f45ac
TensorRT Perf Tool (#4900)
* Initialize tensorrt perf script

* Add bert-squad dependencies

* Modified code to make ort inference with CUDA/Tensorrt

* Add get CUDA/TRT version

* uncomment bert-squad

* Add BERT-SQUAD inputs.json

* Add FastRCNN

* Make preprocess/validation in to common functions

* Add MaskRCNN and SSD and consolidate the code

* Add dependencies for MaskRCNN

* following modifications are made:
    - create common fetch function to get inputs/outputs of model from ONNX model zoo.
    - create common validation function to compare inference outputs with reference outputs from ONNX model zoo.
    - move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile).
    - generate table in csv file to show the latency comparison (TRT vs CUDA) side by side.

* Add approache to analyze profling file and also update model related
settings

* Add models

* Add most of models from ONNX model zoo

* Add model input name and print all the model names at the end of run

* Add system info

* Add TRT fp16 support

* Refine the code

* Handle TRT fall back and modify the way to get input data

* Refine code

* Modify code

* Add more precise approach to measure inference

* Add io-binding

* Add YoLoV4

* Refine the code

* Refine the code

* Add models

* Add yolov4 notebook for jetson device

* Update notebook

* Update notebook

* Add CVS models

* Add missing model

* Add support of float16

* Add new way to get trt version

* Add "validate" and "benchmark" mode

* Add randomly generated input

* Refine perf script

* Refine the code.

* Add README

* Refine the code

* Update README.md

* Refine code

* Update README.md

* Remove all the model related python and instead using model_list.json as
models configuration.

Refine the benchmark.py

* Refine the code

Co-authored-by: Chi Lo <lochi@microsoft.com>
2020-09-15 10:06:01 -07:00
Changming Sun
ef496d36ea
Build: Add missing EXCLUDE_FROM_ALL to ONNX submodule (#5161)
Avoid building unnecessary things
2020-09-15 09:22:09 -07:00
Wenbing Li
de6e3fb61d
Reduce IOS shared library size by symbol file. (#5171) 2020-09-14 23:59:41 -07:00
Ryan Hill
8fa427b264
Ryanunderhill/backout 5014 (#5167)
* Revert 5014
2020-09-14 22:48:00 -07:00
Scott McKay
089789c135
Revert change to disable support for loading ORT format models in the packaging pipelines. (#5168) 2020-09-15 15:11:06 +10:00
Sheil Kumar
c0d7c8bc44
Add docs indicating that the onnxruntime engine from other distributions can be compatible with the WinRT NuGet (#5009)
* add docs for mix and matching

* typos

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-09-14 21:15:51 -07:00
RandySheriffH
1dde215d96
promote cuda version on packacking pipelines (#5154)
* promote cuda version on packacking pipelines

* fix cudnn version in py packaing template

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2020-09-14 21:09:09 -07:00
Yufeng Li
3068a835f1
Fix quantization of 1-D conv with bias (#5157) 2020-09-14 18:07:14 -07:00
Andrei Shadrikov
82b25e1731
Fix datasize call in calibrate (#5110)
* Moving datasize to the interface.

* Reverting changes and adressing the comment
2020-09-14 18:06:23 -07:00
S. Manohar Karlapalem
f7edf0aa57
[OpenVINO-EP] Enable EP config options for VPU hardware (#5119)
* Added config flags for VPU Fast Recompile

* clean-up ifdefs

* Add VPU Fast compile config option

Adds an option that enables Fast compilation of models to VPU
hardware specific format.

* Add config option to choose specific device id for inference

Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.

* Add Python API to list available device IDs

* code cleanup

* Add second C/C++ API with settings string parameter

Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.

* Append 'Ex' to the extended C/C++ API

* Use set_providers Py API to set config options.

Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.

* avoid globals for py config options where possible

Co-authored-by: intel <you@example.com>
2020-09-14 15:46:14 -07:00
Zhang Lei
d45e49dd2b
Add LeakyRelu and Sigmoid QLinear Quantization support (#5116)
* Add LeakyRelu and Sigmoid QLinear Quantization support

* Change due to reflect master changes.
2020-09-14 14:46:24 -07:00
Changming Sun
8946d212bf
Remove the dependency on CUDA SDk's version.txt (#5155) 2020-09-14 14:25:28 -07:00
Yufeng Li
20b2f45b24
Support per-channel quantization of weight tensor (#5057)
* Support per-channel quantization of weight tensor

* rename util functions

* fix bugs in calibrate

* add support of reduce_range

* refine opset check
2020-09-14 11:53:50 -07:00
Wenbing Li
2a456d16c0
Enable onnxruntime iOS shared library build. (#5148) 2020-09-14 10:32:39 -07:00