Commit graph

65 commits

Author SHA1 Message Date
Thiago Crepaldi
8a890ddfd7
Sync ORTModule branch with master and fix tests (#6526)
* Deprecate Python global configuration functions [Part 1] (#5923)

Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.

* remove dnnl_dll_path from post build copy (#6142)

* Model Fusion For Bart (#6105)

Fusion fix for Bart models

* Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108)

* Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers
* Change Provider_IExecutionProviderFactory to be the core version.

* Enable running the mnist_training sample without cuda (#6085)

Signed-off-by: George Nash <george.nash@intel.com>

* nnapi add min max support (#6117)

* Fix CUDA test hang: (#6138)

- Make condition check in `CUDAAllocatorTest` to ensure CUDA device is present.

* Fix TensorRT kernel conflict issue for subgraphs of control flow operators (#6115)

* add static subgraph kernel index

* change kernel naming to avoid conflicts

* Add gradient registration for Abs. (#6139)

* Partition initial optimizer state for Zero-1 (#6093)

* Initial changes

* Working changes

* Working changes

* Cleanup

* fix windows CI

* Review comments

* review comments

* Fix edge case in BFCArena where allocation failures could lead to an infinite loop. (#6145)

#4656

* Revert "work around of the build break in mac (#6069)" (#6150)

This reverts commit 3cae28699b.

* Fix clean_docker_image_cache.py detection of image pushes. (#6151)

Fix clean_docker_image_cache.py detection of image pushes. They were being ignored because the expected HTTP status code was wrong. For pushes, it's 201 instead of 200.

* MLAS: add NEON version of int8 depthwise convolution (#6152)

* Using a map of of ops to stages as input of partition function. (#5940)

* New partition algorithm running before AD

* Convert cut_group_info into device map. Work in progress -- works for  bert-tiny with pp=2

* Removing code for partition of bwd graphs

* Remove old code

* Adding some verification code

* Handle Shared Initializer

* Renaming rank with stage

* Added first unit test

* new test

* redundant check

* undo change in bert

* Moved cut-based partition to testing utils file

Co-authored-by: xzhu1900
Co-authored-by: wschin

* New conversion function and tests

* minor

* remove test that is not needed2

* improve GetDeviceAssignment and PR comments

* minor changes

* PR comments

* improving documentation and variable naming

* add documentation

* Variable naming and docs

* more doc improvements

* more doc improvements

* missing static cast

* Fix test file for windows

* Fix test file for windows

* Fix test file for windows

* stage id is not the same as rank id

* PR comments

* PR comments

* More comments

* More comments

* Minor fix to satisfy c++14 (#6162)

* Deprecating Horovod and refactored Adasum computations (#5468)

deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests

* Update TensorRT-ExecutionProvider.md (#6161)

* Bugfix for topk cuda kernel (#6164)

* fix the issue that std::numeric_limits cannot handle half type

* adding a test

Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Revert "Fuse MatMulIntegerToFloat only when scales are scalar (#6008)" (#6169)

This reverts commit f2dcba7afe.

* Remove ignored build warnings for pybind on Mac (#6165)

* save_checkpoint, load_checkpoint and aggregate_checkpoints (#6136)

* save_checkpoint and load_checkpoint implementations

* checkpoint aggregation logic

* unit tests for save_checkpoint, load_checkpoint and aggregate_checkpoints

* Don't try to bind unused inputs in the Training frontend (#6166)

* Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172)

* aggregate model states only for the case when mixed precision was true (#6176)

* [NNAPI EP] Enable per-channel quantization for QlinearConv  (#6155)

* Enable qlinearconv per-channel quantization

* Fix the android CI test failure

* Add Android Version Check for Per-Channel Quant

* Address PR comments

* Fix some minor issues

* Add verification of per-channel zero points

* Make the error tolerance configurable

* Fix typo in BERT pretraining script (#6175)

A misplaced `}` meant that the `'enable_adasum'` option was interpreted incorrectly, causing the test to fail.

* Update get_docker_image.py to enable use without image cache container registry. (#6177)

Update get_docker_image.py to enable use without image cache container registry.

* Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156)

* Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions.

Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach).

* Restructure the helper so it can be called across the EP bridge.
Add ability to call id generation helper from EP bridge
  - convert DNNL EP to use helper to validate
Address issue where a new Model may be loaded into the same address as a previous one.
  - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model
Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time.
  - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution.

* Backend APIs for checkpointing (#5803)

* Add backend API GetOptimizerState and GetModelState

* add GetPartitionInfoMap

* Android coverage dashboard (#6163)

* Write the report to a file.

* Post code coverage to the Dashboard database.

* Add usage details of unified MCR container image (#6182)

Going forward, a single unifed docker image will be published in
MCR. The hardware accelerator target choice will have to be made
in the application using OpenVINO EP's runtime config options.

* improve perf for softmax (#6128)

* improve perf for both gathergrad and softmax

* revert the change in gathergrad and will be done in another PR.

* address comments from code review.

* Tune fast Gelu to use exp(x) instead of tanh(x) on Rocm platform (#6174)

* tune fast gelu to use exp(x) instead of tanh(x) on rocm

* update to use expression 2/(1+exp(-2x))-1 for stability

* Add Status.csv to EP Perf Tool (#6167)

* merge master, keep postprocess status commit

* download float16.py everytime

* removing hardcoded values

* Lochi/quantization tool for trt (#6103)

* Initial implementation of generating calibration dynamic range table

* Initialize validation support for Quantization

* Initialize validation support for Quantization (cont.)

* Improve validation support for Quantization

* Improve validation support for Quantization

* Rewrite/Refine for calibration and validation

* Rewrite/Refine for calibration and validation (cont.)

* Refine code

* Refine code

* Add data reader for BERT

* Add flatbuffers to serialize calibration table

* Refine code and add BERT evaluation

* Refine the code

* minor modification

* Add preprocess/postprocess of vision team yolov3 and refine the code

* Update annotation

* Make bbox cooridates more accurate

* Fix bug

* Add support of batch processing

* Batch processing for model zoo yolov3

* Add batch inference for evaluation

* Refine the code

* Add README

* Add comments

* Refine the code for PR

* Remove batch support checking in data_reader and refine the code

* Refine the code for PR

* Refine the code for PR review

Co-authored-by: Olivia Jain <oljain@microsoft.com>

* Implement ScatterND for CUDA EP (#6184)

* Condition fix in Resize operator (#6193)

* Clean up checkpoint tests to use the new checkpoint functions (#6188)

* add deprecation warning for old checkpoint functions

* update all the distributed checkpoint tests to use new checkpoint functions

* Implement comparing outputs that are sequence of maps of strings to floats (#6180)

* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats

* PR comments

* Dockerfile to build onnxruntime with ROCm 4.0

* Add ability to skip GPU tests based on GPU adapter name (#6198)

* Implement conversion from ortvalue to Itensor for string tensors and comparing sequence of maps of strings to floats

* PR comments

* Add ability to skip gpu tests according to adapter description

* spacing

* spacing

* spacing

* Openvino ep 2021.2 (#6196)

* Enabling fasterrcnn variant and vehicle detector

* changes for 2021_2 branch

* yolov3_pytorch commit

* fixed braces in basic_backend.cc

* ci information added

* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2

* some changes to support unit tests

* disable some tests which are failing

* fix myriad tests for vehicle detector

* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* yolov3 pytorch workaround to ensure that the output names are matched

* gemmoptest fixed on myriad

* Fixed MYRIADX CPP Test Failures

*Expand,GatherND,Range,Round op's
are only supported in model

*where op with float input data
types are not supported and fixed

*Scatter and ScatterElements op's with
negative axis are fixed

*Reshape op with 0 dim value are not
supported and fixed

*Disabled InstanceNorm_2 test on MYRIADX

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* make changes to yolov3 pytorch

* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes POW op failures on GPU_FP16

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Clean up capability_2021_2.cc

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fixed slice and removed extra prints

* Disabled failing python tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes added in capabilty_2021_2

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* made changes to slice to avoid failures

* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Updated docx for Inferencing a FP16 Model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* fix for mask rcnn

* Script for installing openvino from source

* Updated with openvino 2021.2 online installation

* code comment fixes
fixed accuracy mismatch for div

* Update OpenvinoEP-ExecutionProvider.md

updated for 2021.2 branch

* Update README.md

updated dockerfile documentation

* Update BUILD.md

build.md update documentation

* permissiong change of install_openvino.sh

* made changes to align with microsoft onnxruntime changes

* Updated with ov 2021.2.200

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>

* Fix a memory leak in test_inference.cc (#6201)

* Fix a memory leak in test_inference.cc

* Use TArray in AMD element-wise kernels, rather than manually copying memory to device.

* Remove most ROCm-specific element-wise code and reuse CUDA element-wise code.

* Minor change to improve performance for operator Pad. (#5537)

* small improvment for pad

* Support double for operators Log, Reciprocal, Sum (CPU) (#6032)

* Support double for operators Log, Reciprocal, Sum
* remove tesdt erf_double

* Support double for operators Where, LpNormalisation (#6034)

* Support double for operators Relu, Tanh, Sigmoid (#6221)

* Fix ImportError in build.py (#6231)

There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already

* Removed executor todo that looks dead. (#6234)

* Remove MKLML/openblas/jemalloc build config (#6212)

* Remove python 3.5

* Update the readme file

* Upgrade build.py to assert for python 3.6+

Upgrade build.py to assert for python 3.6+
as python 3.5 cannot build anymore todays master.

* Support MLFloat16 type in Pow opset-12 CUDA kernel (#6233)

* MLAS: handle MlasGemm(M/N/K==0) cases (#6238)

* Support double for operator TopK + fix one bug in TopK implementation for GPU for double (#6220)

* Support double for operator TopK
* add static classes for topk/double
* fix cast issue in topk

* Support double for operator Gemm + fix bug in gemm implementation for cuda, rocm when sizeof(type) != sizeof(float) (#6223)

* Support double for operator Gemm
* fix type size while copying data in gemm operator for GPU
* fix type in gemm implementation for rocm

* Support double for operator ReduceMean, ReduceLogSumExp (#6217)

* Support double for operators ReduceMean, ReduceLogSumExp

* Support double for operator ArgMin (#6222)

* Support double for operator ArgMin
* add test specifically for double
* add new test on pai-excluded-tests.txt

* Update BUILD.md

* Update manylinux docker image to the latest (#6242)

* Fix allocator issue for TensorRT IOBinding (#6240)

* Fix issue: https://github.com/microsoft/onnxruntime/issues/6094

Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt.

Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT

* Tune BiasGeluGradDx kernel in approximation mode to avoid tanh(...) on Rocm (#6239)

* bias gelu grad use exp(...) instead

* update cuda to rocm

* missing semicolon

* comment

* remove dockerfile

* missing factor of two

* Refactor EP Perf Tool  (#6202)

* merge master, keep postprocess status commit

* download float16.py everytime

* using variables to reference eps

* adding ACL EP to ep perf tool

* accuracy with absolute tolerance configurable

* add acl to dict + remove commented line

* Documentation for distributed CI tests pipeline (#6140)

* Remove a debug log in provider_test_utils.cc (#6200)

* Add the Concat Slice Elimination transform, fix constant_folding transform (#5457)

* Add concat slice transform + test

* Cosmetic improvements in concat slice transform

* Remove unrelated file, fix comment, fix constant folding bug

* Add test onnx graph

* fix windows build

* Review comments

* review comment

Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252)

* Add MakeStringLite which uses current locale, update macros to use that to generate messages.

* Convert calls to MakeStringLite().

* Liqun/speech model loop to scan (#6070)

Provide a tool to convert Loop to Scan for Nuphar performance
Fix Nuphar CI pipeline failures.

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* model parallel refinement (#6244)

* Megatron Transformation as a seperate step

* remove useless header

* clang formating

* Re-Structure megatron transformer for subsquent changes

* fix  comments

* Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248)

* Fix Linux/Mac error message on input type mismatch (#6256)

* add bfloat16 to gathergrad type constrains (#6267)

Co-authored-by: Cheng Tang <chenta@microsoft.com>

* Fix VS 2017 build break (#6276)

* Deprecate Python global configuration functions [Part 2] (#6171)

Update Python API to allow more flexibility for setting providers and provider options.

The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict).
Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order.
Convert some usages of the deprecated global configuration functions to use EP-specific options instead.

Update some EP-specific option parsing to fail on unknown options.

Other clean up.

* Add script to preprocess python documentation before publishing (#6129)

* add script to preprocessing python documentation before publishing

* rename past to past_key_values for GPT-2 (#6269)

rename past to past_key_values for transformers 4.*

* Rename MakeString and ParseString functions. (#6272)

Rename MakeString to MakeStringWithClassicLocale, MakeStringLite to MakeString, *ParseString to *ParseStringWithClassicLocale.
Add missing pass-through versions of MakeStringWithClassicLocale for string types.

* Increase timeout for Linux GPU CUDA11 build. (#6280)

* Add helper to compare model with different precision (#6270)

* add parity_check_helper.py

* add real example

* remove lines

* Fix Min/Max CPU kernels for float16 type (#6205)

* fix data_ptr assertion error for past_sequence_length=0 in GPT-2 (#6284)

 fix io binding crash for past_sequence_length=0

* A list of changes in transformers tool (#6224)

* longformer fp16 e2e

* add fp16/fp32 parity check helper file

* excludes nodes with subgraph in profiling

* use onnxconverter_common to do fp32->fp16

* add version check for onnxconverter_common

* remove helper file

* add pkg installation on notebooks and script

* Workaround for static_cast<double>(half)

* Add workaround to remove ROCm-specific binary-elementwise files.

* Update nuget build (#6297)

1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.

* Enable ONNX backend test of SequenceProto input/output  (#6043)

* assert sequence tensor and remove skips

* update testdata json

* use ONNX 1.8 in cgmanifest.json

* use previous commit to workaround

* update ONNX commit ID in docker

* skip test_maxpool_2d_dilations test for now

* update function name

* add --sequence_lengths option (#6285)

* more dtype for Equal CUDA kernel (#6288)

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Force reinstall onnx python package on Windows (#6309)

* update transformers required package versions (#6315)

* Remove abs in LpPool (#6303)

* Support 1D input for Conv + Mul/Add fusion optimizer with test (#6295)

* Support 1D input (N C H) for Conv + Mul/Add fusion optimizer with test cases and test models.

* Add longformer to  python package (#6314)

* add longformer to python package
* move test related script and data to a new folder

* Avoid false sharing on thread pool data structures (#6298)

Description: This change adds alignment and padding to avoid false sharing on fields in the thread pool. It also adds a new microbenchmark to profile thread-pool performance over short loops.

Motivation and Context
MobileNet on a 2*12-core system showed a performance gap between the ORT thread pool and OpenMP. One cause appeared to be false sharing on fields in the thread pool: ThreadPoolParallelSection::tasks_finished (which the main thread spins on waiting for workers to complete a loop), and the RunQueue::front_ and back_ fields (used respectively by the worker thread and the main thread).

The additional micro-benchmark BM_ThreadPoolSimpleParallelFor tests performance of loops of different sizes at different thread counts. The results below are on a machine with 2*14-core processors (E5-2690 v4) running with 1, 14, 15, and 28 threads. For each test, the microbenchmark has N threads run a loop with N iterations; hence a perfect result is for the time taken to be constant as additional threads are added (although we will also see power management effects helping at very low thread counts). The loop durations (100000, 10000, 1000) correspond roughly to 200us, 20us, and 2us on this machine.

Before change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17153 us 17154 us 32
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 22553 us 22553 us 30
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 21521 us 21521 us 29
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24111 us 24111 us 24
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1719 us 1719 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 3409 us 3409 us 200
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 3541 us 3541 us 201
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 4576 us 4576 us 151
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 174 us 174 us 4017
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 1586 us 1586 us 402
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 1586 us 1586 us 397
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 2864 us 2864 us 232

After change:
BM_ThreadPoolSimpleParallelFor/1/1/100000/real_time 17160 us 17160 us 33
BM_ThreadPoolSimpleParallelFor/14/14/100000/real_time 20989 us 20989 us 31
BM_ThreadPoolSimpleParallelFor/15/15/100000/real_time 22286 us 22286 us 31
BM_ThreadPoolSimpleParallelFor/28/28/100000/real_time 24631 us 24631 us 25
BM_ThreadPoolSimpleParallelFor/1/1/10000/real_time 1718 us 1718 us 407
BM_ThreadPoolSimpleParallelFor/14/14/10000/real_time 2868 us 2868 us 242
BM_ThreadPoolSimpleParallelFor/15/15/10000/real_time 2907 us 2907 us 240
BM_ThreadPoolSimpleParallelFor/28/28/10000/real_time 3872 us 3872 us 186
BM_ThreadPoolSimpleParallelFor/1/1/1000/real_time 175 us 175 us 3938
BM_ThreadPoolSimpleParallelFor/14/14/1000/real_time 933 us 933 us 659
BM_ThreadPoolSimpleParallelFor/15/15/1000/real_time 912 us 912 us 591
BM_ThreadPoolSimpleParallelFor/28/28/1000/real_time 1976 us 1976 us 317

* fix opset imports for function body  (#6287)

* fix function opsets

* add tests and update onnx

* changes per review comments

* add comments

* plus updates

* build fix

* Remove false positive prefast warning from threadpool (#6324)

* Java: add Semmle to Java publishing pipelines (#6326)

Add Semmle to Java API pipeline
  Add security results publishing and add Java GPU.

* Quantization support for split operator with its NHWC support (#6107)

* Make split working for quantization.

* NHWC transformer support for split operator

* Refactor some according to Feedback. Will add test cases soon.

* Fix build error on windows.

* Add test case for split op on uint8_t support

* Add nhwc_transformer_test for split uint8_t support

* Some change according to PR feedbacks.

* Liqun/enable pipeline parallel test (#6331)

enable pipeline parallel test
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340)

This removes a special case of the cuda EP.

* MLAS: add fallback implementation for quantized GEMM (#6335)

Add a non-vectorized version of the kernel used for the quantized version of MlasGemm.

* Delete float16.py (#6336)

No longer needed. Also doesn't pass policheck.

* Enable add + softmax fusion for Rocm platform (#6259)

* add bias softmax; tests appear to pass

* check fusion occurs for rocm as well

* check for rocm provider compatible as well

* build for cpu scenario as well

* try again; broader cope

* proper scope on kGpuExecutionProvider

* been editing wrong file

* remove commented #include lines

* try again due to mac os ci error

* try again

* test fusion both cuda and rocm to avoid mac ci error

* add external data support to tensor proto utils (#6257)

* update unpack tensor utilities to support loading external data

* more updates

* fix test

* fix nuphar build

* minor build fix

* add tests

* fix Android CI

* fix warning

* fix DML build failure and some warnings

* more updates

* more updates

* plus few updates

* plus some refactoring

* changes per review

* plus some change

* remove temp code

* plus updates to safeint usage

* build fix

* fix for safeint

* changed wording. (#6337)

* Remove OpSchema dummy definition. Only needed for Function now, and we can just exclude the method in Function (#6321)

* remove gemmlowp submodule (#6341)

* [NNAPI] Add pow support (#6310)

* Add support for running Android emulator from build.py on Windows. (#6317)

* fix the pipeline failure (#6346)

* Train BERT Using BFloat16 on A100 (#6090)

* traing bert using bf16

* Adam support bf16

* bugfix

* add fusedmatmul support

* fix after merge from master.

* bugfix

* bugfix after merge from master

* fast reduction for bf16.

* resolve comments

* fix win build

* bugfix

* change header file.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Fix DerefNullPtr issues raised by SDLNativeRules. (#6348)

* update quantize to support basic optimization and e2e example for image classification (#6313)

update the resnet50-v1 to standard one from onnx zoo.
add an example for mobilenet
run basic optimization before quantization
fix a bug in Clip

* Enable graph save for orttrainer (#6333)

* Enable graph save for orttrainer

* Fix CI

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

* Update orttraining/orttraining/python/training/orttrainer_options.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add PREfast to python packaging pipeline (#6343)

* Add PREfast to python packaging pipeline

* fix longformer benchmark io_binding output_buffers (#6345)

* fix longformer benchmark io_binding output_buffers

* format

* import benchmark_helper from parent directory.

* Use readelf for minimal build binary size checks. (#6338)

* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function

* Java: Set C language warnings to W4 and adjust JNI code (#6347)

Set /W3 for C language and fix up JNI warnings.

* Pipeline Parallel Experimental Python API (#5815)

* Add create session to WinML telemetry to track WinML Usage (#6356)

* Fix one more SDL warning (#6359)

* fix -Wdangling-gsl (#6357)

* Add python example of TensorRT INT8 inference on ResNet model (#6255)

* add trt int8 example on resnet model

* Update e2e_tensorrt_resnet_example.py

* remove keras dependency and update class names

* move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example

* simplify e2e_tensorrt_resnet_example.py

* Update preprocessing.py

* merge tensorrt_calibrate

* Update calibrate.py

* Update calibrate.py

* generalize calibrate

* Update calibrate.py

* fix issues

* fix formating

* remove augment_all

* This added telemetry isn't needed (#6363)

* Wezuo/memory analysis (#5658)

* merged alloc_plan

* pass compilation

* Start running, incorrect allocation memory info

* add in comments

* fix a bug of recording pattern too early.

* debugging lifetime

* fix lifetime

* passed mnist

* in process of visualization

* Add code to generate chrome trace for allocations.

* in process of collecting fragmentation

* before rebuild

* passed mnist

* passed bert tiny

* fix the inplace reuse

* fix the exception of weight in pinned memory

* add guards to ensure the tensor is in AllocPlan

* add customized profiling

* debugging

* debugging

* fix the reuse of differnt location type

* add rank

* add the rank

* add fragmentation

* add time_step_trace

* Add summary for each execution step (total bytes, used/free bytes).

* add top k

* change type of top k parameter

* remove prints

* change heap to set{

* add the name pattern

* add the useage for pattern

* add partition

* change to static class

* add custom group

* remove const

* update memory_info

* in process of adding it as runtime config

* change the memory profiling to be an argument

* add some comments

* add checks to recored meomry_info in traaining session

* set the "local rank setting" to correct argument.

* addressing comments

* format adjustment

* formatting

* remove alloc_interval

* update memory_info.cc to skip session when there is no tensor for a particular memory type

* fix memory_info multiple iteration seg-fault

* consolidate mainz changes

* fixed some minor errors

* guard by ORT_MINIMAL_BUILD

* add ORT_MEMORY_PROFILE flag

* added compiler flag to turn on/off memory profiling related code

* clean up the code regarding comments

* add comments

* revoke the onnx version

* clean up the code to match master

* clean up the code to match master

* clean up the code to match master

Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>

* Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355)

* Add CumSum-14 for Cuda

* fix convert_common version retrival (#6382)

* Refine auto_pad based pad computation in ConvTranspose (#6305)

* Fix SDL warning (#6390)

* Add max_norm for gradient clipping. (#6289)

* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests

* Add the custom op project information (#6334)

* Dont use default string marshalling in C# (#6219)

* Fix Windows x86 compiler warnings in the optimizers project  (#6377)

* [Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376)

* Unblock Android CI code coverage failure (#6393)

* fix build on cuda11 (#6394)

Co-authored-by: Vincent Wang <weicwang@microsoft.com>

* Load the model path correctly (#6369)

* Fix some compile warnings (#6316)

* OpenVino docker file changes to bypass privileged mode

Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device.

Motivation and Context

This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments.

* Megatron checkpointing (#6293)

* Add bart fairseq run script

* Add frontend change to enable megatron

* Initial changes for checkpointing

* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H

* Add load_checkpoint changes

* Fix CI

* Cleanup

* Fix CI

* review comments

* review comments

* review comments:

* Fix generate_submodule_cgmanifest.py Windows issues. (#6404)

* Continue memory planning when unknown shape tensor is encountered. (#6413)

* Reintroduce experimental api changes and fix remote build break (#6385)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Add support for custom ops to minimal build. (#6228)

* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.

* enable pipeline to run quantization tests (#6416)

* enable pipeline to run quantization tests
setup test pipeline for quantization

* Minor cmake change (#6431)

* Liqun/liqun/enable pipeline parallel test2 (#6399)

* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Farewell TrainableDropout (#5793)

* Deprecate TrainableDropout kernel.

* Update bert_toy_postprocessed.onnx to opset 12.

* Add more dropout tests.

* Fix BiasDropout kernel.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>

* fix null dereference warning (#6437)

* Expose graph ModelPath to TensorRT shared library (#6353)

* Update graph_viewer.cc

* Update tensorrt_execution_provider.cc

* Update graph_viewer.h

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update provider_api.h

* Update provider_bridge_ort.cc

* Update provider_interfaces.h

* Update provider_interfaces.h

* expose GraphViewer ModelPath API to TRT shared lib

* add modelpath to compile

* update

* add model_path to onnx tensorrt parser

* use GenerateMetaDefId to generate unique TRT kernel name

* use GenerateMetaDefId to generate unique TRT engine name

* fix issue

* Update tensorrt_execution_provider.cc

* remove GetVecHash

* Update tensorrt_execution_provider.h

* convert wchar_t to char for tensorrt parser

* update tensorrt parser to include latest changes

* fix issues

* Update tensorrt_execution_provider.cc

* merge trt parser latest change

* add PROVIDER_DISALLOW_ALL(Path)

* add tool for generating test data for longformer (#6415)

* only build experimental api in redist (#6465)

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* Add an option to save the training graph after optimization (#6410)

* expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions`

* Share allocator between CUDA EP & TRT EP. (#6332)

* Share allocator between CUDA EP & TRT EP.
limitation:
1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it
2. Need to have more identifiers to make it able to share CPU allocator across all EPs

* fix max norm clipping test in python packaging pipeline test (#6468)

* fix python packaging pipeline

* make clip norm test compatabile with both V100 and M60 GPUs

* Initial version of CoreML EP (#6392)

* Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460)

* update load library code to have the fullly qualified path

* make it work for syswow32

* git Revert "make it work for syswow32"

This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* dequantize 1st input of lstm back if it is quantized (#6444)

* [java] Adds support for OrtEnvironment thread pools (#6406)

* Updates for Gradle 7.

* Adding support for OrtThreadingOptions into the Java API.

* Fixing a typo in the JNI code.

* Adding a test for the environment's thread pool.

* Fix cuda test, add comment to failure.

* Updating build.gradle

* fix SDL native rule warning #6246 (#6461)

* fix SDL rule (#6464)

* use tickcount64 (#6447)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Update pypi package metadata (#6354)

* Update setup file data

* add missing comma

* remove python 3.5

* fix typo bracket

* Delete nuget extra configs (#6477)

* Op kernel type reduction infrastructure. (#6466)

Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.

* Fixing a leak in OnnxSequences with String keys or values. (#6473)

* Increase the distributes tests pipeline timeout to 120 minutes (#6479)

* [CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)

* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.

* Add ability to track per operator types in reduced build config. (#6428)

* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs

* merge e2e with distributed pipeline (#6443)

merge e2e with distributed pipeline

* Fix test breaks in Windows ingestion pipeline (#6476)

* fix various build breaks with Windows build

* fix runtime errors loading libraries from system32

* add build_inbox check to winml_test_common

* use raw string

* cleanup

* fix dll load

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>

* Speed up the Mac CI runs (#6483)

* expose learningmodelpixelrange property (#5877)

* Fix of support api version bug for [de]quantize (#6492)

* SDL fixes: add proper casts/format specifiers (#6446)

* SDL annotation fixes (#6448)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* [OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)

* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py

* Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325)

Support pad operator in quantization tool.
Support pad operator in quantized nhwc transformer.
Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value.
Add more test cases to cover pad() operator bug fixed here.
Fix quantization tools uint8/int8 value overflow issue when quantize weights in python.

* Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454)

Description: This PR makes two changes identified while looking at a PGAN model.

First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes.

Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer.

Motivation and Context

Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation.

* Update document of transformer optimization (#6487)

* nuphar test to avoid test data download to improve passing rate (#6467)

nuphar test to avoid test data download to improve passing rate

* Fuse cuda conv with activation (#6351)

* optimize cuda conv by fused activation

* remove needless print out

* exclude test from cpu

* handle status error from cudnn 8.x

* add reference to base class

* add hipify

* [CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498)

* Init change

* Move some helper from nnapi ep to shared

* Add transpose support

* Fix trt ci build break

* Refine transformers profiler output (#6502)

* output nodes in the original order; grouped by node name
* add document for profiler

* Update to match new test setup. (#6496)

* Update to match new test setup.

* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.

* Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)

* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD

* enable Equal tests

* enable fast_matrix_reduction test case

* Optimize GatherGrad for AMD GPU (#6381)

* optimize gathergrad

* address comments

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

* add explicit barriers for buffer overread and overrwrite (#6484)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* fix sdl bugs for uninitialized variables and returns (#6450)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* handle hr error conditions (#6449)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Dnnl training (#6045)

* Add ReluGrad and ConvGrad ops for the dnnl provider

* the mnist sample is updated to add the --use_dnnl option that
will cause the sample to use the dnnl execution provider for
nodes that exist in dnnl provider.

* Added the ability to find forward ops. Dnnl backward gradient
ops require the forward primitive description and workspace
from the forward operation.

* Enable specifying the execution provider for Gradient Checker Tests

* Prevent memory leak when running dnnl_provider in training mode

Prevent creating a SubgraphPrimitivePool when the code is built with the
ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly.

The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be
stashed in a map for reuse. Due to the way the Training Loop uses threads
the pool of SubgraphPrimitives were not being reuse instead a new pool of
SubgraphPrimitives being created each run. The old pool was not instantly
freed. This behavior could be a language error when using thread_local
memory.

Signed-off-by: George Nash <george.nash@intel.com>

* Added fixes to maxpoolgrad and memory leak.

Maxpoolgrad will now pass all unit tests.
With the conv and convgrad disabled for dnnl, mnist is able to train till 95%

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Fixed misc issues when testing training code with dnnl provider

* fix conv_grad dnnl tests with dilation to run dnnl execution provider

* update mnist training sample to accept convolution type models

  convolution models require the input shape to be {1, 28, 28}
  instead of the flat {728} image that is used for the gemm models

  this will enable models that require the different shape by adding
 `--model_type conv` to the command line when running the mnist sample.
 (while testing a workaround was used see #4762)

* Disable weight caching in dnnl conv operator when using training

  When training we can not use cached weights because the weight
  will be updated each run. This re-enables dnnl Conv and ConvGrad Ops.
  The weight caching was the source of the error from Conv when training.

* Fix issues found when building grad ops on Linux
  * The dnnl_convgrad code was over using the scope operator
    causing a compilation problem.
  * The dnnl_maxpoolgrad code had a logic error that is was
    comparing with the source description when it should have
    been comparing with the destination despription.

* Update BUILD.md so it shows DNNL for training
  * Updated the table of contents. Since the same providers
    are listed twice. Once for Infrance and again for Training
    an HTML anchor was added to distinguish the second header
    from the first for the TOC.

* Fix build failure when not using --enable-training build option

* reorganize the gradient operators so they are grouped together

* Fix issues found when running onnx_backend_test_series.py

* Pooling code only supports 2 outputs when built with --enable-training

* Address code review feedback
  * class member variables end in underscore_
  * use dst instead of dist to match pattern use elsewhere in DNNL code.

* Remove workaround that was introduced to handle problems running
  convolution based training models. See issue #4762

Signed-off-by: George Nash <george.nash@intel.com>

* Isolate training code and code cleanup

* Do not build if dnnl_gpu_runtime if enable_training is set training code
  does not support dnnl_gpu_runtime yet.
* Isolated Training code inside ifdefs so that they wont affect
  project if built without training enabled
* Inadvertant changes in whitespace were removed to make code review simpler
* Undid some code reordering that was not needed
* comments added to closing #endif statments to simplify reading complex ifdefs
* Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw
  pointer. This matches what was done in Pool code and is safer memory code.

Signed-off-by: George Nash <george.nash@intel.com>

* Address code review issues

- whitespace changes caused by running clang-format on the code
- Several spelling errors fixed
- Removed/changed some ifdefs to improve readability
- other misc. changes in responce to code review.

Signed-off-by: George Nash <george.nash@intel.com>

* Code changes to address code review

- Simplify iteration code using `auto` keyword
- remove C style cast that was not needed
- remove instance variable that was not needed [relugrad.h]
- added the execution providers to `ComputeGradientErrorInternal()`
  and `ComputeTheoreticalJacobianTranspose()` instead of using
  a pointer to an instance varaible [gradient_checker.h/.cc]

Signed-off-by: George Nash <george.nash@intel.com>

* Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function.
This will reduce repeated code.
Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created.
This will prevent memory leak from convolution kernels being pushed constantly onto the stack.
Signed-off-by: chethan.palangotu.keshava@intel.com

* Code clean up and formating updates

 - Removed empty else statment
 - updated indentation of code that was causing double curly brackets to look unususal
 - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code.
 - isolated training code

Signed-off-by: George Nash <george.nash@intel.com>

* Restore inadvertantly removed ConvGrad tests

When combining the DNNL and CPU version of the ConvGrad
tests two test were inadvertantly excluded.  This adds
back the Conv3d and Conv3d with strides test cases.

Signed-off-by: George Nash <george.nash@intel.com>

* Add validation to ConvGrad

This validates the dimensions of the ConvGrad match the
passed in Convolution forward primitive description.

The current code for DNNL ConvGrad makes the assumption that the ConvGrad
nodes will be visited in the reverse order from the corresponding Conv nodes

The added validation will return an error if this assumption is not true.

Signed-off-by: George Nash <george.nash@intel.com>

* Do not create new execution providers in provider_test_utils

This removes the code that generated new execution providers in the
OpTester::Run function. This was added because the std::move was
leaving the `entry` value empty so subsequent calls would cause a
segfault.

Problem is this potentially changed the execution_provider because it
would create the default provider dropping any custom arguments.

When the now removed code was originally added the std::move was causing
crashes when the GradientChecker unit tests were run.  However, it is no
longer causing problems even with the code removed.

Signed-off-by: George Nash <george.nash@intel.com>

* Change the forward conv stack to a forward conv map

This changes how the forward conv kernel is mapped to the bwd ConvGrad
kernel the problematic stack is no longer used.

The convolution stack made the assumption that the corresponding
ConvGrad operator would be visited in reverse order of the forward
Conv operators.  This was always problematic and was unlikely to
work for inception models.

Important changes:
- The weight_name is added to the ConvGrad dnnl_node making it
  possible to use the weight_name as a lookup key to find the
  Conv forward Kernel
- the `std::vector fwd_conv_stack_` has been replaced with a
  `std::map fwd_conv_kernel_map_`
- Although it is not needed lock_guards were added when writing
  to and reading from the fwd_conv_kernel_map_ as well as the
  fwd_kernel_map_. These should always be accessed by a single
  thread when preparing the dnnl subgraphs so the guard should not
  be needed but its added just in case.
- Updated the comments ConvGrad.h code to no longer mention the
  stack. The error check is not removed. It will be good to verify
  there are no errors as we continue to test against more models.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>

* Lochi/refactor yolov3 quantization (#6290)

* Refactor the code and move data reader, preprocessing, evaluation to
E2E_example_mode

* Refactor the code.

Move data reader, preprocessing, evaluation to model specific example
under E2E_example_mode

* refactor code

* Move yolov3 example to specific folder and add additional pre/post
processing

* Print a warning message for using newer c_api header on old binary (#6507)

* Fix issues with ArmNN build setup (#6495)

* ArmNN build fixes
* Update BUILD.md to document that the ACL paths must be specified to build ArmNN
* Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that.

* Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518)

* Update onnxruntime_test_python.py to work with numpy 1.20.

Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that.

* Fix another test script

* Fix ORTModule branch for orttraining-* pipelines

* Update pytorch nightly version dependency

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: George Wu <jywu@microsoft.com>
Co-authored-by: Cecilia Liu <ziyue.liu7@gmail.com>
Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com>
Co-authored-by: George Nash <george.nash@intel.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: Juliana Franco <jufranc@microsoft.com>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
Co-authored-by: Tixxx <tix@microsoft.com>
Co-authored-by: Jay Rodge <jayrodge@live.com>
Co-authored-by: Du Li <duli1@microsoft.com>
Co-authored-by: Du Li <duli@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>
Co-authored-by: baijumeswani <bmeswani@microsoft.com>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com>
Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Weixing Zhang <weixingzhang@users.noreply.github.com>
Co-authored-by: Suffian Khan <sukha@microsoft.com>
Co-authored-by: Olivia Jain <oljain@microsoft.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Ryan Lai <rylai@microsoft.com>
Co-authored-by: Jesse Benson <jesseb@microsoft.com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin@vols.utk.edu>
Co-authored-by: Michael Giba <michaelgiba@gmail.com>
Co-authored-by: William Tambellini <wtambellini@sdl.com>
Co-authored-by: Hector Li <hecli@microsoft.com>
Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: liqunfu <liqfu@microsoft.com>
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: pengwa <pengwa@microsoft.com>
Co-authored-by: Tang, Cheng <souptc@gmail.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
Co-authored-by: Vincent Wang <wangwchpku@outlook.com>
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Luyao Ren <375833274@qq.com>
Co-authored-by: Zhang Lei <zhang.huanning@hotmail.com>
Co-authored-by: Tim Harris <tiharr@microsoft.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Alberto Magni <49027342+alberto-magni@users.noreply.github.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: wezuo <49965641+wezuo@users.noreply.github.com>
Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
Co-authored-by: Martin Man <supermt@gmail.com>
Co-authored-by: M. Zeeshan Siddiqui <mzs@microsoft.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sheil Kumar <smk2007@gmail.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Ryota Tomioka <ryoto@microsoft.com>
Co-authored-by: Adam Pocock <adam.pocock@oracle.com>
Co-authored-by: Yulong Wang <f.s@qq.com>
Co-authored-by: Faith Xu <faxu@microsoft.com>
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
2021-02-02 08:59:56 -08:00
sfatimar
8168c91978
Sahar/fix documentation shared lib (#5926)
* Update OpenVINO-ExecutionProvider.Md

update openvino-executionprovider.md for shared library

* Update Build.md

updated --build_shared_lib flag for building openvino shared provider lib

* Update Dockerfile.openvino 

building for shared library with the new changes for openvino shared lib

* Revert "Update Build.md"

This reverts commit c9cf5fee76be7fdc10cadf07259f1d4ed5b45b93.

* Revert "Update Dockerfile.openvino "

This reverts commit e1624e4f93a4cfb425b6f21d7fb71b299a146740.

* Update OpenVINO-ExecutionProvider.md

fix documentation to the shared library

Co-authored-by: sfatimar <sahar.fatima@intel/com>
2020-11-25 08:50:01 -08:00
stevenlix
1068f3eb87
Use flatbuffers for INT8 calibration table (de)serialization in TensorRT EP (#5873)
* add int8

* support both native TRT cal table and ORT cal table

* add more comments

* Update env variable name and check platform availability for int8/fp16

* add backward compatibility on old env var ORT_TENSORRT_ENGINE_CACHE_PATH and switch to flatbuffers for ort cal table deserialization
2020-11-19 21:41:12 -08:00
stevenlix
dfea92925c
Add calibration based INT8 quantization to TensorRT EP (#5842)
* add int8

* support both native TRT cal table and ORT cal table

* add more comments

* Update env variable name and check platform availability for int8/fp16
2020-11-19 17:10:49 -08:00
S. Manohar Karlapalem
ff58f621fa
Remove nGraph Execution Provider (#5858)
* Remove nGraph Execution Provider

Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice

**Deprecation Notice**

| | |
| --- | --- |
| Deprecation Begins	| June 1, 2020 |
| Removal Date |	December 1, 2020 |

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.

* Remove nGraph Licence info from ThirdPartyNotices.txt

* Use simple Test.Run() for tests without EP exclusions

To be consistent with rest of test code.

* Remove nGraph EP functions from Java code
2020-11-19 16:47:55 -08:00
Justin Stoecker
bd236ecc26
Switch to unified DirectML 1.4.0 redistributable (#5794)
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
2020-11-17 13:42:23 -08:00
stevenlix
54de618c2e
Improve TensorRT engine caching (#5737)
* add profile caching to improve engine caching feature

* Add comments

* fix typo

* add decryption for engine caching

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* set opt profile to max value of the range

* add hash to engine/profile name

* Add calibration based INT8 quantization

* add an option to enable both FP16 and INT8

* Update tensorrt_execution_provider.cc

* add env variable to specify calibration file name

* clean up code

* Add comments and update TRT document

* enable tensorrt basic test and add EngineCachingTest

* clean up

* update envrionment variable in the test

* clean up
2020-11-12 08:56:45 -08:00
Maajid khan
a84a058f9e
[OpenVINO-EP] Enabling Multi Device support (#5740)
* Enabling Multi Device support for UEP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor fix added
*Added a simple fix to determine OpenVINO
version for Arm build as well

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
2020-11-11 15:16:30 -08:00
Johannes Bannhofer
6f6dd0b869
added missing flag ORT_TENSORRT_DUMP_SUBGRAPHS (#5724)
[DOCUMENTATION]
added descriptionof the function ORT_TENSORRT_DUMP_SUBGRAPHS to the documentation
2020-11-06 12:32:12 -08:00
Maajid khan
d98062da0c
[OpenVINO-EP] Hetero support (#5627)
* Implement Hetero in UEP
* Added security checks to take valid Hetero combinations
  as device type
* Integrating Hetero features
* Get the statistics Report in Debug Mode

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Passing right device type for vadm_baackend

Added simple fix to pick the right device type
when using vadm_backend with Hetero as well.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed batching logic for 2020.4 and above

* Fixed flake8 PEP8 errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor Fixes Added
*Added security checks for device_type passed
in for Hetero build during run time
*code cleanup

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor changes Added
*Fixed batch_size bug in vadm_backend
*code cleanup
*Documentation updated for Hetero

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
2020-10-30 22:35:08 -07:00
Maajid khan
ddf83d1ace
Maajid/multi threading 2 (#5568)
* Enabled multi-threading for OpenVino EP

->Enabled support for concurrent_session_runs

*Run UEP using concurrent_session_runs > 1
*Enabled support for ORT_PARALLEL ExecutionMode

->Documentation Added for Enabling MultiThreading

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Minor Fixes added
*Configure the value of nireq during Runtime
*Documentation typos rectified and details
added for Multi_Threaded Inference

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Some checks added for this fix
*Added checks to invalidate wrong nireq value
and assigned it to default value of 8
*Added new config options for enable_vpu_fast_compile
which were changed w.r.t OpenVINO_2021.1 Release

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
2020-10-27 14:48:12 -07:00
Olivia Jain
1e4b259d28
Updating EP docs with Onnxruntime API calls (#5503)
* updating examples with current api calls

* Fixing capitalization in api calls, adding RKNPU update

* Correcting nuphar and rknpu ep api calls

* Include creating session in readme
2020-10-19 12:21:21 -07:00
sfatimar
6d2a30eae3
[OPENVINO-EP] 2021.1 Release (#5431)
* Cmake changes for 2021.1

* added new ov version 2020.1 for faster rcnn

* Added missing defs

* equal op modified

* changes to incoroporate faster rcnn

* backend util.cc

* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp

* changing myriad precision bool to i32

* gather is not enabled for gpu

* conv2D and pooltest auto_pad attribute should not be null

* negative indices are not valid for scatter op in myriad

* non max suppression op only supported in faster rcnn mode

* maxpool indices output is not supported

* Cleaned redundant code in backends

* Added ifdefs for HDDL config

* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify

* we are limiting the subgraph size to 3 here

* taking care of review comments

* Fixed minor bugs

* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph

* incorporated upsample conditions too

* Dockerfile changes for 2021.1 release

* dockerfile aptkey update

* Minor fixes

* ceil condition added  again

* Fixed few gpu models

* Disabled LSTM and yolov3 in ModelTests

* python softmax cross entropy tests and negative log likelihood

* Update Build.md

Updated for openvino 2021.1

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider for 2021.1

* Update READMe.md

updated new openvino version

* Update Dockerfile.openvino 

added environment variable for DEBIAN Frontend

* Fixed myriad models

* Fixed gather condition
* Fixed mask rcnn model on myriad

* Modified Gather condition

* set default target of MCR dockerfile to MYRIAD_FP16

* Fixed tinyolov3 on CPU

* Update OpenVINO-ExecutionProvider.md

update openvino execution provider documentation

* Update Dockerfile.openvino

Removed environment variable

* Update OpenVINO-ExecutionProvider.md

update image manipulation networks supported

* Update onnx_backend_test_series_filters.jsonc

removed test_upsample_nearest from cpu test cases

* New InternalCI changes for 2021.1

* Full protobuf removed for OpenVINO

* Protobuf added

* Updated with apt installation for openvino

* Revert the testing changes

* Reverted testing changes

* File permessions are changed to original

* Deleted openvino installation and cmake change

* Optimized Dockerfile

Removed unnecessary cmake installation, numpy

* Added missing ifdefs

* delete array fix

* backend_utils.cc output_shape

* Revert "set default target of MCR dockerfile to MYRIAD_FP16"

This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.

Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
2020-10-14 15:56:00 -07:00
manashgoswami
b5caa7cb12
Updated docs: Execution Provider overview (#5328)
* Update ReleaseManagement.md

* Create ONNX_Runtime_Execution_Providers.md

* Create ONNX_Runtime_EP3.png

* Create ONNX_Runtime_EP2.png

* Create ONNX_Runtime_EP1.png

* Delete ONNX_Runtime_Execution_Providers.md

* Create README.md

* Update README.md

* commit

* Updated in error.
Revert "Update ReleaseManagement.md"

This reverts commit 8530bd5fd46aebce3a6d6055d8952ae4f6458c4e.

* Create ONNX_Runtime_Execution_Providers.md

* Create ONNX_Runtime_EP3.png

* Create ONNX_Runtime_EP2.png

* Create ONNX_Runtime_EP1.png

* Delete ONNX_Runtime_Execution_Providers.md

* Create README.md

* Update README.md

* commit

* Updated in error.
Revert "Update ReleaseManagement.md"

This reverts commit 8530bd5fd46aebce3a6d6055d8952ae4f6458c4e.

* Update ReleaseManagement.md

* Update .gitignore

* Update README.md

* Update README.md
2020-10-06 15:01:25 -07:00
Dwayne Robinson
6ad39819c2
Update DirectML Nuget to 1.3.0 (#5274)
Update to 1.3.0
2020-09-23 22:53:02 -07:00
George Wu
3147bc00c3
update TensorRT docs (#5238)
* doc updates TensorRT

* update

* update

* fix warning

* newline

* format
2020-09-21 15:24:20 -07:00
KeDengMS
ce3b67e0cd
[Python] Move symbolic_shape_infer from nuphar to tools (#5162)
* [Python] Move symbolic shape inference from nuphar to tools

* Fix PEP8 ERROR
2020-09-18 09:31:06 -07:00
S. Manohar Karlapalem
584638e5d3
Corrects doc typos and formatting (#5201) 2020-09-17 01:25:19 -07:00
S. Manohar Karlapalem
f7edf0aa57
[OpenVINO-EP] Enable EP config options for VPU hardware (#5119)
* Added config flags for VPU Fast Recompile

* clean-up ifdefs

* Add VPU Fast compile config option

Adds an option that enables Fast compilation of models to VPU
hardware specific format.

* Add config option to choose specific device id for inference

Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.

* Add Python API to list available device IDs

* code cleanup

* Add second C/C++ API with settings string parameter

Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.

* Append 'Ex' to the extended C/C++ API

* Use set_providers Py API to set config options.

Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.

* avoid globals for py config options where possible

Co-authored-by: intel <you@example.com>
2020-09-14 15:46:14 -07:00
Ashwini Khade
cd56ab197c
csharp build documentation (#5121) 2020-09-11 11:46:10 -07:00
suryasidd
3a00b50cf8
[OpenVINO-EP] Updating OpenVINO EP to 2020.4 (#4836)
* Removed building ngraph from source

* Disabled some tests temporarily

* Enabled softmax for all dims

* Added onnx importer to link libraries

* int64 changes

* fixed

* temp

* slice update start and end need to be initializer

* Disabled GatherND, ScatterND, ReverseSequence operators

* Added supported ops instead of unsupported ops

* Set precision only for CPU

* Removed some unecessary conditions

* Fixed segfault in slice

* Softmax restriction removed

* changes

* Setting precision for all plugins

* Changes added to include precision
and supported ops for gpu and vpu

* branch op support

* checking for disabled python test failure

* mapped input names and tensors directly rather than copying which was leading to mismatch

* last index is not supported
mkldnn does not support pow between integers

* included the code changes

* Rename inner-scoped variable to avoid MSVC warning

* applied changed to vadm as well and removed the utility function
getinputtensors() completely

* OpenVINO multi version support: CMake changes

* OpenVINO multi version support: C++ support

* removed commented code

* Remove redundant code lines

* Revert "Rename inner-scoped variable to avoid MSVC warning"

This reverts commit 2f650493162675bc6fb70730de9656ec400be332.
Merged separately in master.

* vadm changes disabled reduction op test

* putting test_gather_negative_indices in unsupported list for now

* Update MCR Dockerfile with 2020.4

Installs OpenVINO 2020.4 from deb packages via APT tool.

* Update build docs with 2020.4 info

* Update dockerfile with OV 2020.4 info

Instructions for building OpenVINO based docker image no longer require
downloading installer package as it is installed by the dockerfile
using OpenVINO 2020.4 APT package for Ubuntu 18.04

* Added constant folding bypass logic

* Added cout statements for ci

* Added NDEBUG flag for debug symbols

* Update Ops info in docs

* fixes multiple unit tests

* mathoptest.ceil disabled for gpu and myriad

* activation test temp disabled

* Fix models for CPU

* Fixed a syntax error

* local cmmit

* fixing unit tests for myriad

* Fixed Variadic Split, Topk issues

* fix_model commit

* Fix models in myriad

* Added ifdefs for OpenVINO 2020.4

* temp

* made some changes to not operator

* Added unused parameter

* relu enabled

* Fixed bug in Conv output

* Consolidated GPU failing tests into one category

* Made it compatible to InternalCI 2020.4

* Made changes for ngraph

* Disabled test for mask,fastercnn,tinyyolov3

* Removed proxy for ci

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* run_dockerbuild.sh restored to same version

* Updated documentation for 2020.4

* Removed FP32 to FP16 transformation for GPU

* Disabled Coreml-FNS-Candy model test

* Added FP16 transformations

Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: intel <you@example.com>
Co-authored-by: gundaarx <aravindx.gunda@intel.com>
2020-08-19 23:18:08 -07:00
Hariharan Seshadri
c878ecbbe0
Sahar/csharp support openvino (refined) (#4835)
* Sahar/csharp support openvino (#4703)

* Temp changes and include openvino to ensure nuget package is created with linux till we configure azure ci pipeline

* string id change

* native nuget indentation changes

* documentation changes

* Update Openvino_execution_provider.md

Documentation includes openvino execution provider

* Update OpenVino-ExecutionProvider.md

update details to build csharp api for openvino execution provider .

* vadm backend revert

* Update Openvino-Execution-Provider.md

updated for review comments

* Update OpenVino-Execution-Provider.md

* Update OpenVINO-ExecutionProvider.md

* nuget package custome support for openvino
change in native nuget spec python script for including linux runtime

* change to make path to boolean flag

* removed the tab

* Update OpenVINO-ExecutionProvider.md

updated for review comments

* chnages to include pep8 warnings
modification to documentation

Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>

* Changes to include csharp support for openvino

* Fix flake error

* Fix

Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
2020-08-17 21:52:17 -07:00
George Wu
94a6f50af6 Revert "Sahar/csharp support openvino (#4703)"
This reverts commit 0a0ac70eec.
2020-08-17 10:05:21 -07:00
sfatimar
0a0ac70eec
Sahar/csharp support openvino (#4703)
* Temp changes and include openvino to ensure nuget package is created with linux till we configure azure ci pipeline

* string id change

* native nuget indentation changes

* documentation changes

* Update Openvino_execution_provider.md

Documentation includes openvino execution provider

* Update OpenVino-ExecutionProvider.md

update details to build csharp api for openvino execution provider .

* vadm backend revert

* Update Openvino-Execution-Provider.md

updated for review comments

* Update OpenVino-Execution-Provider.md

* Update OpenVINO-ExecutionProvider.md

* nuget package custome support for openvino
change in native nuget spec python script for including linux runtime

* change to make path to boolean flag

* removed the tab

* Update OpenVINO-ExecutionProvider.md

updated for review comments

* chnages to include pep8 warnings
modification to documentation

Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
2020-08-16 17:07:26 -07:00
stevenlix
77c69a0325
Upgrade TensorRT to v7.1.3.4 (#4704)
* upgrade to TensorRT 7.1.3.4

* Upgrade onnx-tensorrt parser for TensorRT 7.1.3.4

* fix format issue

* fix format issue

* fix format issue

* Update tensorrt_execution_provider.cc

* change cmake version to 3.14

* Remove --msvc_toolset 14.16

* change to onnxruntime::make_unique

* use onnxruntime::make_unique

* disable some tests for TensorRT

* disable some tests for TensorRT

* Update upsample_op_test.cc

* Update tile_op_test.cc

* disable some tests for TensorRT

* Update constant_of_shape_test.cc

* update parser

* Update Dockerfile.ubuntu_tensorrt
2020-08-07 17:43:56 -07:00
gwang-msft
c2ec3b734b
[Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576)
* remove dependency of external jd-dnnlibrary

* remove extra variables not used any more

* update /cgmanifest.json
2020-07-22 14:08:12 -07:00
stevenlix
0ebe2fab51
Refactor TensorRT EP code to better handle dynamic shape subgraphs (#4504)
* build engine in runtime for dynamic shape subgraphs

* Update TensorRT-ExecutionProvider.md

* Update TensorRT-ExecutionProvider.md

* fix build issue

* Add more instructions on how to use engine caching

* add precision to trt node name

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc
2020-07-15 02:35:42 -07:00
S. Manohar Karlapalem
ceedf126a2
[nGraph] Deprecation notice for nGraph EP (#4344) 2020-06-26 01:15:34 -07:00
Shucai Xiao
bfc888613f
Migraphx improvements (#4328)
* Add amd migraphx execution provider to onnx runtime

* rename MiGraphX to MIGraphX

* add migraphx EP to tests

* support multiple program output

* disable more tests

* backup changes related to program multiple outputs

* remove logging code

* remove unnecessary changes in migraphx_execution_provider.cc

* add migraphx EP to tests

* add input requests of the batchnorm operator

* add to support an onnx operator PRelu

* update migrapx dockerfile and removed one unused line

* chagnes related to support dynamic input shape

* fix build error

* code backup

* code backup

* version that has 106 models run correctly

* code backup

* code backup

* remove unnecessary print info

* code backup

* code backup

* code backup

* code backup

* code backup

* code backup

* changes corresponding to migraphx change

* fix merge conflict

* minor code cleanup

* code cleanup

* remove unnecessary code

* remove unnecessary code

* add to support more constant folding analysis

* more constant folding checking for shape input

* add env var to control whether fp16 is enabled. Modify docker file to use ROCM3.3

* fix function name to avoid build error

* add build and execution instruction for migraphx execution provider

* added more build instructions

* fixed a small format error

* a minor change

* fix review comments

* another minor change

* additional refinement of the documents

* additional changes

* remove unnecessary changes in the dockfile

* additional changes for the dockerfile

* code change backup

* fix errors related to a few unit tests

* fix a build error related to api change

* fix unit test errors by either disabling the test or fix related isssues

* remove unnecessary log info

* sync submodule tvm with master

* remove unnecessary changes

* remove an unnecessary code line

* refine documents for addition example
2020-06-25 19:22:57 -07:00
jornt-xilinx
c55f6d76be
[Vitis-AI EP] Fix to enable multi-output subgraphs inside Vitis-AI EP + edit docs (#4171) 2020-06-13 04:56:07 -07:00
Andrews548
62b44527e5
Add ArmNN Execution Provider (#3714)
* Add ArmNN Execution Provider

Add a new execution provider targeting Arm architecture based on ArmNN.
Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models.

reviewed-by: mike.caraman@nxp.com

* Minor fixes

- renamed onnxruntime_ARMNN_RELU_USECPU to onnxruntime_ARMNN_RELU_USE_CPU
- fixed acl typo

* remove extra includes. added exception for ArmNN in test

* fix indentation

* Separated the activation implementation from the cpu and fixed the blockage from the endif

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-06-03 22:57:51 +05:30
Dwayne Robinson
51d78bc5e6
Fix DML EP doc link to C API (#4105)
Path used "\" instead of "/".
2020-06-01 16:49:17 -07:00
edelaye
64b5f7edf6
Initial release of Vitis-AI Execution Provider (#3771)
* Initial release of Vitis-AI Execution Provider

* Add documentation, fix for onnxruntime::Model changes and use stringstream instead of file dump for model passing

* - Add Vitis-AI docker file
- Add online quantization flow Vitis-AI execution provider
- Fix remarks

* - Add fatal error build message for Vitis-AI cmake build on Windows
- Fix pep8 issue in build.py
- Add Vitis-AI execution provider example in docs

Co-authored-by: Elliott Delaye <elliott@xilinx.com>
Co-authored-by: Jorn Tuyls <jornt@xilinx.com>
Co-authored-by: Jorn Tuyls <jtuyls@users.noreply.github.com>
2020-05-19 05:32:32 -07:00
Faith Xu
b8a255e1b5
Doc Updates for Build (#3976)
* Initial update of readme

* Readme updates

* Review of consolidated README (#3930)

* Proposed updates for readme (#3953)

I found some of the information was duplicated within the doc, so attempted to streamline

* Fix links

* More updates

- fix build instructions
- nodejs doc reorganization
- roadmap update
- version fixes

* Update ORT Server build instructions

* More doc cleanup

* fix python dev notes name

* Update nodejs and some links

* sync eigen version back to master

* Minor fixes

* add nodsjs to sample table of content

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* address PR feedback

* address PR feedback

* nodejs build instruction

* Update Java instructions to include gradle

* Roadmap refresh

Reformat some data, fix link, minor rewording

* Clarify Visual C++ runtime req

Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com>
Co-authored-by: Prasanth Pulavarthi <prasantp@microsoft.com>
Co-authored-by: manashgoswami <magoswam@microsoft.com>
2020-05-18 20:08:36 -07:00
Jeff Bloomfield
e6da5946d1
Update DML Nuget version and DML EP Doc (#3945)
Update DML Nuget version and DML EP Doc
2020-05-14 17:33:46 -07:00
airockchip
edaf8a542c
Initial PR for RKNPU execution provider (#3609)
* Initial RKNPU execution provider

    * Init

    * Support Ops:
        Conv, Relu, Clip, LeakyRelu,
        MaxPool, AveragePool, GlobalAveragePool,
        Concat, Softmax, BatchNormalization, Gemm,
        Add, Mul, Sub,
        Reshape, Squeeze, Unsqueeze,
        Flatten, Transpose,
        QLinearConv, DequantizeLinear

    * Add rknpu unittest

    * Update BUILD.md and Add RKNPU-ExecutionProvider.md

* misc code update

* fix CLIP accuracy issue.

* fix "Error: Duplicate definition of name".

* move rknpu_ddk out of onnxruntime submodule.

* remove temporary code.

* add rknpu namespace.

* update misc of node_attr_helper

* add const & comment for onnx_converter

* add const & comment for shaper

* unify variable name

Co-authored-by: dkm <dkm@rock-chips.com>
Co-authored-by: George Wu <jywu@microsoft.com>
2020-05-05 20:36:47 -07:00
Jeff Bloomfield
d5b2cd7493
Add performance best practices to DML EP doc (#2859)
* Add performance best practices to DML EP doc


Co-authored-by: Jeff <38966965+jeffbloo@users.noreply.github.com>
2020-05-02 09:53:33 -07:00
suryasidd
e529464a12
Limit the number of models run on OpenVINO (#3742)
* Removed NMS from supported list
2020-04-29 02:23:09 -07:00
S. Manohar Karlapalem
6d4f2f5bf9
OpenVINO EP v2.0 (#3585)
* Added FP16 transformations

* Revert "Added CMAKE_BUILD_TYPE to make building dynamic"

This reverts commit d3e17af1af655cfdc4d2fec33f52055caa525e85.

* Added FP16 transformations for FP16 builds

* Backend logic cleanup

Cleans the backend(intel_graph.*) code in the following ways:-

1. Minimize global usage: Since all the IR graphs need to be
re-generated on every Infer, it is bad practice to rely on globals
for their saving and usage as there would be multiple readers and
writers to the same global variable leading to incorrect usages or
contentions. This change replaces globals with locals where possible.
 This change also fixes an existing bug with due to
incorrect global usage.

2. Remove all unused functions.

3. Remove all unused headers and prepocessor directives.

* removed commented out code

* Disabled default optimization for Intel EP

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fix missed plugins.xml for python bindings

* Fixed the build after latest master changes

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled unsupported ops for accelerators

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added some more disabled ops

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added environment variable to enable debugging

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added more debug statements

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed unsupported ops list for GPU and VPU

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Fixed unsqueeze unit tests

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Added error message to the status

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Overwrite Model proto with shape info from data

Overwrites the shape info of Model proto with the shape from
actual input data. Needed for inferring models with Dynamic
shapes.

* Removed print statement and disabled where op

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

* Disabled Reshape with Empty initializer

* Added more debug statements for 1P

* Don't allow 1D inputs with symbol for dimension

* Disabled some 3rd phase ops

* Disabled split and added zero dimension check for OutputDefs

* Cleanup zero dimensionality check

* Added different data type check for inputs and initializers

* Added conditions for Mod, Cast and Pad

* Removed unused variable

* Disabled scan and added conditions for squeeze

* Added changes for fixing all C++ unit tests

* Implements Backend Manager class for caching

Backend Manager provides a layer of indirection between EP interface
and OV backend that provides caching services for models with
symbolic dims in input shapes.

* clean up commented blocks

* clang-formatting

* Read I/O type info from ModleProto

Read the tensor element type information from ModelProto object,
as FusedNode is no longer available.

* code cleanup

* clang-formatting

* Added print statement for jenkins

* Disabled some python tests

* Changed the path of convert fp32 to fp16 hpp

* Added conditions for BatchNorm in GetCapability

* Fixed failed tests

* Revert "Added conditions for BatchNorm in GetCapability"

This reverts commit c3c28c3b00d27892c42546b35dacdd807a48ee90.

* Added Intel to onnxruntime backends

* pick up vars set by OV package setupvars.sh

* Added conditions for Identity

* remove a few cout prints

* Added conditions for GPU_FP32 unit tests

* Revert "pick up vars set by OV package setupvars.sh"

This reverts commit 8199e029c03eae21a1a7ef6bfdc93d00e5d0198b.

* Commented out fatal message for protobuf

* Might need to be removed

* Add interface class for current backend

* moved common logic to base class

* simplified cpu backend

* Removed unused headers

* use vectors to save i/o tensors for windows compatibility

* move utils fxns to backend_utils namespace

* rename ov_backend to ibackend

* Factory pattern for backend creation

* rename CPU backend to Basic backend

* renamed to vad-M and added to factory list

* Added conditions for VPU

* Added print statements

* Changed the logic for checking for symbolic shapes

* Modified logic for zero dimension check

* Removed VPU single dimension condition

* Removed comments

* Modified logic in DimensionCheck method

* Remove legacy OpenVINO EP

Remove all the legacy code for OpenVINO EP. UEP code will take its
place going forward.

This change does NOT remove OVEP files in the following areas asa
they will be reused by UEP:-
1. Documentation: All .md files
2. Docker releated files
3. Python bindings
4. Java bindings
5. C# bindings
6. ORT Server
7. CI pipeline setup files

* Rename Intel EP to OpenVINO EP

* Added unique names to the subgraphs

* Removed subgraphs with only constant inputs

* Modified subgraph partitioning algorithm to remove const input subgraphs

* Apply suggestion to onnxruntime/core/providers/openvino/openvino_execution_provider.cc

* Tracking output names to fix the output order bug

* Changed output names to a unordered map

* Modified logic to check for symbolic input shapes

* Fixed a bug in Reshape check

* Added empty model path to Model constructor

* Made necessary changes to cmake to build from the binary package

* Changed INTEL_CVSDK_DIR to INTEL_OPENVINO_DIR

* Enable dyn device selection with C++ API

* Added Round operator to unsupported list

* Modified subgraph partition logic for MYRIAD

* Removed supported ops from the list

* Enable dyn dev selection in Py API's

* Add documentation for dynamic device selection

* Use MYRIAD || HDDL instead of VPU

* Removed temporary cast of Int64 to FP32

* Disabled unit Tests for CPU_FP32 and GPU_FP32

* Removed default "CPU" from unit tests to allow overriding

* Removed ops Concat, Squeeze, Unsqueeze from unsupported list

* Get the device id from info

* Removed overwriting device_id and precision

* Enabled ConvTranspose and EyeLike

* Reordered unsupported ops in alphabetical order

* Fixed syntax error

* Fixed syntax error

* Code clean-up: Handle exceptions, logs and formatting

Code formatted according to ORT coding guidelines.

* remove debug print from pybind code

* updated docs with ops and models

* formatting prints

* Added default values for c and j for openvino

* Overriding the values set for c and j to be 1
* BACKEND_OPENVINO should be empty if openvino is not in build

* Overriding c value with default for perftest

* fix VAD-M device string bug

* Add IE error details to exceptions

* Use IE specific device names in EP

* Add VAD-F (FPGA) device support

* Removed unecessary libraries from whl package

* Code changes for Windows compatibility

* Add VAD-F option to python API

* [revert before merge] cmake changes for RC

* Enable Windows build in CMake

* Unset macro OPTIONAL for windows builds

inference_engine.hpp's include chain defines a macro 'OPTIONAL'
which conflicts with onnx project's headers when using MSVC. So
would need to explictly unset it for MSVC.

* Use a single copy of plugin/IE::Core

Defined as a static member in Backend manager

* Remove restriction of single subgraphs for  myriad

* Passed subgraph name to Backend to enhance log statements

* Disabled zero dimension conditions

* Disabled concat to remove zero dims

* Enabled building ngraph as part of ORT

* Removed serializing and added versioning

* Fix CPU_FP32 unit tests

* Removed unecessary condition

* add ngraph.so.0.0 to .whl

* Check for zero dimensions only for inputs and outputs

* Restrict loading only 10 subgraphs on myriad

* Build ngraph.dll within UEP. Doesn't link yet

* Rename Linux included libngraph.so to libovep_ngraph.so

Renames locally built libngraph.so containing ONNX importer to
libovep_ngraph.so in order to avoid linkage conflicts with
libngraph.so supplied by OpenVINO binary installer.
Applies only for Linux builds.

* use output_name cmake properties for lib name

* fix .so name format in lib_name.patch

* CMake code cleanup

* Rename WIN32 included ngraph.dll to ovep_ngraph.dll

To avoid conflict with ngraph.dll distributed by openvino.

* Added myriad config for networks without 4 dimensions

* Loading the 10 max clusters for inference on myriad

* Refactor code and add Batching support

Encapsulate subgraph settings into context structs.

Add batching support for completely supported models.

* Disabled some broken tests

* use input_indexes to avoid batch-checking initializers

* Avoid static initialization order error on WOS

* Added candy to broken tests

* InternalCI changes for 2020.2

* Updated DLDT instructions

* Unsaved changed in install_openvino.sh

* Changes after manual check

* Remove custom ngraph onnx_import build for WOS

ONNX Importer on WOS does not have protobuf issue.

* Remove FP32ToFP16 ngraph pass

This conversion is performed implicitly within IE.

* Surround debug logic by #ifndef NDEBUG

* remove invalid TODO comments

* removed references to ngrpah-ep

* clang-formatting

* remove commented code

* comment edits

* updating copyright year to that of first OpenVINO-EP release

* remove redundant log msg

* Modified operator and topology support

* Update build instructions

* doc formatting

* Fixed clip unit tests

* Revert "Remove FP32ToFP16 ngraph pass"

This reverts commit ec962ca5f315a5658ad980e740196f19de2639c1.

* Applying FP16 transformation only for GPU FP16

* Fixed GPU FP32 python tests

* automatically use full protobuf

* disable onnxrt server for now

* Disabled upsample

* update dockerfile instructions

* Removed MO paths and added ngraph path

* Remove OVEP from ORT Server docs

Will put it back in after validation

* Updated path to Ngraph lib

* Disabled Resize and some other python tests

* Removed unnecesary header files

* Use commit SHA to fetch ngraph repo

* Avoid un-needed file changes due to version update

* Fixed clip tests

* Fixed Pow, max and min onnx tests

* build.md doc typo

* Update cmake patch command for ngraph src

* remove dead cmake code for onnxruntime_USE_OPENVINO_BINARY

* use spaces instead of tab

* remove commented code

* Add info about protobuf version

* edit debug env var and enable for WIN32

* specify only version tag of 2020.2 for dockerbuilds

* remove unnecessary file changes

* Pass empty string as default argument to C# tests

* Use ${OPENVINO_VERSION} to name openvino install directory in CI builds

* Enabled unnecessarily disabled tests

* Fixed ngraph protobuf patch

* Fixed error in protobuf patch

* Revert "Use ${OPENVINO_VERSION} to name openvino install directory in CI builds"

This reverts commit 89e72adb8bf3b9712f5c81c5e13fe68c6c0df002.

* Remove unsetting OPTIONAL macro

This is no longer used in recent ONNX update onnx/onnx@da13be2,
so this unset workaround is no longer necessary.

* Use a null string  default argument for C# API

* Set OpenVINO version yml files and pass to CI Docker builds

Git Tag info for DLDT as well as install directory are set
using this value.

This reverts commit 9fa9c20348ed72ae360a95c98e9b074d2f9fafc5.

* Documentation: recommendation and instructions for disabling ORT graph optimizations

* more doc updates

* Reduced the number of models according to CI time constraints

Co-authored-by: ynimmaga <yamini.nimmagadda@intel.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: Mikhail Treskin <mikhail.treskin@intel.com>
Co-authored-by: mbencer <mateusz.bencer@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
2020-04-24 04:06:02 -07:00
stevenlix
2332a93db0
Update onnx-tensorrt parser (#3369)
* sync onnx-tensorrt parser and update TensorRT doc

* remove --msvc_toolset 14.16 in tensorrt ci pipeline
2020-03-30 20:31:59 -07:00
Pranav Sharma
435f014d71
Add support for sessions to share a global threadpool. (#3177)
* Add support for sessions to share a global threadpool.

* Fix build issues

* Add tests, fix build issues.

* Added some documentation

* Fix centos issue when threadpools become nullptr due to 1 core.

* Fix mac and x86 build issues

* Address some PR comments

* Disabled test for android, added few more tests and addressed more PR comments.

* const_cast
2020-03-18 15:42:46 -07:00
stevenlix
f4a5d17294
Upgrade to CUDA10.2 for TensorRT (#3084)
* Switch to CUDA10.2

* Update win-gpu-tensorrt-ci-pipeline.yml

* Update win-gpu-tensorrt-ci-pipeline.yml

* remove dynamic_shape

* update onnx-tensorrt submodule

* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests

* fix format issue

* add shape inference instruction for TensorRT

* update according to the reviews

* Update win-gpu-tensorrt-ci-pipeline.yml
2020-02-25 05:36:01 -08:00
stevenlix
da653ccdac
Upgrade TensorRT to version 7.0.0.11 (#2973)
* update onnx-tensorrt submodule to trt7 branch

* add fp16 option for TRT7

* switch to master branch of onnx tensorrt

* update submodule

* update to TensorRT7.0.0.11

* update to onnx-tensorrt for TensorRT7.0

* switch to private branch due to issues in master branch

* remove trt_onnxify

* disable warnings c4804 for TensorRT parser

* disable warnings c4702 for TensorRT parser

* add back sanity check of shape tensort input in the parser

* disable some warnings for TensorRT7

* change fp16 threshold for TensorRT

* update onn-tensorrt parser

* fix cycle issue in faster-rcnn and add cycle detection in GetCapability

* Update TensorRT container to v20.01

* Update TensorRT image name

* Update linux-multi-gpu-tensorrt-ci-pipeline.yml

* Update linux-gpu-tensorrt-ci-pipeline.yml

* disable rnn tests for TensorRT

* disable rnn tests for TensorRT

* disabled some unit test for TensorRT

* update onnx-tensorrt submodule

* update build scripts for TensorRT

* formating the code

* Update TensorRT-ExecutionProvider.md

* Update BUILD.md

* Update tensorrt_execution_provider.h

* Update tensorrt_execution_provider.cc

* Update win-gpu-tensorrt-ci-pipeline.yml

* use GetEnvironmentVar function to get env virables and switch to Win-GPU-2019 agent pool for win CI build

* change tensorrt path

* change tensorrt path

* fix win ci build issue

* update code based on the reviews

* fix build issue

* roll back to cuda10.0

* add RemoveCycleTest for TensorRT

* fix windows ci build issues

* fix ci build issues

* fix file permission

* fix out of range issue for max_workspace_size_env
2020-02-12 07:03:58 -08:00
Changyoung Koh
7666d130e5 Rename MKL-DNN to DNNL to fix broken link (#2730) 2020-01-06 08:50:42 -10:00
Faith Xu
bb7f43ee91
Documentation update: build instructions (#2636)
* Spacing fix for code block

* Update instructions

Include java, acl, and nn api instructions on build page

* Update build instructions to link to build.md

* typo

* Update build instructions to link to build.md

* Include other minor build.md page updates

* Update CUDA version

* Fix dockerfile links
2019-12-19 13:40:34 -08:00
daquexian
62de8fa841 Update docs for Android NNAPI EP (#2586) 2019-12-09 14:37:03 -08:00
stevenlix
293b15480b Add dynamic shape support in TensorRT execution provider (#2450)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* update header file

* Removed submodule

* add submodule onnx-tensorrt kevin's branch shape-test'

* add debugging code

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* merge master

* Removed submodule

* update onnx-tensorrt submodule

* add more changes for dynamic shapes

* Update tensorrt_execution_provider.cc

* update for dynamic shape

* update dynamic shape processing

* fix logger issue

* remove submodule onnx-tensorrt

* add submodule onnx-tensorrt

* add env variable min_subgraph_size

* remove redundency

* update document

* use onnxruntime::make_unique

* fix multi-run issue

* remove some tests to save CI build time

* Add dynamic shape test

* Update TensorRT-ExecutionProvider.md

* Add example of running Faster R-CNN model on TensorRT EP

* Add more details on env variables

* update environment variables

* Update tensorrt_basic_test.cc

* Update model tests

* Update tensor_op_test.cc

* remove --use_full_protobuf

* Update build.py
2019-12-03 23:18:33 -08:00
Sreekanth Yalachigere
31ea11a696 Renaming MKL-DNN as DNNL (#2515)
* DNNL: Moving Files to rename file names

* DNNL name change

* azure pipeline updated

* disable ceil/dialation and enable Opset10

* disable ceil/dialation tests in Python

* mlperf_ssd_resnet34_1200 disabled
2019-12-03 07:34:23 -08:00
KeDengMS
60208463a9
[NupharEP] Enable parallel schedule (#2505)
* [NupharEP] Enable parallel schedule
* Update TVM with the fix to TVM threadpool to use OpenMP if possible
* Add parallel schedule when trying to vectorize
With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms

* Address CR, docs and cmake update

* Doc fix

* Fix mkl

* Fix TVM windows build when using mklml
2019-11-28 08:35:56 -08:00
Patrick Foley
151075790d [OpenVINO-EP] Update to latest version: OpenVINO 2019 R3.1 (#2308)
* Updates OpenVINO EP to latest version: 2019 R3.1

* Reviews fixed

* Update Dockerfile.openvino

* Addressed PR comments and disabled model tests temporarily

* Update Dockerfile.ubuntu_openvino
2019-11-05 19:55:46 -08:00