* add description of build ORT+TVM EP on Windows
* fix cmake error related to symlink creation on Windows
* add llvm config path to build flags for correct build on Windows
* update TVM_EP.md for llvm_config build arg
* fix warnings skipping during build on Windows
* fix using string or wstring for model path to correct build on Windows (MSVC error)
* fix error in custom logger for correct build on Windows
* implement glob algorithm for Windows
* additional build fixes
* update TVM with export of VM symbols for dll
* description of nasm issue and workaround
* update TVM with export of Executable from VM symbols for dll
* description of installation of ipp-crypto dependencies on Windows
* cmake key for ipp-crypto build
* fix wstring for TVMso EP
* fix ipp-crypto build
* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality
* fix absolute path to compiled lib
* update TVM_EP.md, fix lint warnings
* update TVM_EP.md
* small fixes after review
* switch on handshake functionality for Linux workflow
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm
* check hash during compile in TVMso EP
* add IPP-CRYPTO to external dependencies for TVM EP
* made checkHash method constant
* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict
* implemented SHA-256 calculation using ipp-crypto library
* fix dependency for ipp-crypto
* add provider options for hash check
* update documentation for added provider options
* add hash check condition
* fix docs
* fix lint
* fix ORT_THROW
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Register signal ops for op set 17
Note code is mostly being moved, not added. These ops were previously
only registered as Microsoft contrib ops and only built if
`BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx
standard op set in version 17.
Main components of this change:
* Move the kernels from the conrib_ops directory to the
core directory.
* Add function bodies for ms experimental ops. This will allow
old models that use the contrib ops to continue to function.
All the function bodies consist of a single op (the
new standard op), so performance overhead should be minimal.
Minor clean-up also in this change:
* De-duplicate get_scalar_value_from_tensor: put it in a new utils.h.
* Fix some bugs that caused compilation errors with the experimental
ops. Tested with `build.sh --ms_experimental`
* Fix some spelling errors and lint violations.
* Replace a couple of switch statements with `MLTypeCallDispatcher`.
* Use `InlineVector` instead of `std::vector`.
Unblocks https://github.com/microsoft/onnxruntime/issues/11640
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.
In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.
Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.
Also fix various minor issues raised by linters.
Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
(1) Support T5 in BeamSearch operator, and add both CPU and CUDA implementation.
(2) Change BeamSearch op: rename encoder_decoder_init attribute to encoder, and add decoder_start_token_id attribute
(3) Update convert_to_onnx for T5 to use int32 instead of int64 inputs as default.
(4) Add more tests in best_beam_search.py
(5) fix ORT_ENFORCE of hypothesis_buffer_offset_
(6) Improve ONNX conversion:
(a) Change encoder some dynamic axes to fixed dim value
(b) add --separate_encoder_and_decoder_init
(c) correct name t5-3B => t5-3b, t5-11B => t5-11b
(d) Add --use_int32_inputs in convert t5 to onnx
(e) Allow t5 beam search conversion in one step
* update description of TVM EP options in docs
* update sample notebook
* update TVM EP documentation
* add link to description of options
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments
* Implement BitmaskDropout and associated unit tests.
* Implement BitmaskDropoutGrad and associated unit tests.
* Implement Dropout -> BitmaskDropout rewrite rule and associated unit tests.
* Implement (Dropout,DropoutGrad) -> (BitmaskDropout,BitmaskDropoutGrad) rewrite rule.
This commit does not yet include unit tests for this rewrite rule.
This commit also introduces improved documentation for all changes which will be grouped
into this PR.
* bitmask dropout
* fix win build
* bugfix for rocm
* bugfix
* fix code format
* fix ut
* fix build break
* fix ut in win
* resolve comments
* fix ut in trt
* resolve comments
* fix rocm build error
* fix typo
Co-authored-by: Aidan Beggs <aidanbeggs@microsoft.com>
Description: Format all python files under onnxruntime with black and isort.
After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.
#11315, #11316
* Implement TreeEnsemble for opset(ai.onnx.ml)==3
* use of InlineVector
* refactoring
* improve attributes retrieval
* avoid creating a temporary buffer
* modifies onnx.ml.cpu.json
* use unordered_map
* update docs/OperatorKernels.md
* address PR comments (TH -> ThresholdType, ORT_RETURN...)
* add a python unit test to load a TreeEnsembleRegressor following ai.onnx.ml==3 specifications
* improve NonZero
* fix megatron_fp16 optimzier, fix the doc
* multi_tensor_applier
* resolve comment
* fix building warning
* fix build error when enabling training and use tensorrt
* change BeamSearch op to support encoder decoder model
* check model_type and decoder attribute
* fix
* update comments
* warn shape inference issue with onnx v1.11 or T5
* skip parity test when tempature != 1.0
* fix build
Work on minimizing memory management calls by
reducing number of allocations and copies.
Replace std::unordered_set to InlinedHashSet
and add usage of InlinedVector.
Employ std::move() to minimize copying and memory allocations.
Remove copying of the const shared data into each of the
PropagateCast transformer instances.
Move inlined_containers.h header to include/common
Adjust AsSpan imlementation for C++ < 17
* add support for bool type
* add TVM EP support for tests
* include TVM EP in python test pool
* fix pylint
* moved technical imports to a separate file
* clean up post build actions & move _ld_preload.py extension to CMake level
* add files for include TVM EP into CI
* implement custom logger for TVM
* replace TVM logging with ONNX RT logging
* update link for TVM EP tutorial
* clean up TVM EP cmake
* add pybind auto enabling for TVM EP
* fix blank spaces
* code review fixes
* replace print with comment
* add list of EP without TVM EP
* enable onnx tests
* disable contrib ops and ml ops
* reuse Dockerfile.ubuntu
* Move install_tvm_test_dependencies.sh out of Docker context dir, update build definition.
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Fix incorrect type constraint registration for RoiAlign. This led to the input type not actually being checked when matching a kernel as the invalid constraint name is treated as a missing optional input.
* fix missing dependency for the unit test exe. Whilst it doesn't link against the CUDA providers lib, without the dependency VS doesn't know it needs to rebuild the library if there are changes.
* Add check for invalid type constraints.
* Fix invalid registrations for other kernels.
* Add hash replacement logic to provide backwards compatibility in ORT format models when the registration is fixed.
* Add tests
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
Rename T from InlinedShapeVectorT. Fix Eager build
Add LLVM Copyright with modified derived code notice.
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Changes to fuse embed layer for gpt2, kernal changes pending
* verified add output and regular add match
* Test added for additional output embedlayernorm, working on CUDA
* Test passing on CPU
* updated convert_to_onnx toll to check parity correctly
* removed some debugs
* couple of TODO left as in optimizer.py
* removed changes to optimizer.py
* fixing build
* fixing build
* updated order of initilization
* added a test case for float16
* updating the docs
* updating tests failing due to embed layer fusion
* update unit tests
* updating CUDA documentation in operatorkernels.md
* addressing comments
* OperatorKernels.md updated with CUDA
* adding TODO to qembed_layer
* minor edit
* updated docs
* addressing comments
* adding position ids to embed layer gpt2
* updating fused gpt2 model
* added extra test
* remove comments
* addressing comments
* contrib_defs.cc updated
* all tests passing
* fixing a typo
* minor edit
* trigger build
* qembedlayernorm checkinputs updated
* fixing build error
* fixing build error
* fixing build error
```
Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens,
starting from previous suffix match index, and predicted tokens.
Concat predicted tokens, starting from bifurcation index, to the back
of current tokens. This forms the output tokens.
Detect suffix match index in source tokens, between source tokens and output tokens.
Detection is based on finding the appearances of last n-gram in output tokens
in source tokens.
A match is considered found if source tokens contain a single matching n-gram.
Return the index of the start of the n-gram in source tokens.
No matching if found if src tokens contain multiple or zero matching n-grams. Return -1.
```
* bias dropout improvement
* add transform case for same shape case
* combine kernel
* merge with vectorized kernel
* use "has_same_shape_bias"
* minor: a "N % 4 != 0" case
* add op UT for has_same_shape_bias
* address comments; add param case for 1d bias;
add param case tests for 1d and same-shape bias
* rewrite logic condition
Co-authored-by: Peng Wang <pengwa@microsoft.com>
* Enable selecting custom ops in onnxruntime-extensions.
* Move cmake_helper.py.
* Remove over-indented spaces.
* Add doc.
* Remove onnxruntime-extensions from git submodules, and user should pass path of onnxruntime-extensions for build.
* Modify doc.
* Remove argument --enable_onnxruntime_extensions and use --onnxruntime_extensions_path.
* Fix build error.
* Fix build error.
* Use onnxruntime_extensions_path.
* support both submodule and external source folders
* refinement
* Update cgmanifest.json
* Support building onnxruntime-extensions from either git submodule or pre-pulled path.
* Update doc.
* more standard name
* update docs
* add the copyright header
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* GridSample OP implementation for CPU and CUDA
**Description**: This change contains implementation for torch grid_sample OP.
Cuda implementation contains contribution from Muscle Wu.
* Use interpolation for out-of-bound points in zero padding mode
Out-of-bound points in zeros padding mode changed from constant 0 to
interpolation of surrounding pixels. This aligns with Pytorch implementation.
A bug in CUDA batch offset calculation is fixed.
Custom op exporter type is added.
* Fix nearest bug in CPU
* Update per CI build finding and review comments
* Force float to avoid potential integer T issue
* Style update
* PR update
* Remove c++17 feature from cuda code
* changes
* tile grad unsqueeze fix for opset 13
* clean up
* remove bool support for opset 2 to 12 for Pad as it is not supported.
* Copy OperatorKernels.md from artifacts of Windows CI build.
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* plus more fixes
* updates per review
* update to release commit
* add filters for optional type tests
* plus updates
QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision.
Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute.
The formula for quantized GEMM is:
Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which,
C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)
SparseTensor support
Implement Builder pattern
Fix support for 1-D and 2-D COO indices
Implement and test CSR support.
Handle shape inference for SparseTensors
Implement conversion for COO, CSR and tests.
Address the case where constant sparse initializer is the output.
Implement test infra for SparseTensors
Implement SparseDenseMatMul for Csr and COO and tested it.
Add hash for SparseToDenseMatMul
Finish shared provider refactor
Refactor GetOrCreate to Create
Working on py interface
Expose OrtDevice and use it in allocate_numpy
Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
Add and test to_cuda()
Add accessors to format specific indices
Test values and indices views, read-only flag, after GC access
Add sparse related methods to OrtValue
Re-work SparseTensor wrapper, add OrtValue methods
Rework numpy_array_to_cuda/to_cpu
Add run_with_ort_values
Add models and test sparse_mat_mul with run_with_ort_values
Refactor sparse tensor to use a single buffer
Ifdef x86 Eigen CSR sparse matmul implementation
Exclude broken test, check for string type when copying cross device
Split pybind schema, regenerate docs, add exclusion
Conditionally exclude schema module
Update docs fix cuda build
Add test to a filter and renerate JS docs
Add conversion and test string support for sparse tensors
Exclude conversion utils from minimal build
Add CUDA Memcpy and adjust provider interfaces
* add Gridsampler contrib op
* fix gridsampler_paddingmode_border test
* disable the tests until the kernel added
* fix CI failure
* change GridSampler to GridSample
* changes working to convert akv nodes
* changes to replace nodes
* changes to accomodate qkv hidden sizes as attributes
* kernel to accept qkv_hidden_size attributes
* Working till compute for varied dimension, todo applyattention()
* changes to make all regression tests work
* inference running successfully without prepack
* success inference with pre-pack weights
* add test for diff sizes
* bias shape need not be a mul of 3
* get the output_hidden_size from input
* infer output shape from input
* merge with master
* cleaning up files that got merged wrong
* accurancy at accepted level
* added unit test case for different dimensions
* all unit tests passing
* packed weights working for attention
* prepacked weights working
* added test case for newly added extra qk input
* updated unit test to test only extra add qk
* fixing build error
* removing few debugs
* reverting test changes
* all python test passing
* cleaning up
* new unit test added, major clean up of code
* removed extra code
* minor
* minor fix to tests
* prepack weights code cleaned up
* compacted compute() in attention.cc
* reformat compute()
* making a parameter T
* adding 3 q,k,v buffers in all cases
* fixing build
* running tests only on cpu
* Updating docs
* trigger ci builds
* Addressing comments in PR
* addressing some more comments
* get add_qk_str from add_qk node directly
* updating docs, added extra check to verify attn inputs
* Optimized the extra add by parallelizing
* added attention_shape to symbolic_shape_infer.py
* minor refactoring to address comments
* Update submodule onnxruntime-extensions to latest.
* Add document for onnxruntime-extensions.
* Update cgmanifest.json for onnxruntime-extensions.
* Add example in JavaScript.
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
**Description**:
Enforce no repetition of n-grams. Scores are set to `-inf` for tokens that form a repeated n-gram if added to the back of the input_ids.
**Motivation and Context**
Needed by transformer models in sequence generation algorithms (greedy search and beam search). This module has heavy impact on performance, and can be highly parallelized.
* Update the operator documentation generation
- Make layout a little nicer
- Update to latest supported operators including training
- Fix some links that are broken when the docs content is copied to github-pages
- Fix incorrect usage of 'onnx.ai.ml' as the default domain
- ML ops are now separated from the real default domain of 'onnx.ai'
- Include CPU, CUDA and training kernels
- exclude DNNL as it's not an EP we own
* There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library.
* Enable validation of both contrib ops and operator kernels in build
Filter generation so it's deterministic
Add ability for CI to publish the md files as build artifacts if they differ so a developer can download and add to their PR to resolve any diffs.
Remove workarounds for github-pages as that will now link to the github docs which display correctly
* checkin
* add 4dmask support in attention cuda op
* trim
* add comments
* fix build/test error
* review comments and add tests
* sync doc
* review comments
* minor change
* Include ORT format model conversion scripts and infrastructure in ORT python package.
- tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).
* Address PR comments
Implement various improvements related to reordering a tensor for use by NCHWc operations:
Relax the requirement that the input channel count must be a multiple of the NCHWc block size (either 8 or 16 depending on ISA). The requirement now is that the channel count must be a multiple of 4. The implementation of MlasReorderInputNchw would need further work to support relaxing this further, but I don't have any models where I've observed this to be necessary yet.
Support fusing a Transpose(NHWC->NCHW) into a following ReorderInput. ReorderInput now has a channels_last attribute as was done in the past for ReorderOutput. This helps with models converted from TF where the converter is unable to remove all Transpose operations.
Add threading support to ReorderInput to accelerate performance (ReorderOutput will come later).
* Implement qlinear concat and unit test.
Add quantization tools for QLinearConcat and it quantization tests.
* Add kernel def hash for QLinearConcat.
* Change according to PR. Add qdq transformer support for QLinearConcat.
* Add QDQ Transformer unittest. Fix typo on domain.
* remove dup logic of no use.
* fix x86 build error.
* Update operator docs.
* Add support for custom ops library to the ORT model conversion script
Simplify model conversion now that we read ops from the ORT format model.
Enable custom ops in the python bindings if custom ops are turned on in a minimal build.
* Add test of model conversion involving custom ops.
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
- Add python bindings for ORT format models
- Add script to update bindings and help info
- Add parsing of ORT format models
- Add ability to enable type reduction to config generation
- Update build.py to only allow operator/type reduction via config
- simpler to require config to be generated first
- can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
- Add script to create reduced build config
- Update CIs
Update Python API to allow more flexibility for setting providers and provider options.
The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict).
Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order.
Convert some usages of the deprecated global configuration functions to use EP-specific options instead.
Update some EP-specific option parsing to fail on unknown options.
Other clean up.
* Enabling fasterrcnn variant and vehicle detector
* changes for 2021_2 branch
* yolov3_pytorch commit
* fixed braces in basic_backend.cc
* ci information added
* faster rcnn variant and vehicle detector changes were made in 2021.1 and not in 2021.2
* some changes to support unit tests
* disable some tests which are failing
* fix myriad tests for vehicle detector
* Did some cleanup
*cleaned up comments
*Disabled Add_Broadcast_0x1 and Add_Broadcast_1x0
tests on MYRIAD_FP16 backend due to a bug
*cleaned up capability_2021_2.cc file
*Removed extra conditions which were added
for some validation in backend_utils
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* yolov3 pytorch workaround to ensure that the output names are matched
* gemmoptest fixed on myriad
* Fixed MYRIADX CPP Test Failures
*Expand,GatherND,Range,Round op's
are only supported in model
*where op with float input data
types are not supported and fixed
*Scatter and ScatterElements op's with
negative axis are fixed
*Reshape op with 0 dim value are not
supported and fixed
*Disabled InstanceNorm_2 test on MYRIADX
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* make changes to yolov3 pytorch
* Fixed python unit tests
*Fixed failing python tests on vpu,
GPU and CPU
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixes POW op failures on GPU_FP16
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Clean up capability_2021_2.cc
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated docx for MultiThreading option
*Added extra info on setting the num_of_threads
option using the API and it's actual usage
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* fixed slice and removed extra prints
* Disabled failing python tests
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor changes added in capabilty_2021_2
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* made changes to slice to avoid failures
* Disabling FP16 support for GPU_FP32
->Inferencing an FP16 model on GPU_FP32
leads to accuracy mismatches. so, we would
rather use GPU_FP16 to infer an FP16 model
on GPU Device
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Updated docx for Inferencing a FP16 Model
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* fix for mask rcnn
* Script for installing openvino from source
* Updated with openvino 2021.2 online installation
* code comment fixes
fixed accuracy mismatch for div
* Update OpenvinoEP-ExecutionProvider.md
updated for 2021.2 branch
* Update README.md
updated dockerfile documentation
* Update BUILD.md
build.md update documentation
* permissiong change of install_openvino.sh
* made changes to align with microsoft onnxruntime changes
* Updated with ov 2021.2.200
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: mohdansx <mohdx.ansari@intel.com>
* allow custom op taking varied types
* refactor test case
* add test model
* refactor test case
* enable copy elision
* update test case
* fix issue in ToString function
* Update version to 1.6.0
* Add v 1.5.3 info
* Updating WindowsAI and ONNX version
Co-authored-by: Du Li <duli@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Expand the documentation on using compiling EPs with a minimal build to call out a 'simple' option that is easier to use. Provide more background on what happens to help users choose the best option for them.
Tweak conversion script to be noisier about attempted usage of 'all' optimization level.
Co-authored-by: manashgoswami <magoswam@microsoft.com>
* Update OpenVINO-ExecutionProvider.Md
update openvino-executionprovider.md for shared library
* Update Build.md
updated --build_shared_lib flag for building openvino shared provider lib
* Update Dockerfile.openvino
building for shared library with the new changes for openvino shared lib
* Revert "Update Build.md"
This reverts commit c9cf5fee76be7fdc10cadf07259f1d4ed5b45b93.
* Revert "Update Dockerfile.openvino "
This reverts commit e1624e4f93a4cfb425b6f21d7fb71b299a146740.
* Update OpenVINO-ExecutionProvider.md
fix documentation to the shared library
Co-authored-by: sfatimar <sahar.fatima@intel/com>
* Add initial documentation on using NNAPI with a minimal build
* minor clarification
* Add note on avoiding local full build
* Address a couple of PR comments
* add int8
* support both native TRT cal table and ORT cal table
* add more comments
* Update env variable name and check platform availability for int8/fp16
* add backward compatibility on old env var ORT_TENSORRT_ENGINE_CACHE_PATH and switch to flatbuffers for ort cal table deserialization
* add int8
* support both native TRT cal table and ORT cal table
* add more comments
* Update env variable name and check platform availability for int8/fp16
* Remove nGraph Execution Provider
Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice
**Deprecation Notice**
| | |
| --- | --- |
| Deprecation Begins | June 1, 2020 |
| Removal Date | December 1, 2020 |
Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.
Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.
* Remove nGraph Licence info from ThirdPartyNotices.txt
* Use simple Test.Run() for tests without EP exclusions
To be consistent with rest of test code.
* Remove nGraph EP functions from Java code
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
* add case for cpu custom op on gpu
* format doc
* restrict GPU custom op on Linux GPU CI only
* separate cu file to a independent project
* fix typo
* include cuda_add lib
* move lib def
* add file header
Co-authored-by: RandySheriffH <rashuai@microsoft.com>
* add profile caching to improve engine caching feature
* Add comments
* fix typo
* add decryption for engine caching
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* Update tensorrt_execution_provider.cc
* update onnx-tensorrt submodule
* set opt profile to max value of the range
* add hash to engine/profile name
* Add calibration based INT8 quantization
* add an option to enable both FP16 and INT8
* Update tensorrt_execution_provider.cc
* add env variable to specify calibration file name
* clean up code
* Add comments and update TRT document
* enable tensorrt basic test and add EngineCachingTest
* clean up
* update envrionment variable in the test
* clean up
* Enabling Multi Device support for UEP
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor fix added
*Added a simple fix to determine OpenVINO
version for Arm build as well
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops.
The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU.
The main changes are:
- Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool.
- In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section.
- Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops.
- Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available.
- Use of parallel sections in the GRU operator.
- Additional test cases in threadpool_test.h.
- Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h.
* Implement Hetero in UEP
* Added security checks to take valid Hetero combinations
as device type
* Integrating Hetero features
* Get the statistics Report in Debug Mode
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Passing right device type for vadm_baackend
Added simple fix to pick the right device type
when using vadm_backend with Hetero as well.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed batching logic for 2020.4 and above
* Fixed flake8 PEP8 errors
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor Fixes Added
*Added security checks for device_type passed
in for Hetero build during run time
*code cleanup
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor changes Added
*Fixed batch_size bug in vadm_backend
*code cleanup
*Documentation updated for Hetero
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* add case for cpu custom op on gpu
* format doc
* restrict GPU custom op on Linux GPU CI only
* separate cu file to a independent project
* fix typo
Co-authored-by: RandySheriffH <rashuai@microsoft.com>
Description: This change makes three changes to the ThreadPool class to clean up issues identified during performance analysis and optimization. (1) It uses mm_pause intrinsics in spin loops, helping avoid consuming pipeline resources while waiting. (2) It re-organizes the spin-then-steal loop for work distribution to start out spinning as intended, rather than to start out trying to steal. (3) It updates the ThreadPool class's API to be consistent in the use of static methods for public functions. The PR includes minor doc updates and corresponding changes to test cases.
Motivation and Context
The change helps ensure consistency in behavior between the OpenMP and Eigen-based implementations. Unlike the instance methods, the static methods abstract over the different ways in which threading can be implemented; they will map onto the OpenMP or Eigen-based implementations when threading is used. When threading is not used they will run work sequentially.
* Enabled multi-threading for OpenVino EP
->Enabled support for concurrent_session_runs
*Run UEP using concurrent_session_runs > 1
*Enabled support for ORT_PARALLEL ExecutionMode
->Documentation Added for Enabling MultiThreading
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Minor Fixes added
*Configure the value of nireq during Runtime
*Documentation typos rectified and details
added for Multi_Threaded Inference
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Some checks added for this fix
*Added checks to invalidate wrong nireq value
and assigned it to default value of 8
*Added new config options for enable_vpu_fast_compile
which were changed w.r.t OpenVINO_2021.1 Release
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* updating examples with current api calls
* Fixing capitalization in api calls, adding RKNPU update
* Correcting nuphar and rknpu ep api calls
* Include creating session in readme
* Cmake changes for 2021.1
* added new ov version 2020.1 for faster rcnn
* Added missing defs
* equal op modified
* changes to incoroporate faster rcnn
* backend util.cc
* hddl_plugin_config.hpp is depreceated . instead use hddl_config.hpp
* changing myriad precision bool to i32
* gather is not enabled for gpu
* conv2D and pooltest auto_pad attribute should not be null
* negative indices are not valid for scatter op in myriad
* non max suppression op only supported in faster rcnn mode
* maxpool indices output is not supported
* Cleaned redundant code in backends
* Added ifdefs for HDDL config
* cast output dimensions check
topk operator k input it seems only resolved for myriad as it is
throwing issues for ask rcnn . need to verify
* we are limiting the subgraph size to 3 here
* taking care of review comments
* Fixed minor bugs
* Modified Slice op checks
* Added NonZero, Upsample
* Removed TopK if it's in the middle of a subgraph
* incorporated upsample conditions too
* Dockerfile changes for 2021.1 release
* dockerfile aptkey update
* Minor fixes
* ceil condition added again
* Fixed few gpu models
* Disabled LSTM and yolov3 in ModelTests
* python softmax cross entropy tests and negative log likelihood
* Update Build.md
Updated for openvino 2021.1
* Update OpenVINO-ExecutionProvider.md
update openvino execution provider for 2021.1
* Update READMe.md
updated new openvino version
* Update Dockerfile.openvino
added environment variable for DEBIAN Frontend
* Fixed myriad models
* Fixed gather condition
* Fixed mask rcnn model on myriad
* Modified Gather condition
* set default target of MCR dockerfile to MYRIAD_FP16
* Fixed tinyolov3 on CPU
* Update OpenVINO-ExecutionProvider.md
update openvino execution provider documentation
* Update Dockerfile.openvino
Removed environment variable
* Update OpenVINO-ExecutionProvider.md
update image manipulation networks supported
* Update onnx_backend_test_series_filters.jsonc
removed test_upsample_nearest from cpu test cases
* New InternalCI changes for 2021.1
* Full protobuf removed for OpenVINO
* Protobuf added
* Updated with apt installation for openvino
* Revert the testing changes
* Reverted testing changes
* File permessions are changed to original
* Deleted openvino installation and cmake change
* Optimized Dockerfile
Removed unnecessary cmake installation, numpy
* Added missing ifdefs
* delete array fix
* backend_utils.cc output_shape
* Revert "set default target of MCR dockerfile to MYRIAD_FP16"
This reverts commit 928d3e2b71e2f589cf51dacd3a133951cf9ca18d.
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
Co-authored-by: S. Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: Aravind Gunda <38353114+gundaarx@users.noreply.github.com>
* Fix Windows AI version
* Update text to extend telemetry coverage
Includes all official binaries
* Update text about EP pluggability
* Update CUDA/cuDNN versions
* Add link to reduce operator kernel page
* Update roadmap
* Add preview for migraphx
* Move Rockchip under IoT/Edge
* Update text to include ORT for Mobile doc link
* Allow sharing of initializers between sessions.
* Allow sharing of initializers between sessions (2).
* Add test for C#
* Add test for C#; address PR comments
* Address PR comments
Moved AddInitializer logic to internal session options
Added tests for owned buffer
Clarified documentation
Fix bug where memory info and not device was getting compared
* Fix test
* Fix training build
* Add ver 5 end marker and ver 6 starter, add scenario and usage examples.
* Added config flags for VPU Fast Recompile
* clean-up ifdefs
* Add VPU Fast compile config option
Adds an option that enables Fast compilation of models to VPU
hardware specific format.
* Add config option to choose specific device id for inference
Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.
* Add Python API to list available device IDs
* code cleanup
* Add second C/C++ API with settings string parameter
Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.
* Append 'Ex' to the extended C/C++ API
* Use set_providers Py API to set config options.
Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.
* avoid globals for py config options where possible
Co-authored-by: intel <you@example.com>
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
* cancel night build on pyop
* setup ci pipeline for build of reduced ops
* add back c# test
* remove debugging print
* add testing model
* add more arg in pipeline script
* disable pipeline trigger temporarily
* fix yaml format
* fix yaml format
* fix pipeline error
* rid c# test
* add ops for test cases
* add Conv from domain com.microsoft.nchwc
* remove --reduce_ops
* fix typo
* remove --build_java
* add test case for excluded op
* update doc with --skip_test
* formatting code, renaming files and simplify yaml
* remove debug build from yaml
* remove surplus ops from included_ops.txt
* add MinSizeRel build to yaml
* rename test cases and models
* exclude ir test from minimum build
* restrict ir test to be only applied to reduced ops build
* Add support for sharing allocators
* Incremental update
* Address some PR comments, add unit tests, add documentation.
* Address PR comments, add tests and some documentation.
* Fix build and test issues
* Remove RegisterAllocator API restoring the OrtAllocator interface changes. Changed docs to reflect this.
Also fixed the orttraining segfault. The segfault was because in the case of training session,
the CPU exec prov is not available at the time the transformers are applied. Changed it to create
a new one.
* cancel night build on pyop
* add rewriter to rewrite cpu provider
* skip BuildKernelCreateInfo<void>
* refactor variable name and comment
* include ops from csv file
* process multiple eps
* add default function to cuda provider
* rename function and add license header
* fix import
* add doc
* fix typo
* deal with empty kernel entry in cuda
* rename the rewriter file
* add comment into provider file
* add comment and rename function
* log warnings
* refactor extracting logic
* add entry for script to run solo
* add better example
* avoid onnx importing
* fix flake8 alerts
* minor fixes to better comments and doc
* add entries for all domains
* add void entry into contrib providers
* format cuda_contrib_kernels.cc
* format cpu_contrib_kernels.cc
* add all providers
* add default entry to all providers
* include op_kernel header
* cancelling change in providers beyond cpu/cuda
* rename file and switch file format to domain;opset;op1,op2...
* update doc
* restore non-regular ending grammar in cuda_contrib_kernels.cc
* add ort_root as input argument of script
* enable test in ci
* update doc
* update doc
* revert change on linux gnu ci
* switch to set to host ops
* simplify trimming logic
* add domain map to track current model
* allow ort_root to take relative path
* Removed building ngraph from source
* Disabled some tests temporarily
* Enabled softmax for all dims
* Added onnx importer to link libraries
* int64 changes
* fixed
* temp
* slice update start and end need to be initializer
* Disabled GatherND, ScatterND, ReverseSequence operators
* Added supported ops instead of unsupported ops
* Set precision only for CPU
* Removed some unecessary conditions
* Fixed segfault in slice
* Softmax restriction removed
* changes
* Setting precision for all plugins
* Changes added to include precision
and supported ops for gpu and vpu
* branch op support
* checking for disabled python test failure
* mapped input names and tensors directly rather than copying which was leading to mismatch
* last index is not supported
mkldnn does not support pow between integers
* included the code changes
* Rename inner-scoped variable to avoid MSVC warning
* applied changed to vadm as well and removed the utility function
getinputtensors() completely
* OpenVINO multi version support: CMake changes
* OpenVINO multi version support: C++ support
* removed commented code
* Remove redundant code lines
* Revert "Rename inner-scoped variable to avoid MSVC warning"
This reverts commit 2f650493162675bc6fb70730de9656ec400be332.
Merged separately in master.
* vadm changes disabled reduction op test
* putting test_gather_negative_indices in unsupported list for now
* Update MCR Dockerfile with 2020.4
Installs OpenVINO 2020.4 from deb packages via APT tool.
* Update build docs with 2020.4 info
* Update dockerfile with OV 2020.4 info
Instructions for building OpenVINO based docker image no longer require
downloading installer package as it is installed by the dockerfile
using OpenVINO 2020.4 APT package for Ubuntu 18.04
* Added constant folding bypass logic
* Added cout statements for ci
* Added NDEBUG flag for debug symbols
* Update Ops info in docs
* fixes multiple unit tests
* mathoptest.ceil disabled for gpu and myriad
* activation test temp disabled
* Fix models for CPU
* Fixed a syntax error
* local cmmit
* fixing unit tests for myriad
* Fixed Variadic Split, Topk issues
* fix_model commit
* Fix models in myriad
* Added ifdefs for OpenVINO 2020.4
* temp
* made some changes to not operator
* Added unused parameter
* relu enabled
* Fixed bug in Conv output
* Consolidated GPU failing tests into one category
* Made it compatible to InternalCI 2020.4
* Made changes for ngraph
* Disabled test for mask,fastercnn,tinyyolov3
* Removed proxy for ci
* run_dockerbuild.sh restored to same version
* run_dockerbuild.sh restored to same version
* run_dockerbuild.sh restored to same version
* Updated documentation for 2020.4
* Removed FP32 to FP16 transformation for GPU
* Disabled Coreml-FNS-Candy model test
* Added FP16 transformations
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Manohar Karlapalem <manohar.karlapalem@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel/com>
Co-authored-by: sfatimar <64512376+sfatimar@users.noreply.github.com>
Co-authored-by: intel <you@example.com>
Co-authored-by: gundaarx <aravindx.gunda@intel.com>