**Description**: Changes to the MIGraphx execution provider code to
allow for stream synchronization on the gpu side
**Motivation and Context**
Performance boost by removing redundant host to device synchronizations
The current implementation of the execution provider continuously calls
hipDeviceSynchronize() between computations which adds overhead and an
idle wait between the GPU's computations. This is noticeable during
device
This change leverages new functionality that's been added to MIGraphX to
allow for GPU side synchronization which avoids the need for
host->device waits.
To maintain backwards compatibility with older MIGraphX versions, the
compile time define MIGRAPHX_STREAM_SYNC has been added to the API to
allow for older version operate with newer builds of onnxruntime without
loss of functionality to the current feature set as of (08/09/22)
Co-authored-by: Ted Themistokleous <tthemist@amd.com>
Update for ROCm CI before reland tunable GEMM #12853. This PR also update
composable kernel to use CMakes's HIP language support so that we can
mix C/C++ compiler with HIP compiler instead of locking to hip-clang
### Description
fix XNNPACK on WebAssembly SIMD.
Flag "-msimd128" need to be applied to every source file when compiling
WASM SIMD. Currently only a part of the source files are compiled with
this flag so we get inconsistent result for
`sizeof(xnn_f32_minmax_params)` because the type definition include a
`#ifdef` for `__wasm_simd128__`. The inconsistency causes writing
garbage data to a stack variable and eventually cause the crash.
XNNPACK libraries are C libraries so need to apply the build flags not
only to `CMAKE_CXX_FLAGS` but also to `CMAKE_C_FLAGS`.
### Description
This updates the oneDNN library used by oneDNN ep from version 2.6 to
version 2.7
### Motivation and Context
This brings in the many improvements incorporated into the oneDNN
library to the oneDNN execution provider.
Signed-off-by: George Nash <george.nash@intel.com>
* consume ONNX 1.12.1 to prevent vulnerability issue while loading external tensors
* update ONNX 1.12.1
* test updated PR
* use official rel-1.12.1 commit
* upgrade emsdk to 3.1.19
* fix build break
* ignore '-Wunused-but-set-variable' in eigen
* add malloc and free in exported functions
* EXPORTED_FUNCTIONS
* Split GemmBase RocBlasGemm
* Add composable kernel GEMM baseline
* Make linter happy
* Address review comment
* Update bert cases with batchsize
* Adjust includes to fix IWYU lint
* Only builds and links used ck kernels to improve building time
* Remove warmup run on SelectImpl
* Add comment to utility function
* Mute cpplint
* Make RocBlasGemm<T>::SelectImpl semantically correct
* Add reduced basic test cases for ck gemm
* More robust gemm testing
* Fix warnings
* Fix grammar
* add description of build ORT+TVM EP on Windows
* fix cmake error related to symlink creation on Windows
* add llvm config path to build flags for correct build on Windows
* update TVM_EP.md for llvm_config build arg
* fix warnings skipping during build on Windows
* fix using string or wstring for model path to correct build on Windows (MSVC error)
* fix error in custom logger for correct build on Windows
* implement glob algorithm for Windows
* additional build fixes
* update TVM with export of VM symbols for dll
* description of nasm issue and workaround
* update TVM with export of Executable from VM symbols for dll
* description of installation of ipp-crypto dependencies on Windows
* cmake key for ipp-crypto build
* fix wstring for TVMso EP
* fix ipp-crypto build
* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality
* fix absolute path to compiled lib
* update TVM_EP.md, fix lint warnings
* update TVM_EP.md
* small fixes after review
* switch on handshake functionality for Linux workflow
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm
* check hash during compile in TVMso EP
* add IPP-CRYPTO to external dependencies for TVM EP
* made checkHash method constant
* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict
* implemented SHA-256 calculation using ipp-crypto library
* fix dependency for ipp-crypto
* add provider options for hash check
* update documentation for added provider options
* add hash check condition
* fix docs
* fix lint
* fix ORT_THROW
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* update trt 8.4ga
* trt 8.4 linux ci pipeline
* fix cmake
* placeholder_builder
* trt 8.4 windows pipeline
* gpu package pipeline
* trt 8.4.1.5 , packaging pipeline updates
* python packaging
* ctest timeout
* python packaging test
* bump timeout
* python format
* format
* revert
* newline
* enable trt python tests
* typo
* python format
* disable on windows
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.
In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.
Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.
Also fix various minor issues raised by linters.
Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
* update TVM
* get alignment constant from TVM
* update TVM_VM_SetInputs to upstream with TVM API
* fix CI issue: update TVM EP dependencies
* add sudo
* revert changes needed to install missing package
* add package for TVM EP CI
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments
* Implement XNNPACK support via an EP.
* Layout transform uses the GraphPartitioner infrastructure.
* Node fusion is supported.
* Conv and MaxPool implementations were ported from Changming's PR.
* Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled
* Add disentangled attention TRT plugin as contrib op
* update plugin name & remove null character
* update onnx-tensorrt submodule with my beta version
* use suggested plugin name & simpler shape propagation
* update onnx-tensorrt gitsubmodule to temporary fork
* update onnx-tensorrt to temporary commit
* redirect submodule back to latest 8.2-GA release of onnx-tensorrt repo
Co-authored-by: HHH-ComputeLab <haohangh@nvidia.com>
* update TVM
* small fixes
* update TVM with new set_input and NDArray API
* use set_input instead of set_one_input
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* Disable training code in DNNL LayerNorm code
The capability code already does not claim the LayerNorm and
SkipLayerNorm that require more than one output. However,
building with training enabled was causing issues.
The training specific code has been removed even when building with
training enabled.
Signed-off-by: George Nash <george.nash@intel.com>
* Fix for DNNL FusedMatMul op.
The bug was in the transpose code.
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Use agreed upon memory format type when runnig Pooling Gradient in dnnl ep
The dnnl ep does not currently have a way to pass memory_format information
between the forward pooling primitive to the backward pooling primitive.
This change explicitly sets the memory_format to use match that of Onnxruntime.
For both the forward and backward pooling code. This will prevent using un-matched
memory format that could result in an `unimplemented` error from dnnl ep.
Signed-off-by: George Nash <george.nash@intel.com>
* Update dnnl ep to use OneDNN v2.6
Do not run ReduceInfLogSum on the kDnnlExecutionProvider due to a
calculation bug when doing Log or infinity valuse. The fix for this
issue will be part of the next OneDNN release.
Signed-off-by: George Nash <george.nash@intel.com>
* Update PrintMemory function in dnnl ep
This modification can be used to enable/disable memory printing
for dnnl ep develpers. This is considered a developer only feature
and is disabled by default. It must be enabled and code recompiled
to use.
Even if it is enabled it will not actually print any memory because
the developer needs to take the extra step of spefifying the memory
that will be printed to the screen.
Signed-off-by: George Nash <george.nash@intel.com>
* Update binary ops to run on intel GPU when using dnnl ep
Binary ops (i.e. Add, Div, Mul, and Sub ) was updated to no longer
call GetMemoryAndReshape in the past this would move the memory from
CPU to the GPU. This extra call is no longer needed since it is taken
care of by the GetMemoryInOrtFormat call. Removing the GetMemoryAndReshape
prevented copying the memory to GPU twice.
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* rename info to options for TVM EP
* transfer options processing from TVMExecutionProvider to TVMEPOptions
* transfer TVMRunner to separated files
* implement TVMCompiler class
* replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider
* correct logging of TVM EP options
* RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented
* add prepareComputeInfo method
* remove update_output_shapes flag
* embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner
* refactor compileModel method
* small cleaning
* separate TVM EP options data store and processing
* replace TvmTensorShape by InlinedVector with max_size 5
* correct indentation
* update TVM hash
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* add executor option (vm or graph) and support virtual machine methods
* nullptr check for compile and run methods (see also PR#10211 from microsoft:onnxruntime)
* get output shapes for VM
* remove run_with_benchmark. remove run methods from python api, get it from native side
* get outputs method for VM was implemented
* support multiple input for VM
* update python logging and exception
* small fix
* update tvm with patch for VM API
* update nhwc transformations for TVM EP
* add data alignment check and support set_input_zero_copy for GE in TVM EP
* fix logger name
* return back to apache/tvm with VM fixes instead of local dev branch
* hide customized tvm logger while issue is not resolved. fix tvm warning related to target_host
* flake8 fix
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Work on minimizing memory management calls by
reducing number of allocations and copies.
Replace std::unordered_set to InlinedHashSet
and add usage of InlinedVector.
Employ std::move() to minimize copying and memory allocations.
Remove copying of the const shared data into each of the
PropagateCast transformer instances.
Move inlined_containers.h header to include/common
Adjust AsSpan imlementation for C++ < 17
* add support for bool type
* add TVM EP support for tests
* include TVM EP in python test pool
* fix pylint
* moved technical imports to a separate file
* clean up post build actions & move _ld_preload.py extension to CMake level
* add files for include TVM EP into CI
* implement custom logger for TVM
* replace TVM logging with ONNX RT logging
* update link for TVM EP tutorial
* clean up TVM EP cmake
* add pybind auto enabling for TVM EP
* fix blank spaces
* code review fixes
* replace print with comment
* add list of EP without TVM EP
* enable onnx tests
* disable contrib ops and ml ops
* reuse Dockerfile.ubuntu
* Move install_tvm_test_dependencies.sh out of Docker context dir, update build definition.
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
Although github works with both, this is more precise.
Having an extension also makes it easy to match with regex, when we want to inject code to reroute traffic to our own git mirror.
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* update base image from 11.4.0 to 11.4.2
* update Linux TRT GPU pipeline to TRT 8.2
* update onnx-tensorrt to 8.2-GA
* disable failing TensorRT 8.2 tests.
* update pad test.
* fix
* update win trt ci pipeline to trt 8.2
* test run with cuda 11.4 and cudnn 8.2
* increase timeout
* revert
* revert
* update packaging pipelines to use trt 8.2
* fix typo
* update trt gpu perf pipeline to trt 8.2
* increase timeout
* delete deprecated ci-perf-pipeline.yml
* bump timeout
* adjust timeout packaging
* Add QAttention to DNNL EP
Add QAttention to DNNL EP (limited support and disable for gpu)
update ONEDNN version to 2.4.4
bug fix in getcapability
add memory debug print
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Address Code Review + MatMulInteger Fix
clean up code and add comments
fix matmulinteger and add fusion rule to enable initialized vector weight zero
points of 0s
update DNNL_TAG to v2.5
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Linux Compile Fix + rollback ONEDNN to 2.4.4
Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com>
* Fix QAttention Debug build
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Fix QAttention build if USE_DNNL not specified
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>
* Enable selecting custom ops in onnxruntime-extensions.
* Move cmake_helper.py.
* Remove over-indented spaces.
* Add doc.
* Remove onnxruntime-extensions from git submodules, and user should pass path of onnxruntime-extensions for build.
* Modify doc.
* Remove argument --enable_onnxruntime_extensions and use --onnxruntime_extensions_path.
* Fix build error.
* Fix build error.
* Use onnxruntime_extensions_path.
* support both submodule and external source folders
* refinement
* Update cgmanifest.json
* Support building onnxruntime-extensions from either git submodule or pre-pulled path.
* Update doc.
* more standard name
* update docs
* add the copyright header
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
* dnnl ep rework
rework DnnlTensor,DnnlNode,DnnlSubgraph to support arbitrary graph topology and tensor data types
rework GetCapability to claim nodes in graph greedily from node topological ordering and delay creation of DnnlSubgraph until Compile
rework compile to have DnnlSubgraphPrimitive as the object to handle primitive creation and execution
instead of thread local primitive pool which duplicates intermediate memory allocated by the EP across threads
DnnlSubgraphPrimitive provides helpers to handle many common functions for each dnnl primitive builder and become the centralized place to store input, output, intermediate memories, initializer memories and etc
it provides functions to obtain input memories with automatic reordering/reshaping and moving between engines
it provides interfaces to add primitive, set output memory for single node and etc
add CONCURRENT_EXEC compile flag for dnnl library as without it, convolution primitive cannot be created and executed on different threads
enable unit tests to run on dnnl ep as well if built with dnnl ep
add dnnl ep support for Matmulinteger
* Add Relu to the DNNL refactor
Signed-off-by: George Nash <george.nash@intel.com>
* Add Convolution op to the DNNL rework
Signed-off-by: George Nash <george.nash@intel.com>
* Add Pooling ops to the DNNL rework
This adds the following ops:
- AveragePool
- GlobalAveragePool
- GlobalMaxPool
- MaxPool
Note: Pooling with dilation is not yet supported.
Note: GlobalLpPool, LpPool, MaxRoiPool, and MaxUnpool are not supported yet.
Signed-off-by: George Nash <george.nash@intel.com>
* Add Sum op to the DNNL rework
Signed-off-by: George Nash <george.nash@intel.com>
* Add ConvGrad op to the DNNL rework
Signed-off-by: George Nash <george.nash@intel.com>
* Add MaxPoolGrad and AveragePoolGrad ops to DNNL rework
Signed-off-by: George Nash <george.nash@intel.com>
* Added lrn operator to the refactored code
Signed-off by chethan.palangoutu.keshava@intel.com
* Added ReduceMean DNNL op to the refactor code
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Added Softmax DNNL op for the refactored code
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Added BatchNorm DNNL op inference-only for refactored code
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Added Binary Ops to DNNL rework
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Added ReluGrad to DNNL Rework
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Update OneDNN tag to v2.3
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Added support for memory upto dim size 12
this is to fix the CI test cases that contain binary ops of input dim
size > 5
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Prevent claiming support for float16 and bfloat16 when only float is suppoted
By using The string.find used was causing the code to claiming support
for float16 and bfloat16 when we only supported float. We now explicitly
check the code for the data type or the data type with a 7 letter prefix
basically prefixed with "tensor("
Signed-off-by: George Nash <george.nash@intel.com>
* Disable uint8 mul and div, improve type conversion
Disable mul_uint8 and div_uint8 test cases as they use modulo for
overflow handling while onednn uses saturation
improve ype conversion using enum instead of string comparsion as well
as adding more types
Signed-off-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* plus more fixes
* updates per review
* update to release commit
* add filters for optional type tests
* plus updates
* update onnx-tensorrt parser to master
* disable unsupported tests
* add cuda sm 75 for T4
* update tensorrt pipeline
* update trt pipelines
* update trt pipelines
* Update linux-gpu-tensorrt-ci-pipeline.yml
* update trt cid pipeline
* Update linux-gpu-tensorrt-ci-pipeline.yml
* Update Tensorrt Windows build pool and TensorRT/CUDA/CuDNN version
* update to cuda11.4 in trt ci pipeline
* update base image to cuda11.4
* update packaging pipeline to cuda11.4
* clean up
* remove cuda11.1 and cuda11.3 docker file
* disable unsupported tensorrt tests at runtime
* Update linux-multi-gpu-tensorrt-ci-pipeline.yml
* Update submodule onnxruntime-extensions to latest.
* Add document for onnxruntime-extensions.
* Update cgmanifest.json for onnxruntime-extensions.
* Add example in JavaScript.
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Pytorch cpuinfo library allows us to query current cpu features, micro-architecture and cache size, etc. These information is needed for targeted performance optimizations.
Unfortunately it does not work under Windows/ARM. We need to develop our own later
Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions:
1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work.
2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long.
3. Please avoid to use C++17 in CUDA files(*.cu). Also, the *.h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.
Co-authored-by: Chen Fu <fuchen@microsoft.com>
Description:
This change add google benchmark git repo as a submodule in onnxruntime repo.
Motivation and Context
Currently we have benchmarking code that depends on google benchmark. The version we are using has cross compilation issues for ARM CPUs. Recent changes in Google benchmark fixed these issues.
Another problem is that we now rely on ONNX to pull in Google benchmark, an indirect dependency. Updating ONNX involves complex steps and rightly so. However, updating Google benchmark dependency should not be hindered by these processes.
* Simplified version of WebAssembly support to keep most of existing data structures and add cmake using Ninja and emcmake
* Clean up CMakeLists.txt and add an example to create and compute a kernel
* Load a model from bytes and remove graph building steps
* Add all cpu and contrib ops with mlas library
* WebAssembly build with Onnxruntime C/CXX API
* Use protobuf cmakefile directory instead of adding every necessary source file
* Fix invalid output at example
* add missing files
* Change an example to use Teams model and support ort mobile format
* add API for javascript
* fix input releasing in _ort_run()
* update API
* Let onnxruntime cmake build WebAssembly with option '--wasm'
* allow one-step building for wasm
* Make build script working on Linux and MacOS
* Fix broken build from Windows command
* Enable unit test on building WebAssembly
* Resolve comments
* update build flags
* wasm conv improvement from: 1) GemmV; 2) Depthwise direct convolution 3x3; 3) Direct convolution 3x3
* Cleaned mlas unittest.
* use glob
* update comments
* Update baseline due to loss scale fix (#6948)
* fix stream sync issue (#6954)
* Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960)
* Update EyeLike CPU kernel.
* Update Mod CPU kernel.
* Update Multinomial CPU kernel.
* Slight improvement to Pad CPU kernel binary size.
* Update RandomNormal[Like], RandomUniform[Like] CPU kernels.
* Fix warning from setting multiple MSVC warning level options. (#6917)
Fix warning from setting multiple MSVC warning level options. Replace an existing /Wn flag instead of always appending a new one.
* MLAS: quantized GEMM update (#6916)
Various updates to the int8_t GEMMs:
1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before.
2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size.
3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator.
* Implement QLinearAveragePool with unit tests. (#6896)
Implement QLinearAveragePool with unit tests.
* Attention fusion detect num_heads and hidden_size automatically (#6920)
* fixed type to experimental session constructor (#6950)
* fixed type to experimental session constructor
Co-authored-by: David Medine <david.medine@brainproducts.com>
* Update onnxruntime_perf_test.exe to accept free dimension overrides (#6962)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
* Fix possible fd leak in NNAPI (#6966)
* Release buffers for prepacked tensors (#6820)
Unsolved problems:
1. One test failure was caused by a bug in Cudnn rnn kernels, when they can allocate a buffer and partially initialize it, the garbage data near tail of the buffer caused problem in some of the hardware. To attack this problem in a broader sense, should we add code in our allocators, and during a memory fuzzing test, fill an allocated buffer with garbage before returning to the caller?
2. Prepacking is used more widely than we know. For instance, Cudnn rnn kernels also cache their weights. They mix several weight tensors together into a single buffer, and never touch the original weight tensor anymore. This is the same idea with pre-pack, but they didn't override the virtual function, and they never tried to release those weight tensors, leading to memory waste. It also seems to me that there are some other kernels have similar behavior. Wonder how much memory we can save if we try to cleanup those too.
3. Turning off memory pattern planning does increase memory fragmentation, leading to out of memory error in some training test cases. Perhaps we can revisit the idea of pushing kernels-creation stage earlier, and then during initializer deserialization, we only avoid tracing those that will be prepacked.
* Enable type reduction for Range, ReverseSequence, ScatterND, Split, and Unique CPU kernels. (#6963)
* add CI
* fix test in ci
* fix flags for nsync in wasm build
* add copyright banner
* fix wasm source glob
* add missing exports
* resolve comments
* Perf gain by make packb wide to 4 from 16 on GEMM for WASM.
Remove no need direct conv in previous perf tuning.
* fix buildbreak introduced from latest master merge
* fix buildbreak in mlasi.h
* resolve all comments except MLAS
* rewrite packb related 3 functions for WASM_SCALAR seperately rather than using #ifdef in each.
and other changes according to PR feedback in mlas.
* More complete scalar path in sgemm from Tracy.
* Fix edge case handling in depthwise conv2d kernel 3x3. where:
*) support input W==1 and H==1
*) recalc in accurate pad_right and pad_bottom
*) support hidden pad_right == 2 or pad_bottom == 2 when W == 1 or H==1 and no pad left/top
* Add more test coverage for conv depthwise from Tracy.
Fix one typo according to PR.
* resolve comments
* replace typedef by using
* do not use throw in OrtRun()
* output error message
Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: David Medine <david.eric.medine@gmail.com>
Co-authored-by: David Medine <david.medine@brainproducts.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Chen Fu <chenfucs@gmail.com>
Changes include:
* Revert Event Pool changes
* Add copyright and revert unrelated changes
* Add DLPack as submodule and remove to_dlpack and from_dlpack from public API
* Update golden numbers for DHP Parallel tests
* Update ORTTrainer unit test numbers
* Rollback to DLPack v0.3
* Disable flaky test
* Update third party notices and CG manifest file
* Minor refactoring of ORTValue API
* Added code for Relugrad with GPU support.
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Add GPU support for DNNL ConvGrad
Signed-off-by: George Nash <george.nash@intel.com>
* Add GPU support for DNNL MaxPoolGrad
Updates to MaxPool for training with GPU
Update oneDNN to version 1.8.1
Signed-off-by: George Nash <george.nash@intel.com>
* Fixed issues found durring code review
- error in code comment
- using auto when the direct type would have been better
- removed ternary operators that were returning bool values
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Add ReluGrad and ConvGrad ops for the dnnl provider
* the mnist sample is updated to add the --use_dnnl option that
will cause the sample to use the dnnl execution provider for
nodes that exist in dnnl provider.
* Added the ability to find forward ops. Dnnl backward gradient
ops require the forward primitive description and workspace
from the forward operation.
* Enable specifying the execution provider for Gradient Checker Tests
* Prevent memory leak when running dnnl_provider in training mode
Prevent creating a SubgraphPrimitivePool when the code is built with the
ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly.
The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be
stashed in a map for reuse. Due to the way the Training Loop uses threads
the pool of SubgraphPrimitives were not being reuse instead a new pool of
SubgraphPrimitives being created each run. The old pool was not instantly
freed. This behavior could be a language error when using thread_local
memory.
Signed-off-by: George Nash <george.nash@intel.com>
* Added fixes to maxpoolgrad and memory leak.
Maxpoolgrad will now pass all unit tests.
With the conv and convgrad disabled for dnnl, mnist is able to train till 95%
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Fixed misc issues when testing training code with dnnl provider
* fix conv_grad dnnl tests with dilation to run dnnl execution provider
* update mnist training sample to accept convolution type models
convolution models require the input shape to be {1, 28, 28}
instead of the flat {728} image that is used for the gemm models
this will enable models that require the different shape by adding
`--model_type conv` to the command line when running the mnist sample.
(while testing a workaround was used see #4762)
* Disable weight caching in dnnl conv operator when using training
When training we can not use cached weights because the weight
will be updated each run. This re-enables dnnl Conv and ConvGrad Ops.
The weight caching was the source of the error from Conv when training.
* Fix issues found when building grad ops on Linux
* The dnnl_convgrad code was over using the scope operator
causing a compilation problem.
* The dnnl_maxpoolgrad code had a logic error that is was
comparing with the source description when it should have
been comparing with the destination despription.
* Update BUILD.md so it shows DNNL for training
* Updated the table of contents. Since the same providers
are listed twice. Once for Infrance and again for Training
an HTML anchor was added to distinguish the second header
from the first for the TOC.
* Fix build failure when not using --enable-training build option
* reorganize the gradient operators so they are grouped together
* Fix issues found when running onnx_backend_test_series.py
* Pooling code only supports 2 outputs when built with --enable-training
* Address code review feedback
* class member variables end in underscore_
* use dst instead of dist to match pattern use elsewhere in DNNL code.
* Remove workaround that was introduced to handle problems running
convolution based training models. See issue #4762
Signed-off-by: George Nash <george.nash@intel.com>
* Isolate training code and code cleanup
* Do not build if dnnl_gpu_runtime if enable_training is set training code
does not support dnnl_gpu_runtime yet.
* Isolated Training code inside ifdefs so that they wont affect
project if built without training enabled
* Inadvertant changes in whitespace were removed to make code review simpler
* Undid some code reordering that was not needed
* comments added to closing #endif statments to simplify reading complex ifdefs
* Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw
pointer. This matches what was done in Pool code and is safer memory code.
Signed-off-by: George Nash <george.nash@intel.com>
* Address code review issues
- whitespace changes caused by running clang-format on the code
- Several spelling errors fixed
- Removed/changed some ifdefs to improve readability
- other misc. changes in responce to code review.
Signed-off-by: George Nash <george.nash@intel.com>
* Code changes to address code review
- Simplify iteration code using `auto` keyword
- remove C style cast that was not needed
- remove instance variable that was not needed [relugrad.h]
- added the execution providers to `ComputeGradientErrorInternal()`
and `ComputeTheoreticalJacobianTranspose()` instead of using
a pointer to an instance varaible [gradient_checker.h/.cc]
Signed-off-by: George Nash <george.nash@intel.com>
* Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function.
This will reduce repeated code.
Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created.
This will prevent memory leak from convolution kernels being pushed constantly onto the stack.
Signed-off-by: chethan.palangotu.keshava@intel.com
* Code clean up and formating updates
- Removed empty else statment
- updated indentation of code that was causing double curly brackets to look unususal
- Changed check for NumDimensions to Size in Relu and ReluGrad error checking code.
- isolated training code
Signed-off-by: George Nash <george.nash@intel.com>
* Restore inadvertantly removed ConvGrad tests
When combining the DNNL and CPU version of the ConvGrad
tests two test were inadvertantly excluded. This adds
back the Conv3d and Conv3d with strides test cases.
Signed-off-by: George Nash <george.nash@intel.com>
* Add validation to ConvGrad
This validates the dimensions of the ConvGrad match the
passed in Convolution forward primitive description.
The current code for DNNL ConvGrad makes the assumption that the ConvGrad
nodes will be visited in the reverse order from the corresponding Conv nodes
The added validation will return an error if this assumption is not true.
Signed-off-by: George Nash <george.nash@intel.com>
* Do not create new execution providers in provider_test_utils
This removes the code that generated new execution providers in the
OpTester::Run function. This was added because the std::move was
leaving the `entry` value empty so subsequent calls would cause a
segfault.
Problem is this potentially changed the execution_provider because it
would create the default provider dropping any custom arguments.
When the now removed code was originally added the std::move was causing
crashes when the GradientChecker unit tests were run. However, it is no
longer causing problems even with the code removed.
Signed-off-by: George Nash <george.nash@intel.com>
* Change the forward conv stack to a forward conv map
This changes how the forward conv kernel is mapped to the bwd ConvGrad
kernel the problematic stack is no longer used.
The convolution stack made the assumption that the corresponding
ConvGrad operator would be visited in reverse order of the forward
Conv operators. This was always problematic and was unlikely to
work for inception models.
Important changes:
- The weight_name is added to the ConvGrad dnnl_node making it
possible to use the weight_name as a lookup key to find the
Conv forward Kernel
- the `std::vector fwd_conv_stack_` has been replaced with a
`std::map fwd_conv_kernel_map_`
- Although it is not needed lock_guards were added when writing
to and reading from the fwd_conv_kernel_map_ as well as the
fwd_kernel_map_. These should always be accessed by a single
thread when preparing the dnnl subgraphs so the guard should not
be needed but its added just in case.
- Updated the comments ConvGrad.h code to no longer mention the
stack. The error check is not removed. It will be good to verify
there are no errors as we continue to test against more models.
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
* assert sequence tensor and remove skips
* update testdata json
* use ONNX 1.8 in cgmanifest.json
* use previous commit to workaround
* update ONNX commit ID in docker
* skip test_maxpool_2d_dilations test for now
* update function name
* Remove nGraph Execution Provider
Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice
**Deprecation Notice**
| | |
| --- | --- |
| Deprecation Begins | June 1, 2020 |
| Removal Date | December 1, 2020 |
Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.
Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.
* Remove nGraph Licence info from ThirdPartyNotices.txt
* Use simple Test.Run() for tests without EP exclusions
To be consistent with rest of test code.
* Remove nGraph EP functions from Java code
Transitions from the ORT-only DML NuGet (hosted on the onnxruntime_public feed) to the new unified DirectML NuGet (Microsoft.AI.DirectML) on nuget.org. In addition, the Microsoft.AI.MachineLearning (WinML) and Microsoft.ML.OnnxRuntime.DirectML packages now take a dependency on the Microsoft.AI.DirectML package. This means we can remove the extra copy of DML binaries in these packages since they will be installed by the DML package.
* fix hash conflict
* Add verbose for engine deserialization and destroy old engine memory if new engine is generated
* update parser
* Update tensorrt_execution_provider.cc
* use a better hash algorithm
* Update tensorrt_execution_provider.cc
* Fix places where MinSizeRel wasn't having relevant flags added in the same way as Release and RelWithDebInfo
Enable LTO for minimal build. Cleanups onnx_minimal.cmake to remove some things handled when LTO is enabled in CMakeLists.txt
* Only enable LTO for MSVC in a minimal build
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
* correct some errors in the flatbuffers schema, move flatbuffers submodule to cmake/external
* update the ort flatbuffers schema to use less namespace
* minor update
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
* update onnx to latest master
* implement per-channel for quantizelinear and dequantizelinear
* refine the unit test
* exclude sequence_insert tests
* refine onnx cmake
* add failure tests to broken_tests
* move qdq common code to a seperate function
* refine code
1. Publish the image ACR, instead of building it every time for every PR
2. Make USE_MKLML and USE_OPENMP be able to co-exist. Currently both of them are enabled in our Linux CI build but indeed only one of them is taking effect.
3. Split nuphar and DNNL to separated pipelines.
4. Fix two warnings in onnxruntime/core/optimizer/matmul_scale_fusion.cc and onnxruntime/test/tvm/tvm_basic_test.cc.
5. Update the manylinux2010_x86_64 image to the latest.
* bump onnx to support bfloat16
* sign test code
* fix ut failures
* add bfloat type in gradient schema
* add bfloat16 to gathernd
* add bfloat16 into grad op defs
* temp disable gpu fusing transformers
* bfloat16 support fix
* more fix to bfloat
* bug ifx
* add bfloat16 to transpose matmul
* fix sce loss
* fix cast opset13 and other missing part of bfloat16
* Revert "temp disable gpu fusing transformers"
This reverts commit b627bc9019.
* add SCEloss back
* fix build break
* fix gpu failure due to missing kernel in opset13
* add tile opset 13 kernel
* Revert "fix gpu failure due to missing kernel in opset13"
This reverts commit 661d63d0599029757f240d29afd64b197b76b880.
* fix comments in pr
* fix cuda break due to opset13
* fix missing msdomain
* add nll loss tests into android build's broken list; disable bfloat16 cast tests due to the wrong type saved in onnx test data, will fix it in onnx first
Co-authored-by: Cheng Tang <chenta@microsoft.com>
* Add protobuf mutator library as a git submodule
* Added files and instructions to build the protobuf mutator library in CMake
* Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019'
* Added src files and build instructions for the main fuzzing engine
* Removed Random number generation test from inside the engine
* Added license header to files
* Removed all pep8 violations introduced by this change and other E501 violations
* Merged PR 4616739: Update QLinear Ops fix 1D support layout
Update QLinear Ops fix 1D support layout
Related work items: #26011523
* Merged PR 4617257: Gather operator DML EP fails with scalar indices and 1D inputs
Fix gather with scalar value.
The ONNX conformance test case is in another PR:
// 0D, axis 1, rank 0 indices tensor
{
"op_type": "Gather",
"axis": 0,
"data": [1,2,3],
"indices": 0,
"output": 1,
"T": "float32"
}
* Merged PR 4632178: Re-enable ORT onnx_test_runner test case (DirectML ConvTranspose validation needs to be loosened to comply with ONNX definition of output_padding)
Re-enable 1D convolution tests.
Related work items: #23499747
* Merged PR 4656672: Make DML EP use Direct queue
While a Compute queue has benefits, Direct is consistent with Winml.
Related work items: #26324112
* Update DML nuget version
* Merged PR 4662079: Update DmlDev branch again from github master
Include Sheil's changes to fix namespace and header file include paths. Without this, the ONNX conformance tests all fail with E_NOTIMPL.
* Increment DML nuget version
Co-authored-by: Nick Feeney <nickfe@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
* Added FP16 transformations
* Revert "Added CMAKE_BUILD_TYPE to make building dynamic"
This reverts commit d3e17af1af655cfdc4d2fec33f52055caa525e85.
* Added FP16 transformations for FP16 builds
* Backend logic cleanup
Cleans the backend(intel_graph.*) code in the following ways:-
1. Minimize global usage: Since all the IR graphs need to be
re-generated on every Infer, it is bad practice to rely on globals
for their saving and usage as there would be multiple readers and
writers to the same global variable leading to incorrect usages or
contentions. This change replaces globals with locals where possible.
This change also fixes an existing bug with due to
incorrect global usage.
2. Remove all unused functions.
3. Remove all unused headers and prepocessor directives.
* removed commented out code
* Disabled default optimization for Intel EP
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fix missed plugins.xml for python bindings
* Fixed the build after latest master changes
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled unsupported ops for accelerators
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added some more disabled ops
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added environment variable to enable debugging
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added more debug statements
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed unsupported ops list for GPU and VPU
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Fixed unsqueeze unit tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Added error message to the status
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Overwrite Model proto with shape info from data
Overwrites the shape info of Model proto with the shape from
actual input data. Needed for inferring models with Dynamic
shapes.
* Removed print statement and disabled where op
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled Reshape with Empty initializer
* Added more debug statements for 1P
* Don't allow 1D inputs with symbol for dimension
* Disabled some 3rd phase ops
* Disabled split and added zero dimension check for OutputDefs
* Cleanup zero dimensionality check
* Added different data type check for inputs and initializers
* Added conditions for Mod, Cast and Pad
* Removed unused variable
* Disabled scan and added conditions for squeeze
* Added changes for fixing all C++ unit tests
* Implements Backend Manager class for caching
Backend Manager provides a layer of indirection between EP interface
and OV backend that provides caching services for models with
symbolic dims in input shapes.
* clean up commented blocks
* clang-formatting
* Read I/O type info from ModleProto
Read the tensor element type information from ModelProto object,
as FusedNode is no longer available.
* code cleanup
* clang-formatting
* Added print statement for jenkins
* Disabled some python tests
* Changed the path of convert fp32 to fp16 hpp
* Added conditions for BatchNorm in GetCapability
* Fixed failed tests
* Revert "Added conditions for BatchNorm in GetCapability"
This reverts commit c3c28c3b00d27892c42546b35dacdd807a48ee90.
* Added Intel to onnxruntime backends
* pick up vars set by OV package setupvars.sh
* Added conditions for Identity
* remove a few cout prints
* Added conditions for GPU_FP32 unit tests
* Revert "pick up vars set by OV package setupvars.sh"
This reverts commit 8199e029c03eae21a1a7ef6bfdc93d00e5d0198b.
* Commented out fatal message for protobuf
* Might need to be removed
* Add interface class for current backend
* moved common logic to base class
* simplified cpu backend
* Removed unused headers
* use vectors to save i/o tensors for windows compatibility
* move utils fxns to backend_utils namespace
* rename ov_backend to ibackend
* Factory pattern for backend creation
* rename CPU backend to Basic backend
* renamed to vad-M and added to factory list
* Added conditions for VPU
* Added print statements
* Changed the logic for checking for symbolic shapes
* Modified logic for zero dimension check
* Removed VPU single dimension condition
* Removed comments
* Modified logic in DimensionCheck method
* Remove legacy OpenVINO EP
Remove all the legacy code for OpenVINO EP. UEP code will take its
place going forward.
This change does NOT remove OVEP files in the following areas asa
they will be reused by UEP:-
1. Documentation: All .md files
2. Docker releated files
3. Python bindings
4. Java bindings
5. C# bindings
6. ORT Server
7. CI pipeline setup files
* Rename Intel EP to OpenVINO EP
* Added unique names to the subgraphs
* Removed subgraphs with only constant inputs
* Modified subgraph partitioning algorithm to remove const input subgraphs
* Apply suggestion to onnxruntime/core/providers/openvino/openvino_execution_provider.cc
* Tracking output names to fix the output order bug
* Changed output names to a unordered map
* Modified logic to check for symbolic input shapes
* Fixed a bug in Reshape check
* Added empty model path to Model constructor
* Made necessary changes to cmake to build from the binary package
* Changed INTEL_CVSDK_DIR to INTEL_OPENVINO_DIR
* Enable dyn device selection with C++ API
* Added Round operator to unsupported list
* Modified subgraph partition logic for MYRIAD
* Removed supported ops from the list
* Enable dyn dev selection in Py API's
* Add documentation for dynamic device selection
* Use MYRIAD || HDDL instead of VPU
* Removed temporary cast of Int64 to FP32
* Disabled unit Tests for CPU_FP32 and GPU_FP32
* Removed default "CPU" from unit tests to allow overriding
* Removed ops Concat, Squeeze, Unsqueeze from unsupported list
* Get the device id from info
* Removed overwriting device_id and precision
* Enabled ConvTranspose and EyeLike
* Reordered unsupported ops in alphabetical order
* Fixed syntax error
* Fixed syntax error
* Code clean-up: Handle exceptions, logs and formatting
Code formatted according to ORT coding guidelines.
* remove debug print from pybind code
* updated docs with ops and models
* formatting prints
* Added default values for c and j for openvino
* Overriding the values set for c and j to be 1
* BACKEND_OPENVINO should be empty if openvino is not in build
* Overriding c value with default for perftest
* fix VAD-M device string bug
* Add IE error details to exceptions
* Use IE specific device names in EP
* Add VAD-F (FPGA) device support
* Removed unecessary libraries from whl package
* Code changes for Windows compatibility
* Add VAD-F option to python API
* [revert before merge] cmake changes for RC
* Enable Windows build in CMake
* Unset macro OPTIONAL for windows builds
inference_engine.hpp's include chain defines a macro 'OPTIONAL'
which conflicts with onnx project's headers when using MSVC. So
would need to explictly unset it for MSVC.
* Use a single copy of plugin/IE::Core
Defined as a static member in Backend manager
* Remove restriction of single subgraphs for myriad
* Passed subgraph name to Backend to enhance log statements
* Disabled zero dimension conditions
* Disabled concat to remove zero dims
* Enabled building ngraph as part of ORT
* Removed serializing and added versioning
* Fix CPU_FP32 unit tests
* Removed unecessary condition
* add ngraph.so.0.0 to .whl
* Check for zero dimensions only for inputs and outputs
* Restrict loading only 10 subgraphs on myriad
* Build ngraph.dll within UEP. Doesn't link yet
* Rename Linux included libngraph.so to libovep_ngraph.so
Renames locally built libngraph.so containing ONNX importer to
libovep_ngraph.so in order to avoid linkage conflicts with
libngraph.so supplied by OpenVINO binary installer.
Applies only for Linux builds.
* use output_name cmake properties for lib name
* fix .so name format in lib_name.patch
* CMake code cleanup
* Rename WIN32 included ngraph.dll to ovep_ngraph.dll
To avoid conflict with ngraph.dll distributed by openvino.
* Added myriad config for networks without 4 dimensions
* Loading the 10 max clusters for inference on myriad
* Refactor code and add Batching support
Encapsulate subgraph settings into context structs.
Add batching support for completely supported models.
* Disabled some broken tests
* use input_indexes to avoid batch-checking initializers
* Avoid static initialization order error on WOS
* Added candy to broken tests
* InternalCI changes for 2020.2
* Updated DLDT instructions
* Unsaved changed in install_openvino.sh
* Changes after manual check
* Remove custom ngraph onnx_import build for WOS
ONNX Importer on WOS does not have protobuf issue.
* Remove FP32ToFP16 ngraph pass
This conversion is performed implicitly within IE.
* Surround debug logic by #ifndef NDEBUG
* remove invalid TODO comments
* removed references to ngrpah-ep
* clang-formatting
* remove commented code
* comment edits
* updating copyright year to that of first OpenVINO-EP release
* remove redundant log msg
* Modified operator and topology support
* Update build instructions
* doc formatting
* Fixed clip unit tests
* Revert "Remove FP32ToFP16 ngraph pass"
This reverts commit ec962ca5f315a5658ad980e740196f19de2639c1.
* Applying FP16 transformation only for GPU FP16
* Fixed GPU FP32 python tests
* automatically use full protobuf
* disable onnxrt server for now
* Disabled upsample
* update dockerfile instructions
* Removed MO paths and added ngraph path
* Remove OVEP from ORT Server docs
Will put it back in after validation
* Updated path to Ngraph lib
* Disabled Resize and some other python tests
* Removed unnecesary header files
* Use commit SHA to fetch ngraph repo
* Avoid un-needed file changes due to version update
* Fixed clip tests
* Fixed Pow, max and min onnx tests
* build.md doc typo
* Update cmake patch command for ngraph src
* remove dead cmake code for onnxruntime_USE_OPENVINO_BINARY
* use spaces instead of tab
* remove commented code
* Add info about protobuf version
* edit debug env var and enable for WIN32
* specify only version tag of 2020.2 for dockerbuilds
* remove unnecessary file changes
* Pass empty string as default argument to C# tests
* Use ${OPENVINO_VERSION} to name openvino install directory in CI builds
* Enabled unnecessarily disabled tests
* Fixed ngraph protobuf patch
* Fixed error in protobuf patch
* Revert "Use ${OPENVINO_VERSION} to name openvino install directory in CI builds"
This reverts commit 89e72adb8bf3b9712f5c81c5e13fe68c6c0df002.
* Remove unsetting OPTIONAL macro
This is no longer used in recent ONNX update onnx/onnx@da13be2,
so this unset workaround is no longer necessary.
* Use a null string default argument for C# API
* Set OpenVINO version yml files and pass to CI Docker builds
Git Tag info for DLDT as well as install directory are set
using this value.
This reverts commit 9fa9c20348ed72ae360a95c98e9b074d2f9fafc5.
* Documentation: recommendation and instructions for disabling ORT graph optimizations
* more doc updates
* Reduced the number of models according to CI time constraints
Co-authored-by: ynimmaga <yamini.nimmagadda@intel.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Co-authored-by: Mikhail Treskin <mikhail.treskin@intel.com>
Co-authored-by: mbencer <mateusz.bencer@intel.com>
Co-authored-by: Aravind <aravindx.gunda@intel.com>
Co-authored-by: suryasidd <48925384+suryasidd@users.noreply.github.com>
* update onnx-tensorrt submodule
* add more model dumping point
* update trt kernel name and docker readme file
* fix minior issues
* fix format issue
* update onnx-tensorrt submodule
Co-authored-by: stevenlix <stevenlix>
* checkin
* fix MSVC build error
* test changes
* split pivot output into multiple tensors
* add horizon tensor
* Support multiple types for non-pivot tensor
* limit horizon tensor type to int32_t as max_horizon type
* work around some conversion warnings for local machine
* support variadic shape for non-pivot input
* dropping all rows is an exception
* fix a bug
* fix the way that generates horizon tensor
* more tests added
* add TypeConstraint() in ONNX_OPERATOR_KERNEL_EX
* update Featurizerslibrary
1. Fix static analysis warnings found by VC++
2. Add a new pipeline for static analysis
3. Merge all the windows CI build into one single yaml file.(Easier to queue them all).
4. Make DNNL build faster by disabling building the tests and examples.
5. Enable custom op unitest.
Advance ONNX commit to pickup the latest ArgMax, ArgMin,
ReduceMax/ReduceMin, MaxPool
Declare new versions for CPU/CUDA.
Implement infrastructure support for int8/uint8.
Adust GatherOp test for a new error.
Adjust Scan9.BadShape test.
Add exclusions for index out of bounds checks.
Rework result verification for SVDTransformer.
* Fix WCOS/Win32 linking bugs
* Remove unused NODEFAULTLIB flags
* Avoid plain target_link_libraries signature
* Avoid plain target_link_libraries signature
* Fix library list escaping
* Use library list instead of string
* Remove duplicate link to windowsapp.lib
* Remove Win32 build workarounds
* Specify CMake policies before initializing language
* Expose Win32 header definitions during build
* Force set API family
* Enable Win32 APIs in featurizer
* Use MT dynamic CRT
* Expose Win32 specific functions
* Disable app container globally
* Disable default wide functions in featurizers
* Add featurizers to test include path
* Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428
* Revert pipeline debugging hacks
* Skip /FI in CUDA sources
* Default to Win32 builds
* Enable WCOS when using WinML
* Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only
We want to implement SoftmaxCrossentropy and NegativeLossLikelihoodLoss forward training ops for opset-12 but that requires ONNX submodule to point to the latest commit to have the latest and greatest ONNX spec!
- Reverse integrate changes from *.in.proto files in github ONNX repo.
- Regenerate csharp/test/Microsoft.ML.OnnxRuntime.Tests/OnnxMl.cs
- Disable ONNX tests that don't have op implementation for the latest opset.
* Switch to CUDA10.2
* Update win-gpu-tensorrt-ci-pipeline.yml
* Update win-gpu-tensorrt-ci-pipeline.yml
* remove dynamic_shape
* update onnx-tensorrt submodule
* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests
* fix format issue
* add shape inference instruction for TensorRT
* update according to the reviews
* Update win-gpu-tensorrt-ci-pipeline.yml
* port the mimalloc allocator
* hook mimalloc opt into common.h and reduction ops
* repurpose USE_MIMALLOC to only denote subbing in of default allocator with mimalloc and some refactoring
* fix unintended cherry pick diffs
* polish alloctor_mimalloc
* explicitly disable mimalloc where it already had been disabled
* update mimalloc to pull in stl allocator
* switch mimalloc stl allocator to use mimalloc library version
* turn mimalloc on by default (only the stl changes are enabled, the python interacting ones are off already and shall remain so)
* move FastAllocVector into cpu specific code
* separate out defines into arena and stl changes
* the rest of the define renames
* bfc arena allocator
* some typos and rename the bfc arena allocator to fit existing class naming conventions
* adjustments in response to comments
* different template instantiations are friends