With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor
and the mode is linear by:
* Transforming NCHW Resize to NHWC variant
* Using the NHWC Resize kernel without floating-point computation
It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves
Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.
* Add warning about future computation change for Convtranspose with auto_pad
* improve msg
* update TODO to make lint happy
* update more contents for warning and add if
* valid was not infected
* move it into kernel registration
* parse auto_pad myself
* try to use conv_transpose_attrs_.auto_pad directly
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm
* check hash during compile in TVMso EP
* add IPP-CRYPTO to external dependencies for TVM EP
* made checkHash method constant
* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict
* implemented SHA-256 calculation using ipp-crypto library
* fix dependency for ipp-crypto
* add provider options for hash check
* update documentation for added provider options
* add hash check condition
* fix docs
* fix lint
* fix ORT_THROW
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
(1) add --run_shape_inference to make shape inference optional
(2) add --vocab_mask to make the input optional
(3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch
(4) save gpt2 model tensors to one external data file by default
(5) group convert_beam_search arguments to multiple groups
(6) make --decoder_onnx optional for gpt2 model
(7) replace print by logger
(8) update shape inference function to support external data.
(9) when saving external data, show warning if onnx version < 1.12
* Pad fallback to CPU
* Added queryPad in operatorRegistration.cpp
* Acknowledged PR comments
* Used any_of
* used none_of instead of any_of
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
* create op from ep
* read input count from context
* create holder to host nodes
* fix typo
* cast type before comparison
* throw error on API fail
* silence warning from minimal build
* switch to unique_ptr with deleter to host nodes
* fix typo
* fix build err for minimal
* fix build err for minimal
* add UT for conv
* enable test on CUDA
* add comment
* fix typo
* use gsl::span and string view for Node constructor
* Added two APIs - CopyKernelInfo and ReleaseKernelInfo
* pass gsl::span by value
* switch to span<NodeArg* const> to allow for reference to const containers
* fix typo
* fix reduced build err
* fix reduced build err
* refactoring node construction logic
* rename exceptions
* add input and output count as arguments for op creation
* refactor static member
* use ORT_CATCH instead of catch
* cancel try catch
* add static value name map
* format input definition and set err code
* fix comments
* fix typo
Improve performance of BiasGelu on OneDNN execution provider
This modifies how BiasGelu is handled by the OneDNN execution provider
by executing the gelu_erf primitive as a postop of the binary_add primitive.
Also fixes extra data copies made when running on GPU.
Signed-off-by: George Nash <george.nash@intel.com>
* Register signal ops for op set 17
Note code is mostly being moved, not added. These ops were previously
only registered as Microsoft contrib ops and only built if
`BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx
standard op set in version 17.
Main components of this change:
* Move the kernels from the conrib_ops directory to the
core directory.
* Add function bodies for ms experimental ops. This will allow
old models that use the contrib ops to continue to function.
All the function bodies consist of a single op (the
new standard op), so performance overhead should be minimal.
Minor clean-up also in this change:
* De-duplicate get_scalar_value_from_tensor: put it in a new utils.h.
* Fix some bugs that caused compilation errors with the experimental
ops. Tested with `build.sh --ms_experimental`
* Fix some spelling errors and lint violations.
* Replace a couple of switch statements with `MLTypeCallDispatcher`.
* Use `InlineVector` instead of `std::vector`.
Unblocks https://github.com/microsoft/onnxruntime/issues/11640
* Using vectorized loads (float2) for fp16 to improve performance
* Fix a few warnings from cpplint
* Fix a few warnings from cpplint
* Use __float2half2_rn and fix some cpplint warnings
* Move some computaions to LaunchFastGeluKernel
* Fix some Lint C++ warning
* Using vectorized loads (float4) for fp16 to improve performance
* Switch whether to optimize FastGelu with float4 vectorization
* Switch to float4 memory access based on input_length in FastGelu
* Comment how to set the threshold of float2 and float4 vectorized kernels
* Add FastGelu fp16 unit tests for bias_length = 2 and 8
* Make vectorized kernels generic with aligned_vector
* Unify the vectorized kernels with/without bias
* Refactor the code to suppress cpplint warnings
* Solve formatting issues
* Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
* Move fast_gelu_impl.h to rocm/bert
* Fix some Lint C++ warnings and code alignment
* fix mpi build for gcc8 or higher
* fix memory profile for partial graph run
* Revert "fix mpi build for gcc8 or higher"
This reverts commit fb60beb05402cd380597a12fc25880c0c8652ed4.
* remove debug code
* fix build
* fix build
* fix cpplint and python black format
* Eager mode ArgMax support.
* Fix basic max and min functionality with minor generator update. Note this does not address all max and min api scope.
* Add addmm test.
* Setting default version values for ovep dlls as well
* Update backend_manager.cc
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: mohsin <mohsinx.mohammad@intel.com>
* Add .net6 support to the C# nuget package.
Currently requires jumping through a lot of hoops due to .net 6 only being supported in the preview release of VS 2022.
Build existing targets using msbuild.
Add .net6 targets and build using dotnet.
Create nuget package with combined targets.
A few misc automated changes from VS to spacing and adding a couple of properties.
* Try manually installing trt8.4 in multi-gpu pipeline
* Remove stmts that clean up cmake, ctest. Update tensorrt repository name passed to get_docker_image.py
* Update trt and cudnn home
* Don't install trtexec cli tool.
* Increase job timeout
* Revert timeout change and use trt placeholder builder build option