* refactor test for model with undefined shapes
* add test for TVMso EP
* update build script for TVM EP tests
* fix pylint
* disable test for Windows
* fix black
* fix python format
* fix pylint
* fix python format
* replace Path.resolve with os.path.join
* fix python path issue
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise
* [UPDATE] update ci to rocm5.2 + torch1.11
* [Revert] disable ort module test
* [DELETE] delete Rocm5.1.1 ci test result
* [UPDATE] update the comments
* add description of build ORT+TVM EP on Windows
* fix cmake error related to symlink creation on Windows
* add llvm config path to build flags for correct build on Windows
* update TVM_EP.md for llvm_config build arg
* fix warnings skipping during build on Windows
* fix using string or wstring for model path to correct build on Windows (MSVC error)
* fix error in custom logger for correct build on Windows
* implement glob algorithm for Windows
* additional build fixes
* update TVM with export of VM symbols for dll
* description of nasm issue and workaround
* update TVM with export of Executable from VM symbols for dll
* description of installation of ipp-crypto dependencies on Windows
* cmake key for ipp-crypto build
* fix wstring for TVMso EP
* fix ipp-crypto build
* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality
* fix absolute path to compiled lib
* update TVM_EP.md, fix lint warnings
* update TVM_EP.md
* small fixes after review
* switch on handshake functionality for Linux workflow
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Add tests for all uniary aten ops supported in eager mode
* fixing the PR draft
* fixing the merge
* changing eval to be at compile time
* adding requirements for eager
* 1.adding function to {ops}_out
2.cleaning the code
and adding comments
* editing the code according to code review
Co-authored-by: root <root@AHA-LIRONKESE-1>
* First attempt for half2 vectorized memory access in SkipLayerNorm
* Add some functions for debugging
* Clean up the code
* Clean up the code
* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp
* Add a unit test for a larger input size
* Fix some Lint C++ warnings
* Use ILP = 4 for the vectorized kernels
* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm
* Use conditional operator for input_v
* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel
* Clean some comments and rename the layernorm function
* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel
* Resolve a Lint C++ warning
* Fix SkipLayerNormBatch1_Float16_vec output data
* Add hipified code of bert SkipLayerNorm for ROCmEP
* Resolve some Lint C++ warnings
* Resolve some Lint C++ warnings
* Resolve some Lint C++ warnings
* Resolve Python formatting issue
* Add net6 targets.
Remove maccatalyst as we don't have a native build targetting that.
* Set platform in macos targets
* Add targetFramework entries
* Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file.
Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions.
Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6.
* Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines
* Fix patch version mismatch
Add some extra debug info in case it helps
* Debug nuget location in CI
* Add workspace entry back in
* Add steps
* One more attempt with hardcoded nuget.exe path and original android31.0 version
* Better fix - found explicit nuget download and updated version there.
* flake8 fixes
* Fix black complaints.
* Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS.
* Removed outdated comment
* Using vectorized loads (float2) for fp16 to improve performance
* Fix a few warnings from cpplint
* Fix a few warnings from cpplint
* Use __float2half2_rn and fix some cpplint warnings
* Move some computaions to LaunchFastGeluKernel
* Fix some Lint C++ warning
* Using vectorized loads (float4) for fp16 to improve performance
* Switch whether to optimize FastGelu with float4 vectorization
* Switch to float4 memory access based on input_length in FastGelu
* Comment how to set the threshold of float2 and float4 vectorized kernels
* Add FastGelu fp16 unit tests for bias_length = 2 and 8
* Make vectorized kernels generic with aligned_vector
* Unify the vectorized kernels with/without bias
* Refactor the code to suppress cpplint warnings
* Solve formatting issues
* Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
* Move fast_gelu_impl.h to rocm/bert
* Fix some Lint C++ warnings and code alignment
* Add .net6 support to the C# nuget package.
Currently requires jumping through a lot of hoops due to .net 6 only being supported in the preview release of VS 2022.
Build existing targets using msbuild.
Add .net6 targets and build using dotnet.
Create nuget package with combined targets.
A few misc automated changes from VS to spacing and adding a couple of properties.
* Try manually installing trt8.4 in multi-gpu pipeline
* Remove stmts that clean up cmake, ctest. Update tensorrt repository name passed to get_docker_image.py
* Update trt and cudnn home
* Don't install trtexec cli tool.
* Increase job timeout
* Revert timeout change and use trt placeholder builder build option
* update trt 8.4ga
* trt 8.4 linux ci pipeline
* fix cmake
* placeholder_builder
* trt 8.4 windows pipeline
* gpu package pipeline
* trt 8.4.1.5 , packaging pipeline updates
* python packaging
* ctest timeout
* python packaging test
* bump timeout
* python format
* format
* revert
* newline
* enable trt python tests
* typo
* python format
* disable on windows
* Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places.
Convert append EP for SNPE to be generic, and also use for XNNPACK.
Add XNNPACK to C# API
* Don't need stub for MIGraphX as it's using provider bridge.
* Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries.
* Only use EPs that require the layout transform if the opset is supported by the layout transformer.
* Update wasm registration of xnnpack.