* [UPDATE] update ci to rocm5.2 + torch1.11
* [Revert] disable ort module test
* [DELETE] delete Rocm5.1.1 ci test result
* [UPDATE] update the comments
* add description of build ORT+TVM EP on Windows
* fix cmake error related to symlink creation on Windows
* add llvm config path to build flags for correct build on Windows
* update TVM_EP.md for llvm_config build arg
* fix warnings skipping during build on Windows
* fix using string or wstring for model path to correct build on Windows (MSVC error)
* fix error in custom logger for correct build on Windows
* implement glob algorithm for Windows
* additional build fixes
* update TVM with export of VM symbols for dll
* description of nasm issue and workaround
* update TVM with export of Executable from VM symbols for dll
* description of installation of ipp-crypto dependencies on Windows
* cmake key for ipp-crypto build
* fix wstring for TVMso EP
* fix ipp-crypto build
* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality
* fix absolute path to compiled lib
* update TVM_EP.md, fix lint warnings
* update TVM_EP.md
* small fixes after review
* switch on handshake functionality for Linux workflow
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Add tests for all uniary aten ops supported in eager mode
* fixing the PR draft
* fixing the merge
* changing eval to be at compile time
* adding requirements for eager
* 1.adding function to {ops}_out
2.cleaning the code
and adding comments
* editing the code according to code review
Co-authored-by: root <root@AHA-LIRONKESE-1>
* First attempt for half2 vectorized memory access in SkipLayerNorm
* Add some functions for debugging
* Clean up the code
* Clean up the code
* Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp
* Add a unit test for a larger input size
* Fix some Lint C++ warnings
* Use ILP = 4 for the vectorized kernels
* Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm
* Use conditional operator for input_v
* Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel
* Clean some comments and rename the layernorm function
* Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel
* Resolve a Lint C++ warning
* Fix SkipLayerNormBatch1_Float16_vec output data
* Add hipified code of bert SkipLayerNorm for ROCmEP
* Resolve some Lint C++ warnings
* Resolve some Lint C++ warnings
* Resolve some Lint C++ warnings
* Resolve Python formatting issue
* Add net6 targets.
Remove maccatalyst as we don't have a native build targetting that.
* Set platform in macos targets
* Add targetFramework entries
* Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file.
Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions.
Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6.
* Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines
* Fix patch version mismatch
Add some extra debug info in case it helps
* Debug nuget location in CI
* Add workspace entry back in
* Add steps
* One more attempt with hardcoded nuget.exe path and original android31.0 version
* Better fix - found explicit nuget download and updated version there.
* flake8 fixes
* Fix black complaints.
* Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS.
* Removed outdated comment
* Using vectorized loads (float2) for fp16 to improve performance
* Fix a few warnings from cpplint
* Fix a few warnings from cpplint
* Use __float2half2_rn and fix some cpplint warnings
* Move some computaions to LaunchFastGeluKernel
* Fix some Lint C++ warning
* Using vectorized loads (float4) for fp16 to improve performance
* Switch whether to optimize FastGelu with float4 vectorization
* Switch to float4 memory access based on input_length in FastGelu
* Comment how to set the threshold of float2 and float4 vectorized kernels
* Add FastGelu fp16 unit tests for bias_length = 2 and 8
* Make vectorized kernels generic with aligned_vector
* Unify the vectorized kernels with/without bias
* Refactor the code to suppress cpplint warnings
* Solve formatting issues
* Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel
* Move fast_gelu_impl.h to rocm/bert
* Fix some Lint C++ warnings and code alignment
* Add .net6 support to the C# nuget package.
Currently requires jumping through a lot of hoops due to .net 6 only being supported in the preview release of VS 2022.
Build existing targets using msbuild.
Add .net6 targets and build using dotnet.
Create nuget package with combined targets.
A few misc automated changes from VS to spacing and adding a couple of properties.
* Try manually installing trt8.4 in multi-gpu pipeline
* Remove stmts that clean up cmake, ctest. Update tensorrt repository name passed to get_docker_image.py
* Update trt and cudnn home
* Don't install trtexec cli tool.
* Increase job timeout
* Revert timeout change and use trt placeholder builder build option
* update trt 8.4ga
* trt 8.4 linux ci pipeline
* fix cmake
* placeholder_builder
* trt 8.4 windows pipeline
* gpu package pipeline
* trt 8.4.1.5 , packaging pipeline updates
* python packaging
* ctest timeout
* python packaging test
* bump timeout
* python format
* format
* revert
* newline
* enable trt python tests
* typo
* python format
* disable on windows
* Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places.
Convert append EP for SNPE to be generic, and also use for XNNPACK.
Add XNNPACK to C# API
* Don't need stub for MIGraphX as it's using provider bridge.
* Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries.
* Only use EPs that require the layout transform if the opset is supported by the layout transformer.
* Update wasm registration of xnnpack.
* aten op for inference
* fix build error
* more some code to training only
* remove domain from operator name
* move aten_op_executor ext out from ortmodule
* add pipeline
* add exec mode
* fix script
* fix ut script
* fix test pipeline
* failure test
* rollback
* bugfix
* resolve comments
* enable aten for python build only
* fix win build
* use target_compile_definitions
* support io binding
* turn off aten by default
* fix ut
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>
* update TVM
* get alignment constant from TVM
* update TVM_VM_SetInputs to upstream with TVM API
* fix CI issue: update TVM EP dependencies
* add sudo
* revert changes needed to install missing package
* add package for TVM EP CI
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Initiate Ort SNPE EP
* fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master
* correct the source path for libonnxruntime.so while building for andorid package
* add AdditionalDependencies for amr64
* On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given.
* fix build failure if snpe is not enabled
* update doc for contrib op
* separate out snpe ep settings to onnxruntime_snpe_provider.cmake
* renaming according review comments
* update according review comments