* Split GemmBase RocBlasGemm
* Add composable kernel GEMM baseline
* Make linter happy
* Address review comment
* Update bert cases with batchsize
* Adjust includes to fix IWYU lint
* Only builds and links used ck kernels to improve building time
* Remove warmup run on SelectImpl
* Add comment to utility function
* Mute cpplint
* Make RocBlasGemm<T>::SelectImpl semantically correct
* Add reduced basic test cases for ck gemm
* More robust gemm testing
* Fix warnings
* Fix grammar
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740
Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.
This change is a workaround to manually add `-O3` for "Release" Android builds.
* add scripts
* update docker scripts
* update build script
* create run script
* add test script
* add log 3 flags
* use the right build function
* build navi
* add clean script
* add pytorch like soln
* only build gfx 1030
* use HOST side var
* ignore logs
* update scripts
* GPU_WARP_SIZE_HOST
* update scripts
* remove scripts/amd
* match main
* add GPU_WARP_SIZE_HOST on cuda side
* match main
* correct gfx1030
* remove print
* move gfx add to rocm5.0
* remove inline
* make constexpr on cuda side
* add description of build ORT+TVM EP on Windows
* fix cmake error related to symlink creation on Windows
* add llvm config path to build flags for correct build on Windows
* update TVM_EP.md for llvm_config build arg
* fix warnings skipping during build on Windows
* fix using string or wstring for model path to correct build on Windows (MSVC error)
* fix error in custom logger for correct build on Windows
* implement glob algorithm for Windows
* additional build fixes
* update TVM with export of VM symbols for dll
* description of nasm issue and workaround
* update TVM with export of Executable from VM symbols for dll
* description of installation of ipp-crypto dependencies on Windows
* cmake key for ipp-crypto build
* fix wstring for TVMso EP
* fix ipp-crypto build
* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality
* fix absolute path to compiled lib
* update TVM_EP.md, fix lint warnings
* update TVM_EP.md
* small fixes after review
* switch on handshake functionality for Linux workflow
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm
* check hash during compile in TVMso EP
* add IPP-CRYPTO to external dependencies for TVM EP
* made checkHash method constant
* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict
* implemented SHA-256 calculation using ipp-crypto library
* fix dependency for ipp-crypto
* add provider options for hash check
* update documentation for added provider options
* add hash check condition
* fix docs
* fix lint
* fix ORT_THROW
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Setting default version values for ovep dlls as well
* Update backend_manager.cc
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: mohsin <mohsinx.mohammad@intel.com>
* update trt 8.4ga
* trt 8.4 linux ci pipeline
* fix cmake
* placeholder_builder
* trt 8.4 windows pipeline
* gpu package pipeline
* trt 8.4.1.5 , packaging pipeline updates
* python packaging
* ctest timeout
* python packaging test
* bump timeout
* python format
* format
* revert
* newline
* enable trt python tests
* typo
* python format
* disable on windows
* Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places.
Convert append EP for SNPE to be generic, and also use for XNNPACK.
Add XNNPACK to C# API
* Don't need stub for MIGraphX as it's using provider bridge.
* Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries.
* Only use EPs that require the layout transform if the opset is supported by the layout transformer.
* Update wasm registration of xnnpack.
* C API version 0.001
* fix linker issues
* fixes for save checkpoint api
* plus fixes based on tests
* plus test_runner and other changes
* Plus cosmetic updates
* remove unnecessary headers
* plus some updates
* plus more changes
Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Prior to this every test shared the same tolerances. This meant
that if an ONNX test failed due to a small but acceptable difference in
output, the only alternative was to disable the test entirely.
In op set 17, the DFT operator is being added. Without this change, the
tests for that operator fail because the output is off by about 5e-5.
It's better to keep test coverage for this new op rather than disable
the test entirely.
Also prior to this change, the global tolerances were not shared between
C++, JavaScript, and Python tests. Now they are.
Also fix various minor issues raised by linters.
Unblocks https://github.com/microsoft/onnxruntime/issues/11640.
* move code used to find the SNPE libs to a separate cmake file
* Roll back the change for libc++_shared, it's the one from SNPE SDK, otherwise it will cause uncaught exception of type std::bad_cast because of conflict
* lr_scheduler implementation
(cherry picked from commit d9c2552b3a3b2ff38ee0a14770257aa1169f6fa9)
* refactor Module/Optimizer constructor.
* add intermidiate API layer bridging public interfaces with internal ones.
* synthetic data loader
* make end to end run pass
* avoid many session input copy (CPU to GPU)
some clean up
* NVTX for runner
* minor fix after sync
* revert to let Module/Optimizer handle session creation.
* fix tests & test file folder consolidation
* refine based on comments & fix cpplint
* typos
* aten op for inference
* fix build error
* more some code to training only
* remove domain from operator name
* move aten_op_executor ext out from ortmodule
* add pipeline
* add exec mode
* fix script
* fix ut script
* fix test pipeline
* failure test
* rollback
* bugfix
* resolve comments
* enable aten for python build only
* fix win build
* use target_compile_definitions
* support io binding
* turn off aten by default
* fix ut
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: zhijxu <zhijxu@microsoft.com>