Commit graph

1141 commits

Author SHA1 Message Date
pallavides
6ebb7b91eb
Re-apply fix for mkl issue for eager mode (#12881)
* reapply fix for mkl issue for eager mode
* add comment, update link libs
2022-09-08 12:29:24 -07:00
RandySheriffH
d3b684cd9e
Drop nuphar (#11555)
* drop nuphar code and configs

* refactor test case

* format python

* remove nuphar from training test

* remove commented nuphar logics

* restore llvm setting

* drop nuphar ci

* fix compile err

* fix compile err

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2022-09-07 15:11:18 -07:00
Hariharan Seshadri
ad69aac491
Introduce ordered quantization ops for the CUDA EP [1/n] (#12582)
Initial core small set for the ordered quantization ops for cuda EP.
2022-09-07 11:58:15 -07:00
Guenther Schmuelling
f856be162e
fix xnnpack wasm build (#12845) 2022-09-06 19:20:07 -07:00
Jan Tilly
437409c343
Add DONT_VECTORIZE flag to cmake (#12169)
Add DONT_VECTORIZE flag.
2022-09-07 12:14:14 +10:00
Yulong Wang
726251609a
increase max memory to 4G for wasm (#12798) 2022-09-06 17:07:13 -07:00
Xavier Dupré
54360c88d2
Disable two warnings raised by tensorboard on Visual Studio (#12773) 2022-09-06 20:42:52 +02:00
Baiju Meswani
295bd26980
Remove orttraining-distributed CI pipeline (#12738) 2022-09-02 14:34:26 -07:00
Changming Sun
ca5af24765
Update Sdl.ruleset to remove C26812 from the rules (#12695) 2022-09-01 20:05:20 -07:00
Sheil Kumar
e3b501125d
DFT on DirectML (#12710)
* DFT on DirectML

* feedback

* fix misc build issues

* fixes

* fix constant cpu inputs and optional tensors for external operators

* disable dft tests on 'pure' dml
2022-09-01 08:31:14 -07:00
Yulong Wang
82a28cc2c3
upgrade emsdk to 3.1.19 (#12690)
* upgrade emsdk to 3.1.19

* fix build break

* ignore '-Wunused-but-set-variable' in eigen

* add malloc and free in exported functions

* EXPORTED_FUNCTIONS
2022-08-30 13:42:45 -07:00
Yi Zhang
27304d9082
gcc should not less than 7 (#12771) 2022-08-29 23:49:29 +08:00
mwootton
817dc94345
Add first pass of rocm kernel profiler (#10911)
* Add first pass of rocm kernel profiler

* Clean up rocm_profiler. Format args. Demangle kernel names.
Add Api EventRecords

* Remove debug output

* Temporarily disable profiling unit test 'api record check' for cupti

* Fix compile error for non-gpu builds

* Use common file for demangle and pid/tid.  Namespace ThreadUtil.  Fix gpu buffer clearing.

* Merge demangle into profiler_common

* Merge demangle into profiler_common part 2

* Style cleanup

* Resolve linking issues via ProviderHost interface

* Demangle cuda kernel names

* Clean up comments

* Fix formatting

* Fix anal retentive formatting
2022-08-26 19:38:03 -07:00
cloudhan
46c074a6c8
Update composable kernel and enable experimental inter wave scheduling (#12626)
Update ck to latest master and enable interwave scheduling
2022-08-25 22:19:41 -07:00
Changming Sun
7927d525a7
Remove CUDNN path from CI build scripts (#12671) 2022-08-24 18:21:50 -07:00
Yi Zhang
de3d772995
Check GCC version (#12680)
* check gcc version
2022-08-24 12:10:08 +08:00
Wei-Sheng Chin
dc486d146b
Make ORT callable from various Pytorch compilers (LazyTensor, TorchDynamo, etc) (#10460)
* Make ORT as Pytorch JIT backend

LORT likely doesn't work with aten fallback so we only test LORT in its own CI.

* Revert changes to enable external CUDA allocator. Will add it later.

Revert "Revert changes to enable external CUDA allocator. Will add it later."

This reverts commit d5487f2e193014c805505afae8fb577c53667658.

Fix external allocator

* Relax tolerance and remove commented code

* Print more information in CI

* Fix pointer

* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.

* Use Pytorch master branch as all PRs are merged

Fix

* Refine based on cpplint feedbacks

* Revert changes to allow custom CUDA allocator in public APIs

* Use torch.testing.assert_close

* Use unittest framework

* Switch docker repo

* Rename *.cpp to *.cc

* Address comments

* Add comment

* Use same pipeline file for eager and lort pipelines

* Address comments

* Add yaml comment

* Fix cmake files

* Address comments

* Rename flags, remove printing code, remove dead comment
2022-08-22 09:40:40 -07:00
Yulong Wang
bfdd191eec
[wasm] use same export name for SIMD/NOSIMD build (#12545) 2022-08-19 18:17:50 -07:00
yf711
9d10badc55
Add build option to link TensorRT prebuilt parser (#12602)
* Add build option to link prebuilt TensorRT parser

* Test without the build option to link prebuilt TRTParser

* Minor: update name of build option

* Minor: update name of build option
2022-08-16 14:09:58 -07:00
Dmitri Smirnov
616677104a
ONNX Protobuf natvis with some google::protobuf (#12580)
ONNX Protobuf natvis with some google::protobuf structures
  Add leading underscore to local Intrinsic
2022-08-15 09:59:07 -07:00
Xinya Zhang
eb827bd3e5
[ROCm] NGramRepeatBlock, LongformerAttention and DecoderAttention Ops (#11971)
* [ROCm] enable NGramRepeatBlock Op

* [ROCm] Enable testing ROCm in NGramRepeatBlockTest.NGramSize_3

Also link onnxruntime_test_all with amdhip64 when USE_ROCM=1

* [ROCm] add LongformerAttention Op

* [ROCm] Enable LongformerAttentionTest

* [ROCm] Add DecoderAttention Op

* Enable DecoderAttention Test for ROCm.

* [ROCM] Updates according to reviews
2022-08-11 19:32:08 -07:00
Changming Sun
ac7538b909
Remove CUDA 10.2 support (#12541) 2022-08-10 22:46:41 -07:00
Cheng
819c36701f
[xnnpack] basic QDQ operators support (#11912)
* basic ops for mobilenet,qconv,qsoftmax,qavgpool

update Xnnpack to latest

unit test

* NodeUnit: use outputedge to replace output-node

* qdq model e2e test

* use inlinedvector to replace vector

* conv bias check

* tensorshape helpers

* Refactor xnn_op minmax

* Qlinearsoftmax schema update

* Remove qlinearsoftmax registration

Co-authored-by: Jicheng Wen <jicwen@microsoft.com>
2022-08-11 10:12:51 +08:00
Dwayne Robinson
eb90b52a75
DML EP fix training build error (#12461)
Fix onnxruntime_training.cmake missing linkage issue
2022-08-05 16:01:25 -07:00
cloudhan
f39354d7cb
Add composable kernel GEMM baseline for kernel explorer (#12364)
* Split GemmBase RocBlasGemm

* Add composable kernel GEMM baseline

* Make linter happy

* Address review comment

* Update bert cases with batchsize

* Adjust includes to fix IWYU lint

* Only builds and links used ck kernels to improve building time

* Remove warmup run on SelectImpl

* Add comment to utility function

* Mute cpplint

* Make RocBlasGemm<T>::SelectImpl semantically correct

* Add reduced basic test cases for ck gemm

* More robust gemm testing

* Fix warnings

* Fix grammar
2022-08-04 17:32:20 -07:00
Dmitri Smirnov
a4ef0e7f7b
Remove dynamic allocation for ThreadPool ParallelSection (#12429)
Use InlinedVector in a TP
Store per thread parallel section in std::optional and avoid memory allocation
2022-08-04 09:46:16 -07:00
Dmitri Smirnov
eebaf5f270
Adjust and fixx abseil-cpp debugging visualization (#12415)
Move abseil-cpp.natvis file, add it to PDB, adjust visualization
2022-08-02 15:08:17 -07:00
Edward Chen
f77ab4fea6
Manually add optimization flag for Android Release builds. (#12390)
With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration.
More details here: https://github.com/android/ndk/issues/1740

Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21.

This change is a workaround to manually add `-O3` for "Release" Android builds.
2022-08-01 12:49:03 -07:00
George Wu
6bb807ef74
add cuda compute 8.7 to Cmakelists.txt to support Nvidia Orin devices (#12377)
* add cuda arch 8.7 to cmakelists.txt to support Nvidia Orin devices

* add cuda version >= 11 check for orin support
2022-08-01 09:45:58 -07:00
Valery Chernov
1a4868e5c4
[TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381)
fix of build on Windows with ipp-crypto. cmake warnings fix

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-31 14:36:54 +02:00
Valery Chernov
e2423bb55c
[TVM EP] Build on Windows with ipp-crypto support (#12336)
* update TVM EP docs for ipp-crypto build conditions

* add ipp-crypto by ExternalProject_Add

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
2022-07-28 15:40:19 +02:00
msftlincoln
9cf6912bba
Fix ORT Eager Mode to work with Pytorch 1.12 (#12323) 2022-07-27 16:24:46 -04:00
Ashwini Khade
ceb76429db
Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc
Merge On-Device-Training Offline Tooling and C/C++ APIs
2022-07-21 15:09:48 -07:00
Baiju Meswani
cbf08c7a7b Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments 2022-07-21 18:11:48 +00:00
cloudhan
a0074ba9bc
Add baseline gemm for kernel explorer (#12050)
Use rocblasGemmHelper gemm wrapper from ORT and profile for bert param size only.
2022-07-20 13:49:26 +08:00
Michael Melesse
bb5bd08545
[ROCM] Navi21 fixes pr (#11368)
* add scripts

* update docker scripts

* update build script

* create run script

* add test script

* add log 3 flags

* use the right build function

* build navi

* add clean script

* add pytorch like soln

* only build gfx 1030

* use HOST side var

* ignore logs

* update scripts

* GPU_WARP_SIZE_HOST

* update scripts

* remove scripts/amd

* match main

* add GPU_WARP_SIZE_HOST on cuda side

* match main

* correct gfx1030

* remove print

* move gfx add to rocm5.0

* remove inline

* make constexpr on cuda side
2022-07-18 22:26:57 -07:00
Valery Chernov
3b0aaa9e0e
[TVM EP] support build on Windows (#11851)
* add description of build ORT+TVM EP on Windows

* fix cmake error related to symlink creation on Windows

* add llvm config path to build flags for correct build on Windows

* update TVM_EP.md for llvm_config build arg

* fix warnings skipping during build on Windows

* fix using string or wstring for model path to correct build on Windows (MSVC error)

* fix error in custom logger for correct build on Windows

* implement glob algorithm for Windows

* additional build fixes

* update TVM with export of VM symbols for dll

* description of nasm issue and workaround

* update TVM with export of Executable from VM symbols for dll

* description of installation of ipp-crypto dependencies on Windows

* cmake key for ipp-crypto build

* fix wstring for TVMso EP

* fix ipp-crypto build

* cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality

* fix absolute path to compiled lib

* update TVM_EP.md, fix lint warnings

* update TVM_EP.md

* small fixes after review

* switch on handshake functionality for Linux workflow

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-07-13 10:48:42 +02:00
cloudhan
785f74979b
Rework cmake for kernel_explorer (#12079)
Improve CMake for deep integration with ORT, so that we can easily hook ort function of microbenchmarking purpose.
2022-07-13 15:43:32 +08:00
Dwayne Robinson
32a8751dc4
DML EP Update to DML 1.9 (#12090)
* Update to DML 1.9

* Appease obnoxious Python formatting tool
2022-07-05 16:30:54 -07:00
Baiju Meswani
1aa27e127c Resolve build conflicts with master 2022-07-05 19:53:54 +00:00
Wenbing Li
479e71a7a8
enable the extensions custom build for java and android (#11823) 2022-07-05 10:34:14 -07:00
Baiju Meswani
a457ddc41d Merge branch 'master' of https://github.com/microsoft/onnxruntime into bmeswani/merge_pr 2022-06-30 21:53:07 +00:00
ashbhandare
0ce14c7068
Fix windows cpu build VS2022 (#12032)
Fix windows cpu build VS2021
2022-06-29 15:45:00 -07:00
Baiju Meswani
6e8edfff0c
Separate training apis from shared core apis (#12027) 2022-06-29 14:12:29 -07:00
Valery Chernov
8ba8146650
[TVM] handshake mechanism for support of TVMso EP (#11437)
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm

* check hash during compile in TVMso EP

* add IPP-CRYPTO to external dependencies for TVM EP

* made checkHash method constant

* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict

* implemented SHA-256 calculation using ipp-crypto library

* fix dependency for ipp-crypto

* add provider options for hash check

* update documentation for added provider options

* add hash check condition

* fix docs

* fix lint

* fix ORT_THROW

Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2022-06-29 14:57:18 +02:00
Edward Chen
f045994389
[NNAPI EP] Update NNAPI headers (#11954)
Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).
2022-06-27 18:54:06 -07:00
Baiju Meswani
d25cf4df26 Merge branch 'master' into training_dev/on_device_poc 2022-06-24 20:18:19 +00:00
Preetha Veeramalai
f54476a42f
Dll version fix ovep4.1 (#11953)
* Setting default version values for ovep dlls as well

* Update backend_manager.cc

Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: mohsin <mohsinx.mohammad@intel.com>
2022-06-22 11:09:36 -07:00
Gary Miguel
4bf22e2a40
Update ONNX to 1.12 (#11924)
Follow-ups that need to happen after this and before the next ORT release:
* Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731
* Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778

Follow-ups that need to happen after this but don't necessarily need to happen before the release:
* Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916

Fixes #11640
2022-06-21 17:19:52 -07:00
Dwayne Robinson
64f95d400a
Update DML 1.9 Nuget package to fix WindowsAI nuget pipeline build issue (#11934) 2022-06-21 15:55:51 -07:00