onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Jon Campbell	768c79317c	Enable QNN HTP support for Node (#20576 ) ### Description Add support for using Onnx Runtime with Node ### Motivation and Context Onnx Runtime supports the QNN HTP, but does not support it for Node.js. This adds baseline support for the Onnx Runtime to be used with Node. Note it does not update the node packages that are distributed officially. This simply patches the onnxruntime.dll to allow 'qnn' to be used as an execution provider. Testing was done using the existing onnxruntime-node package. The `onnxruntime.dll` and `onnxruntime_binding.node` were swapped into `node_modules\onnxruntime-node\bin\napi-v3\win32\arm64` with the newly built version, then the various QNN dlls and .so files were placed next to the onnxruntime.dll. Testing was performed on a variety of models and applications, but the easiest test is to modify the [node quickstart example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/quick-start_onnxruntime-node).	2024-05-09 13:11:07 -07:00
Andrew Fantino	7303a90f49	Fix build errors from date/date.h C++20 compatibility (#20139 ) ### Description For C++ standards >= 20, use `std::chrono::operator<<` in place of `date::operator<<` to fix ambiguous operator compile error. ### Motivation and Context The external dependency HowardHinnant/date has a conflict with std::chrono for >=C++20. Solves #20137	2024-04-02 22:10:25 -07:00
Changming Sun	a0521f899e	Enable CPUINFO for all Windows build (#19655 ) ### Description It was disabled in PR #9065. And the reason was: " api-ms-win-core-kernel32-legacy-*.dll wasn't available in Windows 8 and was added in Windows 10, so cpuinfo breaks our Windows 8 support. I'm disabling it again." We no longer support Windows 8. Therefore we can add CPUINFO back. ### Motivation and Context To make the code simpler. If in any case the library doesn't work as expected, we can submit a PR to their code base and fix it.	2024-03-01 16:23:20 -08:00
Phoebe Chen	4477f57ee3	Enable RISC-V 64-bit Cross-Compiling Support for ONNX Runtime on Linux (#19238 ) ### Description This pull request introduces the necessary changes to enable RISC-V 64-bit cross-compiling support for the ONNX Runtime on Linux. The RISC-V architecture has gained popularity as an open standard instruction set architecture, and this contribution aims to extend ONNX Runtime's compatibility to include RISC-V, thereby broadening the reach of ONNX models to a wider range of devices. ### Motivation and Context RISC-V is a free and open-source instruction set architecture (ISA) based on established RISC principles. It is provided under open licenses without fees. Due to its extensibility and freedom in both software and hardware, RISC-V is poised for widespread adoption in the future, especially in applications related to AI, parallel computing, and data centers. ### Example Build Command ``` ./build.sh --parallel --config Debug --rv64 --riscv_toolchain_root=/path/to/toolchain/root --skip_tests ``` ### Documentation Updates Relevant sections of the documentation will be updated to reflect the newly supported RISC-V 64-bit cross-compilation feature. https://github.com/microsoft/onnxruntime/pull/19239 --------- Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>	2024-01-24 16:27:05 -08:00
Changming Sun	bc84f52633	Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470 ) ### Description Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint to newer versions per request of @ mayeut. He created the following PRs to update the deps: https://github.com/microsoft/onnxruntime/pull/15432 https://github.com/microsoft/onnxruntime/pull/15434 https://github.com/microsoft/onnxruntime/pull/15435 https://github.com/microsoft/onnxruntime/pull/15436 https://github.com/microsoft/onnxruntime/pull/15437 However, our build system needs to fetch the dependencies from an internal mirror that only Microsoft employees have write access to. So I closed his PRs and created this one. This PR also updates abseil to a newer version. This is to prepare for upgrading re2.	2023-09-08 13:35:04 -07:00
Changming Sun	3cec88bd12	FIX: memory leak checker is incompatible with std::stacktrace (#17209 ) ### Description When I worked on PR #17173, I didn't notice that onnxruntime\core\platform\windows\debug_alloc.cc also needs to call dbghelp functions like SymInitialize. So, if we use vc runtime's stacktrace functionality, vc runtime will initialize/uninitialize the dbghelp library independently and vc runtime's stacktrace helper DLLs get unloaded before our memory leak checker starts get work. Then we call SymSetOptions, it crashes. More details: In VC runtime the C++23 stacktrace functions are implemented on top of dbgeng.dll. In C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\crt\src\stl\stacktrace.cpp, you can see it has: ``` dbgeng = LoadLibraryExW(L"dbgeng.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32); ``` The dbgeng.dll is a wrapper around dbghelp.dll. It calls SymInitialize and SymCleanup. dbgeng.dll gets unloaded before our memory leak check starts to run. In theory we should be able to call SymInitialize again if the previous user who called SymInitialize has also called SymCleanup. However, users can use SymRegisterCallback/SymRegisterCallback64/SymRegisterCallbackW64 to register callback functions to dbghelp.dll. These callback functions need to be alive when SymSetOptions(and some other dbghelp APIs) get called. ### Motivation and Context	2023-08-18 17:10:33 -07:00
Changming Sun	5249b7ab7c	Re-implement stacktrace (#17173 ) ### Description Re-implement stacktrace. The new implementation doesn't directly use Windows API, hence can avoid problems regarding to initialize/uninitialize the dbghelp library. ### Motivation and Context	2023-08-16 16:07:49 -07:00
Matthieu Darbois	5e971bc51a	Rework WIL dependency retrieval/usage (#17130 ) ### Description 1. `onnxruntime_fetchcontent_makeavailable` works around unconditional install commands so that can be used instead of `FetchContent_Populate` 2. This dependency is Windows specific, mark it as such. ### Motivation and Context 1. This simplifies `cmake/external/wil.cmake` not to do anything specific wether WIL was fetched or found 2. Given it's specific to Windows, it might not be available on other OS in specific air-gapped environment such as [conan-center-index](https://github.com/conan-io/conan-center-index). This allows downstream builds not to require specific patches for something not required by the build in the first place.	2023-08-15 09:11:46 -07:00
Dmitri Smirnov	853c4ff0a5	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 ) ### Description Introduce `Float16/BFloat16` support for C# and C++ APIs. User should be able to perform conversions from `float` to/from `Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and whether the number is denormalized.` ### Motivation and Context User filed issues such as: https://github.com/microsoft/onnxruntime/issues/14303	2023-07-14 10:46:52 -07:00
cao lei	0c5f492493	remove AllocatorMgr class (#16509 ) ### Description Remove AllocatorManager class ### Motivation and Context After the refactor PR #15833 is in, AllocatorManager class is not referenced anymore.	2023-06-28 15:43:19 -07:00
Prateek Chokse	12dffef768	added support for cmake "find_package" (#8919 ) Description: Adds support for cmake find_package. Motivation and Context As mentioned in issue #7150 onnxruntime doesn't have support for CMake find_package, this PR adds that and also adds the CMake package version file. Now anyone can link onnxruntime like this: ```cmake find_package(onnxruntime) add_executable(test Source.cpp) target_link_libraries(test PRIVATE onnxruntime::onnxruntime) ``` this also simplifies #3124	2023-06-19 22:20:31 -07:00
Changming Sun	0204594f90	Cleanup WASM cmake code (#15996 ) ### Description Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if (CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code look more nature. For example, ```cmake if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY) ``` becomes ```cmake if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten") ```	2023-05-20 18:07:39 -07:00
Edward Chen	9f5aa8e021	Add clog back to onnxruntime_EXTERNAL_LIBRARIES. (#15363 ) ### Description <!-- Describe your changes. --> Add clog back to onnxruntime_EXTERNAL_LIBRARIES. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix iOS packaging pipeline build failure.	2023-04-05 09:11:19 -07:00
Matthieu Darbois	85bb13345d	Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` (#15323 ) ### Description Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` This will allow package managers to more easily provide an onnxruntime package by reducing the amount of patching needed downstream at each version. ### Motivation and Context Availability of onnxruntime in some C++ package managers https://github.com/microsoft/onnxruntime/issues/7150 https://github.com/conan-io/conan-center-index/issues/16699 https://github.com/microsoft/vcpkg/issues/20548 My initial intent is to get this in conan but the PR would most likely be useful (though not tested) to vcpkg as well (and maybe others). I tried to get only a first batch of not too specific patches (i.e. not specific to conan). The first commit reworks `flatbuffers` and just extends what @snnn did in https://github.com/microsoft/onnxruntime/pull/13991 The second commit reworks `pytorch_cpuinfo` The third commit reworks `google_nsync`	2023-04-03 17:45:12 -07:00
RandySheriffH	75584c5fa8	Enabling thread pool to be numa-aware (#13778 ) The PR enables ort thread pool to be numa-aware, so that threads could be evenly created and distributed among numa nodes. In addition, to facilitate performance tuning, the PR opens a new API allowing customers to attach threads to certain logical processors. Please check the API [definition](https://github.com/microsoft/onnxruntime/pull/13778/files#diff-5845a5c76fb64abdc8f0cffe21b37f8da1712674eb3abc4cd87190891be1bd48) for details. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-12-12 10:33:55 -08:00
Changming Sun	04900f96c1	Improve dependency management (#13523 ) ## Description 1. Convert some git submodules to cmake external projects 2. Update nsync from [1.23.0](https://github.com/google/nsync/releases/tag/1.23.0) to [1.25.0](https://github.com/google/nsync/releases/tag/1.25.0) 3. Update re2 from 2021-06-01 to 2022-06-01 4. Update wil from an old commit to 1.0.220914.1 tag 5. Update gtest to a newer commit so that it can optionally leverage absl/re2 for parsing command line flags. The following git submodules are deleted: 1. FP16 2. safeint 3. XNNPACK 4. cxxopts 5. dlpack 7. flatbuffers 8. googlebenchmark 9. json 10. mimalloc 11. mp11 12. pthreadpool More will come. ## Motivation and Context There are 3 ways of integrating 3rd party C/C++ libraries into ONNX Runtime: 1. Install them to a system location, then use cmake's find_package module to locate them. 2. Use git submodules 6. Use cmake's external projects(externalproject_add). At first when this project was just started, we considered both option 2 and option 3. We preferred option 2 because: 1. It's easier to handle authentication. At first this project was not open source, and it had some other non-public dependencies. If we use git submodule, ADO will handle authentication smoothly. Otherwise we need to manually pass tokens around and be very careful on not exposing them in build logs. 2. At that time, cmake fetched dependencies after "cmake" finished generating vcprojects/makefiles. So it was very difficult to make cflags consistent. Since cmake 3.11, it has a new command: FetchContent, which fetches dependencies when it generates vcprojects/makefiles just before add_subdirectories, so the parent project's variables/settings can be easily passed to the child projects. And when the project went on, we had some new concerns: 1. As we started to have more and more EPs and build configs, the number of submodules grew quickly. For more developers, most ORT submodules are not relevant to them. They shouldn't need to download all of them. 2. It is impossible to let two different build configs use two different versions of the same dependency. For example, right now we have protobuf 3.18.3 in the submodules. Then every EP must use the same version. Whenever we have a need to upgrade protobuf, we need to coordinate across the whole team and many external developers. I can't manage it anymore. 3. Some projects want to manage the dependencies in a different way, either because of their preference or because of compliance requirements. For example, some Microsoft teams want to use vcpkg, but we don't want to force every user of onnxruntime using vcpkg. 7. Someone wants to dynamically link to protobuf, but our build script only does static link. 8. Hard to handle security vulnerabilities. For example, whenever protobuf has a security patch, we have a lot of things to do. But if we allowed people to build ORT with a different version of protobuf without changing ORT"s source code, the customer who build ORT from source will be able to act on such things in a quicker way. They will not need to wait ORT having a patch release. 9. Every time we do a release, github will also publish a source file zip file and a source file tarball for us. But they are not usable, because they miss submodules. ### New features After this change, users will be able to: 1. Build the dependencies in the way they want, then install them to somewhere(for example, /usr or a temp folder). 2. Or download the dependencies by using cmake commands from these dependencies official website 3. Similar to the above, but use your private mirrors to migrate supply chain risks. 4. Use different versions of the dependencies, as long as our source code is compatible with them. For example, you may use you can't use protobuf 3.20.x as they need code changes in ONNX Runtime. 6. Only download the things the current build needs. 10. Avoid building external dependencies again and again in every build. ### Breaking change The onnxruntime_PREFER_SYSTEM_LIB build option is removed you could think from now it is default ON. If you don't like the new behavior, you can set FETCHCONTENT_TRY_FIND_PACKAGE_MODE to NEVER. Besides, for who relied on the onnxruntime_PREFER_SYSTEM_LIB build option, please be aware that this PR will change find_package calls from Module mode to Config mode. For example, in the past if you have installed protobuf from apt-get from ubuntu 20.04's official repo, find_package can find it and use it. But after this PR, it won't. This is because that protobuf version provided by Ubuntu 20.04 is too old to support the "config mode". It can be resolved by getting a newer version of protobuf from somewhere.	2022-12-01 09:51:59 -08:00
Edward Chen	2ecd1d6622	Switch GSL to MS GSL 4.0.0 (#13416 )	2022-10-29 04:15:20 -07:00
Dmitri Smirnov	25c0a66934	Natvis adjustments to make debugging bearable (#13237 ) ### Description - Fix Abseil::InlinedVector inlined storage visualization - Fix typo in protobuf natvis. - Add basic gsl.natvis ### Motivation and Context Debugging is hard.	2022-10-10 10:06:55 -07:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Dmitri Smirnov	eebaf5f270	Adjust and fixx abseil-cpp debugging visualization (#12415 ) Move abseil-cpp.natvis file, add it to PDB, adjust visualization	2022-08-02 15:08:17 -07:00
Dmitri Smirnov	a7d0158c24	Introduce a way to disable Abseil library (#11353 ) Introduce a way to disable Abseil library. Use cmake extra args, no new build switch.	2022-04-27 08:57:52 -07:00
Lukas	1b664e6d4c	Link cpuinfo only if supported (#11147 ) * Remove unnecessary target_include_directories for cpuinfo Headers already exposed as public by CMake target: `5916273f79/CMakeLists.txt (L213)` * Link to cpuinfo library only if supported	2022-04-07 21:32:12 -07:00
Jack·Boos·Yu	ea004e953f	[cmake] Export multi targets in static build (#11063 ) * [cmake] Export multi targets in static build * Install more components in static build, format some code * Fix code pos	2022-04-03 22:37:18 -07:00
Tiago Koji Castro Shibata	5ed2f4ad5f	Remove Windows Store specific code	2022-03-17 23:38:14 -07:00
Changming Sun	283d0c47b4	Update our absl cmake files (#10762 )	2022-03-04 09:28:04 -08:00
Dmitri Smirnov	7e092a7e3f	Reduce number of memory allocations based on a customer profiling case (#10193 ) Add abseil and inlined containers typedefs Introduce TensorShapeVector for shape building. Use gsl::span<const T> to make interfaces accept different types of vector like args. Introduce InineShapeVectorT for shape capacity typed instantiations Refactor cuda slice along with provider shared interfaces Refactor Concat, Conv, Pad Build with Conv Einsum and ConvTranspose refactored. Remove TesnorShape::GetDimsAsVector() Refactor SliceIterator and SliceIteratorBase Refactor broadcast Refactor Pads for twice as long Remove memory planner intermediate shapes vector Refactor orttraining Fix passing TenshroShapeVector to tests Remove abseil copy and submodule, use FetchContent_Declare/Fetch Path with separate command Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.	2022-01-24 10:40:46 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00
Dmitri Smirnov	a7abd541c7	Correct message type (#9973 )	2021-12-09 10:00:44 -08:00
Dmitri Smirnov	a7f649db7c	Enable proper override using MIMalloc (#9944 ) Redirect memory allocations to MiMalloc and advance its version to v2.0.3 Refactor for a universal ifdef	2021-12-07 17:56:58 -08:00
Jingqiao Fu	da15f5fc2f	change cmake condition to prevent WCOS fom linking advapi32 (#9500 ) * change condition to prevent WCOS fom linking advapi32.dll * Remove linkage to advapi32.lib	2021-10-26 12:16:49 -07:00
Tiago Koji Castro Shibata	12515552d1	Remove cpuinfo from WCOS builds (#9076 )	2021-09-16 12:05:47 -07:00
Changming Sun	60c98a86b7	CMake file changes for macOS universal2 support (#8953 )	2021-09-04 13:30:33 -07:00
Tiago Koji Castro Shibata	62c0d24340	Fix Windows Store build (#8753 ) * Remove APIs unavailable in Store in #8349, #8178, #8065 * Add UWP stubs of C runtime functions * Remove UWP incompatible tests from UWP build * Remove incompatible tests from Store * Use UWP stubs in store only * Skip partition check outside of Windows * Remove unused WRL include * Workaround Windows header not including what it uses * Fix precompiled header name clash * Workaround SDK bugs * DXCore workaround in Win7 * Fix warning * Fix more warnings * Bump WinML to target Windows 8 * Fix more warnings * Remove unnecessary workarounds * Remove Desktop only APIs from DML adapter	2021-08-23 11:19:03 -07:00
ytaous	0725f80d2d	Revert "Fix Windows Store build (#8481 )" (#8679 ) This reverts commit `53e7831b53`.	2021-08-11 00:37:36 -07:00
Tiago Koji Castro Shibata	53e7831b53	Fix Windows Store build (#8481 ) * Remove APIs unavailable in Store in #8349, #8178, #8065 * Add UWP stubs of C runtime functions * Remove UWP incompatible tests from UWP build * Remove incompatible tests from Store * Use UWP stubs in store only * Skip partition check outside of Windows * Remove unused WRL include * Workaround Windows header not including what it uses * Fix precompiled header name clash * Workaround SDK bugs * DXCore workaround in Win7 * Fix warning * Fix more warnings * Bump WinML to target Windows 8 * Fix more warnings * Remove unnecessary workarounds	2021-08-10 15:19:30 -07:00
Nick Kreeger	963d883de8	Create a common directory for quantization code and functionality. (#8320 )	2021-07-14 22:56:58 -05:00
Guoyu Wang	c5038063ed	Add iOS/macOS static framework (#8357 ) * Add ability to generate ios static framework * Fix typos * Add pod cache clean, update some comments of previous commit * Fix CI failure with newly added cpuinfo library * Update test model (CoreML requires node has a name) * Addressed CR comments	2021-07-14 16:39:17 -07:00
Chen Fu	df4cb6f301	Adding pytorch cpuinfo as dependency (#8178 ) Pytorch cpuinfo library allows us to query current cpu features, micro-architecture and cache size, etc. These information is needed for targeted performance optimizations. Unfortunately it does not work under Windows/ARM. We need to develop our own later	2021-07-12 14:21:12 -07:00
Changming Sun	c716b56f26	Update C++ Standard from 14 to 17 (#8041 ) Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions: 1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work. 2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long. 3. Please avoid to use C++17 in CUDA files(.cu). Also, the .h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.	2021-06-25 14:08:01 -07:00
Edward Chen	13622bae91	Add Apple log sink. (#7820 ) Add a log sink for Apple platforms. This version uses NSLog().	2021-05-27 10:03:02 -07:00
Ryan Hill	c99aa3a3f3	Ryanunderhill/cuda shared (#7626 ) * First iteration of making cuda a shared provider. Separated out shared OpKernel change, so doing this to merge with that change. * More cuda shared library refactoring * More cuda shared library refactoring * More build options tested, converted the training ops over. * Fix merge breaks * Fix submodules * Fix submodules * Fix submodules * Fix python * Fix compile errors * Duplicate symbol fix * Test fix for ROCM provider * Another ROCM test workaround * ROCM Build Test * ROCM build fix * ROCM * ROCM * ROCM * ROCM * ROCM * ROCM test * Reduce header dependencies * Remove redundant namespace * Test fix for linux * Fix linux build * Fix Eigen build error * Fix unused parameter warning * Test link error * Another linker test * Linker test * Linker test * Another test * Another build test * Fix linux link error * Build test * Fix control flow ops to use common base class with core code * Remove extra qualifiers * Fix template syntax for linux * Fix cuda memory leak * Fix pybind * Test disabling cast * Cleanup * Restore cuda in test * Remove more header dependencies * Test not adding cuda provider to session * Make GetProviderInfo_CUDA throw * No-op cuda provider creation * Fix some setup issues * Fix memory cleanup on unload * Diagnostics * Don't unload library * Add diagnostics * Fix deleting registry at right time. * Test disabling profiler * Fix merge break * Revert profiler change * Move unloading of shared providers into Environment * Free more global allocations before library unloads * Add more diagnostics * Move unloading back to the OrtEnv as there are multiple Environments created during a session. Remove some library dependencies for tests. * Fix more cmake files * ERROR -> WARNING * Fix python shutdown * Test not using dml in pipeline * Change python version and disable dml * Update python version * Test adding unload method for shared providers * Disable DLL test * Python test * Revert "Python test" This reverts commit `c7ec2cfe98`. * Revert "Disable DLL test" This reverts commit `e901cb93aa`. * Revert "Test adding unload method for shared providers" This reverts commit `c427b78799`. * Point to RyanWinGPU * Revert python version * Fix id_to_allocator_map * Another python exit test * Remove extra debug messages Try a more clean python shutdown through DllMain * Revert DllMain idea, it didn't work * Merge conflicts * Fix merge with master issues. * Comments * Undo edit to file * Cleanup + new training ops * Revert yml changes * Fix another merge error * ROCM fix * ROCM fix v2 * Put back Linux hack, it is necessary * Stupid fixes * Fix submodule out of sync * ROCM fix 3 * ROCM 4 * Test java fix * Fix typos * Java test on my VM * Fix build error * Spotless fix * Leave temp file around to load properly * Fix cleanup on exit * Fix break * Java comments * Remove LongformerAttentionBase workaround * Spotless fix * Switch yml back to regular build pool * Revert "Switch yml back to regular build pool" This reverts commit `be35fc2a5a`. * Code review feedback * Fix errors due to merge * Spotless fix * Fix minimal build * Java fix for non cuda case * Java fix for CPU build * Fix Nuphar? * Fix nuphar 2 * Fix formatting * Revert "Remove LongformerAttentionBase workaround" This reverts commit `648679b370`. * Training fix * Another java fix * Formatting * Formatting * For orttraining * Last orttraining build fix... * training fixes * Fix test provider error * Missing pass command * Removed in wrong spot * Python typo * Python typos * Python crash on exit, possibly due to unloading of libraries. * Remove test_execution_provider from training build Only enable python atexit on windows Remove assert on provider library exit * Still can't unload providers in python, alas. * Disable Nvtx temporarily * MPI Kernels for Training * MPI Kernels part 2 * Patch through INcclService * Oops, wrong CMakeLists * Missing namespace * Fix missing () * Move INcclService::GetInstance around to link nicer * Missing } * Missing MPI libraries for Cuda * Add extra GetType functions used by MPI * Missing Nccl library * Remove LOGS statements as a test * Add in a couple more missing GetType methods * Update comments * Missed a logging reference in mpi_context.h * Convert aten_op to shared (due to marge with master) * Test moving DistributedRunContext instance into shared provider layer (with purpose error to verify it's being built properly) * Test passed, now with fix * Missing static * Oops, scope DistributedRunContext to just NCCL * Merge related issues and code review feedback. * Merge error * Bump to rel-1.9.1 (#7684) * Formatting * Code review feedback for Java build on non Windows * Remove cupti library dependency from core library * Test Java pipeline fix * Linux build fix * Revert "Linux build fix" This reverts commit `a73a811516`. * Revert "Remove cupti library dependency from core library" This reverts commit `6a889ee8bf`. * Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages * Add cuda to Tensorrt nuget package * onnxruntime_common still has a cuda header dependency Co-authored-by: ashbhandare <ash.bhandare@gmail.com>	2021-05-20 07:53:47 -07:00
Changming Sun	31e6d3f85c	Revert CUPTI profiling feature (#7763 ) For unknown reason it causes deadlocks when it is used with CUDA 11.1	2021-05-19 21:54:29 -07:00
Changming Sun	7b003967b1	Add static code analyzer to Windows CPU/GPU CI builds and fix the warnings (#7489 )	2021-04-29 11:54:57 -07:00
Edward Chen	d21304ceb0	Initial Objective-C API (#7366 ) Initial implementation of an Objective-C API.	2021-04-27 10:06:30 -07:00
Sunghoon	ded2b08380	WebAssembly multi-threads support. (#7326 ) * WebAssembly multi-threads support. * PROXY_TO_PTHREAD is not required for wasm library * Remove an unnecessary line commented out	2021-04-15 21:46:11 -07:00
Yulong Wang	405ca49012	build ONNXRuntime into WebAssembly (#6478 ) * Simplified version of WebAssembly support to keep most of existing data structures and add cmake using Ninja and emcmake * Clean up CMakeLists.txt and add an example to create and compute a kernel * Load a model from bytes and remove graph building steps * Add all cpu and contrib ops with mlas library * WebAssembly build with Onnxruntime C/CXX API * Use protobuf cmakefile directory instead of adding every necessary source file * Fix invalid output at example * add missing files * Change an example to use Teams model and support ort mobile format * add API for javascript * fix input releasing in _ort_run() * update API * Let onnxruntime cmake build WebAssembly with option '--wasm' * allow one-step building for wasm * Make build script working on Linux and MacOS * Fix broken build from Windows command * Enable unit test on building WebAssembly * Resolve comments * update build flags * wasm conv improvement from: 1) GemmV; 2) Depthwise direct convolution 3x3; 3) Direct convolution 3x3 * Cleaned mlas unittest. * use glob * update comments * Update baseline due to loss scale fix (#6948) * fix stream sync issue (#6954) * Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960) * Update EyeLike CPU kernel. * Update Mod CPU kernel. * Update Multinomial CPU kernel. * Slight improvement to Pad CPU kernel binary size. * Update RandomNormal[Like], RandomUniform[Like] CPU kernels. * Fix warning from setting multiple MSVC warning level options. (#6917) Fix warning from setting multiple MSVC warning level options. Replace an existing /Wn flag instead of always appending a new one. * MLAS: quantized GEMM update (#6916) Various updates to the int8_t GEMMs: 1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before. 2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size. 3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator. * Implement QLinearAveragePool with unit tests. (#6896) Implement QLinearAveragePool with unit tests. * Attention fusion detect num_heads and hidden_size automatically (#6920) * fixed type to experimental session constructor (#6950) * fixed type to experimental session constructor Co-authored-by: David Medine <david.medine@brainproducts.com> * Update onnxruntime_perf_test.exe to accept free dimension overrides (#6962) Co-authored-by: Ori Levari <orlevari@microsoft.com> * Fix possible fd leak in NNAPI (#6966) * Release buffers for prepacked tensors (#6820) Unsolved problems: 1. One test failure was caused by a bug in Cudnn rnn kernels, when they can allocate a buffer and partially initialize it, the garbage data near tail of the buffer caused problem in some of the hardware. To attack this problem in a broader sense, should we add code in our allocators, and during a memory fuzzing test, fill an allocated buffer with garbage before returning to the caller? 2. Prepacking is used more widely than we know. For instance, Cudnn rnn kernels also cache their weights. They mix several weight tensors together into a single buffer, and never touch the original weight tensor anymore. This is the same idea with pre-pack, but they didn't override the virtual function, and they never tried to release those weight tensors, leading to memory waste. It also seems to me that there are some other kernels have similar behavior. Wonder how much memory we can save if we try to cleanup those too. 3. Turning off memory pattern planning does increase memory fragmentation, leading to out of memory error in some training test cases. Perhaps we can revisit the idea of pushing kernels-creation stage earlier, and then during initializer deserialization, we only avoid tracing those that will be prepacked. * Enable type reduction for Range, ReverseSequence, ScatterND, Split, and Unique CPU kernels. (#6963) * add CI * fix test in ci * fix flags for nsync in wasm build * add copyright banner * fix wasm source glob * add missing exports * resolve comments * Perf gain by make packb wide to 4 from 16 on GEMM for WASM. Remove no need direct conv in previous perf tuning. * fix buildbreak introduced from latest master merge * fix buildbreak in mlasi.h * resolve all comments except MLAS * rewrite packb related 3 functions for WASM_SCALAR seperately rather than using #ifdef in each. and other changes according to PR feedback in mlas. * More complete scalar path in sgemm from Tracy. * Fix edge case handling in depthwise conv2d kernel 3x3. where: ) support input W==1 and H==1 ) recalc in accurate pad_right and pad_bottom ) support hidden pad_right == 2 or pad_bottom == 2 when W == 1 or H==1 and no pad left/top Add more test coverage for conv depthwise from Tracy. Fix one typo according to PR. * resolve comments * replace typedef by using * do not use throw in OrtRun() * output error message Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com> Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com> Co-authored-by: David Medine <david.eric.medine@gmail.com> Co-authored-by: David Medine <david.medine@brainproducts.com> Co-authored-by: Ori Levari <ori.levari@microsoft.com> Co-authored-by: Ori Levari <orlevari@microsoft.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: Chen Fu <chenfucs@gmail.com>	2021-04-06 16:18:10 -07:00
Ben Niu	d1acdd4f4b	Support building ARM64EC onnxruntime.dll (#6999 )	2021-03-29 15:35:30 -07:00
RandySheriffH	aeca7c2940	Cuda Profiler (#7110 ) * implement cuda profiler * add counters * downgrade cupti kernel version * move mutex * add cupti to path * fix win gpu build err * add path for cuda10 * fix linux com err * extend include path * add init flag * fix test case * fix tensorrt pipeline * add UT Co-authored-by: Ubuntu <randysheriff@rashuai-linux-gpu-3.3cfnmjowvu4e5bidlsmcxsmzwg.xx.internal.cloudapp.net>	2021-03-29 12:04:36 -07:00
Edward Chen	d850fa63bf	Op kernel type reduction infrastructure. (#6466 ) Add infrastructure to support type reduction in Op kernel implementations. Update Cast and IsInf CPU kernels to use it.	2021-01-28 07:27:19 -08:00

1 2

84 commits