onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-28 22:56:32 +00:00

Author	SHA1	Message	Date
Yi Zhang	6e9541046e	extend react native ci timeout limit (#16469 ) ### Description <!-- Describe your changes. --> ### Motivation and Context 2 consecutive runs in npm pipeline failed due to time out	2023-06-27 08:44:03 +08:00
Yifan Li	e2c214d81f	[TensorRT EP] TRT 8.6 minor version update (#16475 ) ### Description * Minor version update: TRT 8.6.0.12->8.6.1.6 * CI pipeline ymls/dockerfiles are updated * cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt binaries uploaded to [win img 307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results) * Re-enable unit tests which were failed in 8.6.0 and re-gained support in 8.6.1	2023-06-26 10:44:27 -07:00
PeixuanZuo	7e211f0e03	[ROCm] Move mount data step into docker container (#16471 ) Some CI jobs may interrupted unexpectedly and didn't execute umount data step. The data left in host device will cause `device or resource busy` and make subsequent CI jobs fail. Move the mount data step into docker container, the host machine will not be occupied when CI jobs exit incorrectly.	2023-06-26 10:25:06 +08:00
Rachel Guo	04dbdc96bf	[js/rn] Fix React Native CI pipeline E2E test (#16447 ) ### Description <!-- Describe your changes. --> Based on this kindly provided quick fix: https://github.com/microsoft/onnxruntime/pull/16411 See more description in the above linked pr about bumping AGP version, etc. Also fixed import header file path in detox e2e test. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Good build: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1041757&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=9894c870-b8ce-548d-51ff-8f44d21a4117&l=18	2023-06-22 14:33:49 -07:00
Yi Zhang	8e8840f1de	Enable Web CI on Linux (#16419 ) ### Description 1. Enable Web ci on Linux ### Motivation and Context 1. speed up web ci, the duration can be reduced from 160 minutes to 130 minutes, a time saving of 20% could be be achieved. The total computation time is 455 minutes now. Moved to Linux, it could be reduced to 336 minutes. 2. It's the first step to enable compilation cache for emscripten 3. per Yulong's request, build_web stages are still using windows pool ![image](https://github.com/microsoft/onnxruntime/assets/16190118/c9496408-74bd-45ea-b4ae-a4dd2a574d17) https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1038382&view=results	2023-06-22 15:42:58 +08:00
yf711	0ad0d6ebbf	Unblock Linux MultiGPU TensorRT CI (#16446 ) ### Description Revert docker base image to nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e ### Motivation and Context The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider tests (these three tests have higher error which are above the tolerance) That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor that make maxwell GPU generator higher error. CIs with T4 GPU are not affected.	2023-06-21 17:15:39 -07:00
Rachel Guo	961fa7274a	[NNAPI doc] add reducemean to supported op list (#16414 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-21 00:29:20 -07:00
Rachel Guo	b4b126ffb0	Set onnxruntime-c local pod path environment variable for react native e2e tests on ci (#16431 ) ### Description <!-- Describe your changes. --> Set onnxruntime-c local pod path environment variable for react native e2e tests on react-native-ci.yml ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previously the E2E test project is not properly consuming a local built onnxruntime-c version pod. https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816	2023-06-21 00:27:36 -07:00
PeixuanZuo	470d6c1cce	[ROCm] Delete unused file to fix Component Governance Alert (#16407 ) Delete unused file to fix Component Governance Alert	2023-06-19 11:28:32 -07:00
PeixuanZuo	1418d8728c	[ROCm] Fix CI Pipeline (#16409 ) 1. add `set -ex` before commands. 2. update ccache.	2023-06-19 15:22:13 +08:00
Yi Zhang	8b9eab093b	keep symlinks in maven package (#16376 ) ### Description 1. Keep symlink in the package. 2. keep the artifact package format ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-19 09:41:39 +08:00
dependabot[bot]	dd660c054e	Bump transformers from 4.24.0 to 4.30.0 in /tools/ci_build (#16331 )	2023-06-16 13:08:46 -07:00
Changming Sun	188d5f5398	Fix Linux Multi GPU build pipeline (#16368 ) ### Description The build pipeline runs on Azure NV12 machines that will be deprecated soon because the SKU is too old. So this PR will move the pipeline to a Windows machine with two A10 GPUs.	2023-06-15 16:24:46 -07:00
Yi Zhang	3e99e43a1d	extend Final AAR testing timeout limit (#16340 ) ### Description <!-- Describe your changes. --> ### Motivation and Context improve nuget pipeline stability	2023-06-15 17:27:45 +08:00
PeixuanZuo	097346be9d	[ROCm] Add clean step for ROCm CI pipeline (#16336 ) 1. Add clean step for ROCm CI pipeline 2. Fix error "device or resource busy" bug by setting umount dataset step as `always()` step.	2023-06-15 13:44:12 +08:00
Baiju Meswani	5eec24837f	Fix for AMD GPU pipeline (#16357 )	2023-06-14 20:36:16 -07:00
Changming Sun	dbc7a195b1	Update win-ci-pipeline.yml: enable xnnpack tests (#16244 ) 1. Enable xnnpack test 2. Change TSA database name from onnxruntime_master to onnxruntime_main. This is a leftover of renaming the "master" branch to "main" 3. Add two static analysis jobs for WinML and DML 4. Rename the machine pool "aiinfra-dml-winbuild" to "onnxruntime-Win2019-GPU-dml-A10", so that the internal and public ADO instances use the same machine pool name. 5. Move Windows GPU CI build pipeline from "onnxruntime-Win2022-GPU-T4" to "onnxruntime-Win2022-GPU-A10" machine pool, because we do not have enough T4 GPUs.	2023-06-14 19:12:42 -07:00
Baiju Meswani	8a3de16d14	Temporary fix to make the training pipeline green (#16353 )	2023-06-14 13:11:35 -07:00
Edward Chen	4f23577cb5	[React Native] Publish E2E test logs on build failure too. (#16327 ) ### Description <!-- Describe your changes. --> Publish E2E test logs on build failure too. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Get more information about intermittent test failures.	2023-06-12 17:56:46 -07:00
JiCheng	eed02a3f78	Xnnpack QDQ test (#16281 ) ### Description A few QDQ tests failed on XNNPACK EP. The reason should be the range of input_data doesn't fit for scale and zero_point. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-12 14:00:42 +08:00
Yulong Wang	f274bbb0c8	[js] add API that allows to get package version (#16207 ) ### Description Add an API for users to get version of current package. example usage: ```js import { env } from 'onnxruntime-node'; console.log(env.versions.node); // output "1.16.0" ``` ```js import { env } from 'onnxruntime-web'; console.log(env.versions.web); // output "1.16.0" console.log(env.versions.common); // output "1.16.0" console.log(env.versions.node); // output "undefined" ``` #16156	2023-06-09 16:18:53 -07:00
Yi Zhang	3b5a8352c1	CodeSign Mac packages in nuget pipeline (#16291 ) ### Description 1. Updated Mac package workflow for easily debugging. 2. Changed Archive type from tgz to zip since zip is supported by ESRP. 3. .../dylib.dSYM/Contents/Resources/DWARF/libonnxruntime.1.16.0.dylib is a debug symbol file, so it couldn't be signed. ### Motivation and Context It‘s required from VS code. Mac binaries in nuget should be signed	2023-06-10 06:35:47 +08:00
Edward Chen	b668a6da96	Treat Objective-C static analysis warnings as errors (#16293 ) - Update Objective-C static analysis check to fail on warnings. - Address warning. - Clean up build definition.	2023-06-09 08:51:49 -07:00
Vrajang Parikh	67f4a4fd16	Objective-C binding for ORT training (#16127 ) ### Description Implement Objective-C binding for `ORTCheckPoint`. Additionally, - Modify `onnxruntime_objectivec.cmake` to only include training header and sources when training flag is enabled - Enable objective-c binding for `orttraining-mac-ci-pipeline` ### Motivation and Context This PR is part of implementing Objective-C bindings for training API. It implements objective-c binding for ORTCheckPoint class. The objective-C API closely resembles the C++ API. Note: The test for saving checkpoint is skipped as it requires use of training session. It will be added when the objective-c binding for `ORTTrainingSession` is added.	2023-06-07 14:01:30 -07:00
Edward Chen	1261d0b8ba	Fix some build issues on MacOS with Xcode 14.3. (#15878 ) - Fix flatbuffers flatc warning, unused-but-set-variable. - Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code). - Update CI builds to use Xcode 14.3. - Update minimum iOS version to 12.0. - Update Mac hosted agents to MacOS 13 where possible.	2023-06-07 12:07:11 -07:00
PeixuanZuo	a95f8ae53c	[ROCm] Update ROCm/MIGraphX CI pipeline (#16215 ) MIGraphX CI - Change docker container user name to `onnxruntimedev` ROCm CI - Build docker image every job instead of using prebuild image. - Every job create a container with only one GPU with command `docker run -it --device=/dev/kfd --device=/dev/dri/renderDxxx` - Remove tests that are unstable or use outdated interfaces. - Enable training ortmodule test.	2023-06-05 10:28:10 +08:00
Changming Sun	6b5b79872b	Avoid taking dependency on dl.fedoraproject.org (#16202 ) ### Description 1. Avoid taking dependency on dl.fedoraproject.org The website is not very stable. Our build pipelines often fail to fetch packages from there. 2. Update manylinux to the latest version	2023-06-02 07:41:46 -07:00
Changming Sun	5bfa1183d1	Add a Memory Profiling build job in post merge pipeline (#16172 ) ### Description 1. Add a Memory Profiling build job 2. Remove no absl build job since the feature will be removed 3. Simplify post-merge-jobs.yml by unifying the pool names ### Motivation and Context To catch build errors in #16124	2023-06-01 13:00:44 -07:00
Yi Zhang	e0199cfbd9	extend mac packaging timeout limit (#16173 ) ### Description ### Motivation and Context MacOS_py_wheels are often failed due to timeout	2023-05-31 18:31:28 +08:00
Baiju Meswani	7edc4b105d	Copy missing training header files to the package archive (#16119 )	2023-05-30 16:45:40 -07:00
Sunghoon	bf05d4ec26	Fix nightly ort CI pipeline (#16162 ) This PR changes [night ort CI pipeline](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198) to pick up the latest night ACPT image, which was changed from torch 2.0.0.dev to torch 2.1.0.dev.	2023-05-30 14:00:34 -07:00
Xavier Dupré	e726151b5c	Introduce float 8 types (#14731 ) ### Description The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA API to cast float/half to float8 if CUDA>=11.8, a custom implementation if CUDA<11.8. * It implements, Cast, QuantizeLinear, DequantizeLinear for all types on CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA. * It extends the supported types for control flow operator, Shape, Reshape, Identity, If, Loop, Scan, Reshape * It implements Equal(19). * Cast, QuantizeLinear, DequantizeLinear operators now support a parameter `saturate` only valid for float 8 types. It is true by default. In that case, any value out of range is converted into the maximum float 8 value. If false, it is infinite. * QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA (and ROCm by extension), scale = 1D tensor with one scale per channel ### Motivation and Context Supports latest onnx version. Fixes [AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395) --------- Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-05-30 13:25:58 -07:00
Yi Zhang	31fc25d2c2	[Fix] Check if CUDA is downloaded in AGENT_TEMPDIRECTORY (#16142 ) ### Description supplement of #15915 ### Motivation and Context fix nuget pipeline exception in the stage of Final_Jar_Testing_Windows_GPU ``` JUnit Jupiter:ProviderOptionsTest:testCUDAOptions() MethodSource [className = 'ai.onnxruntime.providers.ProviderOptionsTest', methodName = 'testCUDAOptions', methodParameterTypes = ''] => ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1131 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\cloudtest\AppData\Local\Temp\onnxruntime-java17193857285260738736\onnxruntime_providers_cuda.dll" ``` ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313476&view=results	2023-05-30 13:14:08 +08:00
Yi Zhang	73584f9360	More fixes on nuget pipeline (#16091 ) ### Description 1. parameters couldn't using string to comprare, change it to boolean. 2. Windows_CI_GPU_DML_DEV_arm64 on the pool onnxruntime-Win-CPU-2022 failed to pass prefast step, change the pool to aiinfra-dml-winbuild. 3. skipped test_zfnet512, it's failed in Nuget_Test_Win_Training_CPU Todo Only Final_Jar_Testing_Windows_GPU failed now. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313042&view=logs&s=d66543d5-16de-5a48-6ecb-a36e21ff8d4d&j=d9489789-5e39-5a05-13ab-9aaf7b4d386f	2023-05-27 08:59:12 +08:00
Changming Sun	60bb07307b	Fix the TRT GPU build job in python packaging pipeline (#16073 ) 1. Cherry-pick #16054 back to the main branch 2. Replace onnxruntime-gpu-winbuild-t4 with onnxruntime-Win2022-GPU-T4. The later one has VS2022. --------- Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com>	2023-05-25 00:09:08 -07:00
Yi Zhang	76fd9aa745	[Fix] Some pipelines have to be using VS2019 (#16034 ) ### Description ### Motivation and Context Fix nuget and python package pipeline. 1. ARM 32 build isn't supported by VS2022 officially. https://developercommunity.visualstudio.com/t/Compilation-Error-with-VS2022-ARM/10285309 2. onnxruntime-gpu-winbuild-T4 and onnxruntime-gpu-winbuild-tensorrt8-T4 haven't VS 2022	2023-05-25 09:55:35 +08:00
yf711	84f1af7ff5	ort build flag fix (#16072 ) ### Description * Sync and clean build flag `--use_tensorrt_builtin_parser` from existing CI config as this becomes default flag * cuda version update	2023-05-24 12:32:10 -07:00
Guenther Schmuelling	20857c4ff2	workaround test failure in ci (#16070 ) don't run wasm proxy test on debug build to unblock ci. Needs some longer debugging.	2023-05-24 21:01:06 +08:00
Shukant Pal	f316bc57c4	[CoreML EP] Implement Unary & Reduce operators (#15532 ) ### Description This change is a follow-up to #15327. It adds Unary operators (Sqrt, Reciprocal) and Reduce operators (ReduceSum, ReduceMean). I've tried to follow existing patterns in the code :-) ### Motivation and Context This reduces fragmentation across EPs when using CoreML on macOS, thereby speeding up execution. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-05-24 18:16:59 +10:00
RandySheriffH	d35361bf9d	Fix python pipeline for AzureEP without using root (#16023 ) Fix python pipeline for AzureEP without using root, this is for 1.15. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-05-22 16:38:47 -07:00
Changming Sun	0204594f90	Cleanup WASM cmake code (#15996 ) ### Description Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if (CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code look more nature. For example, ```cmake if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY) ``` becomes ```cmake if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten") ```	2023-05-20 18:07:39 -07:00
Hector Li	4324d2173b	[QNN EP] Enable Qnn context cache to save model initialization time (#15815 ) ### Description Enable Qnn Context cache feature to save model initialization time Provider options: qnn_context_cache_enable\|1 to enable the cache feature qnn_context_cache_path to set the cache path. It is set to model_file.onnx.bin by default. ### Motivation and Context Model initialization time takes long because the cost of conversion from Onnx model to Qnn model. Qnn have feature to serialize the Qnn context to file, then next time user can load it from the cache context and execute the graph to save the cost. --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2023-05-19 10:52:17 -07:00
RandySheriffH	4dfb89b3ad	Implement mutex-free spin lock for task queue (#14834 ) Implemented "lock-free" spinlock to save CPU usage on context switching. The change has been tested on queene service of Ads team, the lock-free version of ort (40 threads) saves CPU usage on gen8 (128 logical processors on 8 numa nodes) windows by nearly half, from 65% to 35%. For 32 cores, the curve is flat: Anubis, 32 vCPU, windows, hugging face models, 95 percentile E2E latency in ms: model \| mutex(ms) \| mutex-free --- \| --- \| --- alvert_base_v2 \| 34.21 \| 34.09 bert_large_uncased \| 116.27\| 117.84 bart_base \| 72.06 \| 71.99 distilgpt2 \| 25.43 \| 25.02 vit_base_patch16_224 \| 37.33 \| 37.76 Anubis, 32 vCPU win, Linux, 1st party models, 95 percentile E2E latency in ms: model \| mutex(ms) \| mutex-free --- \| --- \| --- deepthink_v2 \| 24.35 \| 22.95 bing_feeds \| 36.96 \| 36.48 deep_writes \| 14.46 \| 14.32 keypoints \| 9.34 \| 7.69 model11 \| 1.71 \| 1.66 model12 \| 1.82 \| 1.44 model2 \| 4.21 \| 3.95 model6 \| 1.08 \| 1.05 agiencoder \| 0.99 \| 0.93 geminet_transformer \| 5.32 \| 5.24 --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-05-19 10:12:10 -07:00
PeixuanZuo	d78bbf5ef2	[ROCm] remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline (#16004 ) remove ROCm5.2.3, ROCm5.3, ROCm5.4 from pipeline.	2023-05-19 10:29:01 +08:00
Edward Chen	6d46007028	Add explicit 'set +x' before printing a vso[] command to avoid output getting parsed again with a trailing quote. (#15986 ) Here's the motivating issue: https://github.com/microsoft/azure-pipelines-tasks/issues/10331 Noticed some problems in other repos so also updating usages in ORT. We may be fine now without it, but this change adds some safeguard against future additions of 'set -x' for debugging.	2023-05-17 19:30:28 -07:00
Changming Sun	d98763473a	Change CUDA pipelines to download CUDA SDK in every build job (#15915 ) ### Description Change CUDA pipelines to download CUDA SDK in every build job ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-17 17:31:51 -07:00
cloudhan	856afa49dd	[C#] Add missing rocm csharp api (#15540 )	2023-05-18 08:15:19 +08:00
Yi Zhang	6d43d51eb0	[Fix] No test result report while not using ctest (#15976 ) ### Description 1. Set gtest output while ctest is set to empty. 2. onnx_src in _deps shouldn't be removed because onnx_test_pytorch_converted and onnx_test_pytorch_converted need to read data from onnx/backend/test/data/.. ### Motivation and Context Test result report is important to find the flaky tests. ### To do Tests are not inconsistent. If ctest_path is empty, onnx_test_pytorch_converted and onnx_test_pytorch_converted will not be executed, if it's not, onnxruntime_mlas_test will not be executed. `270c09a37f/tools/ci_build/build.py (L1743-L1753)`	2023-05-17 08:31:16 -07:00
Jian Chen	2881d849d4	Update Win-CPU-2021 to onnxruntime-Win-CPU-2022 (#15967 ) ### Description After this PR there are following pool need to be updated. old\|new\|note ---\|---\|--- onnxruntime-Win2019-GPU-dml-A10\|tbd\| onnxruntime-Win2019-GPU-T4\|onnxruntime-Win2022-GPU-T4\| onnxruntime-Win2019-GPU-training-T4\|onnxruntime-Win2022-GPU-T4\|ame as the above because we do not have many T4 GPUs onnxruntime-tensorrt8-winbuild-T4\|tbd\| aiinfra-dml-winbuild\|tbd\| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-17 08:29:27 -07:00
kailums	f62f722c70	integrate triton into ort (#15862 ) ### Description In some scenarios, the triton written kernels are more performant than CK or other handwritten kernels, so we implement a framework that onnxruntime can use these triton written kernels. This PR is to integrate triton into ort, so that ort can use kernels that written and compiled by triton. The main change focus on two part: 1. a build part to compile triton written kernel and combine these kernels into libonnxruntime_providers_rocm.so 2. a loader and launcher in c++, for loading and launch triton written kernels. #### Build To compile triton written kernel, add a script `tools/ci_build/compile_triton.py`. This script will dynamic load all kernel files, compile them, and generate `triton_kernel_infos.a` and `triton_kernel_infos.h`. `triton_kernel_infos.a` contains all compiled kernel instructions, this file will be combined into libonnxruntime_providers_rocm.so, using --whole-archive flag. `triton_kernel_infos.h` defines a const array that contains all the metadata for each compiled kernel. These metadata will be used for load and launch. So this header file is included by 'triton_kernel.cu' which defines load and launch functions. Add a build flag in build.py and CMakeList.txt, when building rocm provider, it will call triton_kernel build command, and generate all necessary files. #### C++ Load and Launch On c++ part, we implement load and launch functions in triton_kernel.cu and triton_kernel.h. These two files located in `providers/cuda`, and when compiling rocm, they will be hipified. so this part supports both cuda and rocm. But currently we only call triton kernel in rocm. We also implement a softmax triton op for example. Because there will generate many kernels for different input shape of softmax, we use TunableOp to select the best one. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-17 09:35:28 +08:00

1 2 3 4 5 ...

1863 commits