onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

Author	SHA1	Message	Date
Yulong Wang	5e81fa8aec	[js] fix vulnerability CVE-2024-4068: upgrade `braces` to 3.0.3 (#21078 ) ### Description Upgrade `braces` to 3.0.3 [CVE-2024-4068](https://github.com/advisories/GHSA-grv7-fg5c-xmjg) ``` # npm audit report braces <3.0.3 Severity: high Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg fix available via `npm audit fix` node_modules/braces 1 high severity vulnerability ```	2024-06-18 16:02:08 -07:00
Changming Sun	ffb8e8eb0e	Update build.py: add a comment (#20993 ) ### Description Update build.py: add a comment ### Motivation and Context See the comment.	2024-06-18 13:52:34 -07:00
Yulong Wang	631a2c16be	[js/web] skip default locateFile() when dynamic import is disabled (#21073 ) ### Description skip default `locateFile()` when dynamic import is disabled. This allows the file to work with bundlers to load WebAssembly file correctly if `env.wasm.wasmPaths` is not set.	2024-06-18 12:21:45 -07:00
Changming Sun	b75b2fcdcb	Add MSVC static analyzer back (#21056 ) ### Description Add MSVC static analyzer back. Previously it had a stability issue. It was deleted in #17522 . ### Motivation and Context	2024-06-18 12:10:11 -07:00
Yang Gu	1473d66a00	[js/webgpu] Prefer adapter.info to adapter.requestAdapterInfo (#21065 ) WebGPU is deprecating async adapter.requestAdapterInfo, and replacing it with sync adapter.info. Spec change: https://github.com/gpuweb/gpuweb/pull/4662	2024-06-18 12:02:38 -07:00
Ted Themistokleous	dadd0c451a	[MIGraphX EP] Fix MIGraphX mixed precision run input parameters (#20982 ) See #20643 ### Description Changes order of how we perform quantization to better support mixed precision and fixes a bug found with parameters of inputs for int8 quantization not being correctly handled. We now perform int8 quantization first on a full precision input model, before then quantizing the model to fp16 for remain ops that aren't quantized. The former case was causing us to use a low precision input which could cause larger values to be inserted than intended to the model when int8 quantization is perform. The symptom of this was a failure during quantization steps. Similar to the above input parameters were being uninitialized and resulting in similar failure during int8 quantization. GPU faults were intermittent but present as using uninitialized memory created undefined behavior when we started testing more complex models during mixed precision. ### Motivation and Context In some cases we've seen random data and/or invalid values entering into compiled onnx graphs. This is due to input parameters to the MIGraphX Graph not being set correctly when mixed precision (int8 + fp16) is used and ordering of quantization steps is causes a lower precision model to be used to perform int8 quantization. In most cases the failure is silent/intermittent. In some cases we've observed gpu faults due to out of bounds values being set. This change is required as a large input parameter to the MIGraphX graph is initialized to a large random value, and the next operator is using that for indexing, we get undefined behavior and a GPU fault.	2024-06-18 11:18:13 +08:00
Yi Zhang	809cb26ace	Use A100 for LLama2 model test (#21068 ) ### Description ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-18 11:04:02 +08:00
Changming Sun	9ef4f1b789	Update pybind11 (#21072 ) ### Description Upgrade pybind11 to the latest as suggested by @gnought in #21063 ### Motivation and Context Recently numpy released a new version, which caused compatibility issue between the latest numpy version and the latest ONNX Runtime version.	2024-06-17 19:50:57 -07:00
Scott McKay	159fe9d4f3	Update to mobile model usability checker (#19843 ) ### Description <!-- Describe your changes. --> - Add check for CoreML MLProgram supported ops - Only check usability with ORT Mobile package if requested - this package will be deprecated so info is a) of minimal value and b) can be confusing. - Output more things at INFO level - a lot of meaningful info was only output at DEBUG level. The default INFO level is more useful - dump full partition info at DEBUG level - Check subgraphs fully - CoreML can handle a subgraph - TBD if we want to add support for adding a subgraph to the parent graph for Loop and If nodes - most likely will be required for simple If nodes to be performant - Check 5D CoreML limitation ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve helper tools --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-06-18 07:50:33 +10:00
Nikolai Svakhin	7b3fff650a	Updated build script for CUDA case (#20987 ) ### Description In CUDA case, use the cuda_home variable to set CMAKE's CUDA compiler to a correct version of NVCC Otherwise, an NVCC from a current PATH would be picked up, which could be from a different version of CUDA. ### Motivation and Context I had a case when I had main CUDA installed, and it was a version 11.8. I wanted to build against 12.5, so I downloaded and unpacked it into a separate directory and passed it as a `--cuda-home` parameter, however the ONNX builder was still picking the NVCC compiler from 11.8. This would fix the issue https://github.com/microsoft/onnxruntime/issues/20928 cc @gedoensmax	2024-06-17 14:41:43 -07:00
Adrian Lizarraga	a6c18ae9df	[QNN EP] Add quantization axis checks for Conv/ConvTranspose/Q/DQ ops (#21016 ) ### Description Updates QNN EP to reject Conv/ConvTranspose/Q/DQ ops with unsupported quantization axis values. ### Motivation and Context Allows these unsupported operators to be handled by the CPU EP. Fixes errors like the following: > Node 'ConvTranspose' OpType:ConvTranspose with domain:com.ms.internal.nhwc was inserted using the NHWC format as requested by QNNExecutionProvider, but was not selected by that EP. This means the graph is now invalid as there will not be an EP able to run the node. This could be a bug in layout transformer, or in the GetCapability implementation of the EP. --------- Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-06-17 09:46:14 -07:00
Xavier Dupré	c501c6ffaf	Rename a mispelled filename in the documentation (#21066 ) ### Description Rename a file in the documentation	2024-06-17 18:18:41 +02:00
Wanming Lin	bbb6dbf6d2	[WebNN EP] Update data type constraints for Reduction ops (#20912 ) WebNN Spec adds missing 64-bit integers support for `reduceL1`, `reduceSum`, `reduceSumSquare` and `reduceProduct` ops at this [PR](https://github.com/webmachinelearning/webnn/pull/695), which has already been implemented in Chromium. Update corresponding data type constraints in WebNN EP. Besides, WebNN CPU backend currently doesn't support `uint64` and `uint32` for these ops.	2024-06-17 08:46:18 -07:00
Frank Dong	8aa2667ae6	add bf16 for Tile CUDA executor (#20854 ) ### Description add bf16 for Tile CUDA executor ### Motivation and Context required change to support phimm model for ORT training	2024-06-17 05:52:13 -07:00
Yueqing Zhang	0babc33725	[VitisAI] update graph_save (#20979 ) ### Description <!-- Describe your changes. --> Fix the threshold limit ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We use this to debug the graph after each graph transformation. But, the const data inside the model is useless. So, we decided to remove that information to save disk space. Co-authored-by: Chunye Wang <chunywan@xlnx.xilinx.com>	2024-06-16 22:30:35 -05:00
Ted Themistokleous	11e7a1b8f2	[MIGraphX EP] Add migraphx ep save load compiles (#20643 ) ### Description Adds the ability for MIGraphX EP to save off or load compiled models to save time between inferences. Via Command line User should be able to set the save ability with ORT_MIGRAPHX_SAVE_COMPILED_MODEL ORT_MIGRAPHX_SAVE_COMPILE_PATH User should be able to set the load ability with ORT_MIGRAPHX_LOAD_COMPILED_MODEL ORT_MIGRAPHX_LOAD_COMPILE_PATH via Onnxruntime API migx_save_compiled_model migx_save_model_name migx_load_compiled_model migx_load_model_name ### Motivation and Context The motivation for this is to leverage MIGraphX's existing API to save/load models after our compile step of graph optimization. For larger models or models which were compiled with additional tuning steps, this saves time after first compile and inference run, and thus speeds up the user experience in order to encourage development. --------- Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2024-06-17 11:24:31 +08:00
Scott McKay	d4470fe653	Update Android SDK tools path lookup to be more strongly anchored to the provided root. (#21046 ) ### Description <!-- Describe your changes. --> The tools should really all come from the same Android NDK, so using `shutil.which` adds potential confusion when we do a lookup for the target program by name first due to adding `dirnames.insert(0, "")` as the first directory entry to lookup as it will match the filename anywhere in the current path. That's problematic as the emulator should come from <sdk_tools>/emulator/emulator (see [here](https://www.stkent.com/2017/08/10/update-your-path-for-the-new-android-emulator-location.html)), but the paths on the CI machines result in the old location of <sdk_tools>/tools/emulator being selected. This leads to the emulator failing to run on arm64 macOS CIs as the old emulator does not look for the arm64 binary. At the most you may have multiple cmdline-tools versions installed, but if we need to support explicitly specifying a version for that path that can be added. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Make emulator run on arm64 macOS machines.	2024-06-17 09:24:43 +10:00
Tianlei Wu	f25cf19375	Add helper functions to dump 4d tensors in CPU for debugging (#21043 ) Add some helper functions to dump 4D tensors to help debugging. Example to use it: (1) Change DUMP_TENSOR_LEVEL from 0 to 2 in contrib_ops/cpu/utils/debug_macros.h to enable dumping. Without enabling, the dumping code will not be built into ORT binary. (2) Add a few lines to dump tensors like ``` DUMP_CPU_TENSOR_INIT(); DUMP_CPU_TENSOR("tensor name", tensor_data, dim0, dim1, dim2, dim3); ``` Changes: - [x] Add functions to dump 4D int32/int64/float/half tensors in CPU - [x] Add functions to dump 4D int32/int64 tensors in CUDA - [x] Change namespace (remove .transformers from namespace, and move files to utils directory)	2024-06-14 17:32:27 -07:00
Adrian Lizarraga	8f0e896c95	Fix Reduced Op build with empty FP16 kernel function tables (#21038 ) ### Description - Fixes compilation error for "reduced operator" builds with no FP16 kernels and `MLAS_F16VEC_INTRINSICS_SUPPORTED` enabled. - Fixes linker error for "reduced operator" builds with QNN EP by excluding QNN EP unit tests. QNN EP unit tests require CPU EP operator implementations to evaluate accuracy. ### Motivation and Context Need to be able to build a reduced operator build with QNN EP. See https://github.com/microsoft/onnxruntime/blob/main/docs/Reduced_Operator_Kernel_build.md The following example operator config file causes a compilation error when either `MLAS_F16VEC_INTRINSICS_SUPPORTED` is defined or QNN EP is enabled. ``` # reduced_op_config.txt ai.onnx;12;Add ``` ```shell python tools\ci_build\build.py --include_ops_by_config reduced_op_config.txt --config Debug --build_wheel --build_shared_lib --skip_tests --build_dir build --parallel --use_qnn --qnn_home '<QNN_ROOT_DIR>' ```	2024-06-14 14:23:12 -07:00
cloudhan	f4b22f89bc	refactor flash attn test (#21028 ) Code improvement and allow change to use other ep	2024-06-14 10:23:28 -07:00
Xavier Dupré	c66e920154	Fix wrong quantization type in quantization tool (#20954 ) ### Description Fix issue #20881. The weight quantization was set to be the activation type.	2024-06-14 07:55:13 -07:00
zkep	7313accd44	Update Dockerfile.cuda (#21042 )	2024-06-13 23:50:03 -07:00
Changming Sun	80a60d9a65	Update ONNX installing script (#21044 ) Avoid using command line flags to pass in CMAKE_PREFIX_PATH. Use environment variables instead. Because, otherwise the value of CMAKE_PREFIX_PATH could get encoded twice. For example, if the prefix is `C:\a\root`, then in tools/ci_build/github/windows/helpers.ps1 we set it in Env:CMAKE_ARGS which will be consumed by ONNX. Then when ONNX get it and decoded it, ONNX will get `C:aroot` instead. Then because the path doesn't exist, the CMAKE_PREFIX_PATH couldn't take effect when the script installs ONNX. This PR fixes the issue. The issue got discovered when I tried to upgrade cmake to a newer version. Now our Windows CPU CI build pipeline uses cmake 3.27. In the main branch even the CMAKE_PREFIX_PATH setting does not work, cmake still can find protoc.exe from the directories. However, starting from 3.28 cmake changed it. With the newer cmake versions the find_library(), find_path(), and find_file() cmake commands no longer search in installation prefixes derived from the PATH environment variable.	2024-06-13 23:49:41 -07:00
pengwa	87b14ac7e4	Release backward inputs per static graph ref count (#20804 ) ### Release backward inputs per static graph ref count For the output buffer marked as external output: 1. Remove the additional ref count we used for avoiding reusing buffer. Instead, when we find reuse input/output buffer, we will make sure the reused buffer not not generated by nodes that has external outputs. 2. Remove the ref count of pybind feed inputs, which exists all the time until the run_backward completed. Instead, passing a mutuble feeds, and we clean the feeds vector once that is copied into session states and not needed any more before run the graph sequencentially. #### Before the change: One of the backward inputs is 3.9GB, it lives until the backward ends. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/e71e2072-eaaa-4be3-a39f-0ca74b507265) #### With the change: The 3.9GB is released when the last node depending on that tensor completed. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/7b27d01f-c675-4faf-9a3e-f886b31b2afe) Be noted: the peak did not change though, we have more work to do to reduce on the peak. #### Others It is found there are few tests that were updated to use incorrect expected values in previous code refactoring `a81faee41e (diff-9e8fbae7d3dff24106cd17564949f320e943cb3048eae07813c7de144f140419L382)`. This PR tries to fix them back, and I think now all test cases are back to normal. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-14 14:33:01 +08:00
Baiju Meswani	fff68c3151	Avoid reusing buffer for node outputs with no consumers (#21019 )	2024-06-13 16:08:16 -07:00
Sunghoon	846cac6e2c	Fix warnings and errors on cpu benchmark (#20967 ) Several changes to remove warnings or errors on CPU benchmark and other benchmarks. - Phi-3 codes doesn't require auth token, but trust_remote_code flag. Add "--trust" to enable only trust_remote_code. - Phi3 models are not working with sdpa and needs to be run with eager mode. - Fix CPU io binding error with null device id and element_type mismatch. - use_buffer_share only when engine is ort.	2024-06-13 15:58:48 -07:00
Yueqing Zhang	f5b6f6dc26	[VitisAI] Fix duplicate onnxruntime_vitisai_ep.dll loaded (#20968 ) ### Description <!-- Describe your changes. --> Check if the onnxruntime_vitisai_ep.dll already linked. If yes, don't use dlopen. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> If a executable is linked directly with onnxruntime_vitisai_ep.dll. dlopen would cause two DLL with same content loaded. So we have to check that case. Linux can detect such situation automatically.	2024-06-13 15:17:07 -07:00
mingyueliuh	8edec47c6c	[VitisAI] Fix some typos in tensor_proto_new APIs (#21027 ) ### Description Fix some typos in Vitis AI EP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: liumingyue <mingyue@xilinx.com>	2024-06-13 15:08:51 -07:00
Tianlei Wu	7c3a25225f	[CUDA] update test_flash_attn_cuda.py for Windows (#21006 ) Currently test_flash_attn_cuda.py can only run in Linux. It is because it uses triton for rotary reference implementation, and triton python package is not available in Windows. This changes the script to allow the test run in Windows, so that we can test memory efficient attention in Windows. Due to limitation, rotary is excluded in testing on Windows.	2024-06-13 12:50:02 -07:00
Ye Wang	f35dd1407f	custom allreduce cuda kernel (#20703 ) ### Description <!-- Describe your changes. --> Conditionally route to custom AllReduce kernel when buffer size and gpu numbers meet certain requirements. Otherwise, keep using NCCL's AllReduce. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ye Wang <wangye@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net> Co-authored-by: Your Name <you@example.com>	2024-06-13 11:09:49 -07:00
Jian Chen	9daed5565a	Component Governance Fix round 6 (#21021 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-13 09:10:51 -07:00
Changming Sun	73271dd329	Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 (#21023 ) ### Description Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 ### Motivation and Context To reduce the variants of VM images we need to maintain. Now we have 3: 1. Windows 2022 CPU 2. Windows 2022 GPU A10 3. Windows 2022 GPU T4 This change allows us removing the last one.	2024-06-12 22:04:40 -07:00
Jian Chen	4e18b0b7ce	Upgrade braces from 3.0.2 to 3.0.3 to fix the vulnerability (#21022 )	2024-06-12 18:02:52 -07:00
Chen Fu	6fb09055d4	Adding a sm80 q4 gemm kernel for small tiles (#20545 ) ### Description Implementation of a q4 gemm cuda kernel for small tiles and small sequence_len or batch_size (<=16) ### Performance Test Results \| Problem Shape \|New Kernel \| \| \| Current Kernel\| \| \| ------------------: \| ----------- \| ------- \|--\| ------------- \| ------- \| \| (M x N x K) \| Latency (ms) \| GFLOPS \| \| Latency (ms) \| GFLOPS \| \| 1 x 3072 x 3072 \| 0.008124 \| 2310.93 \| \| 0.017231 \| 1095.39 \| \| 16 x 3072 x 3072 \| 0.011263 \| 26813.7 \| \| 0.017431 \| 17325.4 \| \| 32 x 3072 x 3072 \| 0.018559 \| 32544.3 \| \| 0.079493 \| 7597.89 \| \| 64 x 3072 x 3072 \| 0.030364 \| 39782.1 \| \| 0.079387 \| 15216 \| \| 1024 x 3072 x 3072 \| 0.387194 \| 49916.5 \| \| 0.080849 \| 239054 \| \| \| \| \| \| \| \| \| 1 x 3072 x 9216 \| 0.015734 \| 3598.77 \| \| 0.043404 \| 1304.55 \| \| 16 x 3072 x 9216 \| 0.023611 \| 38371.3 \| \| 0.043388 \| 20859.1 \| \| 32 x 3072 x 9216 \| 0.038652 \| 46878 \| \| 0.224353 \| 8076.31 \| \| 64 x 3072 x 9216 \| 0.072334 \| 50099.5 \| \| 0.224338 \| 16153.6 \| \| 1024 x 3072 x 9216 \| 1.02872 \| 56363.2 \| \| 0.231284 \| 250696 \| \| \| \| \| \| \| \| \| 1 x 8192 x 3072 \| 0.015787 \| 3188.18 \| \| 0.017714 \| 2841.28 \| \| 16 x 8192 x 3072 \| 0.025933 \| 31053.3 \| \| 0.017919 \| 44942.2 \| \| 32 x 8192 x 3072 \| 0.042633 \| 37778.9 \| \| 0.079407 \| 20282.9 \| \| 64 x 8192 x 3072 \| 0.070061 \| 45977.5 \| \| 0.079531 \| 40502.8 \| \| 1024 x 8192 x 3072 \| 1.01264 \| 50896.3 \| \| 0.237244 \| 217243 \| \| \| \| \| \| \| \| \| 1 x 3072 x 8192 \| 0.014444 \| 3484.56 \| \| 0.038961 \| 1291.85 \| \| 16 x 3072 x 8192 \| 0.020433 \| 39411.8 \| \| 0.039056 \| \| \| 32 x 3072 x 8192 \| 0.03459 \| 46563.5 \| \| 0.200189 \| 8045.47 \| \| 64 x 3072 x 8192 \| 0.063319 \| 50873.4 \| \| 0.20029 \| 16082.8 \| \| 1024 x 3072 x 8192 \| 0.928282 \| 55521.5 \| \| 0.205883 \| 250334 \| \| \| \| \| \| \| \| \| 1 x 5120 x 5120 \| 0.014573 \| 3597.79 \| \| 0.02604 \| 2013.42 \| \| 16 x 5120 x 5120 \| 0.025638 \| 32719.5 \| \| 0.026194 \| 32024.4 \| \| 32 x 5120 x 5120 \| 0.037421 \| 44834.2 \| \| 0.127676 \| 13140.4 \| \| 64 x 5120 x 5120 \| 0.065593 \| 51155.9 \| \| 0.127706 \| 26274.8 \| \| 1024 x 5120 x 5120 \| 1.00217 \| 53570.9 \| \| 0.256388 \| 209398 \| \| \| \| \| \| \| \| \| 1 x 17920 x 5120 \| 0.053868 \| 3406.49 \| \| 0.04715 \| 3891.84 \| \| 16 x 17920 x 5120 \| 0.071952 \| 40805.1 \| \| 0.049755 \| 59009.3 \| \| 32 x 17920 x 5120 \| 0.123657 \| 47486.3 \| \| 0.129812 \| 45234.8 \| \| 64 x 17920 x 5120 \| 0.222113 \| 52874.2 \| \| 0.129781 \| 90491.6 \| \| 1024 x 17920 x 5120 \| 3.50124 \| 53668.1 \| \| 0.770569 \| 243852 \| \| \| \| \| \| \| \| \| 1 x 1280 x 5120 \| 0.007029 \| 1864.66 \| \| 0.025954 \| 505.027 \| \| 16 x 1280 x 5120 \| 0.008122 \| 25821.6 \| \| 0.025953 \| 8080.59 \| \| 32 x 1280 x 5120 \| 0.012498 \| 33558.7 \| \| 0.127618 \| 3286.62 \| \| 64 x 1280 x 5120 \| 0.022049 \| 38044.6 \| \| 0.127762 \| 6565.81 \| \| 1024 x 1280 x 5120 \| 0.258547 \| 51912.4 \| \| 0.128425 \| 104511 \| \| \| \| \| \| \| \| \| 1 x 5120 x 17920 \| 0.049096 \| 3737.59 \| \| 0.109703 \| 1672.7 \| \| 16 x 5120 x 17920 \| 0.073145 \| 40139.7 \| \| 0.110608 \| 26544.3 \| \| 32 x 5120 x 17920 \| 0.11405 \| 51486.3 \| \| 0.430942 \| 13626 \| \| 64 x 5120 x 17920 \| 0.210022 \| 55918.1 \| \| 0.430948 \| 27251.7 \| \| 1024 x 5120 x 17920 \| 4.571 \| 41108 \| \| 0.860118 \| 218464 \|	2024-06-12 16:02:26 -07:00
Changming Sun	feec8efae4	Add "-allow-unsupported-compiler" flags to Windows CUDA flags (#21004 ) ### Description Add "-allow-unsupported-compiler" flags to Windows CUDA flags. This change only impacts our pipelines. By default it would not reach this code path. ### Motivation and Context nvcc refuses working with the latest VS toolset unless this flag is set. If without this change, our CI build will fail with the compiler is the latest VS 2022 17.10. Here is the log: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1405549&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=c7e55e04-f02b-57dc-d19a-29b7d3528c44&l=715 The error message is: `D:\a\_work\_temp\v11.8\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [D:\a\_work\1\b\RelWithDebInfo\CMakeFiles\CMakeScratch\TryCompile-g5rudf\cmTC_7b8ff.vcxproj]`	2024-06-12 14:23:00 -07:00
Tianlei Wu	a2b0a69dcc	Update MultiHeadAttention benchmark to test CPU (#20972 ) ### Description MultiHeadAttention benchmark script only supports cuda provider right now. This updates the script to support testing cpu operator and ploting gpu latency. ### Motivation and Context Benchmark for the coming cpu flash attention.	2024-06-12 13:04:25 -07:00
Changming Sun	99f0fe3fae	Fix a few issues in "Zip-Nuget-Java-Nodejs Packaging Pipeline" (#21014 ) ### Description Fix a few issues in the Windows TRT job in "Zip-Nuget-Java-Nodejs Packaging Pipeline": 1. It is a Windows job. It should not use bash(which is usually not available on Windows). 2. When it sets ADO vars, it missed a semicolon Here is the doc of how to set ADO vars via scripts: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-variables-scripts?view=azure-devops&tabs=bash You could see it needs a semicolon . Without the semicolon , the vars will have an extra quotation mark in their values.	2024-06-12 09:44:24 -07:00
Baiju Meswani	94aa21c3dd	Define _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR (#21005 ) https://github.com/microsoft/STL/pull/3824 introduces constexpr mutex. An older version of msvcp140.dll will lead to ```A dynamic link library (DLL) initialization routine failed```. This error can be encountered if using conda Python since conda packages msvc dlls and these are older right now. This PR disables the constexpr mutex so that ort package can work with older msvc dlls. Thanks @snnn for the discovery.	2024-06-11 22:23:28 -07:00
Jing Fang	9be30348b9	[CPU EP] Add blocked quantization to QuantizeLinear op kernel (#20977 ) ### Description Add blocked quantization to QuantizeLinear op kernel. If the quantize axis is not the last axis, block the tensor using 1x128 blocks. Blocks are dispatched to multiple threads for concurrently processing. Currently only support scalar instructions. If the quantize axis is the last axis, block the tensor using 1 x quant_block_size blocks. Blocks are dispatched to multiple threads for concurrent processing. If output type is int types, call mlas kernel to use the SIMD instructions in each block. #### Benchmark data 20 core 2GHz CPU, RelWithDebInfo config, 196 x 4096 tensor, quantize float to int4x2 Quantize before last axis: * single thread, scalar instruction: 31380900 ns * 8 thread, scalar instruction: 5098620 ns Quantize last axis: * single thread, scalar instruction: 27927900 ns * 8 thread, SIMD instruction: 102261 ns more thread, SIMD instruction, larger block size helps ### Motivation and Context ONNX added blocked quantization to QuantizeLinear in optset 21	2024-06-11 20:25:28 -07:00
Yi Zhang	17d5dc503f	Upgrade ESRP signing task from v2 to v5 (#20995 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-12 08:31:53 +08:00
cloudhan	67c8befd1d	test: refactor flash_attn tests to use parameterized (#20913 ) Use `parameterized` to decompose the huge test case. This will make adding ROCm support be possible. --------- Co-authored-by: Guangyun Han <guangyunhan@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>	2024-06-11 15:57:20 -07:00
Tianlei Wu	b3fc9b5a0e	[CUDA] upgrade cutlass to 3.5.0 (#20940 ) ### Description Upgrade cutlass to 3.5 to fix build errors using CUDA 12.4 or 12.5 in Windows - [x] Upgrade cutlass to 3.5.0. - [x] Fix flash attention build error with latest cutlass header files and APIs. This fix is provided by @wangyems. - [x] Update efficient attention to use new cutlass fmha interface. - [x] Patch cutlass to fix `hrsqrt` not found error for sm < 53. - [x] Disable TF32 Staged Accumulation to fix blkq4_fp16_gemm_sm80_test build error for cuda 11.8 to 12.3. - [x] Disable TRT 10 deprecate warnings. The following are not included in this PR: * TRT provider replaces the deprecated APIs. * Fix blkq4_fp16_gemm_sm80_test build error for cuda 12.4 or 12.5. This test is not built by default unless you add `--cmake_extra_defines onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON` in build command. To integrate to rel-1.18.1: Either bring in other changes (like onnx 1.16.1), or generate manifest and upload a new ONNX Runtime Build Time Deps artifact based on rel-1.18.1. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/19891 https://github.com/microsoft/onnxruntime/issues/20924 https://github.com/microsoft/onnxruntime/issues/20953	2024-06-11 13:32:15 -07:00
Yulong Wang	dd805ff77d	[js/web] ESM: use the bundled target as default export (#20991 ) ### Description ESM: use the bundled target as default export In this change, the default import of the following entries: ``` import from 'onnxruntime-web'; import from 'onnxruntime-web/all'; import from 'onnxruntime-web/webgpu'; ``` will use the "bundled" version, which has no dynamic import. This change should only apply to ESM on web.	2024-06-11 11:14:55 -07:00
Jian Chen	05032e5e5f	Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925 ) ### Description Adding support of cudnn 9 ### Motivation and Context Keep exsiting cuda 12.2 with nvidia dirver 535	2024-06-11 09:37:16 -07:00
Wanming Lin	043ef5c95f	[WebNN EP] Support latest WebNN softmax op (#20827 ) Latest WebNN softmax supports N-D input and axis parameter.	2024-06-11 08:27:14 -07:00
Changming Sun	ae4a2e6b3f	Publish Build Symbols for DML nightly nuget package (#20988 ) ### Description Publish Build Symbols for DML nightly nuget package.	2024-06-10 17:53:22 -07:00
Changming Sun	dc545d366d	Publish debug symbols for Windows python packages (#20973 ) ### Description 1. Publish debug symbols for Windows python packages. This PR will publish them to ADO. Later on I will also replicate them to Microsoft Symbol Server. 2. Build the packages in Release mode instead of RelWithDebInfo, to be consistent with the other platforms(Linux/macOS/...) ### Motivation and Context To help debug things. Sometimes we found an issue, but we couldn't debug it because we didn't have symbols, and once we rebuilt the package locally the issue was gone. This change would be helpful for such scenarios. Build log: https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841	2024-06-10 12:33:49 -07:00
Changming Sun	92ae60b01f	Revert a cmake change in protobuf_cmake.patch (#20964 ) Avoid patching external projects unless absolutely necessary #20875	2024-06-10 11:20:33 -07:00
Hector Li	007d106b73	Disable inference on CPU if CPU fallback is disabled (#20976 ) ### Description Don't allow model inference on CPU (Ort CPU EP or QNN EP CPU backend) if CPU fallback is disabled.	2024-06-10 09:27:43 -07:00
Hector Li	3c6d409937	Enable Hardsigmoid for QNN EP using SDK support direct support (#20956 ) ### Description Enable Hardsigmoid for QNN EP using SDK support direct support instead of decomposing to its constituent ops so it can support the quantized model	2024-06-10 09:16:25 -07:00

1 2 3 4 5 ...

11225 commits