onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-11 17:48:34 +00:00

Author	SHA1	Message	Date
Changming Sun	a942bbf489	Update nodejs to 18.x (#17657 ) 1. Upgrade nodejs from 16.x to 18.x for Windows pipelines 2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs nodejs from internet or local disk cache. But we already moved all Linux tests to docker. So we do not need the installer anymore. 3. Remove some other unused code.	2023-09-25 14:12:11 -07:00
Yulong Wang	b2b1408608	[js/web] update browser launch cmd flags (#17658 ) ### Description update Chromium browser launch command line flags Canary already using dxc so no need to specify '--enable-dawn-features=use_dxc' for canary.	2023-09-25 12:24:46 -07:00
Yulong Wang	f50fa46fe0	[JSEP] allow DataTransfer to deal with zero sized input (#17661 ) ### Description allow DataTransfer to deal with zero sized input. This is a standalone fix for zero-sized tensor handling for JSEP DataTransfer. There are other components in JSEP not supporting zero-sized tensors need to be fixed.	2023-09-25 12:21:20 -07:00
Yulong Wang	fcfc2391b8	[JSEP] allow JsCustomAllocator to deal with zero sized input (#17660 ) ### Description allow JsCustomAllocator to deal with zero sized input. This is a standalone fix for zero-sized tensor handling for JsCustomAllocator. There are other components in JSEP not supporting zero-sized tensors need to be fixed.	2023-09-25 12:20:56 -07:00
Xavier Dupré	905faea3b2	Fix static quantization for QDQ and Percentile distribution (#17649 ) ### Description One quantization case was not covered by the current list of unit tests. This PR adds a unit test to cover that case with the fix. It fixes the issue #17619. ### Motivation and Context	2023-09-25 10:11:58 -07:00
Yulong Wang	df15a3a335	[js/web] configure 5GB memory space for webpack build (#17684 ) ### Description ort-web build step - webpack consumes the amount of memory on the edge of Node.js(V8)'s default max-old-space-size, so increase the default memory size to 5GB to avoid this issue.	2023-09-25 09:22:00 -07:00
PeixuanZuo	216214b7d3	[ROCm] Remove ROCm5.4.2, ROCm 5.5 and add ROCm5.7 to python package pipeline (#17668 ) - Remove ROCm5.4.2, ROCm 5.5 and add ROCm5.7 to python package pipeline - Remove redundant arg	2023-09-25 10:35:28 +08:00
Wanming Lin	ce287a4e77	[WebNN EP] Remove workaround for dynamic shape (#17644 ) As now we have the FreeDimensionOverrides option to support dynamic shape, we can remove the previous workaround.	2023-09-22 16:06:04 -07:00
Adrian Lizarraga	e70a23f8dc	[QNN EP] Integrate Resize op fixes from QNN 2.14.1 (#17641 ) ### Description QNN SDK version 2.14.1 fixed several issues with the QNN Resize operator. This PR integrates the fixes and simplifies the implementation. ### Motivation and Context Improve Resize operator and test coverage.	2023-09-22 10:52:47 -07:00
Lukas Berbuer	6d7bc2a097	Fix ARMv7 build (#13891 ) Fix ARMv7 build error on Linux. ### Description `cpuinfo_*` functions are only available if `CPUINFO_SUPPORTED` set and therefore `"cpuinfo.h"` included. Fixed with extended conditional code. ### Motivation and Context Compilation with ARMv7 on Linux system fails.	2023-09-22 09:54:38 -07:00
Yi Zhang	55b16d347c	Read model zoo test (#17666 )	2023-09-22 09:50:36 -07:00
Jiajia Qin	891fba3b9c	[js/webgpu] Optimize Gather op (#17625 ) ### Description This PR optimizes the gather op, which is improved ~6ms in segment anything model in ADL. The problem in original algorithm is that it includes a for loop to calculate a block size of data. However, the block size may be very large, like `65536`. In GPU shader, we should try to avoid large loop in shader and try to use more threads to do it parallelly. Before: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 6886207 ns ``` After: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 11719 ns	2023-09-21 21:00:36 -07:00
Jiajia Qin	cd3fb377ea	[js/webgpu] Allow binary ops with scalar to use the vectorize path (#17589 ) ### Description 1. For binary ops, the components is always 4. So the dispatchGroup should be : `{x: Math.ceil(outputSize / 64 /* workgroup size / / 4 / component size /)}` instead of `{x: Math.ceil(outputSize / 64 / workgroup size / / (vectorize ? 4 : 1) / vec size */)}`. 2. If any of a or b only has one element, we still can use the vectorize path since the same value will be broadcasted.	2023-09-21 20:55:08 -07:00
Yiming Hu	1bc215e1d1	[VITISAI] add float16 and bfloat16 support (#17438 ) ### Description Add float16 and bfloat16 data type support for VitisAI ep ### Motivation and Context The VitisAI ep has added the bfloat datatype support. So we would like to register the datatype from onnxruntime side to enable them. --------- Signed-off-by: Yiming Hu <yiming.hu@amd.com>	2023-09-21 19:22:28 -07:00
pengwa	6b7bce5ec9	Model post process for zero stage3 training (#17187 ) ### Model post process for zero stage3 training This is the last change to make single GPU/Multiple GPUs run pass. Design details: https://microsoft.sharepoint.com/:p:/t/ONNX2/EfNfJ43necpIoPI6x5M2zvYBVbfjoPQmG4Boc_F7-tHm1w?e=ekQwA6&nav=eyJzSWQiOjMxNiwiY0lkIjoxMDE1Nzg3NDZ9 `PyTorch` runs with ZeROOffloadSubscriber: ``` model = prepare_model(...) from onnxruntime.training.utils.hooks import configure_ort_compatible_zero_stage3 configure_ort_compatible_zero_stage3() ``` `ORTModule` runs with ZeROOffloadSubscriber: ``` os.environ['ORTMODULE_ENABLE_ZERO_STAGE3'] = '1' from onnxruntime.training.ortmodule import ORTModule model = ORTModule(self.model) ``` It will be fairly easy to debug convergence issue if both ORT and PyTorch can run the same offload path. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-22 08:54:25 +08:00
Arthur Islamov	498b60d8a4	[js/web] fp16 Pool & Reduce (#17512 ) ### Description Two more ops to support fp16	2023-09-21 14:52:13 -07:00
Abhishek Jindal	d56fc7ebf5	Layer norm fusion deepspeed stage3 changes (#17614 ) ### Description <!-- Describe your changes. --> Layer norm fusion changes required for deepspeed stage 3, also includes test case. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It helps fusing layer norm for Deepspeed Stage 3. Added a test case scenario which ensures that the fusion is working properly for the scenario.	2023-09-21 14:16:41 -07:00
George Nash	f299016cbe	Fix crash on Windows server 2016 on Intel Gen4 Xeon processors (#17611 ) This adds an additional check before enabling MlasGemmU8S8DispatchAmx for GEMM operations. After checking the CPUID for AMX-TILE and AMX-INT8, an additional check is added that checks value of the XCR0 register. The value in the OXR0 register is set by the OS and indicates support for various CPU features. In this case the bits indicating XTILECFG and XTILEDATA support are checked. ### Description This adds an additional check before enabling MlasGemmU8S8DispatchAmx for GEMM operations. After checking the CPUID for AMX-TILE and AMX-INT8, an additional check is added that checks value of the XCR0 register. The value in the OXR0 register is set by the OS and indicates support for various CPU features. In this case the bits indicating XTILECFG and XTILEDATA support are checked. ### Motivation and Context Fix for crash reported directly by customer. When running older Windows server OS on newer Gen4 Xeon processors. Signed-off-by: Nash <george.nash@intel.com>	2023-09-21 09:25:41 -07:00
PeixuanZuo	5b9cd91a9c	[ROCm] fix CI (#17648 ) fix CI, follow #17621	2023-09-21 07:37:50 -07:00
Changming Sun	57dfd15d7b	Remove dnf update from docker build scripts (#17551 ) ### Description 1. Remove 'dnf update' from docker build scripts, because it upgrades TRT packages from CUDA 11.x to CUDA 12.x. To reproduce it, you can run the following commands in a CentOS CUDA 11.x docker image such as nvidia/cuda:11.8.0-cudnn8-devel-ubi8. ``` export v=8.6.1.6-1.cuda11.8 dnf install -y libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v} libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v} dnf update -y ``` The last command will generate the following outputs: ``` ======================================================================================================================== Package Architecture Version Repository Size ======================================================================================================================== Upgrading: libnvinfer-devel x86_64 8.6.1.6-1.cuda12.0 cuda 542 M libnvinfer-headers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 118 k libnvinfer-headers-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 14 k libnvinfer-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 13 M libnvinfer-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 13 M libnvinfer-vc-plugin-devel x86_64 8.6.1.6-1.cuda12.0 cuda 107 k libnvinfer-vc-plugin8 x86_64 8.6.1.6-1.cuda12.0 cuda 251 k libnvinfer8 x86_64 8.6.1.6-1.cuda12.0 cuda 543 M libnvonnxparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 467 k libnvonnxparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 757 k libnvparsers-devel x86_64 8.6.1.6-1.cuda12.0 cuda 2.0 M libnvparsers8 x86_64 8.6.1.6-1.cuda12.0 cuda 854 k Installing dependencies: cuda-toolkit-12-0-config-common noarch 12.0.146-1 cuda 7.7 k cuda-toolkit-12-config-common noarch 12.2.140-1 cuda 7.9 k libcublas-12-0 x86_64 12.0.2.224-1 cuda 361 M libcublas-devel-12-0 x86_64 12.0.2.224-1 cuda 397 M Transaction Summary ======================================================================================================================== ``` As you can see from the output, they are CUDA 12 packages. The problem can also be solved by lock the packages' versions by using "dnf versionlock" command right after installing the CUDA/TRT packages. However, going forward, to get the better reproducibility, I suggest manually fix dnf package versions in the installation scripts like we do for TRT now. ```bash v="8.6.1.6-1.cuda11.8" &&\ yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\ yum -y install libnvinfer8-${v} libnvparsers8-${v} libnvonnxparsers8-${v} libnvinfer-plugin8-${v} libnvinfer-vc-plugin8-${v}\ libnvinfer-devel-${v} libnvparsers-devel-${v} libnvonnxparsers-devel-${v} libnvinfer-plugin-devel-${v} libnvinfer-vc-plugin-devel-${v} libnvinfer-headers-devel-${v} libnvinfer-headers-plugin-devel-${v} ``` When we have a need to upgrade a package due to security alert or some other reasons, we manually change the version string instead of relying on "dnf update". Though this approach increases efforts, it can make our pipeines more stable. 2. Move python test to docker ### Motivation and Context Right now the nightly gpu package mixes using CUDA 11.x and CUDA 12.x and the result package is totally not usable(crashes every time)	2023-09-21 07:33:29 -07:00
Pranav Sharma	038c76378f	Include onnxruntime_float16.h in the package. (#17637 ) ### Description Include onnxruntime_float16.h in the package. ### Motivation and Context This was missed in the recently released 1.16 pkgs (except Nuget).	2023-09-21 00:08:10 -07:00
Changming Sun	4f3f4366d5	Fix API 16's marker (#17640 )	2023-09-20 19:51:50 -07:00
PeixuanZuo	1f991f27f1	[ROCm] add manylinux build test for ROCm CI (#17621 ) manylinux build is used for nightly packaging generation and it's hard to capture issue in time when related files change. This PR add manylinux build in CI.	2023-09-21 10:45:16 +08:00
Changming Sun	dd561f2015	Upgrade sympy (#17639 ) AB#17015	2023-09-20 18:44:23 -07:00
Adrian Lizarraga	c55da45e20	[QNN EP] Add more op unit tests (fix Clip, TopK, Tile) (#17457 ) ### Description Adds more operator unit tests (all op types should now have at least 1 unit test): - [x] Reshape - [x] Flatten - [x] Squeeze - [x] Unsqueeze - [x] Gemm - [x] Clip - Enable QDQ Clip on HTP backend (when not optimized away by L1 ClipQuantFusion optimizer) - Add support for 16-bit QDQ Clip to ClipQuantFusion optimizer - [x] Split - [x] Topk - Enable QDQ TopK on HTP backend - [x] Tile - Enable QDQ Tile on HTP backend ### Motivation and Context Increase QNN operator support and test coverage.	2023-09-20 14:31:01 -07:00
Hariharan Seshadri	c65e892089	[CUDA] Fix performance bug in DecoderMaskedMultiheadAttention for BeamSearch (#17613 )	2023-09-20 10:35:15 -07:00
Vincent Wang	e6301eee6a	Bump Up Version to 1.17.0 (#17587 ) Bump up version to 1.17.0 as the 1.16.0 release branch had been branched out.	2023-09-20 11:02:58 +08:00
Numfor Tiapo	f297d4dfb9	Remove onnxruntime extensions from list of gitmodules (#17615 ) The extensions submodule was removed in [this PR](https://github.com/microsoft/onnxruntime/pull/17097) but not deleted from the list of git modules. This causes breaks in code ingesting ORT that references the git modules for an accurate list of submodules. This change removes the extensions from the list of git modules to resolve this issue.	2023-09-19 17:12:14 -07:00
Yulong Wang	d522cc7cc4	Update npm-packaging-pipeline.yml to always use artifacts from main branch (#17604 ) ### Description Update npm-packaging-pipeline.yml to always use artifacts from main branch	2023-09-19 14:42:08 -07:00
Hariharan Seshadri	460f17fbb8	[JS/WebGPU] Support If on WebGPU (#17478 )	2023-09-19 12:20:18 -07:00
Bowen Bao	152e61da37	Avoid `get_logger` overriding root logger level (#17569 ) ### Description Instead, set level to DEBUG for the logger returned. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Otherwise, this function call overrides root logger level setting, which affects logging facility of other python packages.	2023-09-19 10:42:27 -07:00
Tianlei Wu	730fab3050	Refactor Attention cuda kernel (#17578 ) * Break QkvToContext into small functions. Each fused and unfused kernel will have separated function. * Move DecoderAttention kernel to separated file * Move KV cache related kernel to attention_kv_cache.cu ### Motivation and Context To make the code easier to maintain.	2023-09-19 09:49:21 -07:00
Wei-Sheng Chin	068300d97e	Pin beartype version (#17599 ) PyTorch doesn't like the latest beartype: https://github.com/pytorch/pytorch/pull/109510	2023-09-18 19:31:04 -07:00
Justin Chu	d350ab31d7	Remove reference to internals in torch.onnx in test (#17550 ) - https://github.com/microsoft/onnxruntime/issues/11901	2023-09-18 18:40:09 -07:00
Jambay Kinley	f969e7f8d8	Provide kwargs to remove_shared_initializers (#17539 ) ### Description Fixes a bug in `get_shared_initializers` where `signature_cache1, signature_cache2` are passed as positional arguments to `remove_shared_initializers` but their positions don't match the function signature. So `signature_cache1` is passed to `min_elements` and causes comparison error at line 907. Pass the arguments as kwargs so that it doesn't rely on their positions. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixes the bug described above.	2023-09-18 16:41:11 -07:00
Yi Zhang	7116e66c4b	Improve Win QNNEP pipeline (#17586 ) ### Description 1. use standard win build template 2. enable compiler cache ### Motivation and Context Make win build task easy to maintain and accelerate the pipeline.	2023-09-19 07:36:17 +08:00
Arthur Islamov	0f406ca1d3	[js/web] FP16 binary and unary ops (#17515 ) ### Description Binary and unary ops with fp16 support	2023-09-18 15:43:32 -07:00
Adrian Lizarraga	dea425e7c1	[QNN/CPU EP] Add 16-bit Quantize/Dequantize contrib ops (#17015 ) ### Description - Adds 16-bit integer support to: - Quantization kernel implementations: Intel, Neon, and Power intrinsics - DequantizeLinear and QuantizeLinear contrib ops - QNN EP Quantize and Dequantize operators - Python quantization scripts - Disables QDQ fusions for most 16-bit QDQ node groups (need to add 16-bit support to QLinear* ops) - Retains support for dropping QDQ nodes from Split, Gather, Reshape, Transpose, Squeeze, and Unsqueeze node groups. Sample python code to generate QDQ model with 16-bit activations and 8-bit weights: ```python quantize_static( input_model_path, output_model_path, data_reader, quant_format=args.quant_format, per_channel=args.per_channel, activation_type=QuantType.QUInt16, weight_type=QuantType.QUInt8, extra_options={"DedicatedQDQPair": True, "ForceQuantizeNoInputCheck": True, "UseQDQContribOps": True}, ) ``` Note that enabling the `UseQDQContribOps` extra option is not strictly necessary. If the 16bit types are used without enabling `UseQDQContribOps`, the QDQ ops domains are overridden to 'com.microsoft', and a warning is printed to stdout. ### Automated Tests MLAS/CPU EP: - [x] 16-bit QuantizeLinear computation - [x] 16-bit DequantizeLinear computation Optimizer: - [x] Transpose QDQ fusion - [x] Gather QDQ fusion - [x] Reshape QDQ fusion - [x] Squeeze QDQ fusion - [x] Unsqueeze QDQ fusion - [x] Split drop QDQ - [x] DoubleQDQPairRemover - [x] Transpose optimization - [x] EnsureUniqueDQForNodeUnit - [x] Common subexpression elimination (DQ not removed) - [x] Constant folding QNN EP: - [x] Conv 16-bit activations, 8-bit weights - [x] MatMul 16-bit activations, 8-bit weights - [x] Unary 16-bit QDQ ops - [x] Binary 16-bit QDQ ops Quantization tool: - [x] Test creation of 16-bit QDQ model ### Motivation and Context Support mixed precision (8bit weights, 16bit activations) models. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-09-18 09:43:34 -07:00
PeixuanZuo	af14ae8050	[ROCm] Update whisper benchmark script (#17391 ) - update whisper benchmark for ROCm EP.	2023-09-18 13:34:39 +08:00
simonjub	c969237321	[TRT EP] Fix ProviderOptions functions (#17567 ) ### Description When trying to use the TRT EP option trt_extra_plugin_lib_paths I noticed that my custom op library was not being loaded by the EP. After some digging I found that code was missing to update this option when UpdateTensorRTProviderOptions() is used to set it. At the same time I noticed that char arrays were allocated in that function and wondered where they are de-allocated. When I found it was done in ReleaseTensorRTProviderOptions(), I noticed that a few de-allocations were missing. ### Motivation and Context This PR fixes the problems described above.	2023-09-17 12:19:32 -07:00
Yifan Li	705f8a3718	[TensorRT EP] Fallback to CUDA EP if it's explicitly assigned (#17535 ) ### Description * TensorRT EP can fall back to CUDA EP if it's explicitly assigned * MIGraphX can fall back to ROCM if it's explicitly assigned Test cases: \| When user specifies providers= \| self._fallback_providers= \| \| ------------------------------------------------------------ \| ------------------------------------------------- \| \| ["TensorrtExecutionProvider", "CUDAExecutionProvider"] \| ["CUDAExecutionProvider", "CPUExecutionProvider"] \| \| ["TensorrtExecutionProvider",("CUDAExecutionProvider", cuda_options)] \| ["CUDAExecutionProvider", "CPUExecutionProvider"] \| \| ["TensorrtExecutionProvider"] \| ["CPUExecutionProvider"] \| \| [("TensorrtExecutionProvider", trt_options)] \| ["CPUExecutionProvider"] \| \| [("TensorrtExecutionProvider", trt_options), ("CUDAExecutionProvider", cuda_options)] \| ["CUDAExecutionProvider", "CPUExecutionProvider"] \| \| ["TensorrtExecutionProvider", "CPUExecutionProvider"] \| ["CPUExecutionProvider"] \| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Apply comments of https://github.com/microsoft/onnxruntime/issues/17394 and unify the logic to [MIGraphX, ROCM]	2023-09-15 15:16:11 -07:00
Yulong Wang	efd416b71f	[js/web] update test to explicitly fail for webnn without proxy (#17554 ) ### Description Update test to explicitly fail for webnn without proxy. I am doing this change because if I test webnn with other backend together, it silently enables proxy. I want to make test runner behave with less implicit flag reset. If proxy is not enabled, webnn test should fail. @Honry please let me know if other places (eg. CI scripts) should change also.	2023-09-15 14:40:22 -07:00
Yulong Wang	155887593d	[js/web] update npm test to load test cases only for required backends (#17555 ) ### Description update npm test to load test cases for required backends. No need to load test case list for the backends that we don't test.	2023-09-15 13:55:25 -07:00
Dmitri Smirnov	fdb132643d	Remove redundant Resolve() after each inlined function (#17556 ) ### Description Remove `Resolve()` on the entire graph as each function is resolved. We retain `Resolve()` after each inlining iteration. ### Motivation and Context Poor performance for inlining the model and session initialization. Original model before Resolve() removal FunctionTest.Profiling (65953 ms) After Resolve() Removal FunctionTest.Profiling (2911 ms) RelWithDebInfo pre-inlined model. Presumably because it runs Level1 optimizers Non-inlined model consists of functions and Level1 optimizers have no effect. FunctionTest.Profiling (9851 ms)	2023-09-15 12:13:37 -07:00
Tianlei Wu	adb0be45d3	Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present (#17559 ) To avoid a huge cu file and make code more readable: - Move PrepareQKV to separate cu file (attention_prepare_qkv.cu) - Move ConcatPastToPresent to attention_concat.cu - Add default value for AttentionData - Add a data structure QkvData to track Q, K and V pointers and track QKV format.	2023-09-15 10:57:29 -07:00
Tianlei Wu	af80542e65	Update optimize_pipeline for SDXL (#17536 ) - [x] Optimize SDXL models exported by optimum. - [x] Enable it to run locally instead of using module. - [x] Detect external data file in original model, and save with same format by default. - [x] Add tests ### Example ``` pip install optimum transformers diffusers onnx onnxruntime-gpu>=1.16 optimum-cli export onnx --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl ./sd_xl_base_onnx python -m onnxruntime.transformers.models.stable_diffusion.optimize_pipeline -i ./sd_xl_base_onnx -o ./sd_xl_base_fp16 --float16 ``` ### Known issues (1) VAE decoder cannot be converted to float16. Otherwise, there will be black image in output. (2) To use the float16 models, need a minor change in optimum to convert the inputs for VAE decoder from float16 to float32 since we keep VAE decoder as float32. The change is to append a line like the following after [this line](`afd2b5a366/optimum/pipelines/diffusers/pipeline_stable_diffusion_xl.py (L483)`) ``` latents = latents.astype(np.float32) ```	2023-09-15 10:17:20 -07:00
Yi Zhang	377f959c69	Run Final_Jar_Testing_Linux_GPU in docker (#17533 ) ### Description 1. Create a package test image based on [RedHat UBI](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image) 2. Install TensorRT 8.6.1.6 in RedHat. (Ref. https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#maclearn-net-repo-install-rpm) 3. Run Final_Jar_Testing_Linux_GPU in docker (base image: nvidia/cuda:11.8.0-cudnn8-devel-ubi8) ### Motivation and Context [AB#18470](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18470) ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=354004&view=logs&j=8939b564-1402-57b5-92dc-510eba75e069&t=8939b564-1402-57b5-92dc-510eba75e069	2023-09-15 08:35:55 -07:00
zesongw	a5302fec93	[WebNN EP] Fix bug for PRelu on CPU backend. (#17543 ) ### Description WebNN CPU backend expects slope of PRelu to be a static value. For now, we will not support it. ### Motivation and Context Fallback this case to pass the CI.	2023-09-15 08:29:48 -07:00
Changming Sun	4d931edd78	Update tensorrt_dependencies in setup.py (#17562 ) ### Description The files should not have the minor version number. The names were added in #17365 by mistake. ### Motivation and Context We did not successfully exclude them out.	2023-09-15 08:20:47 -07:00
Yulong Wang	94f2ed6bbd	run_CIs_for_external_pr.py: update required pipelines (#17557 ) ### Description Add required pipeline "Windows x64 QNN CI Pipeline" to script "run_CIs_for_external_pr.py"	2023-09-14 21:15:10 -07:00

1 2 3 4 5 ...

9656 commits