onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-30 03:37:44 +00:00

Author	SHA1	Message	Date
Changming Sun	4e18344028	Delete docs/Python_Dev_Notes.md (#20887 ) It is no longer relevant since it is not a problem since python 3.5, and the minimum python version we support is 3.8.	2024-05-31 14:01:11 -07:00
Yulong Wang	35697d2421	[js/webnn] update API of session options for WebNN (#20816 ) ### Description This PR is an API-only change to address the requirements being discussed in #20729. There are multiple ways that users may create an ORT session by specifying the session options differently. All the code snippet below will use the variable `webnnOptions` as this: ```js const myWebnnSession = await ort.InferenceSession.create('./model.onnx', { executionProviders: [ webnnOptions ] }); ``` ### The old way (backward-compatibility) ```js // all-default, name only const webnnOptions_0 = 'webnn'; // all-default, properties omitted const webnnOptions_1 = { name: 'webnn' }; // partial const webnnOptions_2 = { name: 'webnn', deviceType: 'cpu' }; // full const webnnOptions_3 = { name: 'webnn', deviceType: 'gpu', numThreads: 1, powerPreference: 'high-performance' }; ``` ### The new way (specify with MLContext) ```js // options to create MLcontext const options = { deviceType: 'gpu', powerPreference: 'high-performance' }; const myMlContext = await navigator.ml.createContext(options); // options for session options const webnnOptions = { name: 'webnn', context: myMlContext, ...options }; ``` This should throw (because no deviceType is specified): ```js const myMlContext = await navigator.ml.createContext({ ... }); const webnnOptions = { name: 'webnn', context: myMlContext }; ``` ### Interop with WebGPU ```js // get WebGPU device const adaptor = await navigator.gpu.requestAdapter({ ... }); const device = await adaptor.requestDevice({ ... }); // set WebGPU adaptor and device ort.env.webgpu.adaptor = adaptor; ort.env.webgpu.device = device; const myMlContext = await navigator.ml.createContext(device); const webnnOptions = { name: 'webnn', context: myMlContext, gpuDevice: device }; ``` This should throw (because cannot specify both gpu device and MLContext option at the same time): ```js const webnnOptions = { name: 'webnn', context: myMlContext, gpuDevice: device, deviceType: 'gpu' }; ```	2024-05-31 03:25:14 -07:00
Changming Sun	67bc9438d7	Update training packaging pipeline's docker files (#20853 ) ### Description Similar to #20786 . The last PR was able to update all pipelines and all docker files. This is a follow-up to that PR. ### Motivation and Context 1. To extract the common part as a reusable build infra among different ONNX Runtime projects. 2. Avoid hitting docker hub's limit: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit	2024-05-30 23:48:42 -07:00
Edward Chen	00589f578d	Fix bench_sqnbitgemm.cpp benchmark argument name list. (#20858 ) Add the "HasBias" argument to the ArgNames() call so it matches with the ArgsProduct() call.	2024-05-30 18:59:54 -07:00
Adrian Lizarraga	b02d5e6d76	[CPU EP] Int4 support for QuantizeLinear, DequantizeLinear, and Transpose (#20362 ) ### Description - 4-bit QuantizeLinear(21). Blocked quantization still missing (i.e., do not support the new `block_size` attribute) - 4-bit DequantizeLinear(21). Blocked dequantization still missing (i.e., do not support the new `block_size` attribute) - 4-bit Transpose(21). - Update quantization tool with int4 types. - Disable QDQ fusions for 4-bit types. See: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc - MLAS 4-bit quantization kernels for intel, neon, powerpc. ##### Notes To calculate a tensor's storage size, we normally get the number of elements from the shape (i.e., `tensor_shape.Size()`) and multiply by the size of a single element. This does not directly work for sub-byte elements like int4 as each element in a `Tensor<Int4x2>` stores two packed int4 elements in a byte. The `Tensor:: CalculateTensorStorageSize` should be called to perform the correct calculation for any tensor element type. ### Motivation and Context ONNX 1.16 added the int4 and uint4 types. This initial PR adds the int4 type to ORT and adds int4 implementations for the Quant, Dequant, and Transpose ops on CPU EP. We still need to add int4 support for many ops and execution providers. See the ONNX 1.16 release notes: https://github.com/onnx/onnx/releases.	2024-05-30 18:56:24 -07:00
Edward Chen	a508130456	Address React Native pipeline component detection timeout (#20871 ) mac-react-native-ci-pipeline.yml: - We don't need to run component detection for PR builds so just disable it there. npm-packaging-pipeline.yml: - Manually added component detection task was being added twice - removed one. - Increased timeout of stage where component detection is run since the existing timeout was close for some builds.	2024-05-30 16:37:03 -07:00
Ye Wang	2200a0b3dd	Fix moe tests to run on supported arch (#20872 ) ### Description <!-- Describe your changes. --> https://github.com/microsoft/onnxruntime/issues/20788 Will do sm70 validation separately. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-30 13:26:38 -07:00
Changming Sun	65ef270e06	Update Aten pipeline's docker file to use UBI8 (#20856 ) ### Description Now it uses CentOS 7 which is EOL. This PR updates it to UBI8. ### Motivation and Context To deprecate CentOS 7 .	2024-05-30 07:38:15 -07:00
Yueqing Zhang	59b13b7bbd	[VitisAI] update version and api & bug fix (#20851 ) ### Description <!-- Describe your changes. --> 1. Use macro defined to check version number 2. Add a new api 3. Fix bug at attr_proto ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> These are some problems we need to address for the final delivery to Microsoft.	2024-05-30 07:36:53 -07:00
Xu Xing	25ac65375c	[js/webgpu] Fix mha name (#20860 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-30 00:01:06 -07:00
Jian Chen	228713f635	adding publishing stage to publish java CUDA 12 pkg to ado (#20834 )	2024-05-29 16:24:23 -07:00
Carson M	5bfca1dc57	[Build] Change `onnxruntime_NVCC_THREADS` from option to cache entry (#20768 ) ### Description Changes the `onnxruntime_NVCC_THREADS` CMake variable from an [`option`](https://cmake.org/cmake/help/latest/command/option.html) to a [cache entry](https://cmake.org/cmake/help/latest/command/set.html#set-cache-entry). ### Motivation and Context Fixes #19833. `option` in CMake (confusingly, IMHO) always defines a boolean option. The original definition of `onnxruntime_NVCC_THREADS` specified a default of `1`, which I presume is coerced to `ON`. Thus, if the option is not overridden with a value of another type, NVCC will receive a malformed option `--threads ON` (rather than the expected `--threads 1`), which causes the error reported in #19833. This error only occurred if compiling ONNX Runtime via CMake with `-Donnxruntime_USE_CUDA=ON`; the CI build script always overrode `onnxruntime_NVCC_THREADS` with a string value: `f1fef19b6e/tools/ci_build/build.py (L1152-L1154)`	2024-05-29 12:28:33 -07:00
Wanming Lin	798cea2350	[WebNN EP] Remove legacy MLOperandDescriptor.type (#20783 ) Latest Chrome has supported MLOperandDescriptor.dataType, remove legacy MLOperandDescriptor.type.	2024-05-29 10:20:17 -07:00
Wanming Lin	9ea9f9e46a	[WebNN EP] Add data type constraint (#20779 ) WebNN spec has added data type constraint for every op, and its CPU backend (currently is TFLite) has additional constraint. Add corresponding constraint to each op in WebNN EP. Note: Temporarily disable fp16 for CPU backend as which is planned to be ready in Chromium next month.	2024-05-29 10:19:51 -07:00
Vincent Wang	e77f238dc6	Update Torch Version to Fix ATen CPU Pipeline Failure (#20845 ) Update Torch Version to Fix ATen CPU Pipeline Failure.	2024-05-29 16:04:18 +08:00
Adrian Lizarraga	3044aa8743	[Quant tool] Extend support for QDQ type conversion at graph output (#20841 ) ### Description Allows mixed-precision overrides that adds a QDQ quantization type conversion sequence at a graph output that is not consumed by other nodes. This is not a common use-case but should handle it instead of raising an error. #### Example Original model ![image](https://github.com/microsoft/onnxruntime/assets/19691973/4c9c3bb0-4ca1-4213-9259-9d0506ed22f2) mixed-precision overrides: ```python mixed_prec_overrides = { "input_0": [{"quant_type": QuantType.QUInt16}], "op_0_out": [ { "quant_type": QuantType.QUInt16, "convert": {"quant_type": QuantType.QUInt8}, } ], } quantize_static( float_model_path, qdq_model_path, data_reader, quant_format=QuantFormat.QDQ, activation_type=QuantType.QUInt8, op_types_to_quantize=[node.op_type for node in float_model.graph.node], extra_options={ "TensorQuantOverrides": mixed_prec_overrides, }, ) ``` QDQ model: ![image](https://github.com/microsoft/onnxruntime/assets/19691973/804fc89b-4a00-43bc-a4ff-21edd6f27e98) ### Motivation and Context This scenario is arising for certain quantization configurations. Should handle it gracefully.	2024-05-28 21:27:54 -07:00
Yifan Li	d44be41e1c	[TensorRT EP] Support engine hardware compatibility (#20669 ) ### Description <!-- Describe your changes. --> - Introduce option `trt_engine_hw_compatible` to support engine hardware compatibility for Ampere+ GPUs - This enables `nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS` flag when generating engines - This option has been validated on sm80/86 GPUs, as engine can be reused across different ampere+ arch: - Client side need to enable this option as well to leverage existing sm80+ engines - If this option is enabled by users which TRT<8.6 or sm<80, there will be a warning showing this option not supported Engine naming: \| When \| `trt_engine_hw_compat=false` \| `trt_engine_hw_compat=true` \| \| -------------- \| ------------------------------------------------------------ \| ------------------------------------------------------------ \| \| A100 (sm80) \| TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm80.engine \| TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm80+.engine \| \| RTX3080 (sm86) \| TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm86.engine \| TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm80+.engine \| ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reference: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#hardware-compat --------- Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>	2024-05-28 18:12:56 -07:00
Edward Chen	535e9d7114	Update package_release_tasks.py (#20835 ) 1. Move azcopy environment variables out of script and into an Azure DevOps variable group. Move towards consolidating the managed identity client ID definition in one place. 2. Disable azcopy overwrite. We don't want to accidentally change the files for a released package.	2024-05-28 17:50:25 -07:00
Ye Wang	362a623905	fix a build error with cuda 12.5 (#20770 ) ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/20765	2024-05-28 10:46:24 -07:00
Adrian Lizarraga	e78b18a2fb	Increase ComponentDetection timeout for React Native CI (#20800 ) ### Description Runs of the React Native CI are timing out during ComponentDetection after 8 minutes. This increases the timeout value. ### Motivation and Context Runs of the React Native CI are timing out during ComponentDetection.	2024-05-28 08:36:38 -07:00
Jian Chen	b1b8cb05dc	Adding java build and packaging stage to cuda-packaging-pipeline.yml (#20812 ) ### Description Adding java build/packaging stage to `cuda-packaging-pipeline.yml` ### Motivation and Context This way we can enable publishing the Java Cuda 12 along with Nuget CUDA 12	2024-05-27 07:59:19 -07:00
Chi Lo	454fcdde00	[TensorRT EP] Weightless API integration (#20412 ) This PR includes the weight-stripped engine feature (thanks @moraxu for the #20214) which is the major feature for TRT 10 integration. Two TRT EP options are added: - `trt_weight_stripped_engine_enable`: Enable weight-stripped engine build and refit. - `trt_onnx_model_folder_path`: In the quick load case using embedded engine model / EPContext mode, the original onnx filename is in the node's attribute, and this option specifies the directory of that onnx file if needed. Normal weight-stripped engine workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f314865-cbda-4979-a7ac-b31c7a553b56) Weight-stripped engine and quick load workflow: ![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f31db51-a7a8-495b-ba25-54c7f904cbad) see the doc [here ](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#tensorrt-ep-caches)for more information about EPContext model. --------- Co-authored-by: yf711 <yifanl@microsoft.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: pengwa <pengwa@microsoft.com> Co-authored-by: wejoncy <wejoncy@163.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Yi Zhang <your@email.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: inisis <46103969+inisis@users.noreply.github.com> Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: mo-ja <60505697+mo-ja@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com> Co-authored-by: Atanas Dimitrov <70822030+neNasko1@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Dhruv Matani <dhruvbird@gmail.com> Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Xiaoyu <85524621+xiaoyu-work@users.noreply.github.com> Co-authored-by: Xu Xing <xing.xu@intel.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Andrew Fantino <15876180+afantino951@users.noreply.github.com> Co-authored-by: Thomas Boby <thomas@boby.uk> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Michal Guzek <mguzek@nvidia.com> Co-authored-by: George Wu <jywu@microsoft.com>	2024-05-26 12:24:17 -07:00
Changming Sun	439ed92b96	Remove TVM EP's pipeline (#20813 ) ### Description Temporarily remove TVM EP's pipeline until someone helps us upgrade TVM to a newer version which is compatible with the latest ONNX. ### Motivation and Context The ONNX version that TVM EP uses has a known security vulnerability. We cannot continue using it in our hosted build environment. This change is temporary	2024-05-25 20:42:41 -07:00
Adrian Lizarraga	5bae32eb34	Extend DoubleQDQPairsRemover to handle sequences that end in duplicate DQ nodes (#20759 ) ### Description Extend the DoubleQDQPairsRemover optimizer to also handle sequences that end in duplicate DQ nodes. For example, the following sequence: ``` Q1 --> DQ1 --> Q2 --+--> DQ2 \| +--> DQ2' ``` Is now simplified to: ``` Q1 ---+--> DQ2 \| +--> DQ2' ``` ### Motivation and Context The EnsureUniqueDQNodeUnits pass may add duplicate DQ nodes to ensure valid QDQ node units. The DoubleQDQPairsRemover should still be able to remove unnecessary QDQ ops if the target sequence ends in duplicate DQ nodes. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-05-24 18:30:15 -07:00
Chi Lo	a7bc49a565	[TensorRT EP] Use latest commit of onnx-tensorrt parser (#20758 ) The 10 GA branch updated with several issues fixed. https://github.com/onnx/onnx-tensorrt/commits/10.0-GA/	2024-05-24 16:44:16 -07:00
Suryaprakash Shanmugam	1765da17e4	QDQ transformations in the OpenVINO EP for the NPU device (#20622 ) We introduce rulesets that eliminate QDQ nodes of unsupported types and for unsupported quantised operators for the NPU device. This leads to improved performance and accuracy on critical client AI models. Here's a summary of the changes: - Introduces the provider option `enable_qdq_optimizer` which when set to `True` enables stripping of QDQ nodes on the NPU device for models with `QuantizeLinear` and `DequantizeLinear` layers in them. `enable_qdq_optimizer` defaults to `False`. - Always strip out int16/uint16 QDQ layers as these types are not supported by the NPU compiler. - Only supported ops `Conv`, `MatMul`, and `Add` retain QDQ layers around them, specifically identified for optimal inference performance. OpenVINO EP achieves this by iterating through NodeUnits in the QDQ model, and reconstructing the graph only with the required layers. - Added provider APIs to manipulate node units from EP code by @adrianlizarraga - Added capability rule for the Pad operator when it takes DQ layers as input - Fixes from static code analysis tool --------- Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>	2024-05-24 16:25:05 -07:00
Adam Louly	ed8275883a	[Training] Add bf16 support to GatherElementsGrad. (#20796 ) ### Description Adding bf16 support to GatherElementsGrad. --------- Co-authored-by: Adam Louly <adamlouly@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>	2024-05-24 15:55:14 -07:00
Suryaprakash Shanmugam	76e1a06986	Fix ordering of value info in GraphProto creation (#20691 ) ### Description Graph member value_info_ (unordered_set) is ordered before its values are added to the graph proto. ### Motivation and Context - Without this ordering, the model proto used by the OpenVINO EP is not deterministic and varies across runs. - Since the model proto varies, it affects caching attempts by OpenVINO. Q: If creating a vector to have ordered elements is costly, should we make value_info_ a std::set that is sorted according to NodeArg names? Related PR about ordering initializers: https://github.com/microsoft/onnxruntime/pull/14631	2024-05-24 10:49:32 -07:00
Peishen Yan	cfe68e489e	[WebNN EP] Support Trilu op (#20730 ) Adds support for Trilu via WebNN Triangular op	2024-05-24 10:46:54 -07:00
Guenther Schmuelling	33a68d221f	add missing file for pr20791 (#20811 ) this file should have been in pr20791 to allow fp16 in the tile implementation	2024-05-24 09:59:13 -07:00
Jian Chen	10c425a4d5	Fix Onnx >= to == (#20798 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-24 09:16:23 -07:00
Jian Chen	fe24006425	Fix Nuget Cuda pipeline package pipeline (#20741 ) ### Description <!-- Describe your changes. --> This PR adding protoc.exe to make the Nuget Cuda Pipleine, which also allowing it to get build Java for various CUDA version ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-24 09:15:57 -07:00
Satya Kumar Jandhyala	bab5037eab	Eliminate explicit Concat operations in Attention (#20556 ) ### Description Remove explicitly concatinating pastKey with Key and pastValue with Value. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-24 09:07:57 -07:00
Changming Sun	535a030b1e	Remove manylinux build scripts from python packaging pipeline (#20786 ) ### Description Use a common set of prebuilt manylinux base images to build the packages, to avoid building the manylinux part again and again. The base images can be used in GenAI and other projects too. This PR also updates the GCC version for inference python CUDA11/CUDA12 builds from 8 to 11. Later on I will update all other CUDA pipelines to use GCC 11, to avoid the issue described in https://github.com/onnx/onnx/issues/6047 and https://github.com/microsoft/onnxruntime-genai/issues/257 . ### Motivation and Context To extract the common part as a reusable build infra among different ONNX Runtime projects.	2024-05-24 08:18:22 -07:00
Jian Chen	884acd4598	Fix Nuget-Cuda pubish pipeline (#20794 ) ### Description Previous all feed are set to nightly, the offcial released feed-id is not set ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-23 18:27:46 -07:00
Guenther Schmuelling	0cf7caaff2	[js/webgpu] enable fp16 for tile (#20791 )	2024-05-23 16:59:39 -07:00
Edward Chen	d1af19db9d	Add some CPU feature detection for Apple platforms. (#20769 ) Add CPUIDInfo::ArmAppleInit() to detect CPU features on Apple platforms. This initial implementation is not comprehensive.	2024-05-23 15:59:46 -07:00
Changming Sun	b522df0ae4	Update RE2 to the latest (#20775 ) Update RE2 to the latest. To keep the components up to date.	2024-05-23 14:30:15 -07:00
Yulong Wang	0996d6e19e	[tools] update pipeline list for run_CIs_for_external_pr.py (#20776 ) ### Description add required pipeline "Linux Android Emulator QNN CI Pipeline"	2024-05-23 10:38:42 -07:00
Yi Zhang	fa8670fe5b	Add a test image for stable diffusion (#20780 )	2024-05-23 08:50:23 -07:00
Wanming Lin	2c39d0c502	[WebNN EP] Disable ConvTranspose for WebNN CPU (#20762 ) WebNN CPU backend implementation has been migrated from XNNPack to TFLite, currently TFLite has not supported WebNN's convTranspose2d yet, just disable it for now.	2024-05-22 20:59:37 -07:00
Adam Louly	529feb01f4	Add BF16 for Scale Op. (#20753 ) Adding Bfloat16 to scale op --------- Co-authored-by: Adam Louly <adamlouly@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>	2024-05-22 17:01:17 -07:00
Edward Chen	a39f8862fd	SQNBitGemm - move workspace size calculation functions to hardware-specific implementations (#20757 ) The workspace usage may be hardware-specific. Moving away from a common workspace size calculation allows more flexibility in the hardware-specific implementations.	2024-05-22 15:12:17 -07:00
Jian Chen	d4fe4b5b51	Replace ubuntu-latest with onnxruntime-Ubuntu2204-AMD-CPU (#20736 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-22 13:36:02 -07:00
Jian Chen	0a10a3003a	component-governance fix round 4 (#20754 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-22 11:05:24 -07:00
Yulong Wang	e412bc1919	[doc] update file size table for ORT Web (#20755 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-22 11:04:57 -07:00
Xu Xing	f1fef19b6e	[js/webgpu] Support shared memory for transpose 2d (#19267 ) For 1024x1024, without shared memoey, 18.7ms. With shared memory 13.2ms.	2024-05-22 08:15:44 -07:00
Yulong Wang	068bb3d5ee	[js/webgpu] add missing space in build script (#20752 )	2024-05-21 16:24:34 -07:00
Chi Lo	df01e0d497	[TensorRT EP] Update ORT kernel output with TRT DDS int64 output for TRT 10 (#20738 ) TRT 10 now natively supports int64 tensor, so needs to updating the code where binding the ORT kernel output with DDS int64 output.	2024-05-21 09:03:48 -07:00
pengwa	8a98874e7e	Flash attention recompute (#20603 ) ### Flash attn recompute 1. Allow PythonOp(FlashAttn) can be recomputed correctly. `45879ff5c2` 2. Use JSON to pass the selected-to-recompute subgraphs. `3c374da678` #### Better Memory Efficiency Customer model can run both PyTorch SPDA and Flash Attn, this PR make it possible to let the Flash Attn path work with ORTModule layerwise recompute. The peak drop from 45.xGB to 32.xGB if we only compare the layers (not including other pieces, BTW there are few more optimization targeting other pieces as well later). #### Better Perf Using Flash ATTN bring additionally 16% end to end time reduction, with highly aligned loss curve. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/bb63894a-f281-49bc-a8e6-ff818439be38) #### Use JSON File to pass Recompute Plans To overcome the limitation of max length of the strings defined in session options. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-21 13:38:19 +08:00

1 2 3 4 5 ...

11135 commits