pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
drisspg	72da0a8a42	[Submodule] Add flash as third-party submodule [Prep for later PRs] (#145502 ) # Context Prototyped here: https://github.com/pytorch/pytorch/pull/144120, we are going to make flash-attention a 3rd party submodule. We will then use the c++ sources and include into our build of libtorch.so This requires various changes to work including external and internal changes. Since these require internal changes we need to co-dev and in the co-dev environment I haven't found a way to sync submodule changes + internal only changes. This is unused for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/145502 Approved by: https://github.com/Skylion007	2025-01-24 09:21:41 +00:00
Nikhil Gupta	41b38f755c	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 ) https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue. 1. This reverts commit `0940eb6d44` (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes https://github.com/pytorch/pytorch/issues/145273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505 Approved by: https://github.com/malfet	2025-01-23 18:50:59 +00:00
albanD	0940eb6d44	Reverting the PR adding Kleidiai-based int4 kernels (#145392 ) Mitigation for https://github.com/pytorch/pytorch/issues/145273 Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai	2025-01-22 20:11:49 +00:00
Driss Guessous	3afc5170d4	[Submodule] Upgrade to Cutlass 3.6 part deux (#144911 ) # Summary Take 2 of [D67866269](https://www.internalfb.com/diff/D67866269) Main change is that we identified and fixed the FA2 regression. More details can be found here https://github.com/pytorch/pytorch/issues/144729 and have landed that before this here: [D68194635](https://www.internalfb.com/diff/D68194635) Differential Revision: D68194470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144911 Approved by: https://github.com/eqy, https://github.com/Skylion007	2025-01-17 00:53:42 +00:00
Yutao Xu	6470b0ea6f	Update torch-xpu-ops commit pin (#144739 ) Update the torch-xpu-ops commit to [22cc419e4e60f469341712a5a103fa309a7dfd48](`22cc419e4e`), includes: - Fix building issue https://github.com/intel/torch-xpu-ops/issues/1279 - Aten operator coverage improvement Note: new torch-xpu-ops commit don't support bundle 0.5.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144739 Approved by: https://github.com/EikanWang, https://github.com/malfet	2025-01-16 15:12:37 +00:00
Driss Guessous	db787181b5	Back out "[Submodule] Upgrade to Cutlass 3.6" (#144738 ) Summary: Revert due to perf regressions see: https://github.com/pytorch/pytorch/issues/144729 Test Plan: sand castle Differential Revision: D68137326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144738 Approved by: https://github.com/huydhn	2025-01-15 02:57:14 +00:00
Xu Han	c9afa00a85	update sleef for disable libm on Windows [submodule Sleef] (#142245 ) This PR is implement of RFC: https://github.com/pytorch/pytorch/issues/141946 Changes: 1. Update `Sleef` to contains it's PRS: https://github.com/shibatch/sleef/pull/603 2. Set `SLEEF_BUILD_WITH_LIBM` to `OFF`, it is turn off CMake find_library(libm) of `Sleef`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142245 Approved by: https://github.com/EikanWang, https://github.com/atalman Co-authored-by: Eikan Wang <eikan.wang@intel.com>	2025-01-11 00:11:55 +00:00
Xu Han	bd1f5d1c32	update xnnpack for disable libm on Windows [submodule XNNPACK] (#141943 ) This PR is implement of RFC: https://github.com/pytorch/pytorch/issues/141946 Changes: 1. Update `XNNPACK` to contains it's PRS: https://github.com/google/XNNPACK/pull/7456, https://github.com/google/XNNPACK/pull/7535 and other build fixing PRs. 2. Set `XNNPACK_BUILD_WITH_LIBM` to `OFF`, it is turn off CMake find_library(libm) of `XNNPACK`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141943 Approved by: https://github.com/atalman	2025-01-10 00:47:41 +00:00
drisspg	206a932f23	[Submodule] Upgrade to Cutlass 3.6 (#144180 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144180 Approved by: https://github.com/eqy, https://github.com/Skylion007	2025-01-09 21:56:53 +00:00
PyTorch MergeBot	f71688f30d	Revert "[Submodule] Upgrade to Cutlass 3.6 (#144180 )" This reverts commit `f2c1033178`. Reverted https://github.com/pytorch/pytorch/pull/144180 on behalf of https://github.com/huydhn due to Ops, this fails some slow tests. Please help fix and reland this ([comment](https://github.com/pytorch/pytorch/pull/144180#issuecomment-2581302233))	2025-01-09 21:45:39 +00:00
drisspg	f2c1033178	[Submodule] Upgrade to Cutlass 3.6 (#144180 ) Differential Revision: [D67866269](https://our.internmc.facebook.com/intern/diff/D67866269) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144180 Approved by: https://github.com/eqy, https://github.com/Skylion007	2025-01-09 17:29:58 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
Aaron Gokaslan	5c783bf410	[BE][Ez]: Update CUDNN Frontend submodule to 1.9.0 (#144200 ) * Update CUDNN Frontend to 1.9.0, which include some API improvements, new features, and bugfixes. This is a header only lib fix so should be pretty straight forward. * Nicest feature is that it now logs / print warnings when the CUDNN compiled version does not match the dynamically loaded one * Fixes corrupted / truncated log lines from being printed by CUDNN Frontend Pull Request resolved: https://github.com/pytorch/pytorch/pull/144200 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-01-06 17:33:38 +00:00
Yutao Xu	1e881ceecf	Update torch-xpu-ops commit pin (#143984 ) Update the torch-xpu-ops commit to [28cfac20ec662abdb0ac98faf122450013e8f520](`28cfac20ec`), includes: - Disable batch_norm vectorization path to fix accuracy issues. - Fix the LSRM/RNN implementation error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143984 Approved by: https://github.com/EikanWang, https://github.com/ruidazeng, https://github.com/desertfire, https://github.com/jansel	2025-01-05 09:01:36 +00:00
drisspg	005a4b9537	[Submodule] Bump Cutlass to 3.5.1 OSS PR (#144000 ) ## Summary Follow up PR to https://github.com/pytorch/pytorch/pull/143515. That PR added a bunch of macro switches to ensure both 3.4 and 3.5.1 built succesfully. This PR actual bumps the cutlass pin to 3.5.1. I am going to do a stack on top to add an conditional gates for 3.6 hijacking the 3.4 switches. We will leap frog our way to the top :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144000 Approved by: https://github.com/Skylion007, https://github.com/eqy, https://github.com/malfet	2025-01-04 18:04:03 +00:00
Aaron Gokaslan	baee623691	[BE][Ez]: Update fmtlib submodule to 1.11.1 (#143937 ) * Exactly the same as previous fmtlib except it fixes an edgecase that could affect ABI compatibility between fmtlib versions. * Seems safe to update Pull Request resolved: https://github.com/pytorch/pytorch/pull/143937 Approved by: https://github.com/albanD	2024-12-30 19:46:27 +00:00
Yutao Xu	2ed4d65af0	Update torch-xpu-ops commit pin (#143853 ) Update the torch-xpu-ops commit to [214f33](`214f33b9d9`), includes: - Fix building issue for transformer related operators - Improve XPU operator coverage Pull Request resolved: https://github.com/pytorch/pytorch/pull/143853 Approved by: https://github.com/EikanWang	2024-12-30 02:38:16 +00:00
cyy	e05bfb8ee3	[Submodule] Bump libfmt to 11.1.0 (#143843 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143843 Approved by: https://github.com/Skylion007	2024-12-26 04:49:11 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Nikhil Gupta	94737e8a2a	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-20 19:32:03 +00:00
Xu Han	2daa666591	update kineto to XPU Windows fixed PR. [submodule kineto] (#143445 ) Include XPU Windows Fixed PR: https://github.com/pytorch/kineto/pull/1012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143445 Approved by: https://github.com/sraikund16	2024-12-20 05:57:30 +00:00
PyTorch MergeBot	8136daff5a	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit `4b82251011`. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))	2024-12-19 23:33:17 +00:00
Nikhil Gupta	4b82251011	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-19 18:51:26 +00:00
Aditya Tewari	a97c6a78a8	Upgrade submodule ideep for bf16f32 matmul changes (#143508 ) This change will enable this PR #140159 to pick proper kernels in bf16 mode for SDPA layer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143508 Approved by: https://github.com/yanbing-j, https://github.com/jgong5	2024-12-19 06:49:16 +00:00
PyTorch MergeBot	14fe1f7190	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit `d3ff2d42c2`. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))	2024-12-19 01:05:11 +00:00
Nikhil Gupta	d3ff2d42c2	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-18 22:30:07 +00:00
Max Ren	20718cdebb	[Fast Packing] Add packing ukernels to gemm config (#142191 ) Add file to buck build Differential Revision: [D66692673](https://our.internmc.facebook.com/intern/diff/D66692673/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D66692673/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/142191 Approved by: https://github.com/kirklandsign, https://github.com/digantdesai	2024-12-10 01:06:17 +00:00
Yutao Xu	3cdd997f4c	Update torch-xpu-ops commit pin (#142113 ) Update the torch-xpu-ops commit to [7ecb0b](`7ecb0b1a56`), includes: - Capture rrelu_with_noise noise mutation in compile (Reslove https://github.com/pytorch/pytorch/issues/142102) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142113 Approved by: https://github.com/EikanWang	2024-12-05 17:00:29 +00:00
Yutao Xu	b31d3b2f41	Update torch-xpu-ops commit pin (#141949 ) Update the torch-xpu-ops commit to [f31219](`f312190a92`), includes: - Add lazy init for empty_xpu - Fix nan propagation error for soft_shrink Pull Request resolved: https://github.com/pytorch/pytorch/pull/141949 Approved by: https://github.com/EikanWang	2024-12-05 05:22:38 +00:00
Max Ren	16676fd17b	Disable unused ARM SME to reduce android app binary size (#141942 ) Summary: ARM SME kernels aren't currently used right now, so disabling their build so Reviewed By: digantdesai Differential Revision: D66336599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141942 Approved by: https://github.com/digantdesai	2024-12-04 07:24:50 +00:00
Sun, Jiayi	deffbbdb91	Update submodule ideep for pd cache changes (#141555 ) Fixes https://github.com/pytorch/pytorch/issues/141327. Fixes https://github.com/pytorch/pytorch/issues/141328. Fixes https://github.com/pytorch/pytorch/issues/141329. Fixes https://github.com/pytorch/pytorch/issues/141330. Fixes https://github.com/pytorch/pytorch/issues/141331. Summary: 1. Modify to_bytes function to include binary_src shape information into the keys of pd cache. 2. Modify inner_product_forward to support broadcast add fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141555 Approved by: https://github.com/jgong5	2024-12-04 04:55:33 +00:00
Xiaozhu Meng	d035db3d86	[AMD] [submodule] aten.bmm CK-backend prototype (#140758 ) Summary: Early prototype of adding CK backend for aten.bmm. Currently, it is very limited in that: 1. BF16 only 2. A single CK instance 3. NT layout only 4. Alpha=1, Beta=0 only Reviewed By: xw285cornell, zjing14 Differential Revision: D65954695 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140758 Approved by: https://github.com/bradleyhd	2024-12-03 06:54:51 +00:00
atalman	c17ba69ba5	[submodule] Revert "Adds support for accelerated sorting with x86-simd-sort (#127936 ) (#141901 ) Looks like the original PR caused: https://github.com/pytorch/pytorch/issues/140590 Please see comment: https://github.com/pytorch/pytorch/issues/140590#issuecomment-2508704480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141901 Approved by: https://github.com/andrewor14, https://github.com/malfet	2024-12-03 00:16:35 +00:00
Yutao Xu	81ab2cc757	Update torch-xpu-ops commit pin (#141201 ) Update the torch-xpu-ops commit to [1e32bbc](`1e32bbc3d9`), includes: - Improve XPU aten operator coverage - Support basic `SparseXPU` operators Pull Request resolved: https://github.com/pytorch/pytorch/pull/141201 Approved by: https://github.com/EikanWang, https://github.com/jansel	2024-12-02 01:49:07 +00:00
Mengwei Liu	e28b09517f	[miniz] Make sure miniz extra_size_remaining doesn't go off bound (#141266 ) #140041 added some logic to fix a zip64 header error. This PR makes sure `extra_size_remaining` doesn't overflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141266 Approved by: https://github.com/angelayi	2024-11-21 22:02:28 +00:00
Sun, Jiayi	dcf7728fd6	Update submodule ideep for ideep conv changes (#141101 ) Summary: Update submodule ideep to include ideep conv changes: modify convolution_forward to support broadcast add fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141101 Approved by: https://github.com/Skylion007, https://github.com/jgong5	2024-11-21 12:26:24 +00:00
Nikita Shulga	f0f6144381	[EZ][BE] Update googletest submodule (#140988 ) From v1.11.0 (released in Jun 2021) to v1.15.2 (release in Jul 2024) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140988 Approved by: https://github.com/izaitsevfb, https://github.com/huydhn	2024-11-19 07:49:16 +00:00
Max Ren	cca34be584	Update XNNPACK Version (#139913 ) Updating XNNPACK Version to 4ea82e595b36106653175dcb04b2aa532660d0d8 submodule update Pull Request resolved: https://github.com/pytorch/pytorch/pull/139913 Approved by: https://github.com/digantdesai, https://github.com/huydhn	2024-11-18 18:16:31 +00:00
Yutao Xu	ae7f809bfc	Update torch-xpu-ops commit pin (#140782 ) Update the torch-xpu-ops commit to [bf4bab1](`bf4bab1fff`), includes: - Fix Werror=terminate relevant building issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/140782 Approved by: https://github.com/EikanWang	2024-11-15 10:10:52 +00:00
Shivam Raikundalia	f57ef5ddf2	Update Kineto Submodule (#140629 ) Summary: Update Submodule from Oct 10, 2024 to Nov 13, 2024 Test Plan: CI Passes Differential Revision: D65915865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140629 Approved by: https://github.com/ngimel, https://github.com/Skylion007, https://github.com/briancoutinho	2024-11-14 21:23:59 +00:00
Yutao Xu	f1e045eb75	Update torch-xpu-ops commit pin (#140277 ) Update the torch-xpu-ops commit to [01f4e29](`01f4e293fa`), includes: - Improve XPU operator coverage - Fix `Werror=comments` relevant building issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/140277 Approved by: https://github.com/EikanWang, https://github.com/atalman	2024-11-13 23:38:51 +00:00
Yutao Xu	c3087ace58	Update torch-xpu-ops commit pin (#139986 ) Update the torch-xpu-ops commit to [5e29831 ](https://github.com/intel/torch-xpu-ops/commit/5e29831). Includes: - OneAPI-2025 build issue fix - Enhancement of the XPU operator coverage Pull Request resolved: https://github.com/pytorch/pytorch/pull/139986 Approved by: https://github.com/guangyey, https://github.com/jansel	2024-11-10 06:49:38 +00:00
Mengwei Liu	a02e88d19c	[miniz] Bump miniz version to 3.0.2 and add patch for zip64 (#140041 ) Summary: Bump miniz version from 2.1.0 to 3.0.2 and apply these patches: * #79636 patches internal BUCK and bazel build * #138959 adds `bool compute_crc32` argument * miniz PR: https://github.com/richgel999/miniz/pull/324 to support zip64 Anyone bumping miniz version again, please apply these patches as well. Test Plan: Rely on unit test Imported from OSS Differential Revision: D65586230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140041 Approved by: https://github.com/mikaylagawarecki	2024-11-09 00:13:16 +00:00
Matthew Sterrett	7e65060410	Adds support for accelerated sorting with x86-simd-sort (#127936 ) Adds x86-simd-sort as a submodule to accelerate sorting for 32-bit and 64-bit datatypes when AVX2 or AVX512 are available. For contiguous data, this can be over a 10x speedup for large arrays. For discontiguous data, it can give over a 4x speedup with larger arrays. These benchmarks were gathered on a Skylake system (7900x), limited to 8 threads. <details> <summary><b>Contiguous Benchmarks</b></summary> ``` float32, normally distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.150844336 6.886271477 7.132277489 1.038420335 1.002603214 128 9.208030939 8.478154898 7.846915245 1.086089019 1.173458697 1024 37.79037627 23.60707456 16.44122627 1.600807257 2.298513241 10000 714.7355628 203.9921844 105.5683001 3.503739934 6.770361577 100000 8383.074408 721.6333354 465.3709247 11.61680593 18.01374766 1000000 97124.31945 5632.054572 3920.148401 17.24491803 24.77567416 10000000 1161974.907 86070.48988 71533.82301 13.50027063 16.24371323 int32_t, uniformly distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.203208685 6.92212224 7.014458179 1.040606975 1.026908779 128 8.972388983 8.195516348 7.592543125 1.094792396 1.18173698 1024 32.77489477 23.6874548 15.36617105 1.383639359 2.132925285 10000 607.8824128 193.3402024 99.25090471 3.144107667 6.124703997 100000 523.9384684 608.1836536 442.3166784 0.861480682 1.184532472 1000000 5211.348627 5271.598405 3518.861883 0.988570871 1.480975611 10000000 133853.6263 81463.05084 67852.97394 1.643120714 1.972700952 ``` </details> Note that the int32_t sort is accelerated by FBGEMM's radix sort for larger arrays, but this only handles contiguous data and in one sorting direction. <details> <summary><b>Discontiguous Benchmarks</b></summary> ``` float, normal distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.836543679 4.011214256 3.84376061 0.956454439 0.99812243 128 5.755310194 5.755723127 4.820394962 0.999928257 1.193949923 1024 49.46946019 24.78790785 15.47874362 1.995709379 3.195960952 10000 665.2505291 236.6165959 143.9490662 2.811512551 4.621429974 100000 4328.002203 1329.001212 818.3516414 3.256582586 5.288682743 1000000 47651.5018 16693.72045 11827.39551 2.854456677 4.028909133 10000000 556655.1288 236252.6258 184215.9828 2.356185998 3.021752621 int32_t, uniformly distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.817994356 3.878117442 3.770039797 0.984496837 1.012719908 128 5.578731397 5.577152082 4.716770534 1.000283176 1.182743862 1024 43.3412619 23.61275801 14.55446819 1.835501887 2.977866408 10000 634.3997478 224.4322851 133.9518324 2.826686667 4.736028889 100000 4084.358152 1292.363303 781.7867576 3.16037924 5.22438902 1000000 46262.20465 16608.35284 11367.51817 2.785478192 4.06968381 10000000 541231.9104 235185.1861 180249.9294 2.301301028 3.002674742 ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127936 Approved by: https://github.com/jgong5, https://github.com/peterbell10, https://github.com/sanchitintel	2024-11-02 02:14:01 +00:00
Yu, Guangye	d08dbd0436	Update torch-xpu-ops commit pin (#139041 ) # Motivation This PR intends to update torch-xpu-ops commit pin. It mainly includes the following two highlighted changes: 1. split the DLL library into 4 smaller libraries to avoid the 2G limitation on Windows; 2. some new operators added, for example, `cdist`, `pdist`, `maxunpool2d`, `maxunpood3d`, `upsample_trilinear3d, `Bessel operators`, etc... # Additional Context We have to supply XPU device check logic in `cdist` and `pdist` ops. This PR depends on https://github.com/pytorch/pytorch/pull/139050 to fix Windows build issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139041 Approved by: https://github.com/EikanWang, https://github.com/ezyang	2024-10-31 05:06:06 +00:00
Piotr Bialecki	bd88d40e5f	[Submodule] update submodule onnx==1.17.0 (#139128 ) Follow-up PR of: https://github.com/pytorch/pytorch/pull/138719 CC @malfet @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/139128 Approved by: https://github.com/malfet	2024-10-31 02:50:00 +00:00
Joseph Macaranas	edf2a1be97	[ROCm][CK] Explicit cast values to half (#138751 ) Addresses ambiguous conversions and calls introduced by these two pull requests: [[ROCm] CK-based GEMM](https://github.com/pytorch/pytorch/pull/131004) [[AMD] Fix torch ck backend build with 6.2.1](https://github.com/pytorch/pytorch/pull/138434) Co-authored-by: cjatin <cjatin@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138751 Approved by: https://github.com/jeffdaily Co-authored-by: pruthvistony <pruthvigithub@gmail.com> Co-authored-by: cjatin <cjatin@users.noreply.github.com>	2024-10-28 22:00:26 +00:00
Wouter Devriendt	bae3426af7	reimport pr137735 due to merging check issues (#138959 ) This is a cherry-pick from #137735 by @mikaylagawarecki , that cannot be merged due to a (wrongly) failing check for codev @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/138959 Approved by: https://github.com/mikaylagawarecki	2024-10-27 16:31:34 +00:00
Aaron Gokaslan	4af93fdb77	[BE]: Update cudnn_frontend submodule to 1.8.0 (#138709 ) Update cudnn frontend. Let's see what breaks @eqy Pull Request resolved: https://github.com/pytorch/pytorch/pull/138709 Approved by: https://github.com/eqy	2024-10-26 01:55:33 +00:00
Yu, Guangye	0efa590d43	[CI] Fix XPU CI failure (#138548 ) # Motivation Fix https://github.com/pytorch/pytorch/issues/138577. # Solution 1. All UTs in `test/inductor/test_compiled_optimizers.py` are fixed by https://github.com/pytorch/pytorch/pull/134170 2. UT in `test/inductor/test_pattern_matcher.py` is introduced by https://github.com/pytorch/pytorch/pull/138089, we will skip this UT due to the unsupported feature `max_autotune_gemm_backends:Triton`. 3. We have a new impl related to `histc`, so we remove the expected failure from `test/inductor/test_torchinductor_opinfo.py` 4. We support `avg_pool3d` for `fp16` data type, so we remove the expected failure from `test/inductor/test_torchinductor_opinfo.py` 5. CUDA-bias code is introduced by https://github.com/pytorch/pytorch/issues/138472, we just generalize it to `GPU_TYPE`. # Additional Context > Why update torch-xpu-ops commit pin here? We have to update commit pin to avoid the build failure raised by the code change [C10_UNUSED](https://github.com/pytorch/pytorch/pull/138364). > What does the feature of torch-xpu-ops update? 1. Add some foreach ops, like `unary ops` and `foreach_clamp_max` etc; 2. Add some maxpool ops forward and backward, like `averge_pool3d` and `max_pool3d` 3. Add some other ops, like `log_normal_`, `index_copy`, and `mode` etc; 4. fix build failure related to `C10_UNUSED`; Pull Request resolved: https://github.com/pytorch/pytorch/pull/138548 Approved by: https://github.com/malfet, https://github.com/EikanWang	2024-10-24 07:56:26 +00:00

1 2 3 4 5 ...

1738 commits