pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
PyTorch MergeBot	709752e0bb	Revert "[AOTI] Switch OSS dashboard to use aoti_compile_and_package (#139154 )" This reverts commit `293fbb42d2`. Reverted https://github.com/pytorch/pytorch/pull/139154 on behalf of https://github.com/desertfire due to cpu_aot_inductor_amp_freezing fails ([comment](https://github.com/pytorch/pytorch/pull/139154#issuecomment-2452983651))	2024-11-02 13:04:00 +00:00
Edward Z. Yang	f6be44c74e	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-02 11:50:11 +00:00
Ke Wen	55038aa661	[PGNCCL] Make sure we do not use split for P2P comm creation (#139013 ) Resolve comment https://github.com/pytorch/pytorch/pull/138527#issuecomment-2438613172 There was a split-vs-P2P bug: When P2P comm creation invokes `getNCCLComm`, it may see a `split_from` options which is meant for the previous PG creation. Then the P2P comm creation may use `ncclCommSplit` and hang, because not all ranks join this call. The bug slips previously/today because there is no CI test with the following recipe: eager init + new group + P2P in that new group. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139013 Approved by: https://github.com/shuqiangzhang	2024-11-02 07:47:55 +00:00
PyTorch MergeBot	2a3fe06ce0	Revert "[Partitioner] Enumerate partitions by iterating partition ids (#136598 )" This reverts commit `39ec5a20ea`. Reverted https://github.com/pytorch/pytorch/pull/136598 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails an executorch test https://github.com/pytorch/executorch/blob/main/exir/backend/test/test_graph_partition.py#L114-L175 ([comment](https://github.com/pytorch/pytorch/pull/136598#issuecomment-2452903705))	2024-11-02 07:19:22 +00:00
PyTorch MergeBot	f3238106fd	Revert "Allow inplacing buffer when other users are inconsequential (#138383 )" This reverts commit `030f70b40b`. Reverted https://github.com/pytorch/pytorch/pull/138383 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but I think it has a test failing internally and also on ROCm ([comment](https://github.com/pytorch/pytorch/pull/138383#issuecomment-2452898229))	2024-11-02 06:53:48 +00:00
PyTorch MergeBot	0863d6a08e	Revert "[inductor] Remove SIMDKernel.last_usage (#139364 )" This reverts commit `286d3ce266`. Reverted https://github.com/pytorch/pytorch/pull/139364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:11 +00:00
PyTorch MergeBot	9331640e26	Revert "[inductor] Remove Node.last_usage mutation (#139365 )" This reverts commit `1e934b473c`. Reverted https://github.com/pytorch/pytorch/pull/139365 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:10 +00:00
PyTorch MergeBot	dc4b459737	Revert "[inductor] Move remove_kernel_local_buffers to Kernel (#139370 )" This reverts commit `b57b4b7f9b`. Reverted https://github.com/pytorch/pytorch/pull/139370 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:10 +00:00
PyTorch MergeBot	66a401c9e1	Revert "[inductor] Simplify remove_kernel_local_buffers (#139452 )" This reverts commit `73c0762a34`. Reverted https://github.com/pytorch/pytorch/pull/139452 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:10 +00:00
PyTorch MergeBot	98e11b0021	Revert "[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 )" This reverts commit `c53beab377`. Reverted https://github.com/pytorch/pytorch/pull/139523 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing lots of internal tests in D65345157 ([comment](https://github.com/pytorch/pytorch/pull/139364#issuecomment-2452897337))	2024-11-02 06:49:10 +00:00
Bob Ren	fdd298dcb7	add hex method on SymFloat (#139451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139451 Approved by: https://github.com/ezyang	2024-11-02 05:33:19 +00:00
PyTorch MergeBot	8d1eaa3da6	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `a6630bcf87`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to internal code triggers import cycle ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452833882))	2024-11-02 03:38:15 +00:00
drisspg	540f3ef9b1	Fix flex_decode to build offsets off of strides (#139516 ) Fixes PR: https://github.com/pytorch/pytorch/issues/139462 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139516 Approved by: https://github.com/Chillee	2024-11-02 03:17:46 +00:00
Bin Bao	293fbb42d2	[AOTI] Switch OSS dashboard to use aoti_compile_and_package (#139154 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139154 Approved by: https://github.com/angelayi ghstack dependencies: #139153	2024-11-02 03:10:05 +00:00
Bin Bao	a46a79fe92	[AOTI] Ignore .o files in package_aoti (#139153 ) Summary: There is no point to package .o files since a .so file is included in that package. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139153 Approved by: https://github.com/angelayi	2024-11-02 03:10:05 +00:00
Jason Ansel	c53beab377	[inductor] sympy.Integer([01]) -> sympy.S.(Zero\|One) (#139523 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139523 Approved by: https://github.com/ezyang ghstack dependencies: #139364, #139365, #139370, #139452	2024-11-02 03:04:22 +00:00
Justin Chu	387b120549	[ONNX] Remove type promotion rule for pow (#139527 ) ONNX supports different input types in Pow, so type promotion is not needed. The resulting graph is the following: ```py ONNXProgram( model= < ir_version=9, opset_imports={'': 18, 'pkg.onnxscript.torch_lib.common': 1}, producer_name='pytorch', producer_version='2.6.0a0+git59a1af5', domain=None, model_version=None, > graph( name=main_graph, inputs=( %"x"<FLOAT16,[3]> ), outputs=( %"pow_1"<FLOAT16,[3]> ), ) { 0 \| # node_Constant_0 %"val_0"<?,?> ⬅️ ::Constant() {value=Tensor<FLOAT,[]>(array(2., dtype=float32), name=None)} 1 \| # node_Pow_1 %"pow_1"<FLOAT16,[3]> ⬅️ ::Pow(%"x", %"val_0") return %"pow_1"<FLOAT16,[3]> } ... , exported_program= ExportedProgram: class GraphModule(torch.nn.Module): def forward(self, x: "f16[3]"): # File: /workspace/pytorch/test/onnx/exporter/test_small_models_e2e.py:53 in forward, code: return x**2.0 pow_1: "f16[3]" = torch.ops.aten.pow.Tensor_Scalar(x, 2.0); x = None return (pow_1,) Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='x'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='pow_1'), target=None)]) Range constraints: {} ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139527 Approved by: https://github.com/titaiwangms	2024-11-02 02:19:50 +00:00
Matthew Sterrett	7e65060410	Adds support for accelerated sorting with x86-simd-sort (#127936 ) Adds x86-simd-sort as a submodule to accelerate sorting for 32-bit and 64-bit datatypes when AVX2 or AVX512 are available. For contiguous data, this can be over a 10x speedup for large arrays. For discontiguous data, it can give over a 4x speedup with larger arrays. These benchmarks were gathered on a Skylake system (7900x), limited to 8 threads. <details> <summary><b>Contiguous Benchmarks</b></summary> ``` float32, normally distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.150844336 6.886271477 7.132277489 1.038420335 1.002603214 128 9.208030939 8.478154898 7.846915245 1.086089019 1.173458697 1024 37.79037627 23.60707456 16.44122627 1.600807257 2.298513241 10000 714.7355628 203.9921844 105.5683001 3.503739934 6.770361577 100000 8383.074408 721.6333354 465.3709247 11.61680593 18.01374766 1000000 97124.31945 5632.054572 3920.148401 17.24491803 24.77567416 10000000 1161974.907 86070.48988 71533.82301 13.50027063 16.24371323 int32_t, uniformly distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.203208685 6.92212224 7.014458179 1.040606975 1.026908779 128 8.972388983 8.195516348 7.592543125 1.094792396 1.18173698 1024 32.77489477 23.6874548 15.36617105 1.383639359 2.132925285 10000 607.8824128 193.3402024 99.25090471 3.144107667 6.124703997 100000 523.9384684 608.1836536 442.3166784 0.861480682 1.184532472 1000000 5211.348627 5271.598405 3518.861883 0.988570871 1.480975611 10000000 133853.6263 81463.05084 67852.97394 1.643120714 1.972700952 ``` </details> Note that the int32_t sort is accelerated by FBGEMM's radix sort for larger arrays, but this only handles contiguous data and in one sorting direction. <details> <summary><b>Discontiguous Benchmarks</b></summary> ``` float, normal distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.836543679 4.011214256 3.84376061 0.956454439 0.99812243 128 5.755310194 5.755723127 4.820394962 0.999928257 1.193949923 1024 49.46946019 24.78790785 15.47874362 1.995709379 3.195960952 10000 665.2505291 236.6165959 143.9490662 2.811512551 4.621429974 100000 4328.002203 1329.001212 818.3516414 3.256582586 5.288682743 1000000 47651.5018 16693.72045 11827.39551 2.854456677 4.028909133 10000000 556655.1288 236252.6258 184215.9828 2.356185998 3.021752621 int32_t, uniformly distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.817994356 3.878117442 3.770039797 0.984496837 1.012719908 128 5.578731397 5.577152082 4.716770534 1.000283176 1.182743862 1024 43.3412619 23.61275801 14.55446819 1.835501887 2.977866408 10000 634.3997478 224.4322851 133.9518324 2.826686667 4.736028889 100000 4084.358152 1292.363303 781.7867576 3.16037924 5.22438902 1000000 46262.20465 16608.35284 11367.51817 2.785478192 4.06968381 10000000 541231.9104 235185.1861 180249.9294 2.301301028 3.002674742 ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127936 Approved by: https://github.com/jgong5, https://github.com/peterbell10, https://github.com/sanchitintel	2024-11-02 02:14:01 +00:00
Chen, Zejun	edd3f5a94d	[profiler] fix a building warning by adding USE_KINETO namespace for setTraceID (#139461 ) Fix: https://github.com/pytorch/pytorch/issues/139460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139461 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/sraikund16	2024-11-02 01:02:29 +00:00
Angela Yi	092fe2f422	Handle nan case when checking mutations (#139483 ) Test Plan: PT2 readiness models Differential Revision: D65340986 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139483 Approved by: https://github.com/zou3519	2024-11-02 00:49:05 +00:00
William Wen	b71e813bce	[dynamo, 3.13] fix bytecode nop tests (#139323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139323 Approved by: https://github.com/jansel	2024-11-02 00:39:36 +00:00
Bin Bao	8c17830dea	[AOTI] Unify how weights are stored as data section (#139471 ) Summary: https://github.com/pytorch/pytorch/pull/118076 introduced a cleaner way to link weights as a data section for macos. Unify the code by adopting that approach for Linux as well. Test Plan: CI Differential Revision: D65302273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139471 Approved by: https://github.com/chenyang78	2024-11-02 00:23:24 +00:00
PyTorch UpdateBot	aa54b2467f	[executorch hash update] update the pinned executorch hash (#139133 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139133 Approved by: https://github.com/pytorchbot	2024-11-02 00:14:47 +00:00
eellison	ee2f8a50d3	Class rename (#139490 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139490 Approved by: https://github.com/exclamaforte, https://github.com/zou3519 ghstack dependencies: #139295	2024-11-02 00:10:17 +00:00
PyTorch MergeBot	c95adb9c5b	Revert "use more elements per thread for narrow dtypes (#139449 )" This reverts commit `f5b9e725d1`. Reverted https://github.com/pytorch/pytorch/pull/139449 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but a bunch of tests are failing after it lands, it looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/139449#issuecomment-2452723863))	2024-11-01 23:42:16 +00:00
PyTorch MergeBot	b617d4813c	Revert "fix dynamo tracking numpy 2 ops (#138686 )" This reverts commit `124eac255e`. Reverted https://github.com/pytorch/pytorch/pull/138686 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I am seeing inductor failure with hf_BigBird number of graph breaks after it lands ([comment](https://github.com/pytorch/pytorch/pull/138686#issuecomment-2452718164))	2024-11-01 23:34:06 +00:00
Nikita Shulga	77b72d686e	[BE][MPS] Make metal shaders compile cleanly (#139522 ) I.e. without warnings, by deleting dead code and fixing one signed-unsigned comparison warning Also, pass `-Werror` to metal compiler if WERROR options is set Pull Request resolved: https://github.com/pytorch/pytorch/pull/139522 Approved by: https://github.com/Skylion007	2024-11-01 23:22:47 +00:00
eellison	2382b3b6d8	[Easy] Add joint graph passes, fallback_random to bisector (#139295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139295 Approved by: https://github.com/zou3519, https://github.com/exclamaforte	2024-11-01 23:21:53 +00:00
Gabriel Ferns	1e73842029	Refactor FxGraphDrawer to use HTML-like labels (#137726 ) Fixes https://github.com/pytorch/pytorch/issues/137499 Testing: Added a new unit test to make sure that the regression case succeeds. I'm debating about whether to make the borders visible. I'm partial to no borders, but it might make it harder for some people to read? ![68a2b0e3-orig_fx_graph_diagram](https://github.com/user-attachments/assets/fbc2fd98-9e76-488e-8ebe-c64fbf206932) Vs. ![2bfe1c4f-orig_fx_graph_diagram](https://github.com/user-attachments/assets/b6bc88ba-dda2-4cf7-84ac-a615e1e03a74) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137726 Approved by: https://github.com/eellison, https://github.com/malfet	2024-11-01 23:19:50 +00:00
David Berard	60542eeb33	[inductor] set sanitize_overflow=False for triton kernels (#139502 ) In upstream triton, https://github.com/triton-lang/triton/pull/4589 introduces overflow checks. However, overflow checks likely add some overhead, and have some correctness bugs at the moment (e.g. https://github.com/triton-lang/triton/pull/5033). Let's set `sanitize_overflow=False` but keep `debug=True` so that we can keep using device_assert but without the additional asserts added by `sanitize_overflow`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139502 Approved by: https://github.com/bertmaher	2024-11-01 23:10:21 +00:00
Huy Do	da395384a2	Delete Windows GPU jobs in periodic (#139336 ) As an outcome of https://fburl.com/gdoc/voce5o06, we could stop running Windows GPU tests on periodic pending the green light from MS. No one is monitoring these jobs atm. We already have Windows CUDA and CPU build jobs in trunk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139336 Approved by: https://github.com/ZainRizvi, https://github.com/wdvr, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-11-01 22:26:22 +00:00
Shuqiang Zhang	4c64a7f33f	[pgnccl] add a restart test for PGs in blocking mode (#139496 ) Summary: Restarting (aborting and re-initialize a PG) is a basic need if we want to achieve in-process restart of PGs without tearing down the whole process. Add this tests to verify that this is supported by current NCCL. Note that this restart test passes steadily only for blocking mode for now. In nonblockin mode. There is problem in either nccl init or abort that needs further investigation Test Plan: new UT Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/139496 Approved by: https://github.com/c-p-i-o, https://github.com/kwen2501	2024-11-01 22:13:37 +00:00
Huy Do	0b13bdd877	Delete parallelnative jobs in periodic (#139328 ) As an outcome of https://fburl.com/gdoc/voce5o06, we can now clean up parallelnative build and test jobs in periodic. There is not much value in running them anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/139328 Approved by: https://github.com/wdvr, https://github.com/malfet	2024-11-01 22:05:13 +00:00
Huy Do	8eb75cbad6	Delete iOS jobs from periodic (#139345 ) As an outcome of https://fburl.com/gdoc/voce5o06 and confirm with @iseeyuan, we can now clean up iOS lite interpreter jobs on PyTorch CI. There is not much value in running them anymore. It's stated in https://github.com/pytorch/ios-demo-app/blob/master/README.md that ExecuTorch is the replacement now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139345 Approved by: https://github.com/wdvr, https://github.com/malfet	2024-11-01 22:04:27 +00:00
Huy Do	8ad76efb8d	Delete Vulkan jobs from periodic (#139354 ) As an outcome of https://fburl.com/gdoc/voce5o06, we can clean up this job now as the backend has been marked as deprecated https://pytorch.org/tutorials/prototype/vulkan_workflow.html to be replace by ExecuTorch Vulkan delegate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139354 Approved by: https://github.com/wdvr, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-11-01 22:03:12 +00:00
Mikayla Gawarecki	a979318ef7	Add section to serialization note re weights_only (#139433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139433 Approved by: https://github.com/malfet ghstack dependencies: #138936, #139221	2024-11-01 21:51:50 +00:00
Nikita Shulga	a1f854f270	[MPS] Compile kernels into Metallib (#138636 ) PyTorch MPS backend for the most part relies on MPSGraph to provide specific operations, but recently more and more often one had to implement custom kernel here that were simply embedded in the operator codebase and were compiled directly using [`- id<MTLLibrary>newLibraryWithSource:options:error:`](https://developer.apple.com/documentation/metal/mtldevice/1433431-newlibrarywithsource) (first metal kernel to MPS backend was added in https://github.com/pytorch/pytorch/pull/82307 ) Later on, as number of operator grew, those were refactored into `MetalShaderLibrary` convenience class (see https://github.com/pytorch/pytorch/pull/125550 ) But as number of kernels keeps growing, it's time to make a next step and properly compile them into `.metalib` This PR does exactly that by: - Moving shader sources into separate .metal files - Adds check on whether full Xcode installed or just DeveloperTools - If full Xcode is installed, compiles and links shaders into .metallib for Metal-3.0(Available on MacOS 13) and Metal-3.1 standard (available on MacOS 14, can use bfloat) and bundles both using `-sectcreate` linker option and `getsectiondata` API call. `metallib_dummy.cpp` file is used to properly express dependencies between metallib build and torch_cpu link stages. Logic for generating metallibraries is loosely based on https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/CMakeLists.txt. - If only DeveloperTools CLI is installed, automatically wraps .metal into `_metallib.h` that contains shader source wrapped in `MetalShaderLibrary` Bulk of changes introduced in this PR are just moving code around. I.e. for every file that contains non-templated shader definition in `aten/src/ATen/native/mps/operators` folder, corresponding `.metal` file is created in `aten/src/ATen/native/mps/kernels` folder and embedded shader definition is replaced with the following ```cpp #ifndef PYTORCH_JIT_COMPILE_SHADERS static auto& lib = MetalShaderLibrary::getBundledLibrary(); #else #include <ATen/native/mps/OpName_metallib.h> #endif ``` Some historical stats: \| PyTorch Version \| Number of shaders in MPS \| Ops added \| \| ------------- \| ------------- \| ---- \| \| 1.12 \| 0 \| \| \| 1.13 \| 2 \| bitwise_ops and index.out \| \| 2.0 \| 4 \| cross repeat and view) \| \| 2.1 \| 9 \| unary_ops, histogram, renorm, binary_ops \| \| 2.2 \| 11 \| gamma and bucketization \| \| 2.3 \| 12 \| naive_matmul (to workaround crash) \| \| 2.4 \| 13 \| quantized_mm \| \| 2.5 \| 14 \| fused_adam \| Pros: - Better code structure/readability - Eventually allows one to use shared headers (and implement something like `TensorIterator`) - Faster runtime (as compilation is done ahead of time) and perhaps better optimized compiled kernels Cons: - Build process is a bit more complicated that it used to be - Need to maintain two codepath (as our CI builders only has DeveloperTools installed) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138636 Approved by: https://github.com/manuelcandales	2024-11-01 21:47:20 +00:00
Edward Z. Yang	a6630bcf87	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-01 21:43:25 +00:00
Xuan Zhang	9c2ffce71a	add condition for freeable input buffer (#139480 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139480 Approved by: https://github.com/yf225 ghstack dependencies: #139396	2024-11-01 21:15:40 +00:00
Huy Do	18f3b3c991	Clean up Android jobs in CI (#139350 ) As an outcome of https://fburl.com/gdoc/voce5o06 and confirm with @iseeyuan, we can now clean up Android lite interpreter jobs on PyTorch CI. There is not much value in running them anymore. It's stated in https://github.com/pytorch/android-demo-app/blob/master/README.md that ExecuTorch is the replacement now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139350 Approved by: https://github.com/ZainRizvi	2024-11-01 21:10:19 +00:00
Sam Larsen	c412a42ae2	[pt2 logging] move remote cache get/put logging up one level (#139423 ) Summary: I need to refactor the way we record CompilationMetrics. It will be much easier to do in OSS and having the relevant timing code in the OSS area of the codebase will make this much easier. I doubt this meaningfully changes the values we see. Test Plan: Made sure samples show up: https://fburl.com/scuba/dynamo_compile/sandbox/c38zjq0x Differential Revision: D65290089 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139423 Approved by: https://github.com/oulgen	2024-11-01 21:06:59 +00:00
Animesh Jain	0e57f2b589	[invoke_subgraph] Change the joint_graph output signature to simplify min-cut partitioner (#139326 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139326 Approved by: https://github.com/zou3519 ghstack dependencies: #139216, #139130	2024-11-01 21:02:32 +00:00
Animesh Jain	6a268c3fbb	[invoke_subgraph] Generate fake_inputs correctly (#139130 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139130 Approved by: https://github.com/zou3519 ghstack dependencies: #139216	2024-11-01 21:02:32 +00:00
Animesh Jain	4c756cacfd	[invoke_subgraph] Re-enable fake tensor model in the fake tensor impl (#139216 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139216 Approved by: https://github.com/zou3519	2024-11-01 21:02:32 +00:00
Justin Chu	5d67efb809	[ONNX] New registration API (#135403 ) The ONNX custom ops registration API. ## Design 1. Create a "custom_translation_table: dict[Callable, Sequence[Callable] \| Callable" parameter for specifying extra functions 2. Use a callable as the key to support all possible call_function targets in the fx graph 3. Allow a callable or a Sequence of callables as values. - When there is a single callable, it is the translation function for the op - When there is a Sequence of callable, the exporter's dispatcher will dispatch to these callables in order based on input dtypes. - The translation functions can be a plain python function that calls onnxscript ops (traced), or an onnxscript function. - Complex input support: We create special type annotations for annotating real representations of complex inputs, which are needed to handle complex computation in the ONNX graph, as we don't have any ops in ONNX that handle complex inputs. The dispatcher will have knowledge of these newly created type annotations and dispatch correctly. The complex functions will be in the same overload pool as the real functions. ```py torch.onnx.export(dynamo=True, custom_translation_table = { torch.ops.aten.add: [overload1, overload2], torch.sym_not: sym_not_onnx, }) ``` Support for functions that handles complex inputs will be in separate PRs. fixes https://github.com/pytorch/pytorch/issues/138391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135403 Approved by: https://github.com/titaiwangms	2024-11-01 20:58:54 +00:00
Natalia Gimelshein	f5b9e725d1	use more elements per thread for narrow dtypes (#139449 ) Fix perf issue for narrow type by accessing more elements per thread Pull Request resolved: https://github.com/pytorch/pytorch/pull/139449 Approved by: https://github.com/Chillee, https://github.com/eqy	2024-11-01 20:41:13 +00:00
Jason Ansel	73c0762a34	[inductor] Simplify remove_kernel_local_buffers (#139452 ) I plan to reuse `can_buffer_be_removed_through_fusion` in some heuristics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139452 Approved by: https://github.com/shunting314 ghstack dependencies: #139364, #139365, #139370	2024-11-01 20:36:39 +00:00
Bert Maher	dcdcb8b364	Avoid overflow in float32-to-int32 test (#139489 ) Summary: Triton has added some integer overflow detection when kernels are compiled with `debug=True`, and this test results in integer overflow (2.0 is 0x40000000, times 2 is 0x80000000 which overflows a signed int32). Assertion `int32 overflow detected for operation mul` failed Fixes #139479 Test Plan: ``` python inductor/test_torchinductor.py -k test_float32_to_int32_cuda ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/139489 Approved by: https://github.com/eellison, https://github.com/jansel, https://github.com/chenyang78	2024-11-01 20:22:19 +00:00
Yifu Wang	0dbc284a72	[SymmetricMemory] expose signal_pads as tensors in Python (#138754 ) ## This Stack This stack does the following things to support `xformers`-style, comm-aware Triton kernels: - Exposes `signal_pad`s as tensors in Python - Adds a binding for `cuMemsetAsync` These in combination aims to provide users with more flexibility to express custom signaling/synchronization patterns. ## This PR ```python # Obtain the signal pad of the specified peer rank as a tensor. # If both shape and dtype are unspecified, the returned tensor will be a # 1d uint32 tensor, which is most natural for signaling purposes. symm_mem.get_signal_pad(peer_rank) # If only shape is specified, it is equivalent to: # symm_mem.get_signal_pad(peer_rank)[:shape.numel()].view(shape) symm_mem.get_signal_pad(peer_rank, shape) # If only dtype is specified, it is equivalent to: # symm_mem.get_signal_pad(peer_rank).view(dtype) symm_mem.get_signal_pad(peer_rank, dtype=dtype) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138754 Approved by: https://github.com/weifengpy, https://github.com/lw	2024-11-01 20:17:15 +00:00
Haifeng Jin	124eac255e	fix dynamo tracking numpy 2 ops (#138686 ) Fixes #136559 As we upgrade to NumPy 2, torch falsely filtered out `numpy.random` as unsupported in dynamo tracking. This PR changes the filtering rules to include them while keeping behavior with numpy 1 unchanged. Before this PR, the following tests failed: ``` PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_functions.py -k FunctionTests.test_numpy_random PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/dynamo/test_unspec.py -k UnspecTests.test_to_tensor PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k FakeTensorTest.test_export_numpy PYTORCH_TEST_WITH_ASAN=1 PYTORCH_TEST_WITH_UBSAN=1 python test/test_fake_tensor.py -k PropagateRealTensorsFakeTensorTest.test_export_numpy_propagate_real_tensors ``` With this PR, the supported/unsupported ops in NumPy 1 are not changed. For NumPy 2, only the `numpy.random` ops that are already supported with NumPy 1 are added to the supported list. I used the following scripts to check the differences before and after the change for both NumPy 1 & 2. The output is empty for NumPy 1 since there is no change. The output is a list of `numpy.random` that considered supported for NumPy 2. ```py from torch._dynamo import trace_rules import numpy as np def new_numpy_function_ids(): unsupported_funcs = {"seed", "ranf", "get_bit_generator", "RandomState", "set_bit_generator", "sample"} def is_supported(k, v, mod): if not callable(v): return False if not getattr(v, "__module__", None): return True if v.__module__ == mod.__name__: return True if v.__module__ == "numpy.random.mtrand" and mod.__name__== "numpy.random" and k not in unsupported_funcs: return True return False rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: for k, v in mod.__dict__.items(): if is_supported(k, v, mod): rv[id(v)] = f"{mod.__name__}.{k}" return rv def old_numpy_function_ids(): rv = {} for mod in trace_rules.NP_SUPPORTED_MODULES: rv.update( { id(v): f"{mod.__name__}.{k}" for k, v in mod.__dict__.items() if callable(v) and (getattr(v, "__module__", None) or mod.__name__) == mod.__name__ } ) return rv rv1 = set(old_numpy_function_ids().values()) rv2 = set(new_numpy_function_ids().values()) for v in (rv1 - rv2): print(v) print("****") for v in (rv2 - rv1): print(v) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138686 Approved by: https://github.com/lezcano, https://github.com/williamwen42	2024-11-01 19:51:40 +00:00

1 2 3 4 5 ...

80420 commits