pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
cyyever	23eb0a3201	Improve typing in torch/types.py (#145237 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/145237 Approved by: https://github.com/XuehaiPan, https://github.com/albanD Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>	2025-01-28 05:29:12 +00:00
Aaron Gokaslan	8e46d0f595	[BE]: Update typing of OrderedSet ancestor (#145783 ) Now that we are on python 3.9 minimum version we can properly use Generics in the superclass Pull Request resolved: https://github.com/pytorch/pytorch/pull/145783 Approved by: https://github.com/eellison	2025-01-28 04:43:49 +00:00
cyy	67fcc7cf02	[3/N] Remove unnecessary once flag usage (#145672 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/145672 Approved by: https://github.com/albanD	2025-01-28 04:28:18 +00:00
Burak Turk	01a4d86b31	add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732 ) Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired. Differential Revision: D68577593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732 Approved by: https://github.com/mlazos	2025-01-28 03:50:02 +00:00
Pian Pawakapan	1a26cdd5cb	[cond] remove warning for unsupported tuple returns (#145766 ) I guess this is supported now Pull Request resolved: https://github.com/pytorch/pytorch/pull/145766 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2025-01-28 03:13:36 +00:00
PyTorch MergeBot	9010649292	Revert "Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 )" This reverts commit `db3685a35c`. Reverted https://github.com/pytorch/pytorch/pull/143880 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but either this PR or the base PR breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/143880#issuecomment-2617743403))	2025-01-28 03:07:17 +00:00
Chirag Pandya	78f02bf07c	[bug] handle case when remote peer closes connection (#145757 ) Summary: In the case where remote peer closes the connection, nread returns 0. In this case, we still want to free up the allocated buffer. Also, reorder the if so that the likely success cases (nread > 0) is at the top of the function with an early return. Test Plan: unit tests Differential Revision: [D68733192](https://our.internmc.facebook.com/intern/diff/D68733192) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145757 Approved by: https://github.com/XilunWu, https://github.com/fduwjj ghstack dependencies: #145756	2025-01-28 03:06:38 +00:00
Pian Pawakapan	4be831ba2d	[draft_export] fix dense-in-memory check for inferring fakes (#145653 ) Test Plan: fixes check for dense tensors with size-1 dimensions Differential Revision: D68644028 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145653 Approved by: https://github.com/zou3519	2025-01-28 02:52:14 +00:00
James Wu	7c1fc0a047	Log cache state for AOTAutograd in title of file (#145715 ) Differential Revision: [D68692755](https://our.internmc.facebook.com/intern/diff/D68692755/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145715 Approved by: https://github.com/bobrenjc93	2025-01-28 02:14:18 +00:00
Jason Ansel	78a94c9114	[inductor] Remove type ignores from scheduler.py (#145712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145712 Approved by: https://github.com/yanboliang, https://github.com/Skylion007 ghstack dependencies: #145692	2025-01-28 01:44:32 +00:00
Jason Ansel	2df2f9d895	[inductor] Change type of get_backend_features to OrderedSet (#145692 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145692 Approved by: https://github.com/yanboliang	2025-01-28 01:44:32 +00:00
Yifu Wang	db33d23aa8	[SymmetricMemory] fix an issue where rendezvous is performed with wrong device context when torch.cuda.set_device() is not callled (#144886 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144886 Approved by: https://github.com/awgu	2025-01-28 01:43:37 +00:00
William Wen	e3d3f2b22e	[dynamo] save/restore system random state more carefully (#145750 ) Reattempt of https://github.com/pytorch/pytorch/pull/145435 since the state of the linked internal diff appears to be messed up. Note: I have verified that the previously failing internal tests now pass internally. Differential Revision: [D68723334](https://our.internmc.facebook.com/intern/diff/D68723334) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145750 Approved by: https://github.com/StrongerXi	2025-01-28 01:34:13 +00:00
Gabriel Ferns	f16ce3c7e9	Refactor fuzzer and add support for Dynamo (#145565 ) ## Summary: Dynamo now works with config fuzzer. For BE week, we also found and fixed 5 different bugs (in inductor): - https://github.com/pytorch/pytorch/pull/145426 - https://github.com/pytorch/pytorch/pull/145523 - https://github.com/pytorch/pytorch/pull/145527 - https://github.com/pytorch/pytorch/pull/145532 - https://github.com/pytorch/pytorch/pull/145538 ## Test Plan: New Dynamo Unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/145565 Approved by: https://github.com/masnesral	2025-01-28 00:44:27 +00:00
Syed Tousif Ahmed	6eb74fbec6	Updates NCCL user buffer registration test for NCCL 2.24.3 (#145285 ) NCCL 2.24.3 changed the content of the debug output for NVLS registration. We use this debug output in our test suite to check if NVLS was successfully registered or not. Hence we need to specialize for the NCCL version in the test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145285 Approved by: https://github.com/kwen2501	2025-01-28 00:24:53 +00:00
Ryan Guo	5a4d959cdb	[dynamo] Properly model torch profiler context objects (#145537 ) Prior to this patch, Dynamo conveniently modelled torch profiler context objects (e.g., `torch.profiler.profile`) as `NullContextVariable` because `torch.compile` ignore the effect of these profiler contexts. However, the semantics of these profiler contexts diverges from `contextlib.nullcontext` in the `__enter__` function, where the former returns `self` and the latter returns `None`. This causes subtle error as observed in #125021. This patch adds back a `ProfilerContextVariable`, which addresses the aforementioned semantic discrepency. Fixes #125021. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145537 Approved by: https://github.com/zou3519, https://github.com/williamwen42	2025-01-28 00:03:36 +00:00
Mikayla Gawarecki	db3685a35c	Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880 ) ## Background This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`. When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this). The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases. `6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)` ## Testing strategy The agreed upon testing strategy was as follows: - Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False) - This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested. Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880 Approved by: https://github.com/albanD ghstack dependencies: #143879	2025-01-27 23:57:30 +00:00
Mikayla Gawarecki	7db0afabaa	Remove lexicographical sorting of storage keys in torch.save (#143879 ) Currently the order lexicographical (i.e. 0, 10, 11, ...19, 2, ....) instead of 0, 1, 2, 3, 4, 5 (the order that storage metadata is actually pickled in), since PyTorch will never be used with Python < 3.7 we can be assured that the keys will be read in the order of insertion (numerically sorted) This makes it such that the order storages are written in are the same as the pickling/unpickling order so we can calculate their offsets with less random reads Differential Revision: [D67673025](https://our.internmc.facebook.com/intern/diff/D67673025) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143879 Approved by: https://github.com/albanD	2025-01-27 23:57:30 +00:00
Colin L. Rice	c1161957a4	inductor_config_logging: Don't drop keys (#144700 ) This bit me while I was trying to debug some trace issues. In general this config is already quite large when dumping, so adding more fields doesn't make it significantly worse. Also a number of the items we are type checking for (except the test configs), don't even show up. Primarily this will help us when debugging rocm, halide, and trace configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144700 Approved by: https://github.com/ezyang	2025-01-27 23:47:25 +00:00
Jane (Yuan) Xu	7d01f6e6f2	Add ignorable commits on run_test.py to git blame ignore (#145787 ) Chanced upon it while searching through cpp_extension related code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145787 Approved by: https://github.com/malfet	2025-01-27 23:24:48 +00:00
Chirag Pandya	3ce68dc61e	[c10d] Flush file in file recorder (#145458 ) Summary: Flushing file to hopefully prevent file corruptions as reported in https://github.com/pytorch/pytorch/pull/145125 Test Plan: Couldn't get file corruption to occur in my tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145458 Approved by: https://github.com/kwen2501	2025-01-27 23:15:52 +00:00
Chirag Pandya	5534c270db	[chore] fix new linter (#145756 ) Summary: Fix new linter that's complaining when I made changes to this file: class 'LibUVStoreDaemon' defines a non-default destructor but does not define a copy constructor, a copy assignment operator, a move constructor or a move assignment operator Test Plan: make lint passes Differential Revision: [D68733191](https://our.internmc.facebook.com/intern/diff/D68733191) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145756 Approved by: https://github.com/XilunWu, https://github.com/Skylion007, https://github.com/fduwjj	2025-01-27 22:48:12 +00:00
PyTorch MergeBot	2de53b3b65	Revert "pickler for GraphModule (#141659 )" This reverts commit `c6ad08357b`. Reverted https://github.com/pytorch/pytorch/pull/141659 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally, please take a look at D68694181 for more details. ([comment](https://github.com/pytorch/pytorch/pull/141659#issuecomment-2617045120))	2025-01-27 22:39:30 +00:00
Huy Do	006397fac3	Remove FBGEMM sccache hack (#145664 ) Testing https://github.com/pytorch/pytorch/actions/runs/12959358756, sccache is working correctly now Pull Request resolved: https://github.com/pytorch/pytorch/pull/145664 Approved by: https://github.com/wdvr	2025-01-27 22:00:06 +00:00
David Berard	69e82d02d3	[inductor][3/N] triton support post-#5512, tt.divisibility format (#145575 ) 1. Fix the tt.divisibility format in hints.py. Previously, it was `{((0,), (1,)): [["tt.divisibility", 16]]}`. Now it is `{(0,): [["tt.divisibility", 16]], (1,): [["tt.divisibility", 16]]}`. This was an oversight in the first PR I added. I've verified that we now get `{ tt.divisibility = 16 }` in the generated TTGIR. 2. Update the test_codegen_triton.py test to work with multiple triton versions (and test this divisibility format in the new triton version) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145575 Approved by: https://github.com/SamGinzburg	2025-01-27 21:48:58 +00:00
Animesh Jain	993b229665	[dynamo][dicts] Fix dict.__new__ bug (#145723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145723 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #145519, #145547, #145558	2025-01-27 21:42:43 +00:00
Animesh Jain	7e1c7253e9	[dynamo][builtin-skipfile-cleanup] Support tuple.__new__ (#145558 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145558 Approved by: https://github.com/jansel, https://github.com/StrongerXi ghstack dependencies: #145519, #145547	2025-01-27 21:42:43 +00:00
Joel Schlosser	1ba1b7b597	Support remaining _like factory functions for NJT (#144889 ) Fixes #144761 This PR adds NJT impls for those _like functions that were previously missing: * `full_like()` * `rand_like()` * `randint_like()` It also fixes a bug in existing *_like functions when a new device is specified. Fix is to also transfer `offsets` / `lengths` to the new device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144889 Approved by: https://github.com/soulitzer	2025-01-27 21:33:51 +00:00
Nikita Shulga	3a23d75b37	[MPS] Fix `c0:🤘:log_gamma` correctness on M4 (#145740 ) To workaround a bug where `abs` method call seems to be ignored before calling log, which could be reproduced by running the following code (submitted as FB16415011 ) ```swift import Metal func run_shader<T: BinaryFloatingPoint> (library: MTLLibrary, kernel_name: String, type: T.Type, nelem: Int = 16) { guard let mfunc = library.makeFunction(name: kernel_name) else { fatalError("Can't find function") } let device = library.device guard let queue = device.makeCommandQueue() else { fatalError("Can't make queue") } guard let cmdBuffer = queue.makeCommandBuffer() else { fatalError("Can't make command buffer") } guard let computeEncoder = cmdBuffer.makeComputeCommandEncoder() else { fatalError("Can't make compute encoder") } guard let ibuf = device.makeBuffer(length:nelem * MemoryLayout<T>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") } let ibuf_data = ibuf.contents().assumingMemoryBound(to: T.self) for i in 0..<nelem { ibuf_data[i] = T(sin(Float(2 + i))) } guard let obuf = device.makeBuffer(length:nelem * MemoryLayout<T>.size, options: [.storageModeShared]) else { fatalError("Can't alloc") } let obuf_data = obuf.contents().assumingMemoryBound(to: T.self) computeEncoder.setComputePipelineState(try! device.makeComputePipelineState(function: mfunc)) computeEncoder.setBuffer(obuf, offset:0, index: 0) computeEncoder.setBuffer(ibuf, offset:0, index: 1) computeEncoder.dispatchThreads(MTLSizeMake(nelem, 1, 1), threadsPerThreadgroup:MTLSizeMake(nelem, 1, 1)) computeEncoder.endEncoding() cmdBuffer.commit() cmdBuffer.waitUntilCompleted() print("Results for \(String(describing: T.self)):", terminator: " ") for i in 0..<nelem { print(obuf_data[i], terminator: " ") } print() } let shader_source = """ #include <metal_stdlib> template<typename T> float foo(T x) { const auto abs_x = :🤘:abs(static_cast<float>(x)); auto rc = :🤘:log(abs_x); return rc - :🤘:log(:🤘:abs(abs_x * :🤘:sinpi(abs_x))); } kernel void half_kernel( device half* out_ptr0, constant half* in_ptr0, uint xindex [[thread_position_in_grid]] ) { auto inp = in_ptr0[xindex]; auto out = foo(inp); out_ptr0[xindex] = static_cast<half>(out); } kernel void float_kernel( device float* out_ptr0, constant float* in_ptr0, uint xindex [[thread_position_in_grid]] ) { auto inp = in_ptr0[xindex]; auto out = foo(inp); out_ptr0[xindex] = static_cast<float>(out); } """ let options = MTLCompileOptions() options.mathMode = .safe options.mathFloatingPointFunctions = .precise guard let device = MTLCopyAllDevices().first else { fatalError("Not Metal device found") } let library = try! device.makeLibrary(source:shader_source, options:options) run_shader(library:library, kernel_name:"half_kernel", type: Float16.self) run_shader(library:library, kernel_name:"float_kernel", type: Float.self) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145740 Approved by: https://github.com/dcci	2025-01-27 21:24:22 +00:00
Aaron Orenstein	60f98262f1	PEP585: .github (#145707 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145707 Approved by: https://github.com/huydhn	2025-01-27 21:21:01 +00:00
Ryan Guo	bfaf76bfc6	[dynamo] clear out traced frames at the start of `test_log_traced_frames` (#145640 ) The test was being flaky in CI, and this patch fixes it. Fixes #137461. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145640 Approved by: https://github.com/williamwen42	2025-01-27 20:49:59 +00:00
Ting Lu	93dd6bc4d8	Add CUDA 12.8 installation and manylinux-cuda12.8 (#145567 ) Breaking https://github.com/pytorch/pytorch/pull/145557 into two parts. Need to have manylinux-cuda12.8 in order to build magma. Issue: https://github.com/pytorch/pytorch/issues/145570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145567 Approved by: https://github.com/nWEIdia, https://github.com/atalman	2025-01-27 20:49:07 +00:00
Randolf Scholz	64cd81712d	`torch.distributions`: replace `numbers.Number` with `torch.types.Number`. (#145086 ) Fixes #144788 (partial) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145086 Approved by: https://github.com/malfet	2025-01-27 20:24:55 +00:00
Huy Do	2f8ad8f4b9	Run inductor perf benchmark on ROCm (#145763 ) This requires https://github.com/pytorch/pytorch/pull/144594. The test run on PT2 dashboard is at https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2020%20Jan%202025%2019%3A46%3A14%20GMT&stopTime=Mon%2C%2027%20Jan%202025%2019%3A46%3A14%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=rocm&lBranch=144594&lCommit=9f5cb037965aa2990b2e4593610bca92526ebb3b&rBranch=144594&rCommit=9f5cb037965aa2990b2e4593610bca92526ebb3b Pull Request resolved: https://github.com/pytorch/pytorch/pull/145763 Approved by: https://github.com/jeffdaily	2025-01-27 20:19:03 +00:00
Ryan Guo	66631bc84b	[dynamo] Fix read/write conflicts in a cuda test (#145658 ) Prior to this patch, the `test_cuda_event_created_outside_of_graph` is flaky in CI, and that's because we have read and write to the same `foo` tensor buffer from 2 different streams. This patch eliminates that by adding a synchronization to wait till read finishes before starting the write. Fixes #133837, #133828. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145658 Approved by: https://github.com/yifuwang	2025-01-27 19:55:57 +00:00
PyTorch MergeBot	c986eba560	Revert "[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt (#144441 )" This reverts commit `abf28982a8`. Reverted https://github.com/pytorch/pytorch/pull/144441 on behalf of https://github.com/ZainRizvi due to Sorry but this is failing internally. @Chillee can you please help change get remerged? See D68720562 ([comment](https://github.com/pytorch/pytorch/pull/144441#issuecomment-2616726406))	2025-01-27 19:38:26 +00:00
leslie-fang-intel	9728e900dc	[Inductor][CPP] fix torch logit decomposition (#145576 ) Summary Fix issue https://github.com/pytorch/pytorch/issues/145379, current decomposition using `self = torch.clamp(self, lo, hi)` which gives wrong result when `lo` is larger than `hi` comparing to eager implementation: `cd68d54911/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp (L165)` Align their behavior in this PR. Test Plan ``` python -u -m pytest -s -v test/inductor/test_cpu_repro.py -k test_torch_logit ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145576 Approved by: https://github.com/jgong5, https://github.com/eellison	2025-01-27 19:37:51 +00:00
Edward Z. Yang	635b98fa08	Add nitpick warning that aoti_torch/c/shim.h is ABI stable (#145745 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/145745 Approved by: https://github.com/albanD	2025-01-27 19:25:37 +00:00
Yanbo Liang	bc377c503e	[Custom Ops] Fix f-strings in custom ops error message (#145673 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145673 Approved by: https://github.com/zou3519 ghstack dependencies: #145588	2025-01-27 19:22:43 +00:00
Yanbo Liang	ec91b7720f	[Custom Ops] Add a new API to allow users to register an autocast for the custom op (#145588 ) Fixes #137033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145588 Approved by: https://github.com/zou3519	2025-01-27 19:22:43 +00:00
Simon Mahns	f951d216e0	[autocast][pytorch] Support autocast for MTIA (policy) (#145666 ) Summary: Add autocast support for MTIA (policy) Reviewed By: egienvalue Differential Revision: D68604796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145666 Approved by: https://github.com/chaos5958	2025-01-27 18:26:04 +00:00
Sam Larsen	1835e1eb98	[BE] Remove test_ops from FIXME_inductor_dont_reset_dynamo (#145307 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145307 Approved by: https://github.com/zou3519, https://github.com/FindHao	2025-01-27 18:12:39 +00:00
Randolf Scholz	835e770bad	Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations (#144994 ) Fixes #144976 Using appoach ① `IO[bytes]`, but could also try with a protocol. ## Notes: - moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike` - Use `FileLike` annotation where it makes sense - made sure those functions also support `os.PathLike` - Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate. - Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`) - needed to make `torch.serialization._opener` generic to avoid LSP violations. - skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str \| PathLike[str] \| IO[bytes]` directly... Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-01-27 18:08:07 +00:00
Eddie Yan	abf28982a8	[CUDA][cuBLAS] Add fp16 accumulate option to cuBLAS/cuBLASLt (#144441 ) Test for `cublasGemmEx` added, still need to figure out the best way to exercise the other APIs... Pull Request resolved: https://github.com/pytorch/pytorch/pull/144441 Approved by: https://github.com/Chillee	2025-01-27 18:05:23 +00:00
Nikita Shulga	30dea8429d	[MPS][BE] Use conveinence methods to set args (#145736 ) It's better to call `mtl_setArgs` rather than set arguments one by one with the risk of making a typo Also, all interactions with MTLCommandBuffer must be serialized, which is commonly done using dispatch queues Pull Request resolved: https://github.com/pytorch/pytorch/pull/145736 Approved by: https://github.com/Skylion007	2025-01-27 17:42:01 +00:00
Mikayla Gawarecki	7db20ffd68	Remove `public_allowlist` from `TestPublicBindings.test_correct_module_names` and ensure private_allowlist-ed things are actually private (#145620 ) This passes locally, also sanity checked importing these modules on [colab](https://colab.research.google.com/drive/1edynWX1mlQNZIBxtb3g81_ZeTpAqWi19?usp=sharing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145620 Approved by: https://github.com/albanD	2025-01-27 17:30:02 +00:00
Huy Do	5d01a2874f	Increase the number of perf benchmark shards (#145534 ) Per the discussion on https://github.com/pytorch/pytorch/issues/140332#issuecomment-2610805551, this adds 2 more shards for HF, 2 more for TorchBench, and 1 more for TIMM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145534 Approved by: https://github.com/jeanschmidt	2025-01-27 16:20:42 +00:00
Nikita Shulga	639dd54ef7	[BE] Use copy_method to import all tests (#145718 ) Less chances for typo when doing the imports Pull Request resolved: https://github.com/pytorch/pytorch/pull/145718 Approved by: https://github.com/dcci	2025-01-27 16:01:12 +00:00
leslie-fang-intel	2e80093306	setitem node shouldn't be deadcode eliminated (#145714 ) Summary Fix issue https://github.com/pytorch/pytorch/issues/145697. The `operator.setitem` has been eliminated as dead code, causing a correctness issue. Mark it as impure in this PR to avoid this side effect. TestPlan ``` python -u -m pytest -s -v test/fx/test_dce_pass.py -k test_keep_setitem ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145714 Approved by: https://github.com/ezyang	2025-01-27 15:08:21 +00:00
Stefan-Alin Pahontu	0674ab7e33	solve apl dependency issue (#145215 ) According to the [APL documentation](https://developer.arm.com/documentation/101004/2404/General-information/Arm-Performance-Libraries-example-programs), libraries ending with _mp are OpenMP multi-threaded libraries. When a project is compiled with MSVC and the -openmp flag, the vcomp library (Visual C++ implementation of OpenMP) is used for runtime calls. However, the current APL implementation uses the libomp.dll (LLVM) variant. As a result, there are unexpected behaviors at runtime. --- For Example: ```python import torch # Create a sparse tensor # Input (Sparse Tensor): # [[0, 1], # [1, 0]] indices = torch.tensor([[0, 1], [1, 0]]) values = torch.tensor([1, 1], dtype=torch.float32) size = torch.Size([2, 2]) sparse_tensor = torch.sparse_coo_tensor(indices, values, size) # Convert sparse tensor to dense tensor dense_tensor = sparse_tensor.to_dense() # Expected Output (Dense Tensor): # [[0, 1], # [1, 0]] print("\nDense Tensor:") print(dense_tensor) ``` However, it prints unexpected outputs such as: ```python # [[0, 11], # [10, 0]] ``` The issue arises because the following code does not function as expected at runtime: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/ParallelOpenMP.h#L30 ```c++ // returns 1 , however since OpenMP is enabled it should return total number of threads int64_t num_threads = omp_get_num_threads(); ``` --- In the runtime, loading multiple OpenMP libraries (in this case `libomp` and `vcomp`) is causing unexpected behaviours. So, we've changed libraries from `_mp` to non `_mp` versions and we used `vcomp` for OpenMP calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145215 Approved by: https://github.com/ozanMSFT, https://github.com/malfet Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>	2025-01-27 13:02:16 +00:00

1 2 3 4 5 ...

83674 commits