Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59956
Issue #50175. Basically two things need to be checked and are lacking currently:
1. Overload declarations should always have a single `pass` statement as the body.
2. There should be always an implementation provided for decls which doesn't
have the torch.jit._overload decorator. So in this case we need to check
whether we are actually compiling a function body with decorator ahead.
Test Plan:
python test/test_jit.py TestScript.test_function_overloads
Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D29106555
fbshipit-source-id: 2d9d7df2fb51ab6db0e1b726f9644e4cfbf733d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62836
Used for upcoming diff that adds support for batching to torchdeploy
Test Plan: Models are used by later diffs, but generation script is verified by CI now and locally.
Reviewed By: gunchu
Differential Revision: D30135938
fbshipit-source-id: 566a32a3ede56833e41712025e9d47191dfc5f39
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59916
This fixes two problems with sparse multiplication
- 0d-dense * sparse was creating a non-sparse output and failing.
- dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message
<details>
<summary> unhelpful error message </summary>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel]
SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel]
SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel]
SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
</details>
Also added tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723
Reviewed By: ezyang
Differential Revision: D29962639
Pulled By: cpuhrsch
fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62430
The bias hook is a forward hook that is part of the pruning parametrization; it is attached after the activation reconstruction forward hook, so adding the bias occurs after zeros are reinserted to the pruned activation.
This diff/PR amends the bias hook to work for Conv2d layers, in addition to Linear layers. The reshaping of the ._bias parameter ensures that it is added to the right dimension of the output.
ghstack-source-id: 135097700
Test Plan:
Added tests for `Conv2dB()`, a model with Conv2d layers that have `bias=True`.
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1MfgL
Reviewed By: jerryzh168
Differential Revision: D29979571
fbshipit-source-id: c1a7e9fabc8b3c9d0050bd6b6c6a631ddfdf2a68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62685
Adds a `ref_node_target_type` field to hold the string type
of the base node. This is needed because in some cases
the previous node does not match ref_node (if we have observers,
or if we are logging inputs), and it is useful to know the type
of ref_node.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D30082947
fbshipit-source-id: 98ded7b25a5d8d5ea820e0ef62c3799b65c3fc77
Summary:
Part 1 of fixing https://github.com/pytorch/pytorch/issues/62359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62686
Test Plan:
1. Check out this PR and run `python setup.py install`.
2. The test we will be running requires CUDA. If you don't have CUDA, you can try this on another device or simply comment out the skipIf statement before the `test_jit_cuda_extension` test in `test_cpp_extensions_jit.py`
3. Run: `IN_CI=1 python test/run_test.py -i test_cpp_extensions_jit -- -k test_jit_cuda_extension` and notice that it should skip. If it doesn't skip, edit test/.pytorch-disabled-tests.json: modify the platforms list of the first issue (61655) to include whatever platform you are on (macos or linux), and just run `python test/test_cpp_extensions_jit.py -v -k test_jit_cuda_extension --import-disabled-tests` to make sure it skips.
4. Now `export PYTORCH_IGNORE_DISABLED_ISSUES=61655` or `export PYTORCH_IGNORE_DISABLED_ISSUES=34952,61655`.
5. `rm test/.pytorch-*` to clear the cached files.
6. Run the same command as in step 5 and note that it SHOULDN'T skip. It should run.
Reviewed By: walterddr, samestep
Differential Revision: D30108773
Pulled By: janeyx99
fbshipit-source-id: dbf015a266f57577dc9283b0cdff720083b5c0cb
Summary:
This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training.
It's a known problem with a known solution: to trade-off compute for memory via checkpointing.
FAQ should mention it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709
Reviewed By: nairbv
Differential Revision: D30103326
Pulled By: ezyang
fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62658
Boxed functors, like their unboxed brethren, support operators which
aren't just a function pointer, but a function pointer with some
associated global state that is allocated at registration time.
The use case I have in mind with this implementation is "dispatcher
API from Python", where the extra state kernel registrations need is
the PyObject callable we will invoke to do the actual invocation.
See next PR in this stack.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D30074925
Pulled By: ezyang
fbshipit-source-id: ee040edbbec1e607486d338d1ea78bb5c6b2ece9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62744
The `Tensor._reduce_ex_internal` function can only be called via the `Tensor.__reduce_ex__` function.
And that second function already properly handles the `__torch_function__` overwrites. So no need to handle them again in `Tensor._reduce_ex_internal`.
This PR also updates `Tensor.__reduce_ex__` to use the specialized unary API for `__torch_function__` that makes it nicer to read.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30113113
Pulled By: albanD
fbshipit-source-id: c94f5d2597ee3afe799d9de991f75615c3c172d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62634
Apply the same set of changes as in D27688352 (d728491fc1) to `module.cpp` as instructed by xcheng16.
Basically, this simplifies exception handling and allows propagation of the original message undisturbed to the caller so that we can figure out the lineage of the exception in crash tasks such as t96812652
ghstack-source-id: 134877012
Test Plan: Build/Sandcastle
Reviewed By: raziel
Differential Revision: D30038867
fbshipit-source-id: 8dfd415c510bcd0ab49814f4eb559ec6fc8f72e5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521
This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
Test Plan:
./build/bin/test_api --gtest_filter=IMethodTest.*
To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command.
Reviewed By: ezyang
Differential Revision: D30055372
Pulled By: alanwaketan
fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62318
Tracking issue: #55070
This PR introduces the method `TensorIteratorBase::build_ternary_op` for building a
`TensorIteratorBase` for 3-input 1-output kernel.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D29961997
Pulled By: bdhirsh
fbshipit-source-id: 2208d24823bad6e74c8d508f363716d8125b8619
Summary:
Report pointed memory size, total allocated memory, total reserved size all in one report.
`ptr` and `alloc_size` will be used for associating with op trace.
`allocated_size`, `reserved_size` will be used for memory trace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282
Reviewed By: ejguan
Differential Revision: D29796282
Pulled By: chaekit
fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62669
Useful to avoid having to implement null checking on the application side.
Test Plan: Add unit tests
Reviewed By: suo, houseroad
Differential Revision: D30074406
fbshipit-source-id: 881aec735953b43cb24786c1a2d79e8e724928b8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62739
Original flag didn't initially work correctly so this makes it actually
output the right thing
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision: D30107694
Pulled By: seemethere
fbshipit-source-id: 5ff28d6820b9cf7145dbb617b86a941bf7686b5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62703
Re-enable test on Windows
Test Plan: CI
Reviewed By: ezyang
Differential Revision: D30094460
Pulled By: ilia-cher
fbshipit-source-id: 80521f6bc1365d2c252f20b5d0485fc062c8d9c3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62335
This change ensures that unittests only use out variants or native ops.
- Our unittests currently assume that a graph fed to the static runtime correctly replaces an interpreter op for its corresponding out variant / native op, but it's not checked by the unittest. This change ensures that.
- We relied on manual inspection of log messages to see if an out variant is used for a specific workload even for unittesting. This change frees us from doing that.
- `aten::add` is excluded from this check since it's only enabled for an internal workload. Also some unittests are excluded by using `expect_interpreter_op = true` since they are written to use interpreter ops by design.
Test Plan: Ran `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest` successfully.
Reviewed By: mikeiovine, hlu1
Differential Revision: D29952381
fbshipit-source-id: e60e70b80ccf45e91c6654b4ad53f92ffd5ab702
Summary:
a bug was discovered in https://github.com/pytorch/pytorch/issues/62434, for some reason comparing the schema name didn't match the allow_list item. So:
1. remove duplicate regex compile
2. make use of the schema string is used instead of just the name
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62687
Reviewed By: ezyang
Differential Revision: D30102437
Pulled By: walterddr
fbshipit-source-id: 541b2ed77948f24daebb08623cadabb034a241e0
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).
New submodule commit: 10ec0d3388
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62688
Test Plan: Ensure that CI jobs succeed on GitHub before landing.
Reviewed By: dskhudia
Differential Revision: D30088109
fbshipit-source-id: da8a1e6232e489eac0384faadb71c2dfac5927f7
Summary:
Update magma to point to magma_ctrl_launch_bounds branch.
When upstream magma branch is used, cholesky tests in test_ops.py and test_linalg.py
fails due to "Intel MKL ERROR: Parameter 4 was incorrect on entry to DPOTRF."
Suspect commit: [35325212b15c5baadd7493d61b19b2db2635cb68](35325212b1) in magma master.
Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62502
Reviewed By: malfet
Differential Revision: D30089171
Pulled By: seemethere
fbshipit-source-id: b07234ce66d48e3af113640995f923ee586b3cd9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662
Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface.
Reviewed By: SciPioneer
Differential Revision: D30012869
fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62664
Skipping a test for ROCm because of issue #62602
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D30079534
Pulled By: NivekT
fbshipit-source-id: a9cf35e5d3a8d218edc9c5a704d1f9599d2f38a6
Summary:
This should avoid following false positives:
```
[ RUN ] ProtoTest.Basic
/var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15: runtime error: member call on address 0x7fffffffdd80 which does not point to an object of type 'google::protobuf::MessageLite'
0x7fffffffdd80: note: object is of type 'onnx_torch::ModelProto'
00 00 00 00 b0 b9 05 ef ff 7f 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
^~~~~~~~~~~~~~~~~~~~~~~
vptr for 'onnx_torch::ModelProto'
UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15 in
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62663
Reviewed By: tktrungna
Differential Revision: D30076315
Pulled By: malfet
fbshipit-source-id: 7bfc2c4b417307195e3c3379e4874eaceb4f3134
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61931
This PR consolidates the profiling code around a new C++ implementation
(profiler_kineto.h/cpp) and uses it unconditionally from
torch.autograd.profiler/torch.profiler:
1. Always use profiler_kineto.h/cpp as the C++ implementation
2. Simplify profiler.py to remove unneeded parts depending on legacy
impl
3. Move some of the legacy logic into profiler_legacy.py (to be fully
deleted later)
Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
Imported from OSS
Reviewed By: gdankel
Differential Revision: D29801599
fbshipit-source-id: 9794d29f2af38dddbcd90dbce4481fc8575fa29e