Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421
I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.
Test Plan: Unit tests and CI.
Reviewed By: aaronenyeshi, albanD
Differential Revision: D32865907
fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
Summary:
`assertSignatureIsCorrect` is instantiated at minimum once per unique operator signature yet its core logic is independent of the type. So, it makes sense to have a light-weight template that does nothing but call into the non-templated function with the correct `CppSignature` object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67986
Reviewed By: jbschlosser
Differential Revision: D33108600
Pulled By: swolchok
fbshipit-source-id: 7594524d3156ff2422e6edcdffcb263dc67ea346
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68483
Doesn't need to be in the header.
ghstack-source-id: 145668417
Test Plan: CI
Reviewed By: chaekit
Differential Revision: D32477113
fbshipit-source-id: 30e7796413e3220e4051544559f9110ab745022d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087
This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`:
1. Eliminate code duplication between rvalue/lvalue overloads
2. Add type checks
3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed.
4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.|
ghstack-source-id: 145599837
Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: hlu1
Differential Revision: D32711705
fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68412
These lists have the same size as CallbackHandles, so they should be the same container type.
ghstack-source-id: 145668416
Test Plan:
Run same command as previous diff.
Before: see previous diff, average about 0.46us
After: P467928077, average about 0.43us
Reviewed By: chaekit
Differential Revision: D32454856
fbshipit-source-id: 3a3ff4d381d99f51ef868d4dec4db7c411b5ea56
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69860
Previously I made a mistake and checked in the aten::full.names for the upgrader of aten::full. So changed it back to just aten::full.
Test Plan: None
Reviewed By: gmagogsfm
Differential Revision: D33066985
fbshipit-source-id: a5598d60d1bff9b4455f807361388fac0689ba14
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69412
TypePrinter does not need to take ownership of the Type.
This helps unblock the following diff to stop refcounting Type singletons.
ghstack-source-id: 145671619
Test Plan: CI
Reviewed By: suo
Differential Revision: D32858525
fbshipit-source-id: df58676938fd20c7bae4a366d70b2067a852282d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69778
This PR extends fusion pattern support from simple sequence of ops to a simple
subgraph like conv - add
```
x - conv ---\
y ---------add ---- ouptut
```
where input x, y and output are observed/quantized
Test Plan:
```
python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D33024528
fbshipit-source-id: 5c770c82c8f693fabdac5c69343942a9dfda84ef
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69658
This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one
TODO: we can also move this to backend_config_dict folder
Test Plan:
regression fusion test
```
python test/test_quantization.py TestFuseFx
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32974907
fbshipit-source-id: ba205e74b566814145f776257c5f5bb3b24547c1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69614
Previously sparse COO tensors were ignored during freezing, because
`tryInsertConstant` would fail during `freeze_module.cpp`, and because
hashes weren't implemented for COO tensor IValues.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D32954620
Pulled By: davidberard98
fbshipit-source-id: a91f97fdfc2152b417f43a6948100c94970c0831
Summary:
Refactor torch.profiler.profile by separate it into one low level class and one high level wrapper.
The PR include the following change:
1. separate class torch.profiler.profile into two separated class: kineto_profiler and torch.profiler.profile.
2. The former class has the low-level functionality exposed in C++ level like: prepare_profiler, start_profiler, stop_profiler.
3. The original logics in torch.profiler.profile including export_chrome_trace, export_stacks, key_averages, events, add_metadata are all moved into kineto_profiler since they are all exposed by the torch.autograd.profiler.
4. The new torch.profiler.profile is fully back-compatible with original class since it inherit from torch.profiler.kineto_profiler. Its only responsibility in new implementation is the maintenance of the finite state machine of ProfilerAction.
With the refactoring, the responsibility boundary is clear and the new logic is simple to understand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63302
Reviewed By: albanD
Differential Revision: D33006442
Pulled By: robieta
fbshipit-source-id: 30d7c9f5c101638703f1243fb2fcc6ced47fb690
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69381
Open source lowering workflow, related tools and tests.
Test Plan: CI
Reviewed By: 842974287
Differential Revision: D32815136
fbshipit-source-id: 3ace30833a2bc52e9b02513c5e223cb339fb74a3
Summary:
- PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model.
- Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788
Reviewed By: jansel
Differential Revision: D32283510
Pulled By: malfet
fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607
This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor
The state_dict support will be a follow up diff
ghstack-source-id: 145532834
Test Plan: python test_sharded_optim.py
Reviewed By: pritamdamania87
Differential Revision: D32539994
fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993
This PR attempts to port `index_add` to structured kernels, but does more than that:
* Adds an `out=` variant to `index_add`
* Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`.
* Changes in `derivatives.yaml` file for autograd functioning
* Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615
Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this)
~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~
Issue tracker: https://github.com/pytorch/pytorch/issues/55070
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D32646426
fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947
`_test_math_view` currently calls the operator with different values
than those specified in the `SampleInput`. This is undesirable as it
could break mathematical properties required by the operator. Instead,
this calls `math_op_view(math_op_physical(sample.input))` to get a
view that represents the same value as the original input.
`test_neg_view` already did this by returning `torch._neg_view(-x)`
from `math_op_view` but this moves the handling into `_test_math_view`
to make it apply to all view op tests.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D33064327
Pulled By: anjali411
fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272
In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D).
This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory.
In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread.
This new layout is not currently supported in CPU yet.
Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax
Reviewed By: ngimel
Differential Revision: D32605557
fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2
Summary:
`torch.movedim` directly handle the case of a scalar tensor (0-dim) in input as a no-op by returning a view of the input tensor (after all the usual checks for the other parameters)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69537
Test Plan:
This code now works fine and res1 is a view of tensor
```
import torch
tensor = torch.rand(torch.Size([]))
res1 = torch.movedim(tensor, 0, 0)
```
Fixes https://github.com/pytorch/pytorch/issues/69432
Reviewed By: jbschlosser
Differential Revision: D33020014
Pulled By: albanD
fbshipit-source-id: b3b2d380d70158bd3b3d6b40c073377104e09007
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69819
We should skip ReplaceWithCopy if the inputs to the operator can be updated during inference. For a set of tensors that share data, ReplaceWithCopy should not happen to any of them if there exists updates to any of them.
Currently, the check in place has missed some cases (suppose there exists updates, and uses <= 1). This diff addresses the missing cases by querying AliasDB.
Test Plan:
- Added test cases, including a one that is problematic before this diff
- CI
Reviewed By: mikeiovine
Differential Revision: D33052562
fbshipit-source-id: 61f87e471805f41d071a28212f2f457e8c6785e7