Not originally mentioned in the tracking issue #58414, but is a nice-to-have feature. In summary, the errata filter allows known problematic kernels to be skipped instead of irrecoverably crashing a CUDA context (e.g., via an illegal memory access) via a JSON file supplied at run time. cuDNN frontend description: https://github.com/NVIDIA/cudnn-frontend#errata-filter
Sample errata filter JSON:
```
{
"version" : 1,
"rules" : [
{
"rule_id" : "avoid_bad_bwd_data",
"operation" : "ConvBwdData",
"engine" : 12,
"cudnn_version_start" : 8000,
"cudnn_version_end" : 9000
}
]
}
```
CC @ngimel @zasdfgbnm @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73934
Approved by: https://github.com/ngimel
We recently updated `SyncBatchNorm` to support empty input batches.
The new code removes stats from ranks with empty inputs. However,
this change breaks CUDA graph capture as it forces CPU sync. This
commit uses `is_current_stream_capturing()` to guard the new code
path, and only run the new code when not capturing CUA Graphs. To
support empty inputs with CUDA graph capturing, we might need to
update CUDA kernels for `batch_norm_backward_elemt` and
`batch_norm_gather_stats_with_counts`. See #78656.
Fixes#78549
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78666
Approved by: https://github.com/albanD
Add support for decorating functions with variable length arguments in `quantized_args`. This is needed to decorate functions like `symbolic_fn` in `_interpolate_helper` which takes `*args`.
Previously it is not possible to decorate functions like it. Now we can do
```python
@quantized_args(True)
def symbolic_fn(g, input, output_size, *args):
...
```
and the rest of the params are defaulted to non-quantized.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78775
Approved by: https://github.com/garymm
Test was marked as `skip` due ot a memory leak. Turns out the memory leak is expected - it can be fixed by clearing the compilation unit (with `torch.jit._state._python_cu.drop_all_functions()` at the end of the test function) or by disabling the leak detector on this test.
Fixes#77618
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78566
Approved by: https://github.com/eellison
Fixes#78236
An erronously shaped weights vector will result in the following output
```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/datarwe/pytorch/torch/utils/data/sampler.py in <module>
[274](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=273) WeightedRandomSampler([1,2,3], 10)
----> [275](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=274) WeightedRandomSampler([[1,2,3], [4,5,6]], 10)
~/datarwe/pytorch/torch/utils/data/sampler.py in __init__(self, weights, num_samples, replacement, generator)
[192](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=191) weights = torch.as_tensor(weights, dtype=torch.double)
[193](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=192) if len(weights.shape) != 1:
--> [194](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=193) raise ValueError("weights should be a 1d sequence but given "
[195](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=194) "weights have shape {}".format(tuple(weights.shape)))
[196](file:///home/oliver/datarwe/pytorch/torch/utils/data/sampler.py?line=195)
ValueError: weights should be a 1d sequence but given weights have shape (2, 3)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78585
Approved by: https://github.com/NivekT, https://github.com/ejguan
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78297
Clone followed by expand/expand_as due to memoryOverlap check on copy_ native method. Refer to T118519310 for more details.
Crashing test case:
a = tensor(3,1) // strides = (1,1)
B = tensor(3,2) // strides = (2,1)
Temp = a.expand_as(b). // creates temp with shape as (3,2) and strides as (1,0)
temp.clone() // crashe on copy_ due to memoryOverlap
Fix: Disable the out variant for the expanded tensor.
- Calls native clone instead of out variant for clone dealing with expanded tensors
- Added test case for both clone variants (out and native clones)
- Increased the tensor size for memory planner test case to trigger dynamic allocation
Test Plan:
buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators
buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest
Differential Revision: D36672180
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78322
Approved by: https://github.com/mikeiovine
Fixes various friction points with the documentation for onboarding new users and remove instructions that were no longer valid
Changes include:
- Listing prerequisites earlier, so that devs can ensure they're met before encountering error messages
- Removing linter invocations that are no longer valid
- Modifying instructions to install mkl packages to only apply to x86 based CPUs
[skip ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78682
Approved by: https://github.com/seemethere, https://github.com/janeyx99, https://github.com/malfet
Summary:
Heavily referenced how Hardswish was implemented.
This is a great intro task to get a taste of how a torch method is implemented in shader and tested.
Test Plan:
Compared in metal shader metal version and cpu version result in tests.
https://pxl.cl/251kT
Reviewed By: SS-JIA
Differential Revision: D36732187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78544
Approved by: https://github.com/SS-JIA