pytorch/third_party
David Berard c19d19f6ff [profiler] support cuLaunchKernel (for triton kernel launches) & update kineto submodule (#99571)
**Background**: Prior to this PR, traces for PT2 w/ inductor don't contain connections between CUDA kernels and the CPU launch site. This PR adds those connections.

**Details**: Triton kernels launched by inductor use cuLaunchKernel instead of cudaLaunchKernel. cuLaunchKernel is part of the driver API, while cudaLaunchKernel is part of the runtime API. In order to support cuLaunchKernel, we added support in kineto (pytorch/kineto#752) to also start listening to driver events; hence why we need to update the kineto submodule.

After the change in kineto, we just need to turn this on in the PyTorch  repo by adding CUDA_DRIVER activity type to the CPU and CUDA activity type lists; then

**Testing**: Added test/inductor/test_profiler.py to check for `cuLaunchKernel` in json trace files.

Also, I ran this test:

```python
import torch

x = torch.rand((2, 2), device='cuda')

def fn(x):
    return x.relu()

fn_c = torch.compile(fn)
fn_c(x)

with torch.profiler.profile(with_stack=True) as prof:
    fn_c(x)

prof.export_chrome_trace("relu_profile.json")
```

which generated this chrometrace:
<img width="930" alt="Screenshot 2023-04-18 at 2 58 25 PM" src="https://user-images.githubusercontent.com/5067123/232966895-b65f9daf-7645-44f8-9e2b-f8c11c86ef0a.png">

in which you can see flows between a `cuLaunchKernel` on the CPU side, and the triton kernel on the GPU.

**Kineto Updates**: To get the kineto-side changes required for cupti driver events, this PR updates the kineto pin. In that updated kineto submodule, we also have:
* JSON string sanitizing for event names (likely fix for #99572)
* cuda initialization fixes for multiprocessing
* cuKernelLaunch events (i.e. for this PR)
* DISABLE_CUPTI_LAZY_REINIT (from @aaronenyeshi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99571
Approved by: https://github.com/ngimel, https://github.com/aaronenyeshi
2023-04-20 18:34:41 +00:00
..
benchmark@0d98dba29d
cpuinfo@8ec7bd91ad
cub@d106ddb991
cudnn_frontend@81a041a682 [Inductor] Fix OpenMP discovery on MacOS (#93895) 2023-02-03 09:13:13 +00:00
cutlass@b72cbf957d Revert "Update Cutlass to v2.11 (#94188)" 2023-02-13 19:03:36 +00:00
eigen@3147391d94
fbgemm@e07dda2d50 Update FBGEMM submodule (#99315) 2023-04-17 18:44:56 +00:00
flatbuffers@d0cede9c90
fmt@a33701196a [submodule] update libfmt to tag 9.1.0 (#93219) 2023-02-08 17:21:39 +00:00
foxi@c278588e34
FP16@4dfe081cf6
FXdiv@b408327ac2
gemmlowp
gloo@10909297fe [PTD][Oncall] Sync Reorder structure for compatibility with linux-6.0 and gloo submodule for PT (#92568) 2023-01-19 00:01:59 +00:00
googletest@e2239ee604
ideep@fe83782496 Fix ideep submodule (#98305) 2023-04-04 19:54:14 +00:00
ios-cmake@8abaed637d
ittapi@5b8a7d7422
kineto@21beef3787 [profiler] support cuLaunchKernel (for triton kernel launches) & update kineto submodule (#99571) 2023-04-20 18:34:41 +00:00
miniz-2.1.0 [caffe2] miniz fix -Wstrict-prototypes (#98027) 2023-04-03 16:56:47 +00:00
nccl Updates NCCL to 2.17.1 (#97843) 2023-04-17 22:53:54 +00:00
neon2sse@97a126f08c
nlohmann@87cda1d664
NNPACK@c07e3a0400
nvfuser Revert "[cuda rng] Making offset calculation independent of device properties (#98988)" 2023-04-19 17:23:40 +00:00
onnx@389b6bcb05 Update ONNX submodule from ONNX 1.13.1 with Protobuf 4.21 updates (#96138) 2023-03-28 16:55:10 +00:00
onnx-tensorrt@c153211418
pocketfft@ea778e3771
protobuf@d1eca4e4b4
psimd@072586a71b
pthreadpool@a134dd5d4c
pybind11@80dc998efc Revert submodule updates introduced by #89157 (#89449) 2022-11-22 05:48:43 +00:00
python-enum@4cfedc426c
python-peachpy@f45429b087
python-six@15e31431af
QNNPACK@7d2a4e9931
sleef@e0a003ee83
tbb@a51a90bc60
tensorflow_cuda_bazel_build/cuda
tensorpipe@52791a2fd2
valgrind-headers
VulkanMemoryAllocator@a6bfc23725
XNNPACK@51a987591a Revert "Update xnnpack to the latest commit (#95884)" 2023-03-15 05:32:36 +00:00
zstd@aec56a52fb
BUCK.oss
BUILD
build_bundled.py
cuda.BUILD
cudnn.BUILD
cutlass.BUILD
eigen.BUILD
fmt.BUILD
foxi.BUILD
generate-cpuinfo-wrappers.py
generate-xnnpack-wrappers.py update xnnpack to newer version and update API usage in pytorch (#94330) 2023-02-11 08:59:35 +00:00
glog.buck.bzl
gloo.BUILD [bazel] Fix gloo.BUILD (#92858) 2023-01-31 00:22:28 +00:00
ideep.BUILD
kineto.buck.bzl
kineto.BUILD
LICENSES_BUNDLED.txt Rebuild LICENSES_BUNDLED.txt (#95505) 2023-02-24 21:24:05 +00:00
METADATA.bzl
mkl-dnn.BUILD Fix oneDNN double checkout issue and Upgrade oneDNN to v2.7.3 (#92239) 2023-01-17 01:54:21 +00:00
mkl.BUILD Use @pytorch// in bazel build files (#89660) 2022-12-22 05:14:55 +00:00
mkl_headers.BUILD
onnx.BUILD Bump to stable ONNX 1.13.0 (#90332) 2023-02-08 11:49:06 +00:00
README.md
sleef.BUILD Use @pytorch// in bazel build files (#89660) 2022-12-22 05:14:55 +00:00
sleef.bzl
substitution.bzl
tbb.BUILD Use @pytorch// in bazel build files (#89660) 2022-12-22 05:14:55 +00:00
tbb.patch
tensorpipe.BUILD Use @pytorch// in bazel build files (#89660) 2022-12-22 05:14:55 +00:00
xnnpack.buck.bzl [XNNPACK] Enable S8 Operators (#97386) 2023-03-31 04:32:29 +00:00
xnnpack_src_defs.bzl [PyTorch][XNNPACK] Update wrappers for internal only x86 SSE2 kernels (#96896) 2023-03-17 03:07:39 +00:00
xnnpack_wrapper_defs.bzl [PyTorch][XNNPACK] Update wrappers for internal only x86 SSE2 kernels (#96896) 2023-03-17 03:07:39 +00:00

This folder contains vendored copies of third-party libraries that we use.