pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

drisspg 5dc9128229 FP8 rowwise scaling (#125204 ) # Summary This pull request introduces an fp8 row-scaling kernel as an optional implementation for `scaled_mm`. The kernel selection is based on the scaling tensors of the inputs. For inputs `x` and `y` of shape `[M, K]` and `[K, N]` respectively, the following conditions must be met: - `x`'s scale should be a 1-dimensional tensor of length `M`. - `y`'s scale should be a 1-dimensional tensor of length `N`. It's important to note that this kernel is not called "rowwise, columnwise" scaling because, although the scales for `y` are semantically along its columns, this implementation only supports the TN format. This means the scaling is along the faster-moving dimension, or the "row". The following two PRs were required to enable local builds: - [PR #126185](https://github.com/pytorch/pytorch/pull/126185) - [PR #125523](https://github.com/pytorch/pytorch/pull/125523) ### Todo We still do not build our Python wheels with this architecture. @ptrblck @malfet, should we replace `sm_90` with `sm_90a`? The NVRTC TMA shadowing feels wrong, but I a not sure the right way to spoof the symbol for this compilation unit: https://github.com/pytorch/pytorch/pull/125204/files#r1586986954 #### ifdef I tried to use : `#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 12000 && \ defined(__CUDA_ARCH__) && __CUDA_ARCH__ > 900` to gate the building of the kernel. I was having a hell of a time with this.. so I am not really sure the right way to do this Kernel Credit: @jwfromm Pull Request resolved: https://github.com/pytorch/pytorch/pull/125204 Approved by: https://github.com/lw, https://github.com/malfet		2024-06-05 15:46:40 +00:00
..
benchmark@0d98dba29d
cpp-httplib@3b6597bba9	[distributed] Add cpp-httplib to pytorch (#126470 )	2024-05-17 19:45:08 +00:00
cpuinfo@d6860c477c
cudnn_frontend@b740542818	[BE][Ez]: Update cudnn_frontend submodule to v1.4.0 (#127175 )	2024-05-29 14:23:38 +00:00
cutlass@bbe579a9e3
eigen@3147391d94
fbgemm@dbc3157bf2
flatbuffers@01834de25e
fmt@e69e5f977d
foxi@c278588e34
FP16@4dfe081cf6
FXdiv@b408327ac2
gemmlowp
gloo@5354032ea0
googletest@e2239ee604
ideep@55ca019168	[Reopen] Upgrade submodule oneDNN to v3.4.2 (#126137 )	2024-05-16 12:00:16 +00:00
ittapi@5b8a7d7422
kineto@be1317644c	update kineto submodule hash (#126780 )	2024-05-27 18:11:48 +00:00
mimalloc@b66e3214d8
miniz-2.1.0	Reland add `write_record_metadata` to PyTorchFileWriter (#126087 )	2024-05-14 21:48:44 +00:00
nccl	Update NCCL submodule to v2.20.5 (#121635 )	2024-03-11 17:23:59 +00:00
nlohmann@87cda1d664
NNPACK@c07e3a0400
onnx@990217f043	update submodule onnx==1.16.0 (#123125 )	2024-04-02 20:41:22 +00:00
opentelemetry-cpp@a799f4aed9	[rfc] opentelemetry in pytorch (#122999 )	2024-04-21 15:20:21 +00:00
pocketfft@9d3ab05a7f
protobuf@d1eca4e4b4
psimd@072586a71b
pthreadpool@4fe0e1e183
pybind11@3e9dfa2866	Upgrade submodule pybind to 2.12.0 (#122899 )	2024-03-31 11:29:40 +00:00
python-peachpy@f45429b087
sleef@60e76d2bce	Enable x86 CPU vectorization on windows [submodule sleef] (#118980 )	2024-03-31 03:07:32 +00:00
tensorflow_cuda_bazel_build/cuda
tensorpipe@52791a2fd2
valgrind-headers
VulkanMemoryAllocator@a6bfc23725
XNNPACK@fcbf55af6c
BUCK.oss	[fbcode] remove xcode_public_headers_symlinks (#125966 )	2024-05-13 15:06:35 +00:00
BUILD
build_bundled.py	[rfc] opentelemetry in pytorch (#122999 )	2024-04-21 15:20:21 +00:00
cpp-httplib.BUILD	Reapply "distributed debug handlers (#126601 )" (#127805 )	2024-06-04 19:44:30 +00:00
cuda.BUILD
cudnn.BUILD
cudnn_frontend.BUILD
cutlass.BUILD	FP8 rowwise scaling (#125204 )	2024-06-05 15:46:40 +00:00
eigen.BUILD
fmt.BUILD
foxi.BUILD
generate-cpuinfo-wrappers.py
generate-xnnpack-wrappers.py
glog.buck.bzl
gloo.BUILD
ideep.BUILD
kineto.buck.bzl
kineto.BUILD
LICENSES_BUNDLED.txt	[rfc] opentelemetry in pytorch (#122999 )	2024-04-21 15:20:21 +00:00
METADATA.bzl
mkl-dnn.BUILD	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )	2024-05-31 01:20:45 +00:00
mkl.BUILD	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )	2024-05-31 01:20:45 +00:00
mkl_headers.BUILD
onnx.BUILD
opentelemetry-cpp.BUILD	[rfc] opentelemetry in pytorch (#122999 )	2024-04-21 15:20:21 +00:00
README.md
sleef.BUILD	Enable x86 CPU vectorization on windows [submodule sleef] (#118980 )	2024-03-31 03:07:32 +00:00
sleef.bzl
substitution.bzl
tensorpipe.BUILD
xnnpack.buck.bzl
xnnpack_src_defs.bzl
xnnpack_wrapper_defs.bzl
xpu.txt	Update torch-xpu-ops pin (ATen XPU implementation) (#127879 )	2024-06-05 02:13:46 +00:00

README.md

This folder contains vendored copies of third-party libraries that we use.