pytorch/c10
Peter Bell 15b61d6c1a TensorImpl: Lazily compute numel and contiguity when symbolic (#112785)
Currently whenever the sizes or strides are modified for a `TensorImpl` we
eagerly recompute the numel and memory format flags. This is fine for static
shapes as it's all fast C++ code, but for symbolic shapes it runs slow python code.

This instead changes the `SymbolicShapeMeta` object to compute the derived
quantities lazily at the first request. This has the added benefit that we can
now pass assumptions in `empty_tensor_restride` which remove the need to compute
some contiguity flags at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112785
Approved by: https://github.com/ezyang
ghstack dependencies: #112689, #112890
2023-11-09 01:36:37 +00:00
..
benchmark
core TensorImpl: Lazily compute numel and contiguity when symbolic (#112785) 2023-11-09 01:36:37 +00:00
cuda c10::DriverAPI Try opening libcuda.so.1 (#112996) 2023-11-05 23:20:22 +00:00
hip [ROCm] remove HCC references (#111975) 2023-10-26 02:39:10 +00:00
macros Fix undefined __assert_fail on FreeBSD (#111761) 2023-10-23 12:46:03 +00:00
mobile
test Reland: Add lazy_clone_storage to create COW storages (#111579) 2023-10-20 15:49:59 +00:00
util Remove c10::variant (#112725) 2023-11-03 18:31:58 +00:00
BUCK.oss Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)" 2023-10-13 17:57:53 +00:00
BUILD.bazel
build.bzl Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)" 2023-10-13 17:57:53 +00:00
CMakeLists.txt Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)" 2023-10-13 17:57:53 +00:00
ovrsource_defs.bzl [Reland]Use cpuinfo to determine c10::ThreadPool thread number (#107339) 2023-09-14 23:44:23 +00:00