mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51050 Subclasses want to be able to make storage() calls throw, so we find some free space in TensorImpl to add a flag that they can set to make that happen without making storage() virtual. It should still be inlineable. ghstack-source-id: 121819684 Test Plan: Compared `perf stat` on 1M iterations on AdIndexer benchmark before/after Before: ``` 74,483.15 msec task-clock # 0.999 CPUs utilized ( +- 0.14% ) 16,637 context-switches # 0.223 K/sec ( +- 11.97% ) 3 cpu-migrations # 0.000 K/sec ( +- 7.20% ) 107,085 page-faults # 0.001 M/sec ( +- 2.39% ) 147,356,440,831 cycles # 1.978 GHz ( +- 0.14% ) (50.06%) 278,678,430,378 instructions # 1.89 insn per cycle ( +- 0.01% ) (50.05%) 43,540,698,177 branches # 584.571 M/sec ( +- 0.01% ) (50.05%) 141,028,843 branch-misses # 0.32% of all branches ( +- 1.00% ) (50.05%) ``` After: ``` 74,178.77 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 17,125 context-switches # 0.231 K/sec ( +- 3.41% ) 3 cpu-migrations # 0.000 K/sec 109,535 page-faults # 0.001 M/sec ( +- 1.04% ) 146,803,364,372 cycles # 1.979 GHz ( +- 0.30% ) (50.03%) 277,726,600,254 instructions # 1.89 insn per cycle ( +- 0.02% ) (50.03%) 43,299,659,815 branches # 583.720 M/sec ( +- 0.03% ) (50.03%) 130,504,094 branch-misses # 0.30% of all branches ( +- 1.14% ) (50.03%) ``` Looks like approximately 0.3% instruction count win (and similarly for cycles, but that's within noise). Reviewed By: ezyang Differential Revision: D26013815 fbshipit-source-id: 07939957929070e18b9981d492d8279c9bb33c55
33 lines
905 B
C++
33 lines
905 B
C++
#pragma once
|
|
|
|
#include <c10/core/TensorImpl.h>
|
|
|
|
namespace c10 {
|
|
|
|
struct C10_API UndefinedTensorImpl final : public TensorImpl {
|
|
public:
|
|
// Without this, we get:
|
|
// error: identifier "at::UndefinedTensorImpl::_singleton" is undefined in device code
|
|
// (ostensibly because the constexpr tricks MSVC into trying to compile this
|
|
// function for device as well).
|
|
#ifdef _WIN32
|
|
static inline TensorImpl * singleton() {
|
|
#else
|
|
static constexpr inline TensorImpl * singleton() {
|
|
#endif
|
|
return &_singleton;
|
|
}
|
|
IntArrayRef strides() const override;
|
|
int64_t size(int64_t d) const override;
|
|
int64_t stride(int64_t d) const override;
|
|
#ifdef DEBUG
|
|
bool has_storage() const override;
|
|
#endif
|
|
void set_storage_offset(int64_t offset) override;
|
|
private:
|
|
UndefinedTensorImpl();
|
|
static UndefinedTensorImpl _singleton;
|
|
const char* tensorimpl_type_name() const override;
|
|
};
|
|
|
|
} // namespace c10
|