Commit graph

41 commits

Author SHA1 Message Date
Edward Yang
fe88806784 Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893

Original commit changeset: 50746043acf3

Test Plan: sandcastle and ossci

Reviewed By: malfet, seemethere, ngimel

Differential Revision: D21416509

fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b
2020-05-05 22:43:15 -07:00
Edward Yang
a2fc7f787a Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count
Test Plan: revert-hammer

Differential Revision:
D21171334

Original commit changeset: 37329a379de9

fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f
2020-05-05 16:36:15 -07:00
Kurt Mohler
3706803b60 Change StorageImpl to track byte count rather than element count (#37776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776

* Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl
* Changed numel() and set_numel() to nbytes() and set_nbytes()
* Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028

Differential Revision: D21171334

Pulled By: ezyang

fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8
2020-05-05 14:20:51 -07:00
Edward Yang
a894fff265 Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API"
Summary: Original commit changeset: 636e8a11afc6

Test Plan: export to OSS

Reviewed By: malfet

Differential Revision: D21170502

fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab
2020-04-22 09:18:23 -07:00
James Reed
2ccdc39dce Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API
Test Plan: revert-hammer

Differential Revision:
D21089648

Original commit changeset: 8d54329c1252

fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e
2020-04-21 12:21:45 -07:00
Edward Yang
01100cb477 Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742

Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before.  Instead of

```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```

you write

```
TORCH_LIBRARY(Namespace, m) {
  m.class_<Class>("Class")
    .def("foo", foo);
}
```

All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.

custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D21089648

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507
2020-04-21 10:05:21 -07:00
Edward Yang
e29348f828 Switch to pybind11 style registration function API. (#36258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258

Previous we had a && chaining style API.  There are some downsides to
this API:

- It's easy to forget the 'static' qualifier in front, leading to
  subtle ODR bugs.
- It is not compatible with torchbind class_ definitions, as these
  need multiple levels of chaining.  So in practice people end
  up having to define multiple static initializers, one per class.
- It's not like pybind11.
- There's no way to conveniently get the file and line number of
  the registration, as there is no macro point in the API.
- The old API doesn't really encourage people to put all of their
  definitions for a library in one place, and to give a custom
  namespace for it.  Similarly, the old API wasn't very DRY, because
  you had to keep repeating the namespace/dispatch key you
  were writing implementations for.

The new API is modeled exactly off of the PYBIND11_MODULE macro:
you write:

```
TORCH_LIBRARY(aten, m) {
  m.def("aten::add(Tensor self, Tensor other) -> Tensor");
  ...
}
```

in a non-chaining fashion, and under the hood the macro expands to
define a function, and define a static initializer that allocates
c10::Library (previously called c10::Module, but we renamed it
to avoid confusion with the existing NN module concept), passes
it to your function, and then retains it for the rest of the lifetime
of the program.  Specification of the namespace is mandatory,
and in later commit I plan to make it a hard error to TORCH_LIBRARY
the same library name twice.

If you are specifying an implementation for an existing operator
(e.g., you're the XLA backend, or even if you're just putting
registrations for implementations at the implementation site),
you should use TORCH_LIBRARY_IMPL, which instead takes a backend
argument (instead of namespace) and can be used to specify an
implementation for a backend.  Unlike TORCH_LIBRARY, you can do
as many of these as you want for a backend.

This needs updates to the mobile code analyzer.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929257

Pulled By: ezyang

fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada
2020-04-16 10:44:21 -07:00
Edward Yang
dd64e738c5 Expunge TensorId from all DispatchKey names. (#36240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240

It's annoying, historical, and unnecessary (enum class is already
namespaced).  I did this codemod with:

```
git grep -l 'CPUTensorId' | xargs sed -i 's/CPUTensorId/CPU/g'
git grep -l 'CUDATensorId' | xargs sed -i 's/CUDATensorId/CUDA/g'
git grep -l 'VariableTensorId' | xargs sed -i 's/VariableTensorId/Autograd/g'
git grep -l 'HIPTensorId' | xargs sed -i 's/HIPTensorId/HIP/g'
git grep -l 'MSNPUTensorId' | xargs sed -i 's/MSNPUTensorId/MSNPU/g'
git grep -l 'XLATensorId' | xargs sed -i 's/XLATensorId/XLA/g'
git grep -l 'PrivateUse1_TensorId' | xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g'
git grep -l 'PrivateUse2_TensorId' | xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g'
git grep -l 'PrivateUse3_TensorId' | xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g'
git grep -l 'AutocastTensorId' | xargs sed -i 's/AutocastTensorId/Autocast/g'
git grep -l '_PreAutogradTensorId' | xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g'
git grep -l 'TESTING_ONLY_GenericWrapperTensorId' | xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g'
git grep -l 'TESTING_ONLY_GenericModeTensorId' | xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g'
```

Then I did a git grep for remaining TensorId occurrences, and manually
killed those (mostly in codegen, and some docs that needed updating).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929255

Pulled By: ezyang

fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d
2020-04-13 23:33:44 -07:00
Edward Yang
2de3e491a8 [RELAND] Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#36223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36223

Previously #35714

There are a lot of unboxed only defs.  We're committed to removing
them at the end of the half but as I am about to do a lot of porting
to the new API, let's get them into a form where they're easy to
remove.  This is a new overload impl_UNBOXED that will pass
the function pointer straight to CppFunction::makeUnboxedOnly

I don't attempt to make the _UNBOXED API complete; in particular,
catchall declarations don't get this sugar (as there are very few
of them).

To get some coverage of _UNBOXED API for code analysis, I switched
one of our unboxed tests to be an impl rather than a def.  This
shouldn't materially affect coverage.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929259

Pulled By: ezyang

fbshipit-source-id: 72d2061b6c8a6afbcd392b47f53ade18de2f9184
2020-04-09 14:58:33 -07:00
Edward Yang
ef07bb65e9 [RELAND] Add DispatchKey impl overload; remove use of torch::dispatch (#36222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36222

Reland of #35706, with fixes to code analyzer.

It is extremely common to define implementations of operators at a
specific dispatch key, so we add an overload to impl specifically for
this case.  I then delete most uses of torch::dispatch

dispatch_autograd call sites can't make use of this overload.  So
instead the new preferred way to specify something as autograd is to
pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA
which we support today).

I flip flopped about whether or not kAutograd should have the type
DispatchKey or some other type (to help better encapsulate the
DispatchKey enum); this is more direct and I can't think of any
BC problems from this usage.

Some other reorganization I did:
- I renamed all of the worker functions in op_registration to have
  a leading underscore and made them private, just to make it more
  clear what the public versus private API were (the private API
  shouldn't be used by users because it doesn't come with && overloads)
  Note that this means I needed to adjust the regex in the
  code analyzer, because
- In a few places where I was touching lines already, I replaced
  full DispatchKey typed out enums with shorter kFoo names, similar
  to kAutograd but I didn't publish these globally.
- Code analyzer now prints a unified diff, and in the other order
  (because I tend to think of the diff as reporting how the /new/ result
  is different)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20929256

Pulled By: ezyang

fbshipit-source-id: c69b803d2b3a1a8aff70e14da33d3adec5239f13
2020-04-09 14:56:55 -07:00
Karl Ostmo
0f99b28431 Revert D20775783: Add DispatchKey impl overload; remove use of torch::dispatch
Test Plan: revert-hammer

Differential Revision:
D20775783

Original commit changeset: e45b289e5d1f

fbshipit-source-id: 08551428fa886e93cfda14eb51a0f920c335df34
2020-04-02 10:51:50 -07:00
Karl Ostmo
e67951af63 Revert D20775782: Add temporary impl_UNBOXED syntax sugar for unboxed-only defs.
Test Plan: revert-hammer

Differential Revision:
D20775782

Original commit changeset: c5e804c69f59

fbshipit-source-id: 2198e715eb3a24d198a949a44ec192bec523ffb4
2020-04-02 10:48:51 -07:00
Edward Yang
8e951c5793 Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#35714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35714

There are a lot of unboxed only defs.  We're committed to removing
them at the end of the half but as I am about to do a lot of porting
to the new API, let's get them into a form where they're easy to
remove.  This is a new overload impl_UNBOXED that will pass
the function pointer straight to CppFunction::makeUnboxedOnly

I don't attempt to make the _UNBOXED API complete; in particular,
catchall declarations don't get this sugar (as there are very few
of them).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20775782

Pulled By: ezyang

fbshipit-source-id: c5e804c69f5961c9d4862f6c5dbbe4c524cc32cc
2020-04-02 08:52:54 -07:00
Edward Yang
2db61193bb Add DispatchKey impl overload; remove use of torch::dispatch (#35706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35706

It is extremely common to define implementations of operators at a
specific dispatch key, so we add an overload to impl specifically for
this case.  I then delete most uses of torch::dispatch

dispatch_autograd call sites can't make use of this overload.  So
instead the new preferred way to specify something as autograd is to
pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA
which we support today).

I flip flopped about whether or not kAutograd should have the type
DispatchKey or some other type (to help better encapsulate the
DispatchKey enum); this is more direct and I can't think of any
BC problems from this usage.

Some other reorganization I did:
- I renamed all of the worker functions in op_registration to have
  a leading underscore and made them private, just to make it more
  clear what the public versus private API were (the private API
  shouldn't be used by users because it doesn't come with && overloads)
- In a few places where I was touching lines already, I replaced
  full DispatchKey typed out enums with shorter kFoo names, similar
  to kAutograd but I didn't publish these globally.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20775783

Pulled By: ezyang

fbshipit-source-id: e45b289e5d1f86c180b24cf14c63cf4459ab5337
2020-04-02 08:51:22 -07:00
Edward Yang
4f4ed5c108 Disable c10::import(ns) (#35398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35398

This disables namespaced c10::import which is broken with custom
mobile op builds.  This is to help prevent people from accidentally
breaking the custom mobile build in a mysterious way; if they use
the longform version it will work.  Fixing the analyzer is tracked
in https://github.com/pytorch/pytorch/issues/35397

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20680519

Pulled By: ezyang

fbshipit-source-id: a18ac8df7e72bf399807870beedb828131273e48
2020-03-30 21:12:49 -07:00
Edward Yang
9e3605de98 [RELAND] New operator registration API (#35061) (#35629)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed
the get qualified type name magic from debug strings to work around
MSVC 2017 bug.

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  Tests use this to
  distinguish between multiple distinct registrations; regular
  invocations get only very basic information.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629

Differential Revision: D20724551

Pulled By: ezyang

fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41
2020-03-29 19:48:29 -07:00
Edward Yang
227beb9095 Revert D20680520: New operator registration API
Test Plan: revert-hammer

Differential Revision:
D20680520

Original commit changeset: 5d39a28e4ec7

fbshipit-source-id: 5b2497ffc24db9a05b01d526f161bc0164f9f707
2020-03-28 14:49:56 -07:00
Edward Yang
28ab8c6ff8 New operator registration API (#35061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  There's some best
  effort stuff based on __FUNCSIG__ but this is only really
  capable of reporting types and not function symbols.  Tests
  use this to distinguish between multiple distinct registrations.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20680520

Pulled By: ezyang

fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5
2020-03-28 10:52:49 -07:00
Edward Yang
22963f42ec Delete unnecessary aliasAnalysis specification from operator registrations. (#33093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093

In #30187 the aliasAnalysis field on operator registration was updated
so that alias analysis could be specified in only some registration call
sites, rather than requiring it be consistently specified in all call
sites.  With this change, we can eliminate the requirement that all
registrations specify aliasAnalysis; as long as we know *one* site
specifies the correct aliasAnalysis, we don't have to specify it
any of the other sites.

In this patch, the "one site" is TypeDefault.cpp (previously we only
generated these stub declarations for manually registered functions,
but now we generate the stubs for everything).  Then I delete aliasAnalysis
anywhere we register an op for an existing function (which is a lot
of places).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19837897

Pulled By: ezyang

fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2
2020-02-21 14:24:44 -08:00
Pavel Belevich
62b06b9fae Rename TensorTypeId to DispatchKey (#32154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154

TensorTypeId -> DispatchKey
	c10/core/TensorTypeId.h -> c10/core/DispatchKey.h
	c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp
	TensorTypeId::* -> DispatchKey::*
	TensorTypeId type_id -> DispatchKey dispatch_key
		type_id -> dispatch_key
	TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys
	RealTensorTypeId -> RealDispatchKey
TensorTypeSet -> DispatchKeySet
	TensorTypeIds -> DispatchKeys
	c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h
	c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp
	type_set() -> key_set()
	type_set_ -> key_set_
	typeSet -> keySet
ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard
IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard
LocalTensorTypeSet -> LocalDispatchKeySet
	c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h
	c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp
	tls_local_tensor_type_set -> tls_local_dispatch_key_set
	tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded
	tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded
	tls_is_tensor_type_id_included -> tls_is_dispatch_key_included
	tls_set_tensor_type_id_included -> tls_set_dispatch_key_included
MultiDispatchTensorTypeSet -> MultiDispatchKeySet
	multi_dispatch_tensor_type_set -> multi_dispatch_key_set
tensorTypeIdToBackend -> dispatchKeyToBackend
backendToTensorTypeId -> backendToDispatchKey
initForTensorTypeSet -> initForDispatchKeySet
inferred_type_set -> inferred_key_set
computeTensorTypeId -> computeDispatchKey
PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set
get_default_tensor_type_id -> get_default_dispatch_key
inferred_type_id -> inferred_dispatch_key
actual_type_id -> actual_dispatch_key
typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_
get_type_id() -> get_dispatch_key()
legacyExtractTypeId -> legacyExtractDispatchKey
extractTypeId -> extractDispatchKey

Test Plan: Imported from OSS

Differential Revision: D19398900

Pulled By: pbelevich

fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776
2020-01-15 11:16:08 -08:00
Sebastian Messmer
8321f2592e Register ATen ops with c10 (#26131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131

Changes in this PR:
- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
- Enable the use_c10_dispatcher: True flag for about ~70% of operators
- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.

Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):
- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser
- out functions have different argument order in C++ as in the jit schema
- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- fixed-size arrays like `int[3]` not supported in c10 yet

These will be fixed in separate diffs and then the exclusion tag will be removed.
ghstack-source-id: 90060748

Test Plan: a diff stacked on top uses these registrations to call these ops from ATen

Differential Revision: D16603131

fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb
2019-09-13 13:52:40 -07:00
Mike Ruberry
a024e1e091 Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130)
Summary:
Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130

Test Plan: Two tests were added to cuda_stream_test for this functionality.

Differential Revision: D17145538

Pulled By: mruberry

fbshipit-source-id: 2546c5907c038412e03aa0d3328a972b0164c455
2019-09-01 12:37:52 -07:00
Edward Yang
58a0dee749 Replace open registration TensorTypeId with closed enum. (#25252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252

Our model going forward for extensions will be that you will have to
get an allocation of an ID in our system.  This is how things work
in practice today; we're just simplifying our underlying registration
since there is no need to have distributed registration.

There are some codemods in this diff:

```
codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId'
codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId'
codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId'
codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId'
codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId'
codemod --extensions cpp 'TensorType1' 'CPUTensorId'
codemod --extensions cpp 'TensorType2' 'CUDATensorId'
codemod --extensions cpp 'TensorType3' 'XLATensorId'
```

The main hand-written changes are in c10/core/TensorTypeId.h

Other manual fixes:

- aten/src/ATen/core/op_registration/op_registration.cpp - stop using
  std::string operator+
- aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that
  wasn't caught by codemod
- torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration
  of TensorTypeId
- aten/src/ATen/core/op_registration/ - remove out-of-line registration

Differential Revision: D17072001

Test Plan: ossci and sandcastle

Pulled By: ezyang

fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1
2019-08-29 08:55:58 -07:00
Edward Yang
529bb859b2 Revert D17052534: [pytorch][PR] Creates Torch-friendly Event class and adds Stream tracking to autograd
Test Plan: revert-hammer

Differential Revision:
D17052534

Original commit changeset: d91b308ad0f7

fbshipit-source-id: dacc7e70a835a8fa6ae71246999b4eff3383f3f3
2019-08-28 08:24:43 -07:00
Ailing Zhang
858493d168 generic overrideable convolution for backends (#23562)
Summary:
One possible solution based on our discussion yesterday: ezyang gchanan zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23562

Differential Revision: D16998161

Pulled By: ailzhang

fbshipit-source-id: 07fe3a335f43b4205a421b3521aeb5fa4dc80279
2019-08-27 18:33:21 -07:00
Mike Ruberry
433fe47d95 Creates Torch-friendly Event class and adds Stream tracking to autograd (#25130)
Summary:
Resubmission of https://github.com/pytorch/pytorch/issues/23424 because previous PR was borked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25130

Differential Revision: D17052534

Pulled By: mruberry

fbshipit-source-id: d91b308ad0f730646bb7b3492a601cd9b05c72d8
2019-08-26 15:19:06 -07:00
Sebastian Messmer
02f794b102 Add overload names to native_functions.yaml (#23532)
Summary:
We need this to be able to register them with the c10 dispatcher.

The overload names are based on one-letter-per-argument-type.

Script used to change native_functions.yaml and derivatives.yaml: P75630718

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23532
ghstack-source-id: 87539687

Differential Revision: D16553437

fbshipit-source-id: a1d0f10c42d284eba07e2a40641f71baa4f82ecf
2019-08-01 02:08:37 -07:00
Roy Li
0a04513367 Remove old Type based backend extensions (#22009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22009
ghimport-source-id: e481b64707434a1abdc382fd80bd70f165540711

Test Plan: Imported from OSS

Differential Revision: D15914755

Pulled By: li-roy

fbshipit-source-id: 9230b8b234f71a5d865bf6bca93347c68c349ff7
2019-07-30 14:07:46 -07:00
Brian Vaughan
97a604ef57 Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456)
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412

Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456

Differential Revision: D16097159

Pulled By: nairbv

fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
2019-07-03 20:03:25 -07:00
Wanchao Liang
dff2c07183 Manual revert of D16012838
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412

Reviewed By: nairbv, houseroad

Differential Revision: D16079809

fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6
2019-07-01 19:58:21 -07:00
Brian Vaughan
7707dee761 Re apply optional ScalarType changes (#22237)
Summary:
This is (mostly) the re-application of:
https://github.com/pytorch/pytorch/pull/21088

which was reverted due to an issue conflicting with changes in:
https://github.com/pytorch/pytorch/pull/22104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237

Differential Revision: D16012838

Pulled By: nairbv

fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883
2019-06-26 13:36:25 -07:00
Michael Suo
e016a424ef Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter
Differential Revision:
D15944971

Original commit changeset: 53473c370813

fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10
2019-06-24 09:41:33 -07:00
Brian Vaughan
142361a7e4 merge interfaces that have an optional scalartype parameter (#21088)
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```

[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088

Reviewed By: ailzhang

Differential Revision: D15944971

Pulled By: nairbv

fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
2019-06-24 07:17:58 -07:00
Will Feng
8cde4c4d22 Remove Variable::Impl and DifferentiableViewImpl (#17072)
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.

After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.

**Note that this PR is BC-breaking in the following use cases:**

**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.

**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
    # Change gradient to a sparse tensor
    params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))

grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad)  # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072

Differential Revision: D14075257

Pulled By: yf225

fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
2019-05-23 21:09:04 -07:00
Will Feng
c7b5a8a876 Change is_variable() to check existence of AutogradMeta, and remove is_variable_ (#19139)
Summary:
Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead.

Removing `is_variable_` is part of the work in Variable/Tensor merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139

Differential Revision: D14893339

Pulled By: yf225

fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a
2019-04-11 14:03:33 -07:00
Edward Yang
515238e0a5 Unify cudaGetDeviceCount implementations. (#18445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18445
ghimport-source-id: 30d018737bf6989bc68b7e3676f44e0ca6141fde

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18445 Unify cudaGetDeviceCount implementations.**

I went about doing this by searching for calls to cudaGetDeviceCount,
and then methodically replacing them with references to c10::cuda::device_count()
or at::cuda::device_count().

There is a point to doing this: the various implementations wildly differed
in their handling of what to do when cudaGetDeviceCount returns an error.
The final standardized behavior is that **all errors are swallowed** and
we return device count of zero.  This indirectly fixes running CUDA builds
on CPU, which was broken in #17847.

I added 'noexcept' to the 'deviceCount' virtual method on DeviceGuardImpl.
This is a BC-breaking change for anyone inheriting from DeviceGuardImpl
but all you need to do is put 'noexcept' on your method and it is backwards
compatible with older libtorch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14612189

fbshipit-source-id: 3c8d186e3dd623c0e27625212c7ce30f75d943cb
2019-03-26 09:50:14 -07:00
Davide Libenzi
272a48f6fe Enable autograd to recognize the XLA backend as one providing multiple devices (#17847)
Summary:
…e devices, while not being CUDA/HIP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17847

Differential Revision: D14545634

Pulled By: ezyang

fbshipit-source-id: 417181bf2ff4f8978544afe2fb6b042e787854ed
2019-03-20 13:58:36 -07:00
Roy Li
340cf2a2dd Generate derived extension backend Type classes for each scalar type (#17278)
Summary:
Previously we only generate one class for each extension backend. This caused issues with scalarType() calls and mapping from variable Types to non-variable types. With this change we generate one Type for each scalar type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17278

Reviewed By: ezyang

Differential Revision: D14161489

Pulled By: li-roy

fbshipit-source-id: 91e6a8f73d19a45946c43153ea1d7bc9d8fb2409
2019-02-22 18:38:33 -08:00
Gregory Chanan
6454e3262d Make getting the dtype of a tensor work for backend extensions.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17131

Differential Revision: D14093163

Pulled By: gchanan

fbshipit-source-id: 06638706e26505e3c741b7ae290000ca258599db
2019-02-15 13:47:37 -08:00
Edward Yang
4404762d7d Rename IntList to IntArrayRef. (#16751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751

This was made more complicated by the fact that ivalue::IntList
is a thing.  So I had to fix all of the sites where we referring
to IValue post facto.

The following codemods were run, in this order:

```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```

Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752

Reviewed By: dzhulgakov

Differential Revision: D13954363

fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
2019-02-05 14:54:34 -08:00
Roy Li
4c803f4ebd Expose backend extensions to python
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16582

Reviewed By: gchanan

Differential Revision: D13887539

fbshipit-source-id: 8755babf2e3e849af974655f2f3a91740efe977e
2019-02-01 11:00:18 -08:00