Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
#pragma once
|
|
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
#include <c10/core/DispatchKeySet.h>
|
2023-05-31 19:19:05 +00:00
|
|
|
#include <c10/macros/Export.h>
|
2019-09-11 15:55:51 +00:00
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
// TLS management for DispatchKeySet (the "local" DispatchKeySet(s))
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
//
|
2020-01-15 19:12:17 +00:00
|
|
|
// This manages two thread-local DispatchKeySets:
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
//
|
|
|
|
|
// - The included type set, which adds a tensor type for consideration
|
2020-04-14 06:28:32 +00:00
|
|
|
// in dispatch. (For example, you might add Profiling to
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
// the included type set to turn on profiling on all tensor operations.)
|
2019-09-11 15:55:51 +00:00
|
|
|
//
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
// - The excluded type set, which disqualifies a tensor type from dispatch.
|
|
|
|
|
// (For example, after redispatching on variable, we disqualify
|
2020-04-14 06:28:32 +00:00
|
|
|
// Autograd so we don't attempt to handle variable again.)
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
// (Exclusion wins over inclusion.)
|
2019-09-11 15:55:51 +00:00
|
|
|
//
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
// NB: Originally, I implemented the excluded type set as storing the inverted
|
|
|
|
|
// set, but TLS is defined to be zero-initialized, so this doesn't actually work
|
|
|
|
|
// (if it's inverted, you want the set to be -1 initialized).
|
2019-09-11 15:55:51 +00:00
|
|
|
|
2023-12-20 12:22:21 +00:00
|
|
|
namespace c10::impl {
|
2019-09-11 15:55:51 +00:00
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
// POD version of LocalDispatchKeySet. Declared here just so that
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
// we can put it in the guards.
|
2021-03-31 17:46:38 +00:00
|
|
|
// This struct encapsulates special handling for TLS initialization
|
|
|
|
|
// in set_included()/included() API so that they reflect the truth.
|
|
|
|
|
// If you want to create PODLocalDispatchKeySet with non-zero state,
|
|
|
|
|
// use set_included() instead of default constructor.
|
2020-01-15 19:12:17 +00:00
|
|
|
struct C10_API PODLocalDispatchKeySet {
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
uint64_t included_;
|
|
|
|
|
uint64_t excluded_;
|
|
|
|
|
|
2021-03-31 17:46:38 +00:00
|
|
|
// See Note [TLS Initialization]
|
2020-01-15 19:12:17 +00:00
|
|
|
DispatchKeySet included() const {
|
2021-03-31 17:46:38 +00:00
|
|
|
return DispatchKeySet(DispatchKeySet::RAW, included_) ^
|
|
|
|
|
c10::default_included_set;
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
}
|
2020-01-15 19:12:17 +00:00
|
|
|
DispatchKeySet excluded() const {
|
2021-04-30 15:45:08 +00:00
|
|
|
return DispatchKeySet(DispatchKeySet::RAW, excluded_) ^
|
|
|
|
|
c10::default_excluded_set;
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
}
|
|
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
void set_included(DispatchKeySet x) {
|
2021-03-31 17:46:38 +00:00
|
|
|
included_ = (x ^ c10::default_included_set).raw_repr();
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
}
|
2020-01-15 19:12:17 +00:00
|
|
|
void set_excluded(DispatchKeySet x) {
|
2021-04-30 15:45:08 +00:00
|
|
|
excluded_ = (x ^ c10::default_excluded_set).raw_repr();
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
}
|
|
|
|
|
};
|
2020-01-15 19:12:17 +00:00
|
|
|
static_assert(
|
2023-12-09 17:16:04 +00:00
|
|
|
std::is_trivial_v<PODLocalDispatchKeySet>,
|
2020-01-15 19:12:17 +00:00
|
|
|
"PODLocalDispatchKeySet must be a POD type.");
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
struct C10_API LocalDispatchKeySet {
|
|
|
|
|
/* implicit */ LocalDispatchKeySet(PODLocalDispatchKeySet x)
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
: included_(x.included()), excluded_(x.excluded()) {}
|
2020-01-15 19:12:17 +00:00
|
|
|
DispatchKeySet included_;
|
|
|
|
|
DispatchKeySet excluded_;
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
};
|
|
|
|
|
|
2021-01-06 22:39:42 +00:00
|
|
|
// thread_local variables cannot be C10_API on Windows.
|
2021-04-21 00:16:20 +00:00
|
|
|
// Inlining this seems to break AutoDispatchBelowAutograd on Android.
|
2021-09-13 17:48:55 +00:00
|
|
|
#if defined(_MSC_VER) || defined(C10_ANDROID) || defined(C10_IPHONE)
|
2020-12-14 20:43:59 +00:00
|
|
|
C10_API LocalDispatchKeySet tls_local_dispatch_key_set();
|
2021-09-13 17:48:55 +00:00
|
|
|
#else // defined(_MSC_VER) || defined(C10_ANDROID) || defined(C10_IPHONE)
|
2021-01-06 22:39:42 +00:00
|
|
|
extern C10_API thread_local PODLocalDispatchKeySet raw_local_dispatch_key_set;
|
|
|
|
|
|
|
|
|
|
inline C10_API LocalDispatchKeySet tls_local_dispatch_key_set() {
|
|
|
|
|
// Don't let people fiddle with the thread_local directly just
|
|
|
|
|
// because they include this header.
|
|
|
|
|
return raw_local_dispatch_key_set;
|
|
|
|
|
}
|
2021-09-13 17:48:55 +00:00
|
|
|
#endif // defined(_MSC_VER) || defined(C10_ANDROID) || defined(C10_IPHONE)
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
|
2020-04-01 08:51:34 +00:00
|
|
|
// Internal, use ThreadLocalStateGuard
|
|
|
|
|
C10_API void _force_tls_local_dispatch_key_set(LocalDispatchKeySet key_set);
|
|
|
|
|
|
2019-11-21 15:10:14 +00:00
|
|
|
// RAII API for manipulating the thread-local dispatch state.
|
|
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
class C10_API IncludeDispatchKeyGuard {
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
public:
|
2020-08-08 23:24:55 +00:00
|
|
|
IncludeDispatchKeyGuard(DispatchKeySet);
|
|
|
|
|
IncludeDispatchKeyGuard(DispatchKey k)
|
|
|
|
|
: IncludeDispatchKeyGuard(DispatchKeySet(k)) {}
|
2020-01-29 21:16:37 +00:00
|
|
|
IncludeDispatchKeyGuard(const IncludeDispatchKeyGuard&) = delete;
|
|
|
|
|
IncludeDispatchKeyGuard operator=(const IncludeDispatchKeyGuard&) = delete;
|
|
|
|
|
IncludeDispatchKeyGuard(IncludeDispatchKeyGuard&&) = delete;
|
|
|
|
|
IncludeDispatchKeyGuard operator=(IncludeDispatchKeyGuard&&) = delete;
|
2020-01-15 19:12:17 +00:00
|
|
|
~IncludeDispatchKeyGuard();
|
2021-05-01 04:22:23 +00:00
|
|
|
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
private:
|
|
|
|
|
// A little micro-optimization to save us from tls_get_addr call
|
|
|
|
|
// on destruction
|
2020-01-15 19:12:17 +00:00
|
|
|
PODLocalDispatchKeySet* tls_;
|
2020-08-08 23:24:55 +00:00
|
|
|
DispatchKeySet include_;
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
};
|
|
|
|
|
|
2020-01-15 19:12:17 +00:00
|
|
|
class C10_API ExcludeDispatchKeyGuard {
|
Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719
This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.
General structure of the PR:
* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.
The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:
* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.
One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D17549624
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 19:18:55 +00:00
|
|
|
public:
|
2020-08-08 23:24:55 +00:00
|
|
|
ExcludeDispatchKeyGuard(DispatchKeySet);
|
|
|
|
|
ExcludeDispatchKeyGuard(DispatchKey k)
|
|
|
|
|
: ExcludeDispatchKeyGuard(DispatchKeySet(k)) {}
|
2020-01-29 21:16:37 +00:00
|
|
|
ExcludeDispatchKeyGuard(const ExcludeDispatchKeyGuard&) = delete;
|
|
|
|
|
ExcludeDispatchKeyGuard operator=(const ExcludeDispatchKeyGuard&) = delete;
|
|
|
|
|
ExcludeDispatchKeyGuard(ExcludeDispatchKeyGuard&&) = delete;
|
|
|
|
|
ExcludeDispatchKeyGuard operator=(ExcludeDispatchKeyGuard&&) = delete;
|
2020-01-15 19:12:17 +00:00
|
|
|
~ExcludeDispatchKeyGuard();
|
2021-05-01 04:22:23 +00:00
|
|
|
|
2020-08-06 20:13:37 +00:00
|
|
|
private:
|
|
|
|
|
// A little micro-optimization to save us from tls_get_addr call
|
|
|
|
|
// on destruction
|
|
|
|
|
PODLocalDispatchKeySet* tls_;
|
|
|
|
|
DispatchKeySet exclude_;
|
|
|
|
|
};
|
|
|
|
|
|
2022-02-15 18:54:32 +00:00
|
|
|
struct C10_API ForceDispatchKeyGuard {
|
|
|
|
|
public:
|
2024-04-09 17:56:26 +00:00
|
|
|
ForceDispatchKeyGuard()
|
|
|
|
|
: saved_keyset_(c10::impl::tls_local_dispatch_key_set()) {}
|
2022-02-16 15:32:33 +00:00
|
|
|
ForceDispatchKeyGuard(c10::impl::LocalDispatchKeySet key_set)
|
2024-04-09 17:56:26 +00:00
|
|
|
: ForceDispatchKeyGuard() {
|
2022-02-15 18:54:32 +00:00
|
|
|
c10::impl::_force_tls_local_dispatch_key_set(key_set);
|
|
|
|
|
}
|
2023-09-21 20:37:12 +00:00
|
|
|
ForceDispatchKeyGuard(
|
|
|
|
|
c10::DispatchKeySet include,
|
|
|
|
|
c10::DispatchKeySet exclude)
|
2024-04-09 17:56:26 +00:00
|
|
|
: ForceDispatchKeyGuard() {
|
2023-09-21 20:37:12 +00:00
|
|
|
auto updated_set = saved_keyset_;
|
|
|
|
|
updated_set.included_ = include;
|
|
|
|
|
updated_set.excluded_ = exclude;
|
|
|
|
|
c10::impl::_force_tls_local_dispatch_key_set(updated_set);
|
|
|
|
|
}
|
2024-10-23 00:16:53 +00:00
|
|
|
|
|
|
|
|
ForceDispatchKeyGuard(ForceDispatchKeyGuard&&) noexcept = delete;
|
|
|
|
|
ForceDispatchKeyGuard(const ForceDispatchKeyGuard&) = delete;
|
|
|
|
|
ForceDispatchKeyGuard& operator=(const ForceDispatchKeyGuard&) = delete;
|
|
|
|
|
ForceDispatchKeyGuard& operator=(ForceDispatchKeyGuard&&) = delete;
|
2022-02-15 18:54:32 +00:00
|
|
|
~ForceDispatchKeyGuard() {
|
|
|
|
|
c10::impl::_force_tls_local_dispatch_key_set(saved_keyset_);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
private:
|
|
|
|
|
c10::impl::LocalDispatchKeySet saved_keyset_;
|
|
|
|
|
};
|
|
|
|
|
|
2019-11-21 15:10:14 +00:00
|
|
|
// Non-RAII API for manipulating the thread-local dispatch state.
|
|
|
|
|
// Please prefer the RAII API. The non-RAII API may be useful when
|
2020-01-15 19:12:17 +00:00
|
|
|
// the included/excluded state of a given DispatchKey must span
|
2019-11-21 15:10:14 +00:00
|
|
|
// many calls from the Python to the C++, so you cannot conveniently
|
|
|
|
|
// use an RAII guard.
|
|
|
|
|
//
|
|
|
|
|
// Example use case: a Python context manager that includes a certain
|
2020-01-15 19:12:17 +00:00
|
|
|
// DispatchKey, to ensure ops running under the context manager dispatch
|
|
|
|
|
// through that DispatchKey's registered overrides.
|
2019-11-21 15:10:14 +00:00
|
|
|
//
|
|
|
|
|
// The non-RAII API is less efficient than the RAII guards because both the
|
|
|
|
|
// getter and setter will do a tls_getaddr lookup (the RAII struct only needs
|
|
|
|
|
// one!)
|
|
|
|
|
|
2020-01-29 21:16:37 +00:00
|
|
|
C10_API bool tls_is_dispatch_key_excluded(DispatchKey x);
|
|
|
|
|
C10_API void tls_set_dispatch_key_excluded(DispatchKey x, bool desired_state);
|
|
|
|
|
C10_API bool tls_is_dispatch_key_included(DispatchKey x);
|
|
|
|
|
C10_API void tls_set_dispatch_key_included(DispatchKey x, bool desired_state);
|
2021-03-31 17:46:38 +00:00
|
|
|
C10_API bool tls_is_dispatch_keyset_excluded(DispatchKeySet ks);
|
|
|
|
|
C10_API bool tls_is_dispatch_keyset_included(DispatchKeySet ks);
|
2019-11-21 15:10:14 +00:00
|
|
|
|
2023-12-20 12:22:21 +00:00
|
|
|
} // namespace c10::impl
|