Summary:
- Test cases related to DeepAndWideSciptModel() was crashing at random due to precision issue
- test cases related for precision: DeepWide, KWargsAPI_1, KWargsAPI_2, KWargsAPI_Optional, FusionPass
- test failure was not observed always due to random input to the model (via torch::randn)
- Increasing the absolute tolerance for test cases
Differential Revision: D37639067
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80935
Approved by: https://github.com/mikeiovine
Summary: Added the third bool argument in inalg_solve test case to remove the runtime error.
Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest
Reviewed By: mikeiovine
Differential Revision: D37324419
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79971
Approved by: https://github.com/tenpercent
Summary: This adds the missing jit prim ops appear in the non ads models for c2->pt mitigation: aten::cpu, aten::list, aten::numel, aten::__range_length
Test Plan: static runtime unit tests
Differential Revision: D36984960
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79111
Approved by: https://github.com/davidberard98
Summary: The test is throwing a jit alias analysis not supporting error. Disabling it for now.
Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest
Reviewed By: mikeiovine
Differential Revision: D37056032
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79574
Approved by: https://github.com/mikeiovine
Summary: This adds the pytorch operators that are currently missing in non-ads models from c2->pt mitigation: aten::index_put, aten::item, aten::tensor_split
Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest
Differential Revision: D36984961
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79065
Approved by: https://github.com/davidberard98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78297
Clone followed by expand/expand_as due to memoryOverlap check on copy_ native method. Refer to T118519310 for more details.
Crashing test case:
a = tensor(3,1) // strides = (1,1)
B = tensor(3,2) // strides = (2,1)
Temp = a.expand_as(b). // creates temp with shape as (3,2) and strides as (1,0)
temp.clone() // crashe on copy_ due to memoryOverlap
Fix: Disable the out variant for the expanded tensor.
- Calls native clone instead of out variant for clone dealing with expanded tensors
- Added test case for both clone variants (out and native clones)
- Increased the tensor size for memory planner test case to trigger dynamic allocation
Test Plan:
buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators
buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest
Differential Revision: D36672180
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78322
Approved by: https://github.com/mikeiovine
Summary: Previously the op was auto-generated but it only covered the pointwise overload of aten::max. This adds support for reduction, overall and along a dim
Test Plan: Added a unit test
Differential Revision: D36656378
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78271
Approved by: https://github.com/mikeiovine
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75661
`fast_sigmoid` is a variant of sigmoid in NNC that is implemented in terms of `fast_tanh` (which is a fast rational function approximation).
ghstack-source-id: 155604086
Reviewed By: navahgar, hlu1
Differential Revision: D35481390
fbshipit-source-id: 1d64b5c375539f3b2461a1f3d9b86cd696eae7a1
(cherry picked from commit 8106c2512b8d7b373cb6545a43c3e8fc04805c4b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76391
I've seen this pattern in many important internal models:
```
x = torch.permute(a, [0, 2, 1])
y = torch.softmax(x, 2)
z = torch.permute(y, [0, 2, 1])
```
This is equivalent to
```
z = torch.softmax(x, 1)
```
The `permute` ops can degrade performance, especially if copy variants are on. Add another pattern to our `EliminateExtraPermuteOpsPass` to handle this.
ghstack-source-id: 155466506
Test Plan: New unit tests
Reviewed By: navahgar, huiguoo
Differential Revision: D35938289
fbshipit-source-id: 398b5528077b0b3f1c6fc5544e483803e96d68e9
(cherry picked from commit d742abd094d1fef23ca6a34703d97a6da2d14bd1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76398
This diff adds the large files that include the newly generated ops from D34913736. Refer to the base diff for more details.
Test Plan: buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest
Reviewed By: mikeiovine, tenpercent
Differential Revision: D35945633
fbshipit-source-id: 53497bd5c490a57ea1521837762f740deb42bfd8
(cherry picked from commit e0fbdcb0bf09f5c192430f95f450c0a946c80074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75775
fbgemm kernels already implement the fused kernel, no reason not to use it
ghstack-source-id: 155450342
Test Plan: New unit tests
Reviewed By: navahgar
Differential Revision: D35633297
fbshipit-source-id: a744a33a65ce7dbb9ce8900dbe091b6d56dd4e48
(cherry picked from commit b1361b349862715aa17e6318c5e658cd6401a464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75774
`list[0:]` is a no-op. This should really be eliminated on the modeling side, implement as a graph pass for now until we can get this into prod models.
Test Plan: New unit tests
Reviewed By: navahgar
Differential Revision: D35632947
fbshipit-source-id: 0c564193c35039130e99172e0185e124ea24f62d
(cherry picked from commit e01d5273185e39a563c7acb15662d9c1549d4b58)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75776
The output was returned directly instead of a clone, so the output of the relevant op would not be managed.
ghstack-source-id: 154935103
Test Plan: CI
Reviewed By: navahgar
Differential Revision: D35633469
fbshipit-source-id: 7b08b7368e0349a12abf8802a4c625ffecdc5abb
(cherry picked from commit 24bed9ba4da39cff7f3b40f5e49dfded2552b373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75993
Strobelight shows copy_ in embedding_bag taking up a lot of time in adfinder_story_post_ad_session_exit_model 334827604_0
{F723683014}
More details in https://fb.quip.com/MKumAjz1YD4 (1f47a80e88)a#temp:C:FPD3 (ecd5567980)e5a0871ae5d481286b511ef7
The last 3 outputs of embedding_bag are unused in the graph: P495814049.
* max_indices output isn't necessary for the main output, so remove it when it's not used in the graph.
* offset2bag is used as an intermediate to calculate the main output, so we don't remove this output even though it's unused in the graph.
* bag_size is used as an intermediate to calculate the main output for MODE_MEAN, so we don't remove this for now.
Test Plan:
`./caffe2/caffe2/fb/predictor/scripts/run_disagg_model_benchmarks.sh 334827604 0 /data/users/ansha/tmp/ads_tail sr_only`
Inputs uploaded to `/mnt/persistent-public/ansha/ads_tail/334827604`
Before:
I0414 10:53:12.261133 1070948 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.121318. Iters per second: 8242.78
0.11156 ms. 99.0457%. aten::embedding_bag (52 nodes, out variant)
After:
I0418 13:05:10.837378 2354604 PyTorchPredictorBenchLib.cpp:305] PyTorch run finished. Milliseconds per iter: 0.0881273. Iters per second: 11347.2
0.0789221 ms. 98.7096%. static_runtime::embedding_bag (52 nodes, out variant)
* Ads prod canary:
https://www.internalfb.com/intern/ads/canary/443002539593035806/
* 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_inline_cvr_post_imp -a D35726594`
https://www.internalfb.com/intern/servicelab/602875732/
* 4M test: `servicelab create cogwheel_pyper_inference_fullsync_ads_10x_ctr_mbl_feed_non_mimo -a D35726594`
https://www.internalfb.com/intern/servicelab/1002874745/
Reviewed By: mikeiovine
Differential Revision: D35726594
fbshipit-source-id: 3b71a0822657bf7a23ce37ca899baef9997b011a
(cherry picked from commit fd5e3098c047a1e7d4348e1c97341eecb892536e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76111https://github.com/pytorch/pytorch/pull/68640 broke our build by porting `cat` structured kernels, not sure how CI didn't catch this
ghstack-source-id: 154335722
Test Plan: CI
Reviewed By: navahgar, ajyu
Differential Revision: D35780296
fbshipit-source-id: 0a262eb06a8d619227e5db10b6a775bf0b2e17c1
(cherry picked from commit aea6fbf9365391011df5211164e3978075d7a5cb)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74966
It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:
```
def forward(self, cond: bool, a, b):
lst = []
if cond:
res = a + b # res should not be managed!!!
lst.append(res)
return lst
```
The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`.
The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`.
This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy.
ghstack-source-id: 153288188
Test Plan: New unit tests cover the problematic case
Reviewed By: navahgar
Differential Revision: D35257087
fbshipit-source-id: 853a761dffe51f2c70720759664dd8dfcd56d1d7
(cherry picked from commit 2c7f519354041975f33626eab6b7f16c2494bbf8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74927
The move ctor was broken because `BlockRunner` stores a reference to `values_`. When moving runtime instances, the pointer to the root block would be moved, but the reference inside it would not be updated.
Pass `BlockRunner` a raw pointer to the heap-allocated IValues instead to avoid this issue.
ghstack-source-id: 153168602
Test Plan: New unit test/CI
Reviewed By: navahgar
Differential Revision: D35228467
fbshipit-source-id: 04e198b39f898b82677a0e41e1cdf00c2b0c09f3
(cherry picked from commit 03e2c591ac3a907d68025eae9500ed7226dec17e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74255
This change fixes a bug that `aten::full_like` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full_like` are dynamically changed.
Test Plan: - Enhanced `StaticRuntime.FullLike` to cover the modified code path.
Reviewed By: mikeiovine
Differential Revision: D34863639
fbshipit-source-id: ca6d4ee3c039e263cc3a4f643d949cea59381608
(cherry picked from commit ae7db0af5e7d95d866027abc968afcb162fd2ef8)
Summary:
The implementation of `PackedLinearWeightFp16::apply_dynamic_impl` [here](https://www.internalfb.com/code/fbsource/[b1ef7c31f022]/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp?lines=393) does not handle `relu`. It completely ignores the `ReluFused` boolean template parameter.
At this point, callers of that function handle `relu` explicitly. While the correct thing to do would be to handle the `ReluFused` parameter in that implementation, it is not clear if that semantics is being followed in this code. So, we are handling this in SR's out-variant implementation, until the owner fixes that issue.
This issue resulted in incorrect results when Static Runtime was enabled for the MRS video model.
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=StaticRuntime.QuantizedLinearReluDynamicFp16
```
Reviewed By: mikeiovine
Differential Revision: D35366309
fbshipit-source-id: e60126e3590d52681ceaee5583b81c4c0b5404d9
(cherry picked from commit cabeb96a792339e7dbfd16cb51a3ac9039812137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74987
Add specializations to `prim::If` operator at runtime to save resources when some of subblocks are empty
Test Plan:
`buck build //caffe2:torch-cpp-cpu`
`buck test //caffe2/benchmarks/static_runtime/...`
Add unit test:
`buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.EmptyIfBlock`
Reviewed By: mikeiovine
Differential Revision: D35262952
fbshipit-source-id: 324f88471f33f035f4d8a9b212716530d8e59df2
(cherry picked from commit 2db1b1a6833b1376fa376f54791effc8e12fb77f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74585
Native static runtime for `aten::reshape_as`
ghstack-source-id: 152340038
Test Plan: New unit test
Reviewed By: hlu1
Differential Revision: D35060895
fbshipit-source-id: c4e6f8a04c7df3821c7e654bfaf584e5a72ea701
(cherry picked from commit 6fa596cd866a024b6653239e0e30ddad42de242f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74562
Add a native implementation for `aten::IntImplicit`, which is similar to `aten::Int` except for a few extra checks it must do
ghstack-source-id: 152340039
Test Plan: New unit tests
Reviewed By: hlu1
Differential Revision: D35052997
fbshipit-source-id: cb2f0faf7c62382e3f13750d8e1280c49c6b9e42
(cherry picked from commit 359c7493f8deaeccebc27e1b6e6e9777850010c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74481
This diff fixes an interesting performance issue related to `permute_copy`.
We see this pattern frequently:
```
y = torch.permute(x, (0, 2, 1))
z = torch.sum(y, dim=-1)
```
With copy variants off, we get a strided output from `permute`, and we hit this (faster) kernel in `sum`: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/SumKernel.cpp#L589
But with copy variants on, we get a contiguous output from `permute_copy`, which causes us to hit the slower reduction:
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/SumKernel.cpp#L597
But the permute is actually unnecessary, we can just statically turn the graph into this to ensure that the fast kernel is hit with copy variants on:
```
z = torch.sum(x, dim=1)
```
ghstack-source-id: 152003888
Reviewed By: navahgar
Differential Revision: D34992319
fbshipit-source-id: 0baf493708ee2180c899814a954d220d88ba1d4f
(cherry picked from commit 797b6beb26325c56012e406e14fe211c0b5d744d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73606
The single-output overload of `layer_norm` internally allocates two tensors. As an optimization, we previously added `static_runtime::layer_norm`. This variant of layer norm had two extra outputs to make the memory planner aware of these extra tensors. But these outputs were unused; it's actually better for us to avoid the allocation and associated computations entirely.
ghstack-source-id: 151394116
Test Plan: Existing unit tests
Reviewed By: hlu1
Differential Revision: D34562131
fbshipit-source-id: c6a6560e60db43b0b100aedc54ea4265acb347de
(cherry picked from commit 3bed52b6f688b93b9b032c3d2b4be68d08d8eb76)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73990
This change fixes a bug that `aten::full` reuses a previously allocated tensor that does not match requested one when arguments to `aten::full` are dynamically changed.
This fix is applied to multiple other out variant wrappers added to Static Runtime, and their fixes are following.
Test Plan: - Added a unittest.
Reviewed By: mikeiovine
Differential Revision: D34768718
fbshipit-source-id: b6958d6601d36253dd5d4f93596fb14055cca9c9
(cherry picked from commit 42acb40d3a1e9359c0f1a3c25481854e5ad344b6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73945
This change adds add out variant wrapper for aten::ones_like.
Test Plan:
- Added a unittest.
- Checked that the op execution got switched to its added out variant (P485330978).
Reviewed By: hlu1
Differential Revision: D34727057
fbshipit-source-id: 5022a7f547d53b0c00459d3959ad3c6e6a8a62d5
(cherry picked from commit 1bec4680e8173654400b165d720a0902136dba0f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73946
This change adds an out variant wrapper for aten::zeros.
Test Plan:
- Added a unittest.
- Confirmed that the added out variant gets executed by the unittest (P485324923).
Reviewed By: mikeiovine
Differential Revision: D34725843
fbshipit-source-id: 3ac02ba1914c4a51969381e610d4243df65071ed
(cherry picked from commit 368836d51709b7f96c79114984a95606b29766b1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73704
Empty inputs are invalid for these ops. But while looking for optimizations, I noticed that these ops just segfault when that happens, which is not helpful for users. Added a check/error message.
ghstack-source-id: 150812721
Test Plan: New unit tests
Reviewed By: hlu1
Differential Revision: D34596954
fbshipit-source-id: 6b22a3a255273920210dcd41f54a9d238bbbcc14
(cherry picked from commit 9e950bfffef36c320638662bdb72f19eb805a228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73851
This change adds an out variant wrapper for aten::ones
Test Plan: Added a unittest
Reviewed By: mikeiovine
Differential Revision: D34557095
fbshipit-source-id: 0d2ac8d0ad6f73067e28c2cebd3b4a018a9b17ae
(cherry picked from commit cc1dda957b8c3acd71de3aa6054c11a9aab5cfa6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73681
Static runtime is rejecting legal calls made with the kwargs API when there are parameters with default values.
ghstack-source-id: 150433627
Test Plan: Added unit test to cover this case
Reviewed By: navahgar, d1jang
Differential Revision: D34588804
fbshipit-source-id: 74d7ef5bee74f9d16b02b0c8ceda4285ea776755
(cherry picked from commit 9c3db19cb45f6022e646deeb1e8056daa04f363f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73536
Currently `StaticNodeInfo` class assumes 2 distinct roles that are not too obvious:
1) "template" that contains metadata of an actual executable node by runtime. owned by `StaticModule`
2) fully instanced ones that are owned by `StaticRuntime`.
We currently merge these two usecases into one class, that can be error-prone in case illegal copying happens uncontrollably. Currently, we only copy objects of kind (1) into objects of kind (2) when a `StaticRuntime` instance is created.
To address ths issue, this change introduces `StaticNodeInfo`, a separate class, to distinguishes the aforementioned two usecases in the code more clearly. With this `StaticNodeInfo` is for (1) and `ProcessedNode` is now for (2).
Test Plan: Existing tests
Reviewed By: mikeiovine
Differential Revision: D33985600
fbshipit-source-id: 0c79cea2bf982dd956a35f48eaf6027e5b6e390c
(cherry picked from commit 0d8acc4a2b6eeb3e4af3ad2c99f4cd667680f8df)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73574
T113070663 identified a test breakage in `StaticRuntime.autogen__convert_indices_from_csr_to_coo` using `mode/opt`. This change fixes it by using a correct test value.
By generating the out variants for Static Runtime, this change also includes an out variant wrapper for `mse_loss`.
**Generating out variants for Static Runtime**
```
[djang@devbig024.ftw2 ~/fbsource/fbcode/caffe2] buck run //caffe2/torch/fb/jit:gen_static_runtime_ops
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
Changed value //fbcode.sanitizer='address-undefined-dev' (was 'thread')
... and 13 more. See logs for all changes
Parsing buck files: finished in 0.8 sec
Downloaded 14/25 artifacts, 159.45 Kbytes, 30.0% cache miss (for updated rules)
Building: finished in 1.6 sec (100%) 52/52 jobs, 35/52 updated
Total time: 2.5 sec
BUILD SUCCEEDED
total grouped native ops: 1501
structured grouped native ops: 540
generated grouped native ops: 137
```
Test Plan:
Ran the broken test in `mode/opt` and confirmed that it passes now.
```
[djang@devbig024.ftw2 ~/fbsource/fbcode/caffe2] buck test mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.autogen__convert_indices_from_csr_to_coo' --run-disabled
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
Changed value //project.buck_out='buck-out/opt' (was 'buck-out/dev')
... and 307 more. See logs for all changes
DEBUG: /data/users/djang/fbsource/tools/build_defs/fbcode_macros/build_defs/lib/cpp_common.bzl:287:14: Using disallowed linker flag 'arvr/third-party/toolchains/platform009/build/mesa/lib/libGL.so' in library rule 'fbsource//third-party/toolchains:opengl'
DEBUG: /data/users/djang/fbsource/tools/build_defs/fbcode_macros/build_defs/lib/cpp_common.bzl:287:14: Using disallowed linker flag 'arvr/third-party/freeglut/3.0.0/libs/x64-linux/libglut.a' in library rule 'fbsource//third-party/toolchains:GLUT'
I0301 08:28:08.884272 2239319 configeratorc.cpp:70] Attempting to get config buck/detectors/bypass_dirty_builds, timeout=10000
I0301 08:30:14.751745 2261718 configeratorc.cpp:70] Attempting to get config buck/detectors/bypass_dirty_builds, timeout=10000
Parsing buck files: finished in 10.1 sec
Creating action graph: finished in 6.1 sec
[RE] Metadata: Session ID=[https://fburl.com/b/reSessionID-fa0ba93b-33a1-4e6f-88f8-9f508d2c27c3]
[RE] Waiting on 0 remote actions. Completed 247 actions remotely, action cache hit rate: 0.00%.
Downloaded 13000/17457 artifacts, 463.99 Mbytes, 2.6% cache miss (for updated rules)
Building: finished in 04:16.6 min (100%) 28628/28628 jobs, 28628/28628 updated
Total time: 04:32.9 min
More details at https://www.internalfb.com/intern/buck/build/c774ff43-5311-49ce-a677-30e3f6afdad1
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 16d9b24c-4a63-4671-84b5-690fac0ee086
Trace available for this run at /tmp/tpx-20220301-083049.472831-16d9b24c-4a63-4671-84b5-690fac0ee086/trace.log
RemoteExecution session id: reSessionID-16d9b24c-4a63-4671-84b5-690fac0ee086-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4503599719295685
✓ ListingSuccess: caffe2/benchmarks/static_runtime:static_runtime_cpptest : 285 tests discovered (0.425)
✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.autogen__convert_indices_from_csr_to_coo (0.105)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599719295685
```
Reviewed By: mikeiovine
Differential Revision: D34552645
fbshipit-source-id: 36f15b0f29edcb7deb71ba8a6f66ce2532bf7c82
(cherry picked from commit 2329afd8bfc89671cfbd864414e528241e7045fc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73256
This adds a test that executes multiple Static Runtime instances in parallel
when each instances includes a fusion.
ghstack-source-id: 149787403
Test Plan:
```
buck run mode/dev-asan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=CpuFusion.ParallelRuntimes
```
The above test results in an error: P482317015 (when parts of the fix in D34287960 (6d33852685) are backed out)
Reviewed By: mikeiovine
Differential Revision: D34404127
fbshipit-source-id: 95a267e27d74584df90841fe496f909171136981
(cherry picked from commit 57d3ad9a46a24559f6d4f4097bd1b8e0b1f6b077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72603
This change adds out variant dispatchers generated by the previous diff.
The number of the out variant dispatchers generated by this diff is 133, which increases the out variant coverage by 309% (current: 43, this diff: 133 + 43 = 176). This number is expected to increase a lot as we develop this script further to cover more ops.
Test Plan:
**Unittest**
Confirmed
```
buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest
```
is passing.
Reviewed By: swolchok
Differential Revision: D33373928
fbshipit-source-id: 4d94d788282f3f313bb36f2f9452edecd9862246
(cherry picked from commit e4ce8b386d1fcc47b86cb9c9016a70e7a31b452c)