pytorch

saymrwulf/pytorch

Fork 0

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Commit graph

Author	SHA1	Message	Date
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Taylor Robie	0b1f3bd158	[Profiler] Prefer TSC to wall clock when available (#73855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855 Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale) Test Plan: I added a cpp unit test with very aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us) Reviewed By: chaekit Differential Revision: D34231071 fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8 (cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)	2022-03-13 18:29:06 +00:00
Taylor Robie	5a58820f01	[Profiler] Specialized AppendOnlyQueue (#73409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409 We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.) Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us). Reviewed By: swolchok Differential Revision: D34231072 fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1 (cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)	2022-03-11 19:47:40 +00:00

Author

SHA1

Message

Date

Michael Suo

30fb2c4aba

[lint] autoformat test/cpp and torch/csrc

Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang

2022-06-11 21:11:16 +00:00

Taylor Robie

0b1f3bd158

[Profiler] Prefer TSC to wall clock when available (#73855 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855

Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale)

Test Plan: I added a cpp unit test with *very* aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us)

Reviewed By: chaekit

Differential Revision: D34231071

fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8
(cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d)

2022-03-13 18:29:06 +00:00

Taylor Robie

5a58820f01

[Profiler] Specialized AppendOnlyQueue (#73409 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73409

We can do better than `vector` or `deque`, and it's sufficiently important to the hot path to justify a custom container. (This is part of the larger queue refactor, but this is a standalone drop-in replacement so we don't need to wait.)

Test Plan: It's a pretty simple container type, so I just added a few cpp tests for emplace and read back. I also ran the overhead benchmark (replicates=9) with both `--stressTestKineto` (0.99 -> 0.94 us) and `--stressTestKineto --kinetoProfileMemory` (1.36 -> 1.27 us).

Reviewed By: swolchok

Differential Revision: D34231072

fbshipit-source-id: ed57299729d444d59cf843a0d38a3ee2240eeec1
(cherry picked from commit 43907948f3a8d2137244e7bb59f43999bd660917)

2022-03-11 19:47:40 +00:00

3 commits