mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73855 Calling the clock is one of the most expensive parts of profiling. We can reduce the profiling overhead by using `rdtsc` instead. The tradeoff is that we have to measure and convert. (shift and scale) Test Plan: I added a cpp unit test with *very* aggressive anti-flake measures. I also ran the overhead benchmark (9 replicates) with `--stressTestKineto` (0.94 -> 0.89 us) and `--stressTestKineto --kinetoProfileMemory` (1.27 -> 1.17 us) Reviewed By: chaekit Differential Revision: D34231071 fbshipit-source-id: e3b3dd7580d93bcc783e87c7f2fc726cb74f4df8 (cherry picked from commit e8be9f8160793c6ee35d5af02bca3e01703e377d) |
||
|---|---|---|
| .. | ||
| containers.cpp | ||