onnxruntime/orttraining/tools/amdgpu/script/rocprof.py

78 lines
2.4 KiB
Python
Raw Normal View History

import argparse
import csv
Adopt linrtunner as the linting tool - take 2 (#15085) ### Description `lintrunner` is a linter runner successfully used by pytorch, onnx and onnx-script. It provides a uniform experience running linters locally and in CI. It supports all major dev systems: Windows, Linux and MacOs. The checks are enforced by the `Python format` workflow. This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors in Python code. `lintrunner` now runs all required python lints including `ruff`(replacing `flake8`), `black` and `isort`. Future lints like `clang-format` can be added. Most errors are auto-fixed by `ruff` and the fixes should be considered robust. Lints that are more complicated to fix are applied `# noqa` for now and should be fixed in follow up PRs. ### Notable changes 1. This PR **removed some suboptimal patterns**: - `not xxx in` -> `xxx not in` membership checks - bare excepts (`except:` -> `except Exception`) - unused imports The follow up PR will remove: - `import *` - mutable values as default in function definitions (`def func(a=[])`) - more unused imports - unused local variables 2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than flake8 and is more robust. We are using it successfully in onnx and onnx-script. It also supports auto-fixing many flake8 errors. 3. Removed the legacy flake8 ci flow and updated docs. 4. The added workflow supports SARIF code scanning reports on github, example snapshot: ![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png) 5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Unified linting experience in CI and local. Replacing https://github.com/microsoft/onnxruntime/pull/14306 --------- Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-03-24 22:29:03 +00:00
import os # noqa: F401
import numpy as np # noqa: F401
parser = argparse.ArgumentParser()
parser.add_argument("--input", type=str)
args = parser.parse_args()
def get_gpu_lines(path):
lines = []
with open(path, newline="") as f:
reader = csv.reader(f, delimiter=",")
for row in reader:
if row[2].find("TotalDurationNs") < 0:
lines.append(row)
return lines
activities = [
("nccl", lambda x: x.find("nccl") >= 0),
("gemm", lambda x: x.find("Cijk_") >= 0),
("memcpy", lambda x: x.find("CUDA mem") >= 0),
("adam", lambda x: x.lower().find("adam") >= 0),
("lamb", lambda x: x.lower().find("lamb") >= 0 or x.lower().find("multi_tensor_apply") >= 0),
("dropout", lambda x: x.lower().find("dropout") >= 0 or x.find("curand") >= 0),
("layernorm", lambda x: x.find("LayerNorm") >= 0 or x.find("cuCompute") >= 0),
("reduce", lambda x: x.find("reduce") >= 0),
("softmax", lambda x: x.lower().find("softmax") >= 0),
("transpose", lambda x: x.lower().find("transpose") >= 0),
("element-wise", lambda x: x.lower().find("elementwise") >= 0 or x.find("DivGrad") >= 0),
("jit", lambda x: x.startswith("kernel_")),
("misc", lambda x: True),
]
def group_gpu_activity(lines):
groups = {name: [] for name, _ in activities}
for line in lines:
for name, check in activities:
if check(line[0]):
groups[name].append(line)
break
return groups
def get_seconds(time):
return float(time.replace("us", "")) / (1000.0 * 1000.0 * 1000.0)
def gpu_percent_time(activities):
return sum([float(a[4].replace("%", "")) for a in activities])
def gpu_absolute_time(activities):
return sum([get_seconds(a[2]) for a in activities])
def gpu_kernel_calls(activities):
return sum([int(a[1]) for a in activities])
lines = get_gpu_lines(args.input)
groups = group_gpu_activity(lines)
for name in groups:
activities = groups[name]
print(
f"{name}: N={len(activities)}, calls={gpu_kernel_calls(activities)}, absolute={gpu_absolute_time(activities):.3f}s, percent={gpu_percent_time(activities):.2f}%"
)
total = [item for name in groups for item in groups[name]]
print(
f"Total: N={len(total)}, calls={gpu_kernel_calls(total)}, absolute={gpu_absolute_time(total):.3f}s, percent={gpu_percent_time(total):.2f}%"
)