onnxruntime/onnxruntime
PeixuanZuo 59ea35d592
[ROCm] add CK GroupNorm to GroupNormTunable (#15510)
- Add CK GroupNorm to GroupNormTunable.
- Reduce configuration of GroupNormNHWCOp because CK implementation is
better.

The performance gain on stable diffusion v1.5.
Before:
```
'height': 512
'width': 512
'steps': 50
'batch_size': 1
'batch_count': 5
'num_prompts': 1
'average_latency': 2.4782688856124877
'median_latency': 2.4783748388290405
'provider': 'ROCMExecutionProvider'
'disable_safety_checker': True 
```

After:
```
'height': 512, 
'width': 512, 
'steps': 50, 
'batch_size': 1,
'batch_count': 5,
'num_prompts': 1, 
'average_latency': 2.107170510292053,
 'median_latency': 2.1067750453948975,
 'first_run_memory_MB': -1, 
'second_run_memory_MB': -1,
'provider': 'ROCMExecutionProvider', 
'disable_safety_checker': True
```
2023-04-19 13:54:59 +08:00
..
contrib_ops [ROCm] add CK GroupNorm to GroupNormTunable (#15510) 2023-04-19 13:54:59 +08:00
core Add TRT plugins support using custom ops (#13847) 2023-04-18 20:24:32 -07:00
python [ROCm] add CK GroupNorm to GroupNormTunable (#15510) 2023-04-19 13:54:59 +08:00
test Add TRT plugins support using custom ops (#13847) 2023-04-18 20:24:32 -07:00
tool/etw Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
wasm Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
__init__.py Adopt linrtunner as the linting tool - take 2 (#15085) 2023-03-24 15:29:03 -07:00
ReformatSource.ps1 Run clang-format in CI (#15524) 2023-04-18 09:26:58 -07:00
ReformatSourcePython.bat
VSCodeCoverage.runsettings