onnxruntime/onnxruntime
mindest bf2cc808a1
[ROCm] SkipLayerNorm: add more configs for block size; loosen constraints (#14900)
### Description
* add more configs for `threads_per_block` in SkipLayerNorm, also in
kernel explorer.
* loosen constraints for hidden_size, so that `SkipLayerNormSmallOp` can
be selected for larger hidden sizes.
* add flag for optional output in kernel_explorer


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-09 22:27:01 +08:00
..
contrib_ops [ROCm] SkipLayerNorm: add more configs for block size; loosen constraints (#14900) 2023-03-09 22:27:01 +08:00
core [CUDA] Support decoding multihead self-attention implementation (#14848) 2023-03-08 09:17:54 -08:00
python [ROCm] SkipLayerNorm: add more configs for block size; loosen constraints (#14900) 2023-03-09 22:27:01 +08:00
test [CUDA] Support decoding multihead self-attention implementation (#14848) 2023-03-08 09:17:54 -08:00
tool/etw
wasm [js/web] support flag 'optimizedModelFilePath' in session options (#14355) 2023-02-24 15:50:15 -08:00
__init__.py Add GetVersionSting API for C++, C# and Python (#14873) 2023-03-02 17:11:07 -08:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings