### Description
<!-- Describe your changes. -->
Check whether the min/max inputs are provided and use default values if not provided.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR enables `softmax` outputs max supported components instead of
scalar for each thread.
Softmax with input[0]: [12,4096,4096] becomes 47.86 ms from 55.11 ms
### Description
support using uniform buffer.
This PR allows to use uniform buffer in shader program, so that some
runtime information (eg. input/output shape) is no longer need to be
hardcoded into shader code.
There are 2 commits in this PR:
-
[667f31c](667f31c83d):
framework changes to support uniform buffer, as well as updates in
program manager, gpu data manager and indices helper.
-
[09e1d2a](09e1d2ad1d):
an example change for operator `Transpose` to use input's rank-only
instead of dims as shader key. With this change, model mobilenetv2-12
shader compile times dropped from 71 to 52.
argmax and argmin are similar to reduce. Eventually we need to add
optimized flavors of the shader.
softmax is optimized but only works on the last axis for now which
should be the common use case.
todo: enable more ut for argmax/argmin