* align with cpu on no input data
* review comments and add tests
Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
* Support specifying execution providers.
* Change default provider setting to None.
* Add support for bert_perf_test script.
* Fall back to ROCM/CUDA EP for MIGraphX/Tensorrt EP.
* Assert fall back EPs are included.
* Add model class AutoModelForCausalLM and other minor updates.
Co-authored-by: Yao Zhang <zhanyao@microsoft.com>
* Fuse Clip->Q to Q
* Remove unused variable argmax_node
* Remove braces around scalar initializer
* Move GetClipConstantMinMax under ORT_MINIMAL_BUILD
* Consider epsilon so we can fuse more cases
* move table names to one location
* remove session metadata
* reload trt inputs
* fix posting names
* Update linux-gpu-tensorrt-daily-perf-pipeline.yml for Azure Pipelines
* remove comments
* Split up anubis job and perf run
* add trt environ variables
* No embedded links
* expand model tests name
* skip cpu/cuda for trt when running onnxruntime_test_all
* only run trt ep for c++ unit test
* Update CMAKE_CUDA_ARCHITECTURES for T4
* Use new t4 agent pool
* Update YAML for run T4 on Windows
* revert code
* Update CMAKE_CUDA_ARCHITECTURES
* fix wrong value
* Remove cpu/cuda directly in model tests
* add only CMAKE_CUDA_ARCHITECTURES=75
* remove expanding model test name to see difference
* revert code
* Add fallback execution provider for unit test
* Add fallback execution provider for unit test (cont)
* add conditional to add fackback cuda ep
* Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs
* use M60
* revert code
* revert code
* add comments
* Modify code and add comment
* modify comment
* update comment
* add comment
Adding S8S8 kernels for symmetric quantized indirect conv and depthwise conv.
Perf number with single thread:
Nokia G10 (baseline / new) in ms Pixel 4 (baseline/new) in ms
mobilenet_edgetpu 220 / 213 18.5 / 17.6
cartoongan 8537 / 8521 967 / 928
Co-authored-by: Chen Fu <fuchen@microsoft.com>
* If the tensor is of gray8 format, we should call the gray8 shader
* other check (which resolves to unknown in this case) is incorrectly being compared to constant and not DXGI_FORMAT
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Add abseil cgmanifest declaration. Update coding standards for InlinedContainers
Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use.
Rename T from InlinedShapeVectorT. Fix Eager build
Add LLVM Copyright with modified derived code notice.
* add clear test name for model tests
* handle remove character
* modify for test
* Modify for correct test name
* Remove test code
* add comments
* make it only on Linux
* change function name
* Convert from wchar_t to char
* Export numpy_T as onnx transpose
* further fixes, test
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* add qdqgroup as input for NodeUnit
* minor update
* hookup nnapi_ep
* minor update
* update compiler setting
* Add a simple UT
* Pipeline change to add build minimal extended with NNAPI for Android
* move GetAllNodeUnits to node_unit.h, add UT for NodeUnits, minor updates
* minor updates
* address CR comments
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
* improved usability of TVM EP
* moved technical import under a condition related to TVM EP only
* Revert "moved technical import under a condition related to TVM EP only"
* add conditional _ld_preload.py file extension for TVM EP
* improve readability of inserted code