Commit graph

9188 commits

Author SHA1 Message Date
Xavier Dupré
b508c7236f
Replace call to deprecated torch.norm (#16758)
### Description
torch.norm is deprecated as mentioned in issue #16751. This PR replaces
the call to torch.norm by the options suggested by torch documentation.
2023-07-20 19:52:19 -07:00
kunal-vaishnavi
b7176f9826
Fix bug with saving model optimized by inference session (#16716)
### Description
A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531)
added a temporary directory to save the model optimizations after
loading a model into an `InferenceSession`. Many models that have an
external data file, however, require the data file to be in the same
directory as the ONNX model file. Because the model is saved in a
temporary directory and the data is saved in another directory, this
causes a `FileNotFoundError` error when trying to load the model in the
temporary directory.

This PR fixes this error by saving the external data file in the same
directory that the optimized model is located in.

### Motivation and Context
This PR fixes a bug with using a temporary directory while running the
optimizer for models that have an external data file.
2023-07-20 18:44:28 -07:00
Edward Chen
0f9883f804
Fix Mac M1 build (#16763)
- Add ifndef `__APPLE__` to skip lines which cause EXC_BAD_INSTRUCTION error.
- Fix floatToHalf/doubleToHalf conversion issue and add tests.
2023-07-20 18:24:57 -07:00
Baiju Meswani
538d2412ef
Objective-C Add Support to Create and Query String ORTValues (#16764)
This pull request contains a few changes:

1. Adds support for string ort values.
2. Fixes the training minimal build (that was broken with #16601) by
putting custom op registration behind #ifdefs
3. Fixes the iOS pod package generation (that was again broken with
#16601) by explicitly providing paths to be copied during pod creation.
2023-07-20 17:39:29 -07:00
Adrian Lizarraga
a8c263f92c
[QNN EP] Update QNN SDK to 2.12 (#16750)
### Description
- Updates the default QNN SDK to 2.12 for CI pipelines
- Adds a disabled InstanceNormalization test for regression on QNN SDK
2.12
- Cleans up logs for unsupported ops.

### Motivation and Context
Test with the latest QNN SDK.
2023-07-20 16:22:14 -07:00
Wanming Lin
eaea34f8e2
[WebNN EP] Support PRelu op (#16756) 2023-07-20 10:39:30 -07:00
Jeff Daily
bb136f86c8
[ROCm][MIGraphX] for googletest dep, set OVERRIDE_FIND_PACKAGE (#16715)
Otherwise, an unsupported version of gtest/gmock will be found at
/opt/conda/include for ROCm builds. Though this issue was initially
found for ROCm builds, the issue is generic. onnxruntime requires a
specific version of googletest and should not rely on locating
googletest using find_package.

The ROCm error was:

```
In file included from /opt/conda/include/gmock/gmock-spec-builders.h:75,
                 from /opt/conda/include/gmock/gmock-generated-function-mockers.h:47,
                 from /opt/conda/include/gmock/gmock-function-mocker.h:39,
                 from /opt/conda/include/gmock/gmock.h:61,
                 from /stage/onnxruntime/onnxruntime/test/util/test_utils.cc:17:
/opt/conda/include/gmock/gmock-matchers.h: In instantiation of ‘bool testing::internal::PointwiseMatcher<TupleMatcher, RhsContainer>::Impl<LhsContainer>::
MatchAndExplain(LhsContainer, testing::MatchResultListener*) const [with LhsContainer = const gsl::span<const float>&; TupleMatcher = testing::internal::
FloatingEq2Matcher<float>; RhsContainer = gsl::span<const float>]’:
/opt/conda/include/gmock/gmock-matchers.h:2303:10:   required from here
/opt/conda/include/gmock/gmock-matchers.h:2312:48: error: no type named ‘const_iterator’ in ‘testing::internal::PointwiseMatcher<testing::internal::
FloatingEq2Matcher<float>, gsl::span<const float> >::Impl<const gsl::span<const float>&>::LhsStlContainer’ {aka ‘class gsl::span<const float>’}
```
2023-07-21 00:57:38 +08:00
zesongw
0e40049eb2
[WebNN EP] Add support for Op Pad. (#16732)
### Description
<!-- Describe your changes. -->
Support Op Pad for WebNN EP. It aims to support three modes (constant,
reflect and edge). For now, only constant can be tested with Chrome
Canary.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Support more models like SD1.5-VAE-encode.
2023-07-20 07:57:48 -07:00
Xavier Dupré
2bc9fbb621
Fix url in the code documentation (graph optimizations) (#16770)
### Description
Fix a wrong url in the documentation as mentioned in issue #16678.



### Motivation and Context
Better documentation.
2023-07-20 07:02:22 -07:00
Yi Zhang
c314d7724f
Update dml gpu pool to onnxruntime-Win2022-GPU-dml-A10 (#16765)
### Description
onnxruntime-Win2022-GPU-dml-A10 is using VS2022.



### Motivation and Context
1. Upgrade VS2019 to VS2022 to fix prefast issue.
2023-07-20 16:52:13 +08:00
Scott McKay
8b866060f2
Comment out ORT-Nightly feed in test app NuGet.config (#16762)
### Description
<!-- Describe your changes. -->
Comment out ORT-Nightly feed in NuGet.config to see if that makes the
Secure Supply Chain Analysis CI step happy.

Add info to readme on manually adding feed and using it.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-20 14:30:29 +10:00
Edward Chen
fc1f463ff1
[ios] Enable training package in packaging pipeline (#16683)
Build iOS training package in packaging pipeline.
Refactor iOS packaging pipeline to build different package variants in parallel.
2023-07-19 19:55:00 -07:00
Yi-Hong Lyu
6e895fe70a
Parallelize Max (#16745)
It gives up to 7.5% improvement in LLaMA 7B case.
2023-07-19 15:23:48 -07:00
saurabh
24566058b3
ovep dockerfile and wheel docs changes (#16482)
### Description
This PR is includes changes in the documentation of _readmeOV.rst_ file
and also the changes in the dockerfile which enables to build ORT with
latest OpenVINO 2023.0.0



### Motivation and Context
Modified the dockerfile to incorporate the latest version of OpenVINO
(2023.0.0) for building Onnxruntime.
The changes in the PR aim to improve the overall user experience by
providing accurate and up-to-date documentation while leveraging latest
OpenVINO 2023.0.0
2023-07-19 09:01:09 -07:00
Wanming Lin
dcb0f2cdde
[WebNN EP] Only support opset >= 7 (#16730)
Set WebNN EP minimum supported opset to 7 as ONNX Runtime currently only
guarantees support for models stamped with opset 7 or above.
2023-07-18 17:59:19 -07:00
Yulong Wang
7dcb805ab8
[js/web] upgrade onnx-proto version (#16722)
### Description
This change upgrades a lot of dependencies. There are 2 motivations of
doing this change:
- fix the security issue reported by dependabot (protobufjs Prototype
Pollution vulnerability -
https://github.com/advisories/GHSA-h755-8qp9-cq85)
 - resolve the requirement of using ONNX IR_VERSION 9 (#16638)


This requires:
- upgrade protobufjs to v7.2.4
- upgrade library 'onnx-proto' to consume latest ONNX release (v1.14.0).

Problems:
- protobufjs v7.2.4 depends on long.js v5, which does not work well with
typescript (commonjs).
- onnx-proto depends on this fix with a new release of long.js
- long.js is in maintenance and it takes longer than expected to put in
new changes

Solutions:
- use a patch script in `preprepare` to copy type declarations to make
long.js work with typescript (commonjs)
- generate onnx protobuf JS/TS files and put them under
js/web/lib/onnxjs/ort-schema/protobuf folder - remove 'onnx-proto' from
dependency.
- apply fixes to generated onnx.d.ts
2023-07-18 16:36:39 -07:00
Adrian Lizarraga
f8e3aacb47
[QNN EP] Fix support of ArgMin and ArgMax on all QNN backends (#16731)
### Description
- Fixes support for ArgMin/ArgMax to QNN CPU and HTP backends.
  - Adds Q/DQ node unit selection logic.
  - Handles casting int64 output to uint32 when necessary.
- Adds unit tests for ArgMax/ArgMin.

### Motivation and Context
QNN EP did not actually support ArgMin/ArgMax. Unit tests revealed that
the existing translation was not sufficient to support these ops.
2023-07-18 14:50:24 -07:00
Tianlei Wu
95f9628776
Fix transformers optimizations for GPT-NeoX (#16743)
### Description
Fix some issues found in GPT-NeoX graph fusion:
(1) GPT-NeoX uses float16 weights. The step of using onnxruntime with
opt_level==1 uses CPU provider. Since most operators does not have fp16
in CPU EP, so extra Cast nodes are added to up cast to fp32.
(2) Add is shared by two LayerNormalization children, and
SkipLayerNormalization might cause invalid graph.
(3) Reshape fusion might miss since some part only check initializer but
not Constant.

This PR adds a check whether model uses FP16, and output a warning when
use_gpu is not True, and use GPU provider for graph optimization when use_gpu=True.
2023-07-18 10:29:08 -07:00
Dmitri Smirnov
e752cbe7f2
Work on eliminating Internal Compiler Error (#16741)
### Description
<!-- Describe your changes. -->
Replace the offending bitwise `operator |` with if() logic for ARM.
2023-07-18 10:17:52 -07:00
Wei-Sheng Chin
b71ebf91a5
[DORT] Reduce global configs to make enabling dynamic shape easier (#16720)
There are several global configs used by DORT.
```py
DEFAULT_ONNX_EXPORTER_OPTIONS = torch.onnx._internal.exporter.ResolvedExportOptions(
    torch.onnx._internal.exporter.ExportOptions()
)

# TODO(wechi): This line must generate result identical to the call of
# _create_onnx_supports_op_overload_table(...) inside
# create_onnx_friendly_decomposition_table(...) in
# torch/onnx/_internal/fx/decomposition_table.py.
_SUPPORT_DICT = torch.onnx._internal.fx.decomposition_table._create_onnx_supports_op_overload_table(
    DEFAULT_ONNX_EXPORTER_OPTIONS.onnx_registry
)  # type: ignore

_EXTRA_SUPPORT_DICT: Dict[str, Any] = {
    "getattr": None,
    "_operator.getitem": None,
}

DORT_DECOMPOSITION_TABLE = DEFAULT_ONNX_EXPORTER_OPTIONS.decomposition_table
```

We can see all but `_EXTRA_SUPPORT_DICT` are extracted from deduced from
ONNX exporter's options. As there are many ways to configure ONNX
exporter's options, we decided to move these variables to `OrtBackend`'s
`__init__` so that the construction of `OrtBackend` becomes more
flexible (especially for enabling dynamic shape or not).
2023-07-18 09:06:58 -07:00
PeixuanZuo
9b549c646c
[ROCm] fix kernel explorer GemmSoftmaxGemm test (#16735)
GemmSoftmaxGemmTunble occasionally broken with large numerical error.
The root cause of this error is CK's Strided Batched Gemm has larger
error under a specific initialization distribution
`(multinormal_distribution)`.

Generic(Gemm1 + Softmax + Gemm2) implementation is one instance of
GemmSoftmaxGemmTunble. Gemm1 and Gemm2 in Generic implementation are
TunableOps when tuning enabled. In some case GemmSoftmaxGemmTunble
select Generic implentation, while Gemm1 or Gemm2 select ck
implementation, the result of GemmSoftmaxGemmTunble affect by CK.

- Make tolerance more loosen.
- Add `GemmSoftmaxGemmPermuteGenericNestedTunable` to test Generic
implementation with tuning enabled.
2023-07-18 16:47:39 +08:00
zhangsibo1129
9ba5cdbaa4
[CANN EP] Fix Float16 support for CANN EP (#16733)
### Description
<!-- Describe your changes. -->

Replace the constructor function `MLFloat16()` with the public member
function `FromBits()` in the file
`onnxruntime/core/providers/cann/cann_common.cc`

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
PR [#16506](https://github.com/microsoft/onnxruntime/pull/16506) changed
the public constructor function `MLFloat16(uint16_t x)` to private, and
added a public function `MLFloat16::FromBits(uint16_t x)` in the file
`include/onnxruntime/core/framework/float16.h`, which broke the CANN CI.

This PR aligns the CANN behavior with the modified class `MLFloat16`.
2023-07-17 23:24:51 -07:00
cloudhan
0cab7e1a37
[ROCm] Generalize FastGeLU (#16623)
Allow the whole pipeline to be parameterized with unary elementwise
functor.
2023-07-18 11:23:12 +08:00
Scott McKay
ad90352a68
Add MAUI test app that can be used to test model loading and performance (#16658)
### Description
<!-- Describe your changes. -->
MAUI test app with tooling to add model and generated or provided input
test data.

The app will load the model and validate the output. It can also run a
specified number of iterations to provide basic performance information.

<img width="401" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/979079/daf3af13-fb22-4cbb-9159-486b483a7485">

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Primarily to make it easier to test an arbitrary model on iOS. A MAUI
app allows testing on all platforms.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-07-18 08:21:18 +10:00
cloudhan
a45b834722
Fix warning about uninitialized member (#16736)
#16506 Cause almost every translation units on linux complaint

```
[1175/1235] Building CXX object CMakeFiles/onnxruntime_test_all.dir/home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc.o
In file included from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:18,
                 from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/data_types.h:17,
                 from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/tensor.h:17,
                 from /home/guangyunhan/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16,
                 from /home/guangyunhan/onnxruntime/onnxruntime/test/providers/compare_provider_test_utils.h:7,
                 from /home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc:4:
/home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h: In instantiation of ‘static constexpr uint16_t onnxruntime_float16::Float16Impl<Derived>::ToUint16Impl(float) [with Derived = onnxruntime::MLFloat16; uint16_t = short unsigned int]’:
/home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:42:66:   required from here
/home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:241:7: note: ‘union onnxruntime_float16::detail::float32_bits’ has no user-provided default constructor
  241 | union float32_bits {
      |       ^~~~~~~~~~~~
/home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:242:16: note: and the implicitly-defined constructor does not initialize ‘unsigned int onnxruntime_float16::detail::float32_bits::u’
  242 |   unsigned int u;
      |                ^
```

This PR shut the compiler up.
2023-07-17 11:33:54 -07:00
Edward Chen
df8843c4a7
Upgrade old Python version in packaging pipeline (#16667)
- Upgrade from Python 3.6 to 3.8 in packaging pipeline.
- Raise build.py minimum required Python version.
2023-07-17 08:24:47 -07:00
Dmitri Smirnov
b8c40b7813
Fix parameter naming that fails Doc generation. (#16717)
### Description
Rename `FromBits` param name to match the docs.

### Motivation and Context
Fix API Doc generation.
2023-07-16 22:02:05 -07:00
RandySheriffH
e1ca8ee6d4
RunAsync C/CXX API (#16613)
Implement RunAsync API - the session will run in a thread of intra-op
thread pool.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-07-16 16:51:40 -07:00
Ryan Hill
2cf31a20cf
Cuda: Decoder Masked Multihead Attention Q values get corrupted when using cross attention (#16721)
### Description
Some code was accidentally moved into the
`if(!params.is_cross_attention)' block, it must stay outside to work in
both cases.

### Motivation and Context
This causes invalid results. We detected this as a performance bug, as
it caused the EOS early exit to never happen, and the runs would always
take max_length to complete which was slow.
2023-07-15 00:41:06 -07:00
Wanming Lin
2b7a94e65b
[WebNN EP] Make some types clearer (#16705)
It's a follow-up to address comments in
https://github.com/microsoft/onnxruntime/pull/16671#discussion_r1261761828
and
https://github.com/microsoft/onnxruntime/pull/16671#discussion_r1261763873
2023-07-14 17:39:36 -07:00
Ryan Hill
2ae041f390
atomicAdd returns previous value, not current value. (#16690)
### Description
Mistake in beam scorer processing, atomicAdd result should be compared
with '1' vs '0' as it returns the original value, not the latest value.

This error just results in slow perf, nothing fails.

### Motivation and Context
Fixes #16642
2023-07-14 15:46:57 -07:00
Wei-Sheng Chin
44fd98ebfe
[DORT] Enable aten::full by implementing extra logics to select EP (#16699)
DORT only select devices from inputs arguments' (type: torch.Tensor).
However, it errors out when a graph doesn't have any inputs (e.g., a
single aten::full graph). This PR address this problem by changing the
EP selection to

- First, inspect graph inputs. If there are some valid devices, use them
plus a default one (`OrtBackend.ep: str`).
- Otherwise, inspect graph outputs carried by `torch.fx.GraphModule` and
use all valid devices plus the default `OrtBackend.ep`.
- When both (1) and (2) fail, it uses the default EP specified by
`OrtBackend.ep`.
2023-07-14 15:42:25 -07:00
Edward Chen
f236768d5c
[ios] Enable --use_extensions with custom built iOS pod (#16711)
- Fix link errors by including the needed onnxruntime-extensions libraries in the static framework.
- Add Objective-C API to register custom ops from embedded onnxruntime-extensions.

Caveat: Not all onnxruntime-extensions build options are working yet. E.g., building with the onnxruntime-extensions OpenCV dependency does not work.
2023-07-14 15:37:16 -07:00
G. Ramalingam
4faee2e44c
Fix issue in constant-propagation inside function subgraph (#16330)
### Description

The SequenceMap function-op has a graph-attribute. ORT's
constant-folding optimization may identify constant-expressions inside
the subgraph and promote them to constants, stored as initializers in
the main graph. When it does this, the optimization updates the subgraph
to remove the corresponding nodes.

When we expand a SequenceMap node by inlining its function-expansion, we
need to use this updated subgraph. However, the existing code uses the
original graph-attribute (GraphProto), instead of regenerating it from
the modified subgraph. This results in producing a graph with duplicate
definitions for the constant-folded variable, resulting in an error
during graph-resolve.

This PR fixes this issue (just a single line fix), and adds a test-case
to cover this scenario.

---------

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2023-07-14 14:44:59 -07:00
Wanming Lin
ea43671eb6
[WebNN EP] Support several activation ops (#16693)
Support Elu, HardSigmoid, HardSwish, Softplus, Softsign, Tanh.
2023-07-14 14:36:15 -07:00
Adrian Lizarraga
a189e76fde
[QNN EP] Fix error handling for Softmax/ReduceOps (#16700)
### Description
- Fix check for Softmax with axis attributes not equal to -1. QNN EP
only supports axis values equal to -1 (or rank - 1).
- Explicit error when Reduce* ops have an input with rank > 4 on HTP
backend (unsupported).
- Correctly filter out partitions that only contain a single
QuantizeLinear or DequantizeLinear node.
- Add tests for the above and clean up unnecessary usage of test
description labels.



### Motivation and Context
Make it easier to debug why a model may not be supported.
2023-07-14 13:47:23 -07:00
Baiju Meswani
9889f0f507
Add support for training apis to support custom ops (#16601) 2023-07-14 11:15:51 -07:00
Adrian Lizarraga
19169afe30
[QNN EP] Add option to skip unit tests in the QNN NuGet packaging pipeline (#16164)
Add option to skip unit tests in the QNN NuGet packaging pipeline.
2023-07-14 10:52:05 -07:00
Dmitri Smirnov
853c4ff0a5
[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506)
### Description
Introduce `Float16/BFloat16` support for C# and C++ APIs.
User should be able to perform conversions from `float` to/from
`Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and
whether the number is denormalized.`

### Motivation and Context
User filed issues such as:
https://github.com/microsoft/onnxruntime/issues/14303
2023-07-14 10:46:52 -07:00
Tianlei Wu
77b45c6503
Add Stable Diffusion Benchmark on A100-PCIE-80GB (#16702)
0(1) Fix a bug in https://github.com/microsoft/onnxruntime/pull/16560
that UNet shall be set fp16 flag.
(2) Remove wget in requirements since it is no longer needed.
(3) Add benchmark numbers in A100-PCIE-80GB. Note that CUDA EP have
issue to run in batch size 4 so the number is not added.
2023-07-14 10:37:00 -07:00
Yi Zhang
36b121d8c2
add more check to Web CI on cache restore (#16689)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Make sure the data is correct.
2023-07-14 10:00:13 +08:00
mindest
810512c658
[ROCm] TunableOp: add hipBLASLt tuning logic (#16338)
### Description
- Add hipBLASLt tuning logic in place of default hipBLASLt
implementation;
- add kernel explorer for hipBLASLt.

related operators: Gemm, StridedBatchedGemm, and GemmFastGelu.

Temporarily mark algos that require extra workspace as unsupported.
Will add workspace support in later PR, which will change Gemm Params
def and affect multiple files.
2023-07-14 08:20:58 +08:00
Scott McKay
a3fc04ba74
Fix CodeCoverage pipeline (#16684)
### Description
<!-- Describe your changes. -->
Delete second reference to onnxruntime_api_tests_without_env in the code
coverage commands. One was removed in #16373 and the duplicate wasn't
noticed.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix pipeline.
2023-07-14 07:47:04 +10:00
Yulong Wang
d1d65978f6
[js/web] fix file size trim for wasm only .min.js (#16681)
### Description
fix file size trim for wasm only .min.js

minimal build `ort.wasm.min.js` and `ort.wasm-core.min.js` should
exclude JSEP related source code.
2023-07-13 14:20:51 -07:00
Danny Friar
5de2e2fb76
Call lazy_reset_grad in on-device training docs (#16696) 2023-07-13 13:29:54 -07:00
Dipanjan Sengupta
a461608409
Amx flag removal (#16527)
### Description
1. Replacing AMX intrinsics with machine code macros in QGEMM kernel.
2. Removing AMX build flags for GCC in cmake file.
3. Fixing the link time optimization (LTO) issue introduced with asm
.include of an assembly file.

I have moved the AMX instruction macro definitions from
QgemmU8S8KernelAmxCommon.S to the amx_common.h to fix the LTO issue.
Note that I am also pushing the macros defined in
QgemmU8S8KernelAmxCommon.S for future reference.

A special thanks to @laxmansole who helped in the development of the
instruction macro definitions for AMX intrinsics and fixing the LTO
issue.

### Motivation and Context
The additional AMX flag in cmake adds an extra layer of dependency on
GCC version to use the feature.These changes should allow the usage of
the AMX feature with just the CPU ID check.
2023-07-13 11:19:49 -07:00
Vincent Wang
c07a3b869c
Triton Codegen for ORTModule (#15831)
Fuse connected elementwise and reduce Ops to TritonOp and codegen triton
code to run the kernel.

This PR is co-edited by @wejoncy and @er3x3
2023-07-13 18:17:58 +08:00
Wanming Lin
7cac114e52
[WebNN EP] Support Abs and Neg ops (#16672) 2023-07-13 00:44:22 -07:00
Wanming Lin
d5b76cff60
[WebNN EP] Fixed build error (#16671)
The build break was caused by enabling `-Wshorten-64-to-32` in
https://github.com/microsoft/onnxruntime/pull/16524
2023-07-12 23:37:24 -07:00
mindest
b7fd5af48b
[ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657)
### Description
- Update existing rocBLAS get_solutions API using
`*_get_solutions_by_type` (supported from ROCm5.6); remove the original
nested TunableOp logic.
- Update kernel_explorer.
2023-07-13 11:20:26 +08:00