### Description
Change all macOS python packages to use universal2, to reduce the number
of packages we have.
### Motivation and Context
According to [wikipedia](https://en.wikipedia.org/wiki/MacOS_Big_Sur),
macOS 11 is the first macOS version that supports universal 2. And it is
the min macOS version we support. So we no longer need to maintain
separate binaries for different CPU archs.
### Description
reducemax/min have been updated in onnx(20). implement it in ort
### Motivation and Context
this is for ort1.17.0 release
---------
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
### Description
- Add mutex to protect QNN API calls for executing a graph and
extracting the corresponding profile data.
- Ensures QNN EP's execute function does not store unnecessary state
(i.e., input and output buffer pointers do not need to be stored as
class members.)
### Motivation and Context
Allow calling `session.Run()` from multiple threads when using QNN EP.
### Description
onnxruntime may raise an error "type inference failed" but when a custom
operator sets IsHomogeneous to false in its schema. This change make
sure that TypeInferenceFunction and schema type constraints are aligned
to prevent that from happening.
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
### Description
a replacement of #18683. try to resolve#18689.
By specifying "-s PTHREAD_POOL_SIZE" flag in emscripten, it forces the
threadpool to initialize before the webassembly instance is available.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
## Description
This pull request aims to enhance the efficiency of the inference
session creation by eliminating unnecessary `Node::ToProto` invocations.
The current codebase presents opportunities for optimization,
particularly in the removal of superfluous `Node::ToProto` calls, along
with their subsequent `~NodeProto` invocations.
## Motivation and Context
The optimization focus of this pull request is on addressing low-hanging
fruit in the inference session creation process. By strategically
removing undesired `Node::ToProto` calls, we aim to streamline the
codebase and enhance the overall performance. The flame graphs
illustrate the notable improvements achieved by reducing the percentage
of `Node::ToProto` calls, thereby optimizing the execution flow.
### Code Snippet
```cpp
TEST(InferenceSessionTests, Bench) {
// Initialize logging manager
auto logging_manager = std::make_unique<logging::LoggingManager>(
std::unique_ptr<ISink>(new CLogSink()), logging::Severity::kVERBOSE, false,
LoggingManager::InstanceType::Temporal);
// Create environment
std::unique_ptr<Environment> env;
auto st = Environment::Create(std::move(logging_manager), env);
ASSERT_TRUE(st.IsOK());
// Configure session options
SessionOptions so;
so.execution_mode = ExecutionMode::ORT_SEQUENTIAL;
so.graph_optimization_level = TransformerLevel::Level2;
so.intra_op_param.thread_pool_size = 1;
// Initialize and load the InferenceSession
InferenceSessionTestGlobalThreadPools session1{so, *env};
ASSERT_STATUS_OK(session1.Load("big.onnx"));
ASSERT_STATUS_OK(session1.Initialize());
}
```
### `big.onnx` model creation
```python
import onnx
import numpy as np
from spox import argument, build, Tensor, Var
from spox.opset.ai.onnx import v17 as op
from spox.opset.ai.onnx.ml.v3 import label_encoder
a = argument(Tensor(np.int64, ('N',)))
c = a
for x in range(1000):
c = op.mul(c, op.const(np.ones(10000, dtype=np.int64)))
for x in range(3000):
all_strings = list("random_string" + str(i) for i in range(100))
all_ints = list(range(len(all_strings)))
c = label_encoder(
c,
keys_int64s=all_ints,
values_strings=all_strings
)
c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints)
model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c})
onnx.save(model, "big.onnx")
```
Testing in `Release` with `perf` yields:
Before: 3.3% spent in `Node::ToProto`
After: 1.6% spent in `Node::ToProto`
---------
Co-authored-by: Atanas Dimitrov <atanasdimitrov@Atanass-MacBook-Pro.local>
Fixed a small spelling error.
### Description
<!-- Describe your changes. -->
Small spelling error fix.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
It is documentation for the product, and it misspells the word
documentation. This reflects on your product and the quality of the
work.
### Description
<!-- Describe your changes. -->
SessionOptions use_deterministic_compute can be set via the python API.
User request to enable setting via C API.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#17416
### Description
This limits the size of constant data nodes which the DML EP creates in
the DML graph following de-duplication of 1D quantization tensors. In
the process it reduces a check for the maximum size of the constant
node.
This is merged from: https://github.com/microsoft/onnxruntime/pull/18494
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Cleanup and rebase from [this
PR](https://github.com/microsoft/onnxruntime/pull/18629)
### Motivation and Context
---------
Co-authored-by: Christian Larson <chrilaMSFT@users.noreply.github.com>
Co-authored-by: Christian Larson <28911437+chrilaMSFT@users.noreply.github.com>
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
Co-authored-by: Anagha Rao <anagrao@microsoft.com>
### Description
This addresses a bug in a fast path that was added for submission of
re-used command lists of fused graph kernels in the DML EP, addressing a
D3D debug layer error.
### Motivation and Context
The fast path in DmlCommandRecorder::ExecuteCommandList enabled a
current non-reused command list, if empty, to be used for commands
following submission of the fused command list. The fix ensures the
associated command allocator is only re-used after the next fence value
is completed, which is higher due to submission of the other command
list.
The command recorder design was intended to support batching of provided
command list execution, however it submits command lists immedately as
an implementation detail to maximize CPU/GPU parallelism. If that
heuristic was removed, it would expose additional issues in this same
fast path. Because of this and complexity and inefficiency of the old
batching mechanism, I also removed this.
### Description
[DirectML EP] Add DML EP registration for Col2Im operator
### Motivation and Context
Add Col2Im support for opset 18.
This operator is implemented as the DirectML Fold operator.
---------
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
Update resource creation flag to avoid D3D12 WARNING
### Description
Update the DML DX12 allocator to use D3D12_RESOUCE_STATE_COMMON to avoid
DX12 Warning messages.
### Motivation and Context
When directML is created with debug layer there are warnings when
resources are created by ORT.
---------
Co-authored-by: Christian Larson <28911437+chrilaMSFT@users.noreply.github.com>
### Description
1. Expand input datatype support for Resize with uint8/int8.
2. Update the logic to compute output shape of Resize Op, roiRange is
got rid of to align with how tests compute the output shape to go around
the size asserting in MLOperatorAuthorImpl.cpp
`m_inputDimensions[i] * roiRange * scale` -> `m_inputDimensions[i] *
scale`
3. disable 4 tests because of the result mismatch. The results of DML
with float32 and uint8/int8 match each other, so it should be problem of
resize implementation, which is out the scope of this PR.
`ResizeOpTest.NhwcResizeOpLinearDownSampleTest_tf_crop_and_resize_without_extrapolation_uint8
ResizeOpTest.NhwcResizeOpLinearDownSampleTest_tf_crop_and_resize_without_extrapolation_int8
ResizeOpTest.NhwcResizeOpLinearDownSampleTest_4DBilinear_pytorch_half_pixel_uint8
ResizeOpTest.NhwcResizeOpLinearDownSampleTest_4DBilinear_pytorch_half_pixel_int8`
[Cherry pick Reviewed]
Re-add changes which were merged out...
---------
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Sheil Kumar <smk2007@gmail.com>
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
### Description
This PR also includes,
8b0a55e7cc DML constant pow operator
7520974970 Enable custom heaps based on query-
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
[Cherry Pick Reviewed]
DML EP Implementation for
[QLinearAveragePool](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QLinearAveragePool)
```
Note: Google Test filter = *QLinear*Pool*
[==========] Running 72 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 36 tests from QLinearGlobalAveragePool
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x1x32x32
[ OK ] QLinearGlobalAveragePool.Nhwc_1x1x32x32 (410 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x32x32x1
[ OK ] QLinearGlobalAveragePool.Nchw_1x32x32x1 (641 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x8x8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x256x8x8 (156 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x256
[ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x256 (134 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x7x7
[ OK ] QLinearGlobalAveragePool.Nhwc_1x255x7x7 (160 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x255
[ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x255 (145 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x8x8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x255x8x8 (148 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x255
[ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x255 (129 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x7x7
[ OK ] QLinearGlobalAveragePool.Nhwc_1x256x7x7 (134 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x256
[ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x256 (131 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x8x8
[ OK ] QLinearGlobalAveragePool.Nhwc_3x256x8x8 (159 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x256
[ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x256 (168 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x7x7
[ OK ] QLinearGlobalAveragePool.Nhwc_3x255x7x7 (139 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x255
[ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x255 (170 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x8x8
[ OK ] QLinearGlobalAveragePool.Nhwc_3x255x8x8 (155 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x255
[ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x255 (156 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x7x7
[ OK ] QLinearGlobalAveragePool.Nhwc_3x256x7x7 (133 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x256
[ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x256 (149 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x1x32x32_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x1x32x32_S8 (131 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x32x32x1_S8
[ OK ] QLinearGlobalAveragePool.Nchw_1x32x32x1_S8 (127 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x8x8_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x256x8x8_S8 (153 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x256_S8
[ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x256_S8 (129 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x7x7_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x255x7x7_S8 (133 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x255_S8
[ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x255_S8 (135 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x255x8x8_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x255x8x8_S8 (129 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x8x8x255_S8
[ OK ] QLinearGlobalAveragePool.Nchw_1x8x8x255_S8 (152 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_1x256x7x7_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_1x256x7x7_S8 (140 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_1x7x7x256_S8
[ OK ] QLinearGlobalAveragePool.Nchw_1x7x7x256_S8 (133 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x8x8_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_3x256x8x8_S8 (135 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x256_S8
[ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x256_S8 (147 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x7x7_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_3x255x7x7_S8 (156 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x255_S8
[ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x255_S8 (155 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x255x8x8_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_3x255x8x8_S8 (138 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x8x8x255_S8
[ OK ] QLinearGlobalAveragePool.Nchw_3x8x8x255_S8 (155 ms)
[ RUN ] QLinearGlobalAveragePool.Nhwc_3x256x7x7_S8
[ OK ] QLinearGlobalAveragePool.Nhwc_3x256x7x7_S8 (144 ms)
[ RUN ] QLinearGlobalAveragePool.Nchw_3x7x7x256_S8
[ OK ] QLinearGlobalAveragePool.Nchw_3x7x7x256_S8 (139 ms)
[----------] 36 tests from QLinearGlobalAveragePool (5968 ms total)
[----------] 36 tests from QLinearPoolTest
[ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel
[ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel (480 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel
[ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel (481 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel
[ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel (512 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel
[ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel (455 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel
[ OK ] QLinearPoolTest.AveragePool2D_MultiChannel (463 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel
[ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel (448 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel
[ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel (458 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc
[ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc (171 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc
[ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc (169 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc
[ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc (152 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc
[ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc (660 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc
[ OK ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc (150 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc
[ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc (145 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc
[ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc (146 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_BigImage
[ OK ] QLinearPoolTest.AveragePool2D_BigImage (505 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_BigImage_nhwc
[ OK ] QLinearPoolTest.AveragePool2D_BigImage_nhwc (161 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_Global
[ OK ] QLinearPoolTest.AveragePool2D_Global (481 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_Global_nhwc
[ OK ] QLinearPoolTest.AveragePool2D_Global_nhwc (152 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_S8
[ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_S8 (461 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel_S8
[ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel_S8 (448 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_S8
[ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_S8 (471 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel_S8
[ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel_S8 (473 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel_S8
[ OK ] QLinearPoolTest.AveragePool2D_MultiChannel_S8 (1507 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_S8
[ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_S8 (477 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel_S8
[ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel_S8 (493 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool1D_ExcludePadPixel_nhwc_S8 (158 ms)
[ RUN ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool1D_IncludePadPixel_nhwc_S8 (146 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool2D_ExcludePadPixel_nhwc_S8 (146 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool2D_IncludePadPixel_nhwc_S8 (158 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool2D_MultiChannel_nhwc_S8 (157 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool3D_ExcludePadPixel_nhwc_S8 (145 ms)
[ RUN ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool3D_IncludePadPixel_nhwc_S8 (147 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_BigImage_S8
[ OK ] QLinearPoolTest.AveragePool2D_BigImage_S8 (537 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_BigImage_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool2D_BigImage_nhwc_S8 (173 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_Global_S8
[ OK ] QLinearPoolTest.AveragePool2D_Global_S8 (457 ms)
[ RUN ] QLinearPoolTest.AveragePool2D_Global_nhwc_S8
[ OK ] QLinearPoolTest.AveragePool2D_Global_nhwc_S8 (150 ms)
[----------] 36 tests from QLinearPoolTest (12914 ms total)
[----------] Global test environment tear-down
[==========] 72 tests from 2 test suites ran. (18885 ms total)
[ PASSED ] 72 tests.
memleakdbg:
----- No memory leaks detected -----
```
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Adrian Tsai <adtsai@microsoft.com>
### Description
[Cherry Pick Reviewed]
```
[ OK ] QLinearConcatS8.ExpectFail_WrongZeroPointType_1 (372 ms)
[ RUN ] QLinearConcatS8.InputOne_Dynamic
[ OK ] QLinearConcatS8.InputOne_Dynamic (255 ms)
[ RUN ] QLinearConcatS8.InputOne_Const
[ OK ] QLinearConcatS8.InputOne_Const (255 ms)
[----------] 11 tests from QLinearConcatS8 (3385 ms total)
[----------] Global test environment tear-down
[==========] 21 tests from 3 test suites ran. (9355 ms total)
[ PASSED ] 21 tests.
```
[#16971](https://github.com/microsoft/onnxruntime/pull/16971)
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: Xiang Zhang <xianz@microsoft.com>
### Description
The patch fixes a floating point accuracy issue in Resize by preferring
integer indices and integer arithmetic where possible.
### Motivation and Context
Model test `test_resize_upsample_sizes_nearest_floor_align_corners` was
observed to be failing on certain platforms. The root cause is the
inaccurate floating point evaluation of 21 / 7 (2.999... vs 3), which
results in the wrong input element to be indexed (floor(2.999...) vs
floor(3)).
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
If we fail to calculate the buffer size (due to overflow) we currently
return a nullptr. This is inconsistent as an actual memory allocation
failure throws. An overflow would typically be due to bad input so an
exception makes more sense given that.
Change to throw so code using MakeUniquePtr* and AllocArray* doesn't
need to check for nullptr.
Add some extra info to the log message to help debugging.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Should help with #18905 by avoiding the invalid attempted usage of a
nullptr from the allocation. Extra info _might_ help with figuring out
where the overflow is coming from which is the real issue.
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->