Commit graph

9235 commits

Author SHA1 Message Date
Jeff Bloomfield
4dfeef48eb Revert QlinearAdd1 2023-08-07 14:30:08 -07:00
Jeff Bloomfield
49791b5dec Revert "Temporarily flush and sync weight copies before initialization"
This reverts commit 789eb66fda.
2023-08-07 14:18:15 -07:00
Jeff Bloomfield
a1d20ec8aa Fix DML regression from allocator refactor and enable unrounded weight allocations through ORT API 2023-08-06 09:49:39 -07:00
raoanag
9406bcb3b4
Adding matmul_integer_to_float16 onnx models (#16978)
### Description
Missed adding float16 onnx models generated using
`matmul_integer_to_float.py`



### Motivation and Context
2023-08-03 10:16:22 -07:00
raoanag
34b6cd6d9a
MatMulIntToFloat Enable FP16 and update tensor ORT-DML indexing (#16871)
MatMulIntegerToFloat Enable FP16 and update tensor ORT-DML indexing

```
.\onnxruntime_test_all.exe --gtest_filter="*MatMulIntegerToFloat*"
Note: Google Test filter = *MatMulIntegerToFloat*
[==========] Running 13 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CPU_U8S8_Precision_Tests
[ RUN      ] CPU_U8S8_Precision_Tests.MatMulIntegerToFloat
[       OK ] CPU_U8S8_Precision_Tests.MatMulIntegerToFloat (35 ms)
[----------] 1 test from CPU_U8S8_Precision_Tests (35 ms total)

[----------] 1 test from GraphTransformationTests
[ RUN      ] GraphTransformationTests.MatMulIntegerToFloatTest
[       OK ] GraphTransformationTests.MatMulIntegerToFloatTest (5 ms)
[----------] 1 test from GraphTransformationTests (5 ms total)

[----------] 1 test from QDQTransformerTests
[ RUN      ] QDQTransformerTests.MatMulIntegerToFloat
[       OK ] QDQTransformerTests.MatMulIntegerToFloat (24 ms)
[----------] 1 test from QDQTransformerTests (24 ms total)

[----------] 10 tests from MatMulIntegerToFloat
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8X8_FP16
2023-07-27 16:45:13.1564552 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.3115187 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.4173384 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.5232503 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.6350011 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.7389236 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.8479674 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:13.9661708 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8X8_FP16 (951 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8X8_FP16
2023-07-27 16:45:14.1087648 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.2124719 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.3137829 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.4173078 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.5223567 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.6240616 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.7247578 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:14.8346170 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8X8_FP16 (830 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8_FP16
2023-07-27 16:45:14.9395745 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:15.0389962 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:15.1365449 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:15.2336207 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8_FP16 (393 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8_FP16
2023-07-27 16:45:15.3304154 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:15.4340044 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:15.5331474 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
2023-07-27 16:45:15.6337272 [W:onnxruntime:Default, base_tester.cc:763 onnxruntime::test::BaseTester::ExecuteModelForEps] registered execution providers CPUExecutionProvider were unable to run the model.
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8_FP16 (403 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8X8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8X8 (739 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8X8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8X8 (756 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 (363 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 (368 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint
[       OK ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint (11 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint_FP16
[       OK ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint_FP16 (11 ms)
[----------] 10 tests from MatMulIntegerToFloat (4833 ms total)

[----------] Global test environment tear-down
[==========] 13 tests from 4 test suites ran. (4914 ms total)
[  PASSED  ] 13 tests.
```
2023-07-28 14:45:49 -07:00
Xiang Zhang
cbdd0bb729
QAttention calls into MatMulIntToFloat instead of Dequantize+GEMM (#16851)
### Description
Update QAttention calling into MatMulIntToFloat instead of
Dequantize+GEMM to enable more metacommand path.
2023-07-26 15:31:09 -07:00
Xiang Zhang
c19e4c02e2 Implement QAttention And Enable tests (#16837) 2023-07-25 14:39:29 -07:00
Jeff Bloomfield
e9d330e489 Build fixes 2023-07-25 14:39:10 -07:00
raoanag
14298e9c02 MatrixMultiplyIntegerToFloat (#16804)
### Description
Implementation for[
com.microsoft.MatMulIntegerToFloat](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulIntegerToFloat)

```
C:\workspace\ORT\onnxruntime\build\test\RelWithDebInfo\RelWithDebInfo>.\onnxruntime_test_all.exe --gtest_filter="*MatMulIntegerToFloat*"
Note: Google Test filter = *MatMulIntegerToFloat*
[==========] Running 8 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from CPU_U8S8_Precision_Tests
[ RUN      ] CPU_U8S8_Precision_Tests.MatMulIntegerToFloat
[       OK ] CPU_U8S8_Precision_Tests.MatMulIntegerToFloat (31 ms)
[----------] 1 test from CPU_U8S8_Precision_Tests (31 ms total)

[----------] 1 test from GraphTransformationTests
[ RUN      ] GraphTransformationTests.MatMulIntegerToFloatTest
[       OK ] GraphTransformationTests.MatMulIntegerToFloatTest (0 ms)
[----------] 1 test from GraphTransformationTests (1 ms total)

[----------] 1 test from QDQTransformerTests
[ RUN      ] QDQTransformerTests.MatMulIntegerToFloat
[       OK ] QDQTransformerTests.MatMulIntegerToFloat (12 ms)
[----------] 1 test from QDQTransformerTests (12 ms total)

[----------] 5 tests from MatMulIntegerToFloat
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8X8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8X8 (801 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8X8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8X8 (804 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 (378 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 (382 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint
[       OK ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint (31 ms)
[----------] 5 tests from MatMulIntegerToFloat (2398 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 4 test suites ran. (2455 ms total)
[  PASSED  ] 8 tests.
```
2023-07-25 13:59:08 -07:00
Jeff Bloomfield
4336d6422d Merge remote-tracking branch 'origin/main' into DmlPrototype 2023-07-25 13:58:44 -07:00
Adam Pocock
a1bb670536
[java] Fp16 fix for android/react native (#16832)
### Description
This PR splits out the FP16 conversions into a separate package we can
override in the android build with a version which works on old versions
of Android.

I'm not sure the android build system changes are correct as I haven't
got an android build environment configured on my workstation.
@YUNQIUGUO if the CI build fails we should follow up offline to get my
environment configured so I can iterate on it.

### Motivation and Context
Fixes the CI failure after #16703.
2023-07-25 12:31:32 -07:00
Edward Chen
e01365f80b
Update upload_pod_archive_and_update_podspec.sh to take path pattern (#16810)
Update upload_pod_archive_and_update_podspec.sh to take a pod archive path glob pattern. The actual pod archive path has a version suffix which changes.
2023-07-25 08:55:31 -07:00
Yi Zhang
38db5eca65
replace onnxruntime-Win-CPU-2019 with onnxruntime-Win-CPU-2022 (#16844)
### Description
<!-- Describe your changes. -->



### Motivation and Context
upgrade to VS2022
2023-07-25 23:05:34 +08:00
Luis Rios
feeb0b50f9
Improve TreeNodeElementId hash function (#16459)
### Description
This PR improves `TreeNodeElementId` hash function by employing [Elegant
Pairing function](http://szudzik.com/ElegantPairing.pdf). In few works,
Elegant Pairing function maps two non−negative integers to a
non−negative integer that is uniquely associated with that pair. This
drastically reduces the collision and therefore reduces the time
required to create a session in order to use a large tree ensemble
model.

### Motivation and Context
We use ONNX runtime to serve our models as part of Triton backend. We
noticed that it was taking around 2 minutes to load a model which is a
large tree ensemble model (around 5k trees with around 3 millions nodes
in total). After investigating the issue, it was clear that the
`TreeNodeElementId` hash function wasn't being able to map keys to
buckets of C++ `unordered_map` without a significant amount of
collisions (in same cases 700 items per bucket).

The following picture shows graphically the improvement obtained by the
proposed change. We used the `onnx_test_runner` command.

![flamegraph](https://github.com/microsoft/onnxruntime/assets/3594678/2588e87c-125b-4a4b-8f03-55e00ae25e08)

#### Before
```
$> time ./onnx_test_runner -v ~/folder_with_model
result:
	Models: 1
	Total test cases: 0
		Succeeded: 0
		Not implemented: 0
		Failed: 0
	Stats by Operator type:
		Not implemented(0):
		Failed:
Failed Test Cases:

real	0m55.695s
user	0m52.919s
sys	0m0.760s
```

#### After
```
$> time ./onnx_test_runner -v ~/folder_with_model
result:
	Models: 1
	Total test cases: 0
		Succeeded: 0
		Not implemented: 0
		Failed: 0
	Stats by Operator type:
		Not implemented(0):
		Failed:
Failed Test Cases:

real	0m17.152s
user	0m14.318s
sys	0m0.619s
```
2023-07-25 14:25:50 +02:00
BoarQing
daef133982
update onnxruntime_perftest's README.md as vitisai is supported on v1.15.1 (#16827)
### Description
<!-- Describe your changes. -->
Updating README.md to add vitisai for onnxruntime_perftest


### Motivation and Context
<!-- - Why is this change required? What problem does it solve? -->
The perftest tool does support vitisai whereas the README.md does not
list it. This creates some confusions internally about if vitisai is
supported. See https://github.com/microsoft/onnxruntime/pull/15673 for
context.
2023-07-25 13:39:26 +02:00
Yi Zhang
f88f0d8e36
Upgrade 4 stages in nuget pipeline to VS2022 (#16825)
### Description


### Motivation and Context
Continue upgrading to VS2022

### Verfication

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=331377&view=results

N.B.
In practice, SDLNativeRules@3 doesn't support VS2019.
2023-07-25 14:22:39 +08:00
Yulong Wang
8b30dc11d7
Update run_CIs_for_external_pr.py to skip passed checks (#16808)
### Description
Update run_CIs_for_external_pr.py to skip passed checks
2023-07-25 16:11:53 +10:00
Yi Zhang
2e214d6e27
Workaround to upgrade VS2022 for Windows ARM build (#16826)
### Description



### Motivation and Context
It should be reverted when VS2022 is upgraded to 17.7 or above.

### Vefication

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=331401&view=logs&j=7517abfd-115a-5c61-78a0-7ba3c9e3a88d
2023-07-25 08:35:52 +08:00
pengwa
f2c0470436
Fix slice upstream - Incompatible dimensions (#16818)
### Fix slice upstream - (MatMul) [ShapeInferenceError] Incompatible
dimensions

```
     2023-07-22 14:58:16.918478478 [I:onnxruntime:Default, constant_sharing.cc:256 ApplyImpl] Total shared scalar initializer count: 10
        2023-07-22 14:58:16.919494252 [W:onnxruntime:Default, graph.cc:108 MergeShapeInfo] Error merging shape info for output. 'onnx::Cast_424' source:{-1,31,-1,-1} target:{-1,32,-1,-1}. Falling back to lenient merge.
        2023-07-22 14:58:16.921014114 [W:onnxruntime:Default, graph.cc:108 MergeShapeInfo] Error merging shape info for output. 'onnx::MatMul_425' source:{-1,31,-1,-1} target:{-1,32,-1,-1}. Falling back to lenient merge.

Traceback (most recent call last):
  File "examples/onnxruntime/training/language-modeling/run_clm.py", line 594, in <module>
    main()
  File "examples/onnxruntime/training/language-modeling/run_clm.py", line 542, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/bert_ort/pengwa/optimum/optimum/onnxruntime/trainer.py", line 454, in train
    return inner_training_loop(
  File "/bert_ort/pengwa/optimum/optimum/onnxruntime/trainer.py", line 755, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/bert_ort/pengwa/optimum/optimum/onnxruntime/trainer.py", line 363, in compute_loss
    return model_with_loss(dict_inputs, return_outputs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1724, in forward
    loss = self.module(*inputs, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_utils.py", line 384, in _forward
    return ortmodule._torch_module.forward(*inputs, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_utils.py", line 364, in _forward
    return torch_module_ort._execution_manager(torch_module_ort.is_training()).forward(*inputs, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 345, in forward
    self._fallback_manager.handle_exception(
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_fallback.py", line 157, in handle_exception
    raise exception
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 280, in forward
    self._build_graph(graph_transformer_config)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_logger.py", line 218, in wrapper
    result = func(graph_execution_manager, *args, **kwargs)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 360, in _build_graph
    super()._build_graph(graph_transformer_config)
  File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 186, in _build_graph
    self._graph_builder.build(config)
RuntimeError: /bert_ort/pengwa/onnxruntime/orttraining/orttraining/python/orttraining_pybind_state.cc:823 onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder*, const onnxruntime::training::TrainingGraphTransformerConfiguration&)> [ONNXRuntimeError] : 1 : FAIL : Node (MatMul_403) Op (MatMul) [ShapeInferenceError] Incompatible dimensions

 
```

Missed using `axis` attribute for `Slice` op, so change to use `axes`
inputs instead.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-25 08:21:46 +08:00
Wei-Sheng Chin
b0279b14d8
[DORT] Enable Dynamic Shape in DORT and Use Different InferenceSession's when Inputs Are Not Compatible (#16753)
Sometimes, ONNX exporter generates rank- or shape-dependent sub-graphs.
Thus, error could occur when running the ONNX model with different
inputs. This PR
([78e736d](78e736d857))
addresses this problem by
- if needed, exporting multiple ONNX models with different inputs for
the same GraphModule.
- implementing a naive mechanism to determine of existing ONNX models
(and the associated InferenceSession) can be reused.
 
On the other hand, in the second commit
[b5a9b5f](b5a9b5f849),
this PR also enables dynamic shapes in DORT by
- passing dynamic_shapes = True to exporter (see how
DEFAULT_DYNAMIC_BACKEND is created)
- calling torch._dynamo.optimize(dynamic_ort_aot, dynamic=True) (see how
dynamic_ort_aot is created).
2023-07-24 16:54:01 -07:00
dependabot[bot]
4b6d9fa851
Bump actions/deploy-pages from 1 to 2 (#16402) 2023-07-24 16:13:59 -07:00
Maximilian Müller
d8d8349a1b
fix: add missing nullptr of SessionOptions V2 (#16794)
/builds/devtechproviz/dl/ort-builder/onnxruntime/onnxruntime/python/onnxruntime_pybind_state.cc:388:14:
error: missing initializer for member
'OrtTensorRTProviderOptionsV2::trt_cuda_graph_enable'
[-Werror=missing-field-initializers]
  388 |             0};
      |

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-24 15:17:11 -07:00
Wanming Lin
5d17bcd776
[WebNN EP] Support Greater and Less ops (#16782) 2023-07-24 09:08:53 -07:00
PeixuanZuo
8ede2f139e
[ROCm] Optimize ROCm CI pipeline 2 (#16691)
- Set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to replace numpy with cupy on
kernel explorer test.

KERNEL_EXPLORER_TEST_USE_CUPY=0 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/91724b78-0b4e-4cbd-ad88-83cad9976472)

KERNEL_EXPLORER_TEST_USE_CUPY=1 The CPU utilization is shown as below:

![image](https://github.com/microsoft/onnxruntime/assets/94887879/58239911-667c-4d5f-bb78-deca60d0266f)


- Use `Bash@3`.
- Update shell script.
2023-07-24 13:57:48 +08:00
Chi Lo
21ef14476b
Bug fix for nested control flow ops for TRT EP (#16343)
Current TRT EP can support model which has nested control flow ops
(multiple level subgraphs). But it fails at a case where the subgraph
has outer scope value that is defined several levels up in the top-level
graph, in this case, the outer scope value is the input of the top-level
graph. The outer scope values are not properly handled during TRT EP's
subgraph reconstruction stage and fails at `graph.resolve()`.

The way ORT gets capability from EPs is a bottom-up approach meaning
inner most subgraph gets handled first. TRT EP reconstructs each
subgraph level by level and following modifications are made to fix the
outer scope values issue:

- `SetGraphOuterScopeValuesAndInputs()` and `SetAllGraphInputs()` are
added to handle outer scope values and add those values as graph inputs
if needed in order to make `graph.resolve()` happy.
- Change to use `GetNodeArgIncludingParentGraphs` so that when creating
the fused TRT node for some subgraphs in`
Graph::CreateFusedSubGraphNode()`, it can get the NodeArgs for outer
scope values from top-level graph.


This PR fixes https://github.com/microsoft/onnxruntime/issues/16217
2023-07-23 16:16:17 -07:00
pengwa
40277b7f37
Fix orttraining-linux-gpu-ci-pipeline - LargeSizeTensorUInt64Index tests (#16820)
### Disable large index tests due to limited GPU mem

Recently following two tests fail due to GPU mem not enough, not sure
what else program running using GPU as well. So disable them for now to
unblock the required CI.

```
1: [  FAILED  ] 2 tests, listed below:
1: [  FAILED  ] CrossEntropyTest.SoftmaxCrossEntropyLossInternal_LargeSizeTensorUInt64Index
1: [  FAILED  ] CrossEntropyTest.SoftmaxCrossEntropyLossInternalGrad_LargeSizeTensorUInt64Index


2023-07-23T02:15:39.7559251Z 1: [ RUN      ] CrossEntropyTest.SoftmaxCrossEntropyLossInternal_LargeSizeTensorUInt64Index
2023-07-23T02:16:53.0904576Z 1: 2023-07-23 02:16:53.089586592 [E:onnxruntime:SoftmaxCrossEntropyLossInternal, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running SoftmaxCrossEntropyLossInternal node. Name:'node1' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* **onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4294973440**
2023-07-23T02:16:53.0905775Z 1: 
2023-07-23T02:16:53.0906087Z 1: /onnxruntime_src/onnxruntime/test/providers/base_tester.cc:323: Failure
2023-07-23T02:16:53.0906698Z 1: Expected equality of these values:
2023-07-23T02:16:53.0907086Z 1:   expect_result
2023-07-23T02:16:53.0907564Z 1:     Which is: 4-byte object <00-00 00-00>
2023-07-23T02:16:53.0973055Z 1:   ExpectResult::kExpectFailure
2023-07-23T02:16:53.0973984Z 1:     Which is: 4-byte object <01-00 00-00>
2023-07-23T02:16:53.0975375Z 1: Run failed but expected success: Non-zero status code returned while running SoftmaxCrossEntropyLossInternal node. Name:'node1' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4294973440
2023-07-23T02:16:53.0976198Z 1: 
2023-07-23T02:16:53.0976483Z 1: Google Test trace:
2023-07-23T02:16:53.0976818Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910
2023-07-23T02:16:53.0977229Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910
2023-07-23T02:16:53.0977639Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 2345
2023-07-23T02:16:53.0978035Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 5678
2023-07-23T02:16:53.0978441Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1234
2023-07-23T02:16:53.1303810Z 1: /onnxruntime_src/orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc:443: Failure
2023-07-23T02:16:53.1304644Z 1: Expected equality of these values:
2023-07-23T02:16:53.1304974Z 1:   ret.first
2023-07-23T02:16:53.1305685Z 1:     Which is: 4-byte object <04-00 00-00>
2023-07-23T02:16:53.1306030Z 1:   COMPARE_RESULT::SUCCESS
2023-07-23T02:16:53.1306414Z 1:     Which is: 4-byte object <00-00 00-00>
2023-07-23T02:16:53.1306754Z 1: Unsupported compare with CompareOrtValueNumerals.
2023-07-23T02:16:53.1307487Z 1: Google Test trace:
2023-07-23T02:16:53.1307848Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910
2023-07-23T02:16:53.1308252Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910
2023-07-23T02:16:53.1308652Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 2345
2023-07-23T02:16:53.1309068Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 5678
2023-07-23T02:16:53.1309460Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1234
2023-07-23T02:16:53.1309889Z 1: /onnxruntime_src/orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc:443: Failure
2023-07-23T02:16:53.1310239Z 1: Expected equality of these values:
2023-07-23T02:16:53.1310527Z 1:   ret.first
2023-07-23T02:16:53.1310893Z 1:     Which is: 4-byte object <04-00 00-00>
2023-07-23T02:16:53.1311208Z 1:   COMPARE_RESULT::SUCCESS
2023-07-23T02:16:53.1311600Z 1:     Which is: 4-byte object <00-00 00-00>
2023-07-23T02:16:53.1311921Z 1: Unsupported compare with CompareOrtValueNumerals.
2023-07-23T02:16:53.1312229Z 1: Google Test trace:
2023-07-23T02:16:53.1312556Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910
2023-07-23T02:16:53.1312951Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910
2023-07-23T02:16:53.1313362Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 2345
2023-07-23T02:16:53.1313749Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 5678
2023-07-23T02:16:53.1314156Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1234
2023-07-23T02:16:53.4476437Z 1: [  FAILED  ] CrossEntropyTest.SoftmaxCrossEntropyLossInternal_LargeSizeTensorUInt64Index (73692 ms)

```



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-23 15:02:09 +08:00
Yi Zhang
3252ff2cb7
Change DML GPU pool in Windows GPU workflow use Visual Studio 2022 (#16784)
### Description
1. use the pool with VS2022
2. upgrade System.Memory to 4.5.5


### Motivation and Context
Solve the build error while using VS2022:
`[Failure] Msbuild failed when processing the file
'D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\Microsoft.ML.OnnxRuntime.csproj'
with message: Method not found: 'System.ReadOnlySpan`1<Char>
Microsoft.IO.Path.GetFileName(System.ReadOnlySpan`1<Char>)'`

Ref:
https://stackoverflow.com/questions/73399777/azure-build-failing-due-to-method-not-found-system-readonlyspan1char-micros
2023-07-23 10:07:21 +08:00
dependabot[bot]
b92f02ad48
Bump word-wrap from 1.2.3 to 1.2.4 in /js/react_native (#16755)
Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3
to 1.2.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jonschlinkert/word-wrap/releases">word-wrap's
releases</a>.</em></p>
<blockquote>
<h2>1.2.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Remove default indent by <a
href="https://github.com/mohd-akram"><code>@​mohd-akram</code></a> in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li>
<li>🔒fix: CVE 2023 26115 (2) by <a
href="https://github.com/OlafConijn"><code>@​OlafConijn</code></a> in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li>
<li>🔒 fix: CVE-2023-26115 by <a
href="https://github.com/aashutoshrathi"><code>@​aashutoshrathi</code></a>
in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li>
<li>chore: publish workflow by <a
href="https://github.com/OlafConijn"><code>@​OlafConijn</code></a> in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/42">jonschlinkert/word-wrap#42</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/mohd-akram"><code>@​mohd-akram</code></a> made
their first contribution in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li>
<li><a
href="https://github.com/OlafConijn"><code>@​OlafConijn</code></a> made
their first contribution in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li>
<li><a
href="https://github.com/aashutoshrathi"><code>@​aashutoshrathi</code></a>
made their first contribution in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f64b188c72"><code>f64b188</code></a>
run verb to generate README</li>
<li><a
href="03ea08256b"><code>03ea082</code></a>
Merge pull request <a
href="https://redirect.github.com/jonschlinkert/word-wrap/issues/42">#42</a>
from jonschlinkert/chore/publish-workflow</li>
<li><a
href="420dce9a24"><code>420dce9</code></a>
Merge pull request <a
href="https://redirect.github.com/jonschlinkert/word-wrap/issues/41">#41</a>
from jonschlinkert/fix/CVE-2023-26115-2</li>
<li><a
href="bfa694edf5"><code>bfa694e</code></a>
Update .github/workflows/publish.yml</li>
<li><a
href="ace0b3c78f"><code>ace0b3c</code></a>
chore: bump version to 1.2.4</li>
<li><a
href="6fd7275946"><code>6fd7275</code></a>
chore: add publish workflow</li>
<li><a
href="30d6daf60f"><code>30d6daf</code></a>
chore: fix test</li>
<li><a
href="655929cabe"><code>655929c</code></a>
chore: remove package-lock</li>
<li><a
href="49e08bbc32"><code>49e08bb</code></a>
chore: added an additional testcase</li>
<li><a
href="9f626935f3"><code>9f62693</code></a>
fix: cve 2023-26115</li>
<li>Additional commits viewable in <a
href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=word-wrap&package-manager=npm_and_yarn&previous-version=1.2.3&new-version=1.2.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-22 13:36:49 -07:00
dependabot[bot]
dafe11839e
Bump word-wrap from 1.2.3 to 1.2.4 in /js (#16754)
Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3
to 1.2.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/jonschlinkert/word-wrap/releases">word-wrap's
releases</a>.</em></p>
<blockquote>
<h2>1.2.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Remove default indent by <a
href="https://github.com/mohd-akram"><code>@​mohd-akram</code></a> in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li>
<li>🔒fix: CVE 2023 26115 (2) by <a
href="https://github.com/OlafConijn"><code>@​OlafConijn</code></a> in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li>
<li>🔒 fix: CVE-2023-26115 by <a
href="https://github.com/aashutoshrathi"><code>@​aashutoshrathi</code></a>
in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li>
<li>chore: publish workflow by <a
href="https://github.com/OlafConijn"><code>@​OlafConijn</code></a> in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/42">jonschlinkert/word-wrap#42</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/mohd-akram"><code>@​mohd-akram</code></a> made
their first contribution in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li>
<li><a
href="https://github.com/OlafConijn"><code>@​OlafConijn</code></a> made
their first contribution in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li>
<li><a
href="https://github.com/aashutoshrathi"><code>@​aashutoshrathi</code></a>
made their first contribution in <a
href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f64b188c72"><code>f64b188</code></a>
run verb to generate README</li>
<li><a
href="03ea08256b"><code>03ea082</code></a>
Merge pull request <a
href="https://redirect.github.com/jonschlinkert/word-wrap/issues/42">#42</a>
from jonschlinkert/chore/publish-workflow</li>
<li><a
href="420dce9a24"><code>420dce9</code></a>
Merge pull request <a
href="https://redirect.github.com/jonschlinkert/word-wrap/issues/41">#41</a>
from jonschlinkert/fix/CVE-2023-26115-2</li>
<li><a
href="bfa694edf5"><code>bfa694e</code></a>
Update .github/workflows/publish.yml</li>
<li><a
href="ace0b3c78f"><code>ace0b3c</code></a>
chore: bump version to 1.2.4</li>
<li><a
href="6fd7275946"><code>6fd7275</code></a>
chore: add publish workflow</li>
<li><a
href="30d6daf60f"><code>30d6daf</code></a>
chore: fix test</li>
<li><a
href="655929cabe"><code>655929c</code></a>
chore: remove package-lock</li>
<li><a
href="49e08bbc32"><code>49e08bb</code></a>
chore: added an additional testcase</li>
<li><a
href="9f626935f3"><code>9f62693</code></a>
fix: cve 2023-26115</li>
<li>Additional commits viewable in <a
href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=word-wrap&package-manager=npm_and_yarn&previous-version=1.2.3&new-version=1.2.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once it's up-to-date and CI passes on it,
as requested by @fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-22 13:36:38 -07:00
Ted Themistokleous
488544b79a
[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors (#16787)
Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC
for adding the following

- remove excess hipStreamSyncronize to nullstream on CopyTensor calls
- Add proper call for stream synchronized CopyTensorAsync for
DeviceToHost case

Without this change subsequent CopyTensorAsync() calls will fail for
cards that don't use pinned memory thus causing hipMemcpy() calls to
occur before certain kernel operations occur.

![image](https://github.com/microsoft/onnxruntime/assets/107195283/4915c18a-fb2d-40c9-a50e-a7c6613c324b)

becomes

![image](https://github.com/microsoft/onnxruntime/assets/107195283/f661acf4-e2af-4c9a-b26a-30fca339cf1d)

---------

Co-authored-by: Ted Themistokleous <tthemist@amd.com>
2023-07-22 09:48:36 +08:00
Arthur Islamov
210d29b40e
Allow --build_wasm on a mac system (#16761)
### Description
Changes allow downloading prebuilt protoc compiler when building
WebAssebly version on mac systems.
Otherwise it tries to build a js/wasm version of protoc and throws an
error while executing it: "protoc.js permission denied"


### Motivation and Context
I need to switch between my main working computer and a PC to make
changes to WebAssebly build. Would like not to do that anymore.
2023-07-21 14:21:37 -07:00
Xiang Zhang
0790b0515f
implement DynamicQuantizeMatMul (#16757)
### Description
This PR implement
[com.microsoft.DynamicQuantizeMatMul](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftdynamicquantizematmul)


![image](https://github.com/microsoft/onnxruntime/assets/17421593/c8ab927a-5d69-40e5-a08b-79b89becf937)
2023-07-21 14:18:58 -07:00
Jiajia Qin
193415a162
[js/webgpu] reuse buffer for GpuDataManager (#16746)
### Description
<!-- Describe your changes. -->
Allocating new GPUBuffer in every session.run is not efficient. We
should make it only happen in the first run. In the following runs, we
should try to reuse those buffers.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- This PR is for performance.
See mobilenetv2 becomes 9.58 ms from 12.9 ms.
2023-07-21 13:13:01 -07:00
Justin Chu
d79515041c
[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789

Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-07-21 12:53:41 -07:00
Justin Chu
d3295f4329
[Better Engineering] Fix N802 lint errors in tests (#16788)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* #16789
* __->__ #16788

This change fixes the N802 lint errors by renaming the test case to use
snake case.
2023-07-21 09:17:34 -07:00
Adam Pocock
a8e776b78b
[java] Adds support for fp16 and bf16 tensors (#16703)
### Description
The Java API currently only supports fp16 output tensors which it
automatically casts to floats on the way out. This PR adds support for
creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the
output of models, creation from Java short arrays is not supported),
along with efficient methods for casting `FloatBuffer` into
`ShortBuffer` filled with fp16 or bf16 values and vice versa.

The fp16 conversions use a trick to pull in the efficient conversion
methods added to Java 20, falling back to ports of the MLAS methods
otherwise. The Java 20 methods can be special cased by the C2 JIT
compiler to emit the single instruction on x86 and ARM which converts
fp32<->fp16, or the vectorized versions thereof, so they should be quite
a bit faster than the MLAS ported one.

### Motivation and Context
fp16 and bf16 are increasingly popular formats and we've had several
requests for this functionality. Fixes #7003.

cc @yuslepukhin  @cassiebreviu

---------

Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
2023-07-21 21:14:41 +10:00
Dmitri Smirnov
1e18efade5
[C#] Add ML Sequences and Maps Create and Process APIs (#16648)
### Description
1) Added Sequence And Maps convenience APIs to create input Sequences
and Maps
and also visit the outputs.

2) Address OrtValue design issue when the values are created on top of
the
managed memory and the ortValues are used for sequence and maps
creation.
We should retain the original managed instances that keep the memory
pinned.
We opt to keep track of those and dispose of them within an instance of
OrtValue
that represents a Map or a Sequence.

3) Set `LangVersion` to default per [MS Versioning
Docs.](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version)

### Motivation and Context
1) When writing code examples, use of Map and Sequences API proved to be
cumbersome.
2) It is a BUG, that we should address, as the managed memory can move
by the GC and lead to
intermittent crashes.
3) Make use of the most feature of the C#.
2023-07-21 12:58:29 +08:00
Hector Li
4d569f6586
[QNN EP] Op support: LayerNorm, Asin, Sign (#16740)
### Description
Add op support for LayerNorm, Asin, Sign.
Enable QDQ node unit support for Sin Op

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-07-20 20:57:48 -07:00
Xavier Dupré
b508c7236f
Replace call to deprecated torch.norm (#16758)
### Description
torch.norm is deprecated as mentioned in issue #16751. This PR replaces
the call to torch.norm by the options suggested by torch documentation.
2023-07-20 19:52:19 -07:00
kunal-vaishnavi
b7176f9826
Fix bug with saving model optimized by inference session (#16716)
### Description
A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531)
added a temporary directory to save the model optimizations after
loading a model into an `InferenceSession`. Many models that have an
external data file, however, require the data file to be in the same
directory as the ONNX model file. Because the model is saved in a
temporary directory and the data is saved in another directory, this
causes a `FileNotFoundError` error when trying to load the model in the
temporary directory.

This PR fixes this error by saving the external data file in the same
directory that the optimized model is located in.

### Motivation and Context
This PR fixes a bug with using a temporary directory while running the
optimizer for models that have an external data file.
2023-07-20 18:44:28 -07:00
Edward Chen
0f9883f804
Fix Mac M1 build (#16763)
- Add ifndef `__APPLE__` to skip lines which cause EXC_BAD_INSTRUCTION error.
- Fix floatToHalf/doubleToHalf conversion issue and add tests.
2023-07-20 18:24:57 -07:00
Baiju Meswani
538d2412ef
Objective-C Add Support to Create and Query String ORTValues (#16764)
This pull request contains a few changes:

1. Adds support for string ort values.
2. Fixes the training minimal build (that was broken with #16601) by
putting custom op registration behind #ifdefs
3. Fixes the iOS pod package generation (that was again broken with
#16601) by explicitly providing paths to be copied during pod creation.
2023-07-20 17:39:29 -07:00
Adrian Lizarraga
a8c263f92c
[QNN EP] Update QNN SDK to 2.12 (#16750)
### Description
- Updates the default QNN SDK to 2.12 for CI pipelines
- Adds a disabled InstanceNormalization test for regression on QNN SDK
2.12
- Cleans up logs for unsupported ops.

### Motivation and Context
Test with the latest QNN SDK.
2023-07-20 16:22:14 -07:00
Wanming Lin
eaea34f8e2
[WebNN EP] Support PRelu op (#16756) 2023-07-20 10:39:30 -07:00
Jeff Daily
bb136f86c8
[ROCm][MIGraphX] for googletest dep, set OVERRIDE_FIND_PACKAGE (#16715)
Otherwise, an unsupported version of gtest/gmock will be found at
/opt/conda/include for ROCm builds. Though this issue was initially
found for ROCm builds, the issue is generic. onnxruntime requires a
specific version of googletest and should not rely on locating
googletest using find_package.

The ROCm error was:

```
In file included from /opt/conda/include/gmock/gmock-spec-builders.h:75,
                 from /opt/conda/include/gmock/gmock-generated-function-mockers.h:47,
                 from /opt/conda/include/gmock/gmock-function-mocker.h:39,
                 from /opt/conda/include/gmock/gmock.h:61,
                 from /stage/onnxruntime/onnxruntime/test/util/test_utils.cc:17:
/opt/conda/include/gmock/gmock-matchers.h: In instantiation of ‘bool testing::internal::PointwiseMatcher<TupleMatcher, RhsContainer>::Impl<LhsContainer>::
MatchAndExplain(LhsContainer, testing::MatchResultListener*) const [with LhsContainer = const gsl::span<const float>&; TupleMatcher = testing::internal::
FloatingEq2Matcher<float>; RhsContainer = gsl::span<const float>]’:
/opt/conda/include/gmock/gmock-matchers.h:2303:10:   required from here
/opt/conda/include/gmock/gmock-matchers.h:2312:48: error: no type named ‘const_iterator’ in ‘testing::internal::PointwiseMatcher<testing::internal::
FloatingEq2Matcher<float>, gsl::span<const float> >::Impl<const gsl::span<const float>&>::LhsStlContainer’ {aka ‘class gsl::span<const float>’}
```
2023-07-21 00:57:38 +08:00
zesongw
0e40049eb2
[WebNN EP] Add support for Op Pad. (#16732)
### Description
<!-- Describe your changes. -->
Support Op Pad for WebNN EP. It aims to support three modes (constant,
reflect and edge). For now, only constant can be tested with Chrome
Canary.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Support more models like SD1.5-VAE-encode.
2023-07-20 07:57:48 -07:00
Xavier Dupré
2bc9fbb621
Fix url in the code documentation (graph optimizations) (#16770)
### Description
Fix a wrong url in the documentation as mentioned in issue #16678.



### Motivation and Context
Better documentation.
2023-07-20 07:02:22 -07:00
Yi Zhang
c314d7724f
Update dml gpu pool to onnxruntime-Win2022-GPU-dml-A10 (#16765)
### Description
onnxruntime-Win2022-GPU-dml-A10 is using VS2022.



### Motivation and Context
1. Upgrade VS2019 to VS2022 to fix prefast issue.
2023-07-20 16:52:13 +08:00
Scott McKay
8b866060f2
Comment out ORT-Nightly feed in test app NuGet.config (#16762)
### Description
<!-- Describe your changes. -->
Comment out ORT-Nightly feed in NuGet.config to see if that makes the
Secure Supply Chain Analysis CI step happy.

Add info to readme on manually adding feed and using it.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-20 14:30:29 +10:00
Edward Chen
fc1f463ff1
[ios] Enable training package in packaging pipeline (#16683)
Build iOS training package in packaging pipeline.
Refactor iOS packaging pipeline to build different package variants in parallel.
2023-07-19 19:55:00 -07:00