### Description
1. Add valgrind to existing ep_perf CI MemTest and parse ORT-TRT memLeak
details
1. General Valgrind logs and logs related to ORT-TRT will be parsed in
[CI
artifacts](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=334122&view=artifacts&pathAsName=false&type=publishedArtifacts)
1. Logic:
1. Run valgrind with `onnxruntime-perf-test -e tensorrt` and export log
to `valgrind.log`
2. Identify if any `definitely lost` memleak happened
1. For log paragraphs which show `definitely lost`, parse if they have
keyword `TensorrtExecutionProvider`.
2. If so, extract these details to `ort_trt_memleak_detail.log`, and
return `build failure` to EP Perf CI
3. Fix existing addressSanitizer and sync the squeezenet testcase with
latest update from
[ort-inference-example](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/squeezenet/main.cpp)
1. Updates in short: Upgrade main.cpp to be using
OrtTensorRTProviderOptionsV2
4. Reorder the 7-min-MemTest to be ahead of 9-hr-model-tests, and enable
MemTest by default
### Description
This change allows Web CI to do some check as the first step, so that if
there are errors it won't launch the task to build web assembly, which
is heavy.
Checks includes:
- "npm ci" in /js, /js/common and /js/web. this implicitly include:
- typescript compiler in /js
- typescript compiler in /js/common
- webpack build in /js/common
- typescript compiler in /js/web
- ESLint on typescripts
- clang-format formatter (.js, .ts, .cc, .h, .mm)
- Prettier formatter (.json, .jsonc, .md)
---------
Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
1. rename OrtValue.FillStringTensorElement to StringTensorSetElementAt .
To the API user I think we're conceptually setting the string at an
offset in the tensor with is roughly equivalent to `List<string> list
... list[index] = "value"`.
2. While working on new inference examples, I noticed that I am still
inclined to use `DenseTensor` for N-D indexing. Added `GetStrides()` and
`GetIndex()` from strides for long dims, so the user can obtain strides
and translate N-D indices into a flat index to operate directly on the
native `OrtValue` buffers. Expose these functions to the user.
3. Make sure we generate docs for C# public static functions.
### Description
There are currently multiple failures that blocking the CI pipelines so
this PR has all of the fixes in order to make sure it passes the CI.
Otherwise a single fix will still fail the CI.
includes:
#16960#16958
Please help to make sure this PR get merged once CI passed.
@snnn @carzh @guschmue
Fixed:
[AB#18118](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18118)
---------
Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
### Description
Add lsb-release package to android custom build
### Motivation and Context
To fix a build issue:
/workspace/onnxruntime/tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/install_protobuf.sh:
line 27: lsb_release: command not found
### Description
- enable unit test for js/common in CI
- add debug config in js/.vscode/launch.json
- enable source map for js/common/test for debugging purposes; add
source map files to ignore list
- ignore js/common/test folder for npm packaging
### Description
1. As a follow-up of #16761, this PR allows build ORT on iOS/Android
without the need to explicitly specify a protoc path. #16761 is for
WASM. This one is for iOS/Android
2. Update the MacOS/Linux build scripts that build/install protobuf from
source. Make them be more flexible. Add the support for
RedHatEnterprise(ubi), which will needed for upgrading the base image
from centos:7 to ubi:8.
3. Update tools/ci_build/github/pai/rocm-ci-pipeline-env.Dockerfile :
the docker file's base image has preinstalled protobuf in /usr/local, we
should uninstall them to avoid conflicts.
### Description
The `%AGENT_TEMPDIRECTORY%\v11.8` is created in azcopy step.
So, the set env step should be after the azcopy step.
### Motivation and Context
Correct the previous logic
Unify the step since multiple jobs are using it.
### Description
<!-- Describe your changes. -->
Split stages for CPU and CPU+NNAPI builds as CodeQL is enabled at the
stage level.
We run it for CPU+NNAPI as that covers all the Android code.
We don't want to run it for both as duplicate issues would be created
for a problem in code included in both builds.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Remove VS 2019 code.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
These yaml files and docker files are not used by any pipeline. If I
were wrong, feel free to submit a PR to get the wrongly deleted file
back from git history (git keeps everything forever).
### Description
### Motivation and Context
It's also used to upgrade visual studio to VS2022.
onnxruntime-gpu-winbuild-T4 and onnxruntime-gpu-tensorrt8-winbuild-t4
are using the image based on one dev branch and VS2019
To avoid breaking the current CIs, we move jobs running on
onnxruntime-gpu-winbuild-T4/onnxruntime-gpu-tensorrt8-winbuild-t4 to
onnxruntime-Win2022-GPU-T4.
### Description
Support SmoothQuant for ORT static quantization via intel neural
compressor
> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.
### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.
---------
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
### Description
Disable two PERF* rules in ruff to allow better readability. Rational
commented inline. This change also removes the unused noqa directives
because of the rule change.
### Motivation and Context
Readability
Update upload_pod_archive_and_update_podspec.sh to take a pod archive path glob pattern. The actual pod archive path has a version suffix which changes.
Current TRT EP can support model which has nested control flow ops
(multiple level subgraphs). But it fails at a case where the subgraph
has outer scope value that is defined several levels up in the top-level
graph, in this case, the outer scope value is the input of the top-level
graph. The outer scope values are not properly handled during TRT EP's
subgraph reconstruction stage and fails at `graph.resolve()`.
The way ORT gets capability from EPs is a bottom-up approach meaning
inner most subgraph gets handled first. TRT EP reconstructs each
subgraph level by level and following modifications are made to fix the
outer scope values issue:
- `SetGraphOuterScopeValuesAndInputs()` and `SetAllGraphInputs()` are
added to handle outer scope values and add those values as graph inputs
if needed in order to make `graph.resolve()` happy.
- Change to use `GetNodeArgIncludingParentGraphs` so that when creating
the fused TRT node for some subgraphs in`
Graph::CreateFusedSubGraphNode()`, it can get the NodeArgs for outer
scope values from top-level graph.
This PR fixes https://github.com/microsoft/onnxruntime/issues/16217
### Description
1. use the pool with VS2022
2. upgrade System.Memory to 4.5.5
### Motivation and Context
Solve the build error while using VS2022:
`[Failure] Msbuild failed when processing the file
'D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\Microsoft.ML.OnnxRuntime.csproj'
with message: Method not found: 'System.ReadOnlySpan`1<Char>
Microsoft.IO.Path.GetFileName(System.ReadOnlySpan`1<Char>)'`
Ref:
https://stackoverflow.com/questions/73399777/azure-build-failing-due-to-method-not-found-system-readonlyspan1char-micros
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at
bottom):
* __->__ #16789
Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all
existing RUF012 errors which requires mutable class variables to be
annotated with `ClassVar`, as well as all PERF issues.
Signed-off-by: Justin Chu <justinchu@microsoft.com>
This pull request contains a few changes:
1. Adds support for string ort values.
2. Fixes the training minimal build (that was broken with #16601) by
putting custom op registration behind #ifdefs
3. Fixes the iOS pod package generation (that was again broken with
#16601) by explicitly providing paths to be copied during pod creation.
### Description
- Updates the default QNN SDK to 2.12 for CI pipelines
- Adds a disabled InstanceNormalization test for regression on QNN SDK
2.12
- Cleans up logs for unsupported ops.
### Motivation and Context
Test with the latest QNN SDK.
### Description
This PR is includes changes in the documentation of _readmeOV.rst_ file
and also the changes in the dockerfile which enables to build ORT with
latest OpenVINO 2023.0.0
### Motivation and Context
Modified the dockerfile to incorporate the latest version of OpenVINO
(2023.0.0) for building Onnxruntime.
The changes in the PR aim to improve the overall user experience by
providing accurate and up-to-date documentation while leveraging latest
OpenVINO 2023.0.0
### Description
<!-- Describe your changes. -->
MAUI test app with tooling to add model and generated or provided input
test data.
The app will load the model and validate the output. It can also run a
specified number of iterations to provide basic performance information.
<img width="401" alt="image"
src="https://github.com/microsoft/onnxruntime/assets/979079/daf3af13-fb22-4cbb-9159-486b483a7485">
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Primarily to make it easier to test an arbitrary model on iOS. A MAUI
app allows testing on all platforms.
---------
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Delete second reference to onnxruntime_api_tests_without_env in the code
coverage commands. One was removed in #16373 and the duplicate wasn't
noticed.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix pipeline.
This PR mainly optimize ROCm CI test to reduce time and CPU utilization.
- use smaller batch size on strided_batched_gemm/batched_gemm test
- disable cpu training test
- fix test_e2e_padding_elimination Occasional failures on ROCm.
### Description
Use pipeline cache instead of reading data from the image.
### Motivation and Context
1. To reduce the browser dependency of custom image.
2. The onnx node test data is less than 30M and the cache download time
is very short.
### Description
<!-- Describe your changes. -->
Add ort_value.h to session_options.h so OrtValue is defined.
Update a unit test binary to add required include paths. Adding
ort_value.h pulls in more data type headers.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#16193
- Move ROCm build step on CPU only machine
- Add the performance data of the huggingface bert-large model on the
MI200
- At the beginning of the test step, check the agent's GPU usage and
kill the threads occupying the GPU, which may be left over from previous
tasks that exited abnormally.
- Use different docker images during the build and test steps. The
difference is the `uid` and `user` when build docker image and create
docker container.
### Description
Windows GPU Reduced Ops CI Pipeline is broken due to the introduction of
a second template type in registered kernels. The python code checking
the registration is broken due to that. This PR addresses this issue on
the python side by keeping only one type equal to the concatenation of
the two types.
- Fix some warnings from Xcode build (`-Wshorten-64-to-32`).
- Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet.
- Some clean up in build.py including setting CMake generator more consistently.
- Update some documentation comments.
- Use onnxruntime_training.h as the umbrella header so training API docs are included in generated docs.
- Fix static analysis build.
### Description
<!-- Describe your changes. -->
Split out the more basic changes from #15552 for easier review.
Re-organize to clarify the structure
- Separate out generic base functionality from ORT specific components
- pass in handlers for internal ORT ops to Optimize
- Split out layout transformation from transpose optimization
- Separate out level 1 transpose optimizer
- Cleanup some naming to try and clarify things like an optimizer vs.
general optimization code
Most of the changes are from this movement of code.
Two implementation changes:
- the extended handlers are queried first in GetHandler
- allows the extended handlers to override the default behaviour for an
ONNX operator
- simplify the Optimize function to remove OptimizerMode.
- `can_modify_node` is used instead of `mode` and
`ignore_assigned_nodes` and a long description of the current usage is
added. I don't _think_ that changes the current behavior and hopefully
clarifies what happens and when, and makes the base transpose optimizer
implementation more generic.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Create a cleaner separation to support adding EP specific logic next to
cleanly handle where an EP has additional layout sensitive behaviour
required (e.g. it's Resize implementation only handles one layout).