### Description
<!-- Describe your changes. -->
As title.
- Only support constant mode Pad for CoreML EP for now.
- Enable Pad tests for CoreML with inputs as initializer types.
CoreML Spec for reference:
https://apple.github.io/coremltools/mlmodel/Format/NeuralNetwork.html#paddinglayerparams
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fill operator gaps for ClipChamp models.
---------
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Fix reference count for autograd
When PythonOp kernel initialized, `AddPointerScalarArgs` creates
`const_args_` which put all non-tensor references (including
ProcessGroup, string, or other user types) in it.
In kernel's destructor, all ref cnt got decreased for `const_args_`.
```
void PythonOpBase::Clear() {
for (auto ptr : const_args_) {
auto obj = reinterpret_cast<PyObject*>(ptr);
Py_DECREF(obj);
}
}
```
It means, we did not increase cnt, but just decrease cnt. Running the
unit, segmentation fault will be thrown. The simple fix is to remove the
Py_DECREF for those pointer-type constant inputs triggered by kernel
destructor.
NONTENSOR_OBJECT_POINTER_STORE is the place we increase the reference
during export, then the reference will remain until the python program
terminates.
Additionally tunings:
1. Move some logs into verbose instead of warning in case of flooding
training logs.
2. Move pointer type ref holding from python side
(NONTENSOR_OBJECT_POINTER_STORE) to
orttraining/orttraining/core/framework/torch/custom_function_register.h.
Then we use a consistent approach to manage all PythonOp related python
object/methonds ref count increasing and decreasing.
### Description
This PR includes the following changes:
- upgrade js dependencies
- enable STRICT mode for web assembly build.
- corresponding fix for cmake-js upgrade
- corresponsing fix for linter upgrade
- upgrade default typescript compile option of:
- `moduleResolution`: from `node` to `node16`
- `target`: from `es2017` to `es2020`
- fix ESM module import in commonJS source file
## change explanation
### changes to onnxruntime_webassembly.cmake
`-s WASM=1` and `-s LLD_REPORT_UNDEFINED` in latest version is
by-default and deprecated.
### changes to onnxruntime_node.cmake
The npm package `cmake-js` updated its way to find file `node.lib`.
previously it downloads this file from Node.js public release channel,
and now it generates it from a definition file.
The node.js release channel does not contain a windows/arm64 version, so
previously cmake-js will fail to download `node.lib` for that platform.
this is why we made special handling to download the unofficial binary
to build. now this is no longer needed so we removed that from the cmake
file.
### changes to tsconfig.json
`node16` module resolution supports async import and `es2020` as target
supports top level await.
### Description
<!-- Describe your changes. -->
AMX isn't supportted until assembler 2.40 even though the GCC frontend
supports it.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
(1) Allow model to be path, and use infer_shapes_path to fix
https://github.com/microsoft/onnxruntime/issues/15063
(2) Add some logging for float data truncation
(3) Add RandomUniformLike to default op_block_list
(4) Some minor changes to use f string.
### Description
<!-- Describe your changes. -->
Transformer models can handle batch of inputs at once. However,
sequences in a batch usually have different length. Then we have to pad
the short one to have same length as the longest. This is not efficient
especially for large batch with high variance.
This PR introduces a PackedAttention operator which can take in packed
sequences (no padding) and also produces output in packing mode.
There will be another PR to use the PackedAttention to implement the
encoder in packing mode.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Convolution for fp16 datatype. Use NHWC for computation. For NCHW input,
it rearranges the input tensor to NHWC format before computing the
result.
Support two optional fusion:
1. Activation
2. Add (not yet implemented)
### Motivation and Context
Accelerating fp16 inference
### Description
Check the Mac x86_64 packages installation.
### Motivation and Context
To avoid installation error, add packages smoking test before release.
### Description
Patch https://github.com/microsoft/onnxruntime/pull/14767 in order to
make two provider options `force_timing_cache` and `detailed_build_log`
can be updated. Otherwise, they only use default value.
`timing_cache_enable` is good.
### Description
Remove the use of `eval` in test code so we don't (1) use eval and (2)
create "unused" local vars that ruff will remove. Predecessor to #15085
### Description
<!-- Describe your changes. -->
We are introducing the FasterTransfomer model-level integration using
ORT [custom op runtime
wrapper](https://github.com/microsoft/onnxruntime/pull/13427).
In order to make the FT wrapper/integration work, two things need to be
done:
- New API `KernelInfoGetConstantInput_tensor`. (Done in this PR)
During custom op kernel initialization, it needs to get the model
weights (saved as node's constant inputs) ready for FT's weights
instantiation. What's why we need to add this new API to make kernel
info capable of getting constant inputs.
- Custom op and custom op kernel to wrap FT model. (Will provide in
onnxruntime extensions or inference examples)
During custom op kernel initialization, it can fetch attributes from
kernel info to determine which kind of FT model instance create. During
custom op kernel compute/inference, it can get input/output from kernel
context and then assign input/output buffers for model instance to run.
### Description
Add logging APIs for custom ops.
This PR introduces a `OrtLogger` type, which can be retrieved from a
`OrtKernelInfo` or `OrtKernelContext`. The kernel info's logger is the session logger stored
in the execution provider. The kernel context's logger is a run logger.
### Motivation and Context
Allows custom ops to log information in a manner consistent with
built-in ops.
Example usage in custom op:
```C++
struct MyCustomKernel {
MyCustomKernel(const OrtApi& api, const OrtKernelInfo* info) {
Ort::ConstKernelInfo kinfo(info);
this->logger_ = kinfo.GetLogger();
// ...
ORT_CXX_LOGF_NOEXCEPT(this->logger_, OrtLoggingLevel::ORT_LOGGING_LEVEL_ERROR, "Error: %s", err_msg);
}
void Compute(OrtKernelContext* context) {
ORT_CXX_LOG(this->logger_, OrtLoggingLevel::ORT_LOGGING_LEVEL_VERBOSE, "Calling compute...");
// ...
}
// ...
private:
Ort::Logger logger_;
};
```
### Description
Three main changes:
* `qOrdered*` tests fail 100% on Ampere+ cards with cublas error.
Disable them on these cards.
* Bump LayerNormalization tests to opset 17, to be consistent with the
ONNX specification. Mark TRT EP as a provider that does not do error
checking on faulty LayerNorm definitions
* Remove null tensor for optional `bias` input for LayerNorm. Optional
inputs should be omitted entirely.
### Motivation and Context
Streamlines testing for CUDA and TRT with later NVIDIA architectures.
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
### Fix training gpu ci related to pl upgrade
As new version of pln relased, old parameter of pln.Trainer, gpus looks
not supported. So we switch to new params to make it work.
```
['/home/onnxruntimedev/miniconda3/bin/python3', 'orttraining_test_ortmodule_torch_lightning_basic.py', '--train-steps=470', '--epochs=2', '--batch-size=256', '--data-dir', '/mnist']
/home/onnxruntimedev/miniconda3/lib/python3.8/site-packages/torch/onnx/utils.py:1794: FutureWarning: The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead. warnings.warn( Traceback (most recent call last): File "orttraining_test_ortmodule_torch_lightning_basic.py", line 101, in <module> main() File "orttraining_test_ortmodule_torch_lightning_basic.py", line 96, in main trainer = pl.Trainer(**kwargs) File "/home/onnxruntimedev/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/argparse.py", line 69, in insert_env_defaults return fn(self, **kwargs) TypeError: __init__() got an unexpected keyword argument 'gpus'
```
1. Fix Nhwcconv with asymmetric padding. The slice axies are (1,2) with
NHWC layout.
2. For ROCm EP, Move Addbias after SliceOutUnwantedOutputSection,
because before that, the actual output of Conv is s_.y_data.
[//]: # (dependabot-start)
⚠️ **Dependabot is rebasing this PR** ⚠️
Rebasing might not happen immediately, so don't worry if this takes some
time.
Note: if you make any changes to this PR yourself, they will take
precedence over the rebase.
---
[//]: # (dependabot-end)
Bumps [@sideway/formula](https://github.com/sideway/formula) from 3.0.0
to 3.0.1.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5b44c1bffc"><code>5b44c1b</code></a>
3.0.1</li>
<li><a
href="9fbc20a02d"><code>9fbc20a</code></a>
chore: better number regex</li>
<li><a
href="41ae98e042"><code>41ae98e</code></a>
Cleanup</li>
<li><a
href="c59f35ec40"><code>c59f35e</code></a>
Move to Sideway</li>
<li>See full diff in <a
href="https://github.com/sideway/formula/compare/v3.0.0...v3.0.1">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~marsup">marsup</a>, a new releaser for
<code>@sideway/formula</code> since your current version.</p>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [@sideway/formula](https://github.com/sideway/formula) from 3.0.0
to 3.0.1.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5b44c1bffc"><code>5b44c1b</code></a>
3.0.1</li>
<li><a
href="9fbc20a02d"><code>9fbc20a</code></a>
chore: better number regex</li>
<li><a
href="41ae98e042"><code>41ae98e</code></a>
Cleanup</li>
<li><a
href="c59f35ec40"><code>c59f35e</code></a>
Move to Sideway</li>
<li>See full diff in <a
href="https://github.com/sideway/formula/compare/v3.0.0...v3.0.1">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~marsup">marsup</a>, a new releaser for
<code>@sideway/formula</code> since your current version.</p>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Some of our vectorized kernels (including SkipLayerNorm) doesn't check
the alignment of data pointer. While ORT's allocator may guarantee the
alignment, but since training is using PyTorch's allocator, which cannot
guarantee that, we need to add the data pointer check before we call any
vectorized kernel.
This PR is to fix such data pointer alignment issue for SkipLayerNorm's
vectorized kernel. We found this issue when running huggingface's swinv2
model. The PR also refactored the code for SkipLayerNorm kernel to make
it simpler.
### Description
<!-- Describe your changes. -->
Add check on axis to make sure it is in a valid range
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->