Commit graph

9155 commits

Author SHA1 Message Date
G. Ramalingam
4faee2e44c
Fix issue in constant-propagation inside function subgraph (#16330)
### Description

The SequenceMap function-op has a graph-attribute. ORT's
constant-folding optimization may identify constant-expressions inside
the subgraph and promote them to constants, stored as initializers in
the main graph. When it does this, the optimization updates the subgraph
to remove the corresponding nodes.

When we expand a SequenceMap node by inlining its function-expansion, we
need to use this updated subgraph. However, the existing code uses the
original graph-attribute (GraphProto), instead of regenerating it from
the modified subgraph. This results in producing a graph with duplicate
definitions for the constant-folded variable, resulting in an error
during graph-resolve.

This PR fixes this issue (just a single line fix), and adds a test-case
to cover this scenario.

---------

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2023-07-14 14:44:59 -07:00
Wanming Lin
ea43671eb6
[WebNN EP] Support several activation ops (#16693)
Support Elu, HardSigmoid, HardSwish, Softplus, Softsign, Tanh.
2023-07-14 14:36:15 -07:00
Adrian Lizarraga
a189e76fde
[QNN EP] Fix error handling for Softmax/ReduceOps (#16700)
### Description
- Fix check for Softmax with axis attributes not equal to -1. QNN EP
only supports axis values equal to -1 (or rank - 1).
- Explicit error when Reduce* ops have an input with rank > 4 on HTP
backend (unsupported).
- Correctly filter out partitions that only contain a single
QuantizeLinear or DequantizeLinear node.
- Add tests for the above and clean up unnecessary usage of test
description labels.



### Motivation and Context
Make it easier to debug why a model may not be supported.
2023-07-14 13:47:23 -07:00
Baiju Meswani
9889f0f507
Add support for training apis to support custom ops (#16601) 2023-07-14 11:15:51 -07:00
Adrian Lizarraga
19169afe30
[QNN EP] Add option to skip unit tests in the QNN NuGet packaging pipeline (#16164)
Add option to skip unit tests in the QNN NuGet packaging pipeline.
2023-07-14 10:52:05 -07:00
Dmitri Smirnov
853c4ff0a5
[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506)
### Description
Introduce `Float16/BFloat16` support for C# and C++ APIs.
User should be able to perform conversions from `float` to/from
`Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and
whether the number is denormalized.`

### Motivation and Context
User filed issues such as:
https://github.com/microsoft/onnxruntime/issues/14303
2023-07-14 10:46:52 -07:00
Tianlei Wu
77b45c6503
Add Stable Diffusion Benchmark on A100-PCIE-80GB (#16702)
0(1) Fix a bug in https://github.com/microsoft/onnxruntime/pull/16560
that UNet shall be set fp16 flag.
(2) Remove wget in requirements since it is no longer needed.
(3) Add benchmark numbers in A100-PCIE-80GB. Note that CUDA EP have
issue to run in batch size 4 so the number is not added.
2023-07-14 10:37:00 -07:00
Yi Zhang
36b121d8c2
add more check to Web CI on cache restore (#16689)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Make sure the data is correct.
2023-07-14 10:00:13 +08:00
mindest
810512c658
[ROCm] TunableOp: add hipBLASLt tuning logic (#16338)
### Description
- Add hipBLASLt tuning logic in place of default hipBLASLt
implementation;
- add kernel explorer for hipBLASLt.

related operators: Gemm, StridedBatchedGemm, and GemmFastGelu.

Temporarily mark algos that require extra workspace as unsupported.
Will add workspace support in later PR, which will change Gemm Params
def and affect multiple files.
2023-07-14 08:20:58 +08:00
Scott McKay
a3fc04ba74
Fix CodeCoverage pipeline (#16684)
### Description
<!-- Describe your changes. -->
Delete second reference to onnxruntime_api_tests_without_env in the code
coverage commands. One was removed in #16373 and the duplicate wasn't
noticed.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix pipeline.
2023-07-14 07:47:04 +10:00
Yulong Wang
d1d65978f6
[js/web] fix file size trim for wasm only .min.js (#16681)
### Description
fix file size trim for wasm only .min.js

minimal build `ort.wasm.min.js` and `ort.wasm-core.min.js` should
exclude JSEP related source code.
2023-07-13 14:20:51 -07:00
Danny Friar
5de2e2fb76
Call lazy_reset_grad in on-device training docs (#16696) 2023-07-13 13:29:54 -07:00
Dipanjan Sengupta
a461608409
Amx flag removal (#16527)
### Description
1. Replacing AMX intrinsics with machine code macros in QGEMM kernel.
2. Removing AMX build flags for GCC in cmake file.
3. Fixing the link time optimization (LTO) issue introduced with asm
.include of an assembly file.

I have moved the AMX instruction macro definitions from
QgemmU8S8KernelAmxCommon.S to the amx_common.h to fix the LTO issue.
Note that I am also pushing the macros defined in
QgemmU8S8KernelAmxCommon.S for future reference.

A special thanks to @laxmansole who helped in the development of the
instruction macro definitions for AMX intrinsics and fixing the LTO
issue.

### Motivation and Context
The additional AMX flag in cmake adds an extra layer of dependency on
GCC version to use the feature.These changes should allow the usage of
the AMX feature with just the CPU ID check.
2023-07-13 11:19:49 -07:00
Vincent Wang
c07a3b869c
Triton Codegen for ORTModule (#15831)
Fuse connected elementwise and reduce Ops to TritonOp and codegen triton
code to run the kernel.

This PR is co-edited by @wejoncy and @er3x3
2023-07-13 18:17:58 +08:00
Wanming Lin
7cac114e52
[WebNN EP] Support Abs and Neg ops (#16672) 2023-07-13 00:44:22 -07:00
Wanming Lin
d5b76cff60
[WebNN EP] Fixed build error (#16671)
The build break was caused by enabling `-Wshorten-64-to-32` in
https://github.com/microsoft/onnxruntime/pull/16524
2023-07-12 23:37:24 -07:00
mindest
b7fd5af48b
[ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657)
### Description
- Update existing rocBLAS get_solutions API using
`*_get_solutions_by_type` (supported from ROCm5.6); remove the original
nested TunableOp logic.
- Update kernel_explorer.
2023-07-13 11:20:26 +08:00
PeixuanZuo
ebc311365b
[ROCm] Optimize ROCm CI to reduce time (#16620)
This PR mainly optimize ROCm CI test to reduce time and CPU utilization.

- use smaller batch size on strided_batched_gemm/batched_gemm test
- disable cpu training test
- fix test_e2e_padding_elimination Occasional failures on ROCm.
2023-07-13 10:58:03 +08:00
cloudhan
af89496fc7
Allow generic pipeline to accept some params for cross attention (#16519)
Allow `GemmSoftmaxGemmPermuteGenericPipeline<T>` to be used in some
cross attention, that opt for rocblas instead of ck if rocblas is
better to the small problem. The improvement is ~20% e2e time reduction
on some test cases for whisper large.

**Note:** This is because ck has some performance issue if the sequence
length is merely 1, and should be improved in the future.
2023-07-13 09:31:31 +08:00
cloudhan
3866614519
Avoid cmake repeatly printing DISABLE_FLOAT8_TYPES=ON (#16656) 2023-07-13 09:29:20 +08:00
Yi Zhang
f3b40abe29
Use pipeline cache to cache onnx node test data. (#16659)
### Description
Use pipeline cache instead of reading data from the image.


### Motivation and Context
1. To reduce the browser dependency of custom image.
2. The onnx node test data is less than 30M and the cache download time
is very short.
2023-07-13 09:26:27 +08:00
Rachel Guo
111382746e
[js/rn] Add test for validating "executionProvider" options (#16651)
### Description
<!-- Describe your changes. -->

As title.

Validation at JS call level in E2E app is not included. Can cover
together in a separate pr.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Test coverage.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-07-12 14:55:47 -07:00
Ye Wang
dd7d721f3c
support rotary embeddings in decoder masked self-attention (#16556)
### Description
<!-- Describe your changes. -->

This PR adds support for rotary embeddings in decoder masked
self-attention

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-07-12 13:48:48 -07:00
Sheil Kumar
0c956bef0a
[WinML] Fix warnings in OnnxruntimeEngine and OnnxruntimeEngineBuilder (#16679)
Fix [prefast:Warning]: C6101 (in
'_winml::OnnxruntimeEngine::CreateTensorValueFromDefaultAllocator'
Fix [prefast:Warning]: C6101 (in
'_winml::OnnxruntimeEngineBuilder::CreateEngine'

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2023-07-12 13:09:50 -07:00
pengwa
2449ded20f
Use autograd_inlining for model export (#16665)
### Use autograd_inlining for model export

From some versions of PyTorch, there is an issue related to custom
autograd.Function inlining, even though we register custom export
function for the autograd.Function (e.g. when custom autograd function
is enabled).

As an options, PyTorch exporter adds a new flag during export, we can
disable the inline. https://github.com/pytorch/pytorch/pull/104067

Currently the PyTorch change is in nightly built, this PR dynamically
check the torch.onnx.export's signature and decide to use the
`autograd_inlining` when it exists.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-12 20:57:24 +08:00
PeixuanZuo
596dbe277e
[ROCm] add upgrade to fix security issue (#16668) 2023-07-12 17:57:18 +08:00
Yulong Wang
ecca11340a
[js/common] allow creating (u)int64 tensors in 2 ways (#16541)
### Description
allow creating (u)int64 tensors from either a number array or a bigint
array.

before:

```js
// TypeScript think is good, but actually does not work
// runtime error: Uncaught TypeError: Cannot convert 1 to a BigInt
const myTensor1 = new Tensor('int64', [1, 2, 3, 4], [2, 2]);

// runtime good, but TypeScript thinks myTensor2 is a string tensor
const myTensor2 = new Tensor('int64', [1n, 2n, 3n, 4n], [2, 2]);
```

after:
```js
// both work at runtime and TypeScript populates the correct types
const myTensor1 = new Tensor('int64', [1, 2, 3, 4], [2, 2]);
const myTensor2 = new Tensor('int64', [1n, 2n, 3n, 4n], [2, 2]);
```
2023-07-11 21:07:36 -07:00
Aditya Goel
8e393e0b8c
Unique operator with double (#16359)
### Description
The [ONNX
standard](https://github.com/onnx/onnx/blob/main/docs/Operators.md#type-constraints-181)
permits the `Unique` operator to have `double` input tensor element
type, however this was not supported in onnxruntime. This PR enables
this kernel.

### Motivation and Context
The lack of support for `float64` forces users currently to cast to
`float32` instead. This loss of precision can be severely problematic in
feature engineering pipelines downstream of the `Unique` operator. It
would be good to prevent this by updating ORT to reflect the standard
and support `double` input tensors.

---------

Signed-off-by: Aditya Goel <agoel4512@gmail.com>
2023-07-11 20:24:14 -07:00
Edward Chen
1b8d5c43c2
Fix builds (#16646)
- Fix some more `shorten-64-to-32` warnings
- Move minimum build.py Python version back to 3.6
2023-07-11 19:21:25 -07:00
Scott McKay
ce68a4c06a
Fix Linux build failure when onnxruntime_DISABLE_ABSEIL=ON (#16373)
### Description
<!-- Describe your changes. -->
Add ort_value.h to session_options.h so OrtValue is defined. 

Update a unit test binary to add required include paths. Adding
ort_value.h pulls in more data type headers.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#16193
2023-07-12 11:23:18 +10:00
Tianlei Wu
2de5807703
Attention fusion for UNet onnx model export from PyTorch 2.* (#16629)
### Description
Tested with stable diffusion unet models exported by pytorch nightly.

Example to run:
```
cd onnxruntime/python/tools/transformers/
python optimizer.py --input unet.onnx  --output unet_fp16.onnx --model_type unet --float16 --opt_level 0
```
2023-07-11 14:35:48 -07:00
Yulong Wang
b4bf7d5044
[js/web/test] accelerate 'npm test' suite0/1 init time (#16558)
### Description
This change reduces the number of calls to globby functions so that it
accelerates the initialization for 'npm test' with suite0/1 tests from
~14sec to <2sec.
2023-07-11 14:34:40 -07:00
Ti-Tai Wang
72076e5320
Update converter registry usage in orttraining_test_dort_custom_ops.py (#16663)
Fix Orttraining Linux Lazy Tensor CI       

Orttraining Linux Lazy Tensor CI is broken.
The error message is
AttributeError: 'OnnxRegistry' object has no attribute 'register'
2023-07-11 12:03:12 -07:00
satyajandhyala
d41bbac7b9
[Web/JS] Added Expand operator support. (#16577)
### Description
Added Expand operator support.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-11 09:38:16 -07:00
Tommy Au
1b07bbceaa
Update build.bat Prevent spaces in path (#16635)
### Description
<!-- Describe your changes. -->
Simply add double quotes to prevent there is spaces in the path


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
As if there are spaces in path the bat cannot run, error would occurs.
So with a simple double quotes can fix these problems
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-07-11 07:07:08 -07:00
Justin Chu
ad994565ae
Fix type annotation for InferenceSession (#16632)
The Sequence should have been annotated to take a Union type; otherwise
the annotation would be invalid.
2023-07-11 06:34:22 -07:00
dependabot[bot]
617b3a84ba
Bump semver from 5.7.1 to 5.7.2 in /js/react_native/e2e (#16653) 2023-07-11 10:40:19 +00:00
dependabot[bot]
3608592ef5
Bump semver from 5.7.1 to 5.7.2 in /js/react_native (#16652) 2023-07-11 10:39:57 +00:00
pengwa
1ebc5d3879
Log ORTModule initialization overhead (#16529)
### Log ORTModule initialization overhead

When profiling some model for example 

```
 torchrun --nproc_per_node=1 examples/onnxruntime/training/language-modeling/run_mlm.py  --model_name_or_path microsoft/deberta-v3-large --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1  --num_train_epochs 10 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --do_train  --overwrite_output_dir --output_dir ./outputs/ --seed 1137 --fp16 --report_to none --optim adamw_ort_fused  --max_steps 200 --logging_steps 1 --use_module_with_loss

{'train_runtime': 303.8711, 'train_samples_per_second': 0.658, 'train_steps_per_second': 0.658, 'train_loss': 6.569518616199494, 'epoch': 0.09}
100%|200/200 [05:03<00:00,  1.52s/it]
***** train metrics *****
  epoch                    =       0.09
  train_loss               =     6.5695
  train_runtime            = 0:05:03.87
  train_samples            =       2223
  train_samples_per_second =      0.658
  train_steps_per_second   =      0.658


```



The end to end time is 303s (train_runtime=0:05:03.87), but the
ORTModule first step initialization (including export, graph build, etc)
takes about 255s, so when we compare the end to end time for a baseline
ORT with an improved version of ORT, there is no perf gains, since the
x% gains over (303-255) is diluted out among the overall 303s. This is
misleading!

So this PR outputs the ORTModule initialization overhead in the output,
then we can manually compute the real compte time and get the perf
gains.


If the log level is >= WARNING, then only the total end to end time +
export time is logged, otherwise, more details of break down is logged:


![image](https://github.com/microsoft/onnxruntime/assets/10530022/8e34283d-4868-4f22-b65b-9f00d10d8fb7)



![image](https://github.com/microsoft/onnxruntime/assets/10530022/c13bcfad-0d79-483d-a886-e238efcbe657)
2023-07-11 14:11:29 +08:00
mindest
347c963d5c
[ROCm] Add ROCm Triton TunableOp for GroupNorm (#16196)
### Description
- Refactor existing Triton TunableOp-related code (based on work in
#15862)
- Add GroupNorm Triton implementation
2023-07-11 13:55:30 +08:00
Yulong Wang
5b6c1394cb
[js/test] CI: use pre-downloaded testdata in image (#16562)
### Description
update web CI to use pre-downloaded testdata in image
2023-07-10 22:22:14 -07:00
zesongw
53057ec1f5
[WebNN EP] Support InstanceNormalization and GroupNormalization. (#16604)
### Description
Add support for Op InstanceNormalization and GroupNormalization via MeanVarianceNormalization.

### Motivation and Context
Enable more models like Olive'ified SD unet to run on WebNN EP.
2023-07-10 20:51:53 -07:00
pengwa
15cb2f5a8a
Warn the user when nondet kernels are invoked in det mode (#16571)
### Give user warnings if nondeterministic kernels got called when
Deterministic flag is set

When we do accuracy investigation (for example training convergence
issue debug), usually we will set `use_deterministic_compute ` to be
true.

```
 SessionOptions sess_options;
 sess_options.use_deterministic_compute = true;
```

While in recent investigation, it is found GatherElementsGrad kernel
(who used atomic add) generate non-deterministic results, making a
deberta model ouput pretty different loss curve every time we run it
even we fix the seed, remove the dropout ratio, and set
use_deterministic_compute to be true. It turned out to be an expected
problem if we do the add in different order by cuda threads. The order
cannot be guaranteed.

So this PR will give warnings when users set `use_deterministic_compute
`, but some kernels don't have determinstic kernel impl, has to run with
non-determinstic impls. This would at least let users know the results
is not determinstic though that flag is set to be True.


![image](https://github.com/microsoft/onnxruntime/assets/10530022/99ff60f5-21a4-44cf-bf5b-323d698b7147)

Only print the message once in case it floods training logs.
2023-07-11 11:45:47 +08:00
Edward Chen
b4c4e2b594
[objc] Add session options register custom ops with function pointer API (#16603) 2023-07-10 18:54:32 -07:00
Adam Louly
211fe5988e
add steps to write modulewithloss wrapper (#16486)
### Description
This PR includes documentation updates, providing step-by-step
instructions on how to implement the ModuleWithLoss wrapper in a
different codebase.
The documentation outlines the necessary code changes and offers
customization options based on specific requirements.

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-07-11 09:07:35 +08:00
dependabot[bot]
de1b66a25a
Bump tough-cookie from 4.0.0 to 4.1.3 in /js/react_native (#16633)
Bumps [tough-cookie](https://github.com/salesforce/tough-cookie) from
4.0.0 to 4.1.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/salesforce/tough-cookie/releases">tough-cookie's
releases</a>.</em></p>
<blockquote>
<h2>4.1.3</h2>
<p>Security fix for Prototype Pollution discovery in <a
href="https://redirect.github.com/salesforce/tough-cookie/issues/282">#282</a>.
This is a minor release, although output from the <code>inspect</code>
utility is affected by this change, we felt this change was important
enough to be pushed into the next patch.</p>
<h2>4.1.2 -- Patch and Bugfix Release</h2>
<h2>What's Changed</h2>
<ul>
<li>fix: allow set cookies with localhost by <a
href="https://github.com/colincasey"><code>@​colincasey</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/253">salesforce/tough-cookie#253</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/salesforce/tough-cookie/compare/v4.1.1...v4.1.2">https://github.com/salesforce/tough-cookie/compare/v4.1.1...v4.1.2</a></p>
<h2>4.1.1</h2>
<h2>Patch Release</h2>
<h2>What's Changed</h2>
<ul>
<li>fix: allow special use domains by default by <a
href="https://github.com/colincasey"><code>@​colincasey</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/249">salesforce/tough-cookie#249</a></li>
<li>4.1.1 Patch -- allow special use domains by default by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/250">salesforce/tough-cookie#250</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/salesforce/tough-cookie/compare/v4.1.0...v4.1.1">https://github.com/salesforce/tough-cookie/compare/v4.1.0...v4.1.1</a></p>
<h2>4.1.0</h2>
<p>v4.1.0</p>
<p>Minor release, focused mainly on resolving reported issues and some
minor feature work.</p>
<h2>What's Changed</h2>
<ul>
<li>Create CHANGELOG.md by <a
href="https://github.com/ShivanKaul"><code>@​ShivanKaul</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/189">salesforce/tough-cookie#189</a></li>
<li>Missing param validation issue145 by <a
href="https://github.com/medelibero-sfdc"><code>@​medelibero-sfdc</code></a>
in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/193">salesforce/tough-cookie#193</a></li>
<li>Create SECURITY.md by <a
href="https://github.com/ShivanKaul"><code>@​ShivanKaul</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/201">salesforce/tough-cookie#201</a></li>
<li>Create CODE_OF_CONDUCT.md by <a
href="https://github.com/ShivanKaul"><code>@​ShivanKaul</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/200">salesforce/tough-cookie#200</a></li>
<li>Fix for issue <a
href="https://redirect.github.com/salesforce/tough-cookie/issues/195">#195</a>
by <a
href="https://github.com/medelibero-sfdc"><code>@​medelibero-sfdc</code></a>
in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/202">salesforce/tough-cookie#202</a></li>
<li>Add explanation and more special-use domains by <a
href="https://github.com/ShivanKaul"><code>@​ShivanKaul</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/203">salesforce/tough-cookie#203</a></li>
<li>Sync of constructor options for serialization by <a
href="https://github.com/medelibero-sfdc"><code>@​medelibero-sfdc</code></a>
in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/204">salesforce/tough-cookie#204</a></li>
<li>Returned null in case of empty cookie value by <a
href="https://github.com/vsin12"><code>@​vsin12</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/196">salesforce/tough-cookie#196</a></li>
<li>132 str trim not a function by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/209">salesforce/tough-cookie#209</a></li>
<li>Fix for issue <a
href="https://redirect.github.com/salesforce/tough-cookie/issues/153">#153</a>
by <a
href="https://github.com/medelibero-sfdc"><code>@​medelibero-sfdc</code></a>
in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/210">salesforce/tough-cookie#210</a></li>
<li>Fix permuteDomain with trailing dot by <a
href="https://github.com/ruoho-sfdc"><code>@​ruoho-sfdc</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/216">salesforce/tough-cookie#216</a></li>
<li>Issue <a
href="https://redirect.github.com/salesforce/tough-cookie/issues/213">#213</a>
-- added gh-actions flow for building and testing tough-co… by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/218">salesforce/tough-cookie#218</a></li>
<li>Issue <a
href="https://redirect.github.com/salesforce/tough-cookie/issues/210">#210</a>
-- Updated workflow to use npm install. by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/220">salesforce/tough-cookie#220</a></li>
<li>@<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/215">GH-215</a>
-- Tests that document localhost behavior when set as domain. by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/221">salesforce/tough-cookie#221</a></li>
<li>fix: MemoryCookieStore methods should exist on the prototype, not on
the class. by <a
href="https://github.com/wjhsf"><code>@​wjhsf</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/226">salesforce/tough-cookie#226</a></li>
<li>Unit test cases for <code>allowSpecialUseDomain</code> option by <a
href="https://github.com/colincasey"><code>@​colincasey</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/225">salesforce/tough-cookie#225</a></li>
<li>[Snyk] Upgrade universalify from 0.1.2 to 0.2.0 by <a
href="https://github.com/snyk-bot"><code>@​snyk-bot</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/228">salesforce/tough-cookie#228</a></li>
<li>React Native Support by <a
href="https://github.com/colincasey"><code>@​colincasey</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/227">salesforce/tough-cookie#227</a></li>
<li>Adding Updating CODEOWNERS with ECCN as per Export Control
Compliance by <a
href="https://github.com/svc-scm"><code>@​svc-scm</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/223">salesforce/tough-cookie#223</a></li>
<li>fix: domain match routine by <a
href="https://github.com/colincasey"><code>@​colincasey</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/236">salesforce/tough-cookie#236</a></li>
<li>Stop using the internal NodeJS punycode module by <a
href="https://github.com/gboer"><code>@​gboer</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/238">salesforce/tough-cookie#238</a></li>
<li>Initial documentation review by <a
href="https://github.com/mcarey86"><code>@​mcarey86</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/234">salesforce/tough-cookie#234</a></li>
<li>fix: distinguish between no samesite and samesite=none by <a
href="https://github.com/colincasey"><code>@​colincasey</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/240">salesforce/tough-cookie#240</a></li>
<li>Prepare tough-cookie 4.1 for publishing (updated GitHub actions,
move… by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/242">salesforce/tough-cookie#242</a></li>
<li>4.1.0 release to NPM by <a
href="https://github.com/awaterma"><code>@​awaterma</code></a> in <a
href="https://redirect.github.com/salesforce/tough-cookie/pull/245">salesforce/tough-cookie#245</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4ff4d29f6c"><code>4ff4d29</code></a>
4.1.3 release preparation, update the package and lib/version to 4.1.3.
(<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/284">#284</a>)</li>
<li><a
href="12d474791b"><code>12d4747</code></a>
Prevent prototype pollution in cookie memstore (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/283">#283</a>)</li>
<li><a
href="f06b72d1d4"><code>f06b72d</code></a>
Fix documentation for store.findCookies, missing allowSpecialUseDomain
proper...</li>
<li><a
href="b1a8898ee3"><code>b1a8898</code></a>
fix: allow set cookies with localhost (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/253">#253</a>)</li>
<li><a
href="ec707966e6"><code>ec70796</code></a>
4.1.1 Patch -- allow special use domains by default (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/250">#250</a>)</li>
<li><a
href="d4ac5801dd"><code>d4ac580</code></a>
fix: allow special use domains by default (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/249">#249</a>)</li>
<li><a
href="79c2f7d373"><code>79c2f7d</code></a>
4.1.0 release to NPM (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/245">#245</a>)</li>
<li><a
href="4fafc179a7"><code>4fafc17</code></a>
Prepare tough-cookie 4.1 for publishing (updated GitHub actions, move
Dockerf...</li>
<li><a
href="aa4396da7a"><code>aa4396d</code></a>
fix: distinguish between no samesite and samesite=none (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/240">#240</a>)</li>
<li><a
href="b8d751188d"><code>b8d7511</code></a>
Modernize README (<a
href="https://redirect.github.com/salesforce/tough-cookie/issues/234">#234</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/salesforce/tough-cookie/compare/v4.0.0...v4.1.3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tough-cookie&package-manager=npm_and_yarn&previous-version=4.0.0&new-version=4.1.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-10 11:23:24 -07:00
Tianlei Wu
b8f6235f11
Update stable diffusion benchmark for TensorRT EP (#16560)
### Description

Add Stable Diffusion Text2Image pipelines of TensorRT EP and CUDA EP.
They can automatically export and optimize ONNX model, and create
ONNXRuntime session to use TensorRT EP or CUDA execution provider.

Add support for benchmarking TensorRT.

Add support of cuda graph. The feature is only supported in nightly
package right now.

Engine/Provider to test | command line
---- | ---
CUDA EP | `python benchmark.py -v 1.5`
CUDA EP with cuda graph | `python benchmark.py -v 1.5
--enable_cuda_graph`
TensorRT EP | `python benchmark.py -v 1.5 -r tensorrt`
TensorRT EP with cuda graph | `python benchmark.py -v 1.5 -r tensorrt
--enable_cuda_graph`
TensorRT | `python benchmark.py -v 1.5 -e tensorrt`

Add benchmark numbers of T4 GPU using CUDA 11.7, cuDNN 8.5, PyTorch
1.13.1+cu11.7, TensorRT 8.6.1, onnxruntime-gpu 1.15.1 (or
ort-nightly-gpu 1.16 for cuda graph).

TODO: add benchmark numbers of A100-80GB

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-07-10 09:51:03 -07:00
PeixuanZuo
2fd5e1cc39
[ROCm] fix shell bug (#16641)
`set -ex` with `grep` will exit when grep doesn't meet any string.
2023-07-10 17:31:27 +08:00
PeixuanZuo
3b729e5d2f
[ROCm] use cupy for GPU-accelerated computing (#16611)
kernel explorer has lots of tests and need numpy to verify the results
of GPU kernels, it will make CPU utilization very high. This PR use
`cupy ` to replace `numpy` to do compute on GPU to reduce CPU
utilization.

set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to enable cupy.
2023-07-10 17:17:39 +08:00
cloudhan
5fee3f4302
Remove the special min cmake for rocm (#16570)
#15807 fixed the building error for rocm with cmake 3.26. The
specialized relaxation of the cmake version is not needed anymore.
2023-07-10 13:19:48 +08:00