onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
Nat Kershaw (MSFT)	bbcf4b45dc	Upgrade doxygen to 1.9.8 (#17525 )	2023-09-12 20:44:27 -07:00
Changming Sun	9b755dce9f	Delete all Prefast tasks (#17522 ) ### Description Delete all Prefast tasks because the new VS 17.7 version crashes every time when we run the task on our CI build servers. However, we cannot reproduce it locally. And this problem blocks us installing security patches to our CI build machines. Will use [CodeQL](https://codeql.github.com/) instead. ### Motivation and Context Address some security alerts.	2023-09-12 17:40:49 -07:00
Yulong Wang	f923eec28b	[js/web] release session after use in npm test (#17470 ) ### Description release session after use in npm test. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 #17470 (this one)	2023-09-12 16:59:13 -07:00
Tianlei Wu	49511b5483	Improve performance of prune_graph in onnx_model.py (#17502 ) During optimization of SDXL UNet, the prune_graph takes up to 5 minutes. The cause is to find a node in all nodes is time-consuming. This optimization will reduce the latency of prune_graph to 2 seconds. New algorithm will use a hash table (key is first node output, value is node) to speed up.	2023-09-12 11:38:36 -07:00
Edward Chen	cf672c5887	Use name of temporary provisioning profile. (#17459 ) The old provisioning profile no longer works. Switched to a temporary one that we can use before a new one is available. The temporary one has a different name.	2023-09-12 10:56:35 -07:00
Chi Lo	aa5e36456a	[TRT EP] Fix multithreading bug of getting the corrupted trt engine instance (#17507 ) Revert to the old TRT EP behavior of securing the whole compute_function by lock_guard. Current TRT EP which only puts lock_guard around a critical section (obvious wrong) inside compute_function. The issue can happen where one thread is updating the engine in compute_function whereas another thread still accesses the stale/corrupted engine instance in compute_function, for example, the code outside the critical section, `int total_bindings = trt_engine->getNbBindings()`. So, make the whole compute_function the critical section should be okay.	2023-09-12 07:37:45 -07:00
Aditya Goel	db558ef9b4	TreeEnsemble speed up (#17449 ) ### Description This PR proposes a change that should speed up inference for the TreeEnsemble* kernels. Previously, when traversing a decision tree, the `TreeNodeElement` pointer would be incremented or decremented to the appropriate child node - I assume this was because the `truenode_inc_or_first_weight` and `falsenode_inc_or_n_weights` member were overloaded for two purposes. In this PR, we now assign the true branch pointer. We also initialise `nodes_` in a pre-order traversal which means that the false branch's position can be resolved statically and does not need to be stored. I observe the following speed ups. The benchmarks used are derived from those in https://github.com/siboehm/lleaves/tree/master/benchmarks and the baseline is the main branch. NYC Dataset -------------- \| Number of threads \| Baseline \| Pointer assignment \| Pre-ordered initialisation \| Pointer assignment % improvement \| Pre-ordered initialisation % improvement \| \|--------------------:\|-----------:\|---------------------:\|-----------------------------:\|-----------------------------------:\|-------------------------------------------:\| \| 1 \| 176.539 \| 155.709 \| 145.119 \| 11.7989 \| 17.7976 \| \| 4 \| 59.9015 \| 51.9652 \| 50.0884 \| 13.2488 \| 16.382 \| \| 8 \| 34.5561 \| 31.3024 \| 28.2535 \| 9.41581 \| 18.2387 \| Airline Dataset --------------- \| Number of threads \| Baseline \| Pointer assignment \| Pre-ordered initialisation \| Pointer assignment % improvement \| Pre-ordered initialisation % improvement \| \|--------------------:\|-----------:\|---------------------:\|-----------------------------:\|-----------------------------------:\|-------------------------------------------:\| \| 1 \| 2127.34 \| 1389.7 \| 920.373 \| 34.6745 \| 56.736 \| \| 4 \| 723.307 \| 481.634 \| 310.618 \| 33.4122 \| 57.0558 \| \| 8 \| 420.722 \| 278.397 \| 185.265 \| 33.8286 \| 55.9651 \| mtpl2 Dataset -------------- \| Number of threads \| Baseline \| Pointer assignment \| Pre-ordered initialisation \| Pointer assignment % improvement \| Pre-ordered initialisation % improvement \| \|--------------------:\|-----------:\|---------------------:\|-----------------------------:\|-----------------------------------:\|-------------------------------------------:\| \| 1 \| 1143.62 \| 1020.04 \| 998.171 \| 10.8055 \| 13.0988 \| \| 4 \| 386.153 \| 339.905 \| 328.061 \| 11.9764 \| 14.3729 \| \| 8 \| 225.995 \| 200.665 \| 199.057 \| 11.2084 \| 13.4408 \| These were run using an M2 Pro with 16GB of RAM. All times are in milliseconds and averages over 10 runs with a batch size of 100,000. ### Motivation and Context Performance improvements.	2023-09-12 10:26:25 +02:00
Arthur Islamov	65249f42e4	[js/web] FP16 Gemm, Softmax & Transpose (#17494 ) ### Description First three OPs to support fp16. Will add more once this gets merged since others depend on changes in js_data_types	2023-09-11 21:09:37 -07:00
Adrian Lizarraga	f20e475e67	[QNN EP] Update QNN SDK to version 2.14.1 (#17467 ) ### Description Updates the version of QNN SDK used by CI Pipelines. Enables some tests fixed by 2.14.1, but still need to look into Resize in a separate PR. ### Motivation and Context Test latest version of QNN SDK.	2023-09-11 21:07:50 -07:00
Patrice Vignola	8ad9ab1b9a	Fix CPU constant folding not reverting the node to its previous EP (#17399 ) A recent change was made in `5a83a67f32` to make `ep_type` a reference instead of having it be a copy, presumably to avoid assigning strings (so `auto& ep_type = node->GetExecutionProviderType()` instead of `auto ep_type = node->GetExecutionProviderType()`). The problem with this change is that calling `node->SetExecutionProviderType(kCpuExecutionProvider)` will change the value of the reference itself, which means that it's impossible to revert the node to its previous EP. This change fixes this bug and adds an optimization over the previous approach by only assigning a string when we know that we are dealing with a non-CPU node.	2023-09-11 17:38:37 -07:00
satyajandhyala	bf6d6961cc	[JS/Web] Added Einsum operator support. (#17401 ) ### Description Added Einsum operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-11 15:57:15 -07:00
Yulong Wang	850baced33	[web] a few updates to web pipeline (#17485 ) ### Description Update the Web CI pipelines: - remove parameter 'WebTemplate': Since we start to support webgpu, the linux-web-ci.yml is no longer working and it is already out-of-date. remove this file and parameter so that we always use win-web-ci.yml - change flag `RunWebGpuTests` into 2 flags, for release and debug. Currently for CI we only run webgpu tests on release build. But we want to have the capability to run webgpu tests on debug build as well. After this PR is merged, next step is to enable both Debug and Release webgpu tests in PostMerge pipeline.	2023-09-11 11:43:42 -07:00
Kaz Nishimura	24f0893d3c	Enable optimize_by_onnxruntime call for float32 unet model (#17483 ) This makes it possible to call `optimize_by_onnxruntime` for float32 unet if `--use_external_data_format` is also used. ### Motivation and Context When using `optimize_pipeline.py` without `--float16`, `optimize_by_onnxruntime` was not called for unet.	2023-09-10 16:50:50 -07:00
Chi Lo	b827ab0efc	[TRT EP] Fix build error for building oss onnx-tensorrt parser (#17468 ) If building ORT TRT with `--use_tensorrt_oss_parse` (meaning ORT wil include [oss onnx-tensorrt parser](https://github.com/onnx/onnx-tensorrt/blob/main/CMakeLists.txt#L82) and build it from source) ,the cmake CUDA_INCLUDE_DIR variable is needed. if not, you will encounter following [ build error](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1133937&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=39e3f98f-7fe5-578c-20bd-5ae5a4590bda): CMake Error: The following variables are used in this project, but they are set to NOTFOUND. Please set them or make sure they are set and tested correctly in the CMake files: /build/Release/_deps/onnx_tensorrt-src/CUDA_INCLUDE_DIR Note: Not quite sure why in the past when CI still tested with oss parser won't hit this issue. probably the CUDA_INCLUDE_DIR was defined somewhere back then.	2023-09-08 20:34:57 -07:00
Yulong Wang	89da5a0108	[js/webgpu] exclude WebGPU reduce_log_sum_exp_* float64 test cases (#17472 ) ### Description as explained in the comments, tests "test_reduce_log_sum_exp_*" on opset17/opset18 are excluded because they use float64. They are passing now because they fallback to CPU. WebGPU does not support f64. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 (this one)	2023-09-08 17:03:04 -07:00
Yulong Wang	550293d9ad	OrtMemoryInfo: support new name "WebGPU_Buffer" (#17469 ) ### Description Add new name "WebGPU_Buffer" to OrtMemoryInfo. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 (this one)	2023-09-08 16:37:35 -07:00
Caroline Zhu	dcc93909b4	Add training WASM generation to Web CI pipeline (#17319 ) ### Description [Successful pipeline run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results) Added flag to build the training artifacts & updated the pull-wasm-artifacts script to pull the training artifacts as well. Bundled into this PR are minor formatting fixes + naming fixes. ### Motivation and Context [This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended the WASM API wrapper to build training WASM artifacts as well. The ORT training WASM artifacts are required to support ORT training web bindings.	2023-09-08 15:49:47 -07:00
Tianlei Wu	29a818caa0	Attention fusion for stable diffusion clip model (#17445 ) Add attention fusion for stable diffusion clip model to improve performance of SD or SDXL	2023-09-08 14:17:14 -07:00
Yulong Wang	4d753b74a5	[js/common] prepare work for supporting webgpu IO binding implementation (#17465 ) ### Description This PR contains a few changes in /js/common/ to support a coming PR for a full implementation of webgpu IO binding. - allows pass-through if value is already a Tensor instance in return value of `handler.run()` called by `InferenceSession.run()` (inference-session-impl.ts). Specifically, onnxruntime-node and onnxruntime-react-native uses native bindings to generate a Tensor-like object so we need to create a real Tensor instance here; for onnxruntime-web the return value is already a Tensor instance. - adds new types for GPU buffer supported types: `'float32'\|'int32'` -> `'float32'\|'float16'\|'int32'\|'int64'\|'uint32'\|'bool'` - exposes types `GpuBufferDataTypes` together with `CpuPinnedDataTypes` and `TextureDataTypes` as exported	2023-09-08 13:49:24 -07:00
Changming Sun	bc84f52633	Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470 ) ### Description Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint to newer versions per request of @ mayeut. He created the following PRs to update the deps: https://github.com/microsoft/onnxruntime/pull/15432 https://github.com/microsoft/onnxruntime/pull/15434 https://github.com/microsoft/onnxruntime/pull/15435 https://github.com/microsoft/onnxruntime/pull/15436 https://github.com/microsoft/onnxruntime/pull/15437 However, our build system needs to fetch the dependencies from an internal mirror that only Microsoft employees have write access to. So I closed his PRs and created this one. This PR also updates abseil to a newer version. This is to prepare for upgrading re2.	2023-09-08 13:35:04 -07:00
Changming Sun	f51a765e64	Avoid calling patchelf (#17365 ) ### Description Resolve #9754	2023-09-08 12:25:16 -07:00
Ashwini Khade	c5dbd5c919	Updates to training pipelines (#17292 )	2023-09-08 11:57:12 -07:00
Yi Zhang	ae74a517b6	Run Nuget_Test_Linux_GPU in container (#17452 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results	2023-09-08 13:41:20 +08:00
Tianlei Wu	7bc6dcecf7	fix embed layer norm fusion with embedding sum output (#17460 ) The embedding sum could be graph output (when exporting with output hidden state enabled). Previously, we only check whether there are multiple children node to decide whether to output embedding sum in fused node. This fix will check if the sum is graph output, we will retain the name.	2023-09-07 22:01:26 -07:00
xhcao	9017ea131b	[js/webgpu] support GreaterOrEqual and LessOrEqual operators (#17310 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-07 17:41:16 -07:00
dependabot[bot]	eaef485461	Bump electron from 23.1.2 to 23.3.13 in /js/web (#17436 ) Bumps [electron](https://github.com/electron/electron) from 23.1.2 to 23.3.13. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/electron/electron/releases">electron's releases</a>.</em></p> <blockquote> <h2>electron v23.3.13</h2> <h1>Release Notes for v23.3.13</h1> <h2>End of Support for 23.x.y</h2> <p>Electron 23.x.y has reached end-of-support as per the project's <a href="https://www.electronjs.org/docs/latest/tutorial/electron-timelines#version-support-policy">support policy</a>. Developers and applications are encouraged to upgrade to a newer version of Electron.</p> <h2>electron v23.3.12</h2> <h1>Release Notes for v23.3.12</h1> <h2>Other Changes</h2> <ul> <li>Fixed a crash while screen sharing on Wayland with PipeWire. <a href="https://redirect.github.com/electron/electron/pull/39274">#39274</a></li> <li>Security: backported fix for CVE-2023-3732. <ul> <li>Security: backported fix for CVE-2023-3728.</li> <li>Security: backported fix for CVE-2023-3730. <a href="https://redirect.github.com/electron/electron/pull/39268">#39268</a></li> </ul> </li> </ul> <h2>electron v23.3.11</h2> <h1>Release Notes for v23.3.11</h1> <h2>Fixes</h2> <ul> <li>Fixed a crash when listing desktop capture sources on Wayland with PipeWire. <a href="https://redirect.github.com/electron/electron/pull/39116">#39116</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/39050">24</a>, <a href="https://redirect.github.com/electron/electron/pull/39051">25</a>, <a href="https://redirect.github.com/electron/electron/pull/39049">26</a>)<!-- raw HTML omitted --></li> </ul> <h2>electron v23.3.10</h2> <h1>Release Notes for v23.3.10</h1> <h2>Other Changes</h2> <ul> <li>Security: backported fix for CVE-2023-3422. <ul> <li>Security: backported fix for CVE-2023-3421.</li> <li>Security: backported fix for CVE-2023-3420.</li> <li>Security: backported fix for 1454860. <a href="https://redirect.github.com/electron/electron/pull/38948">#38948</a></li> </ul> </li> </ul> <h2>electron v23.3.9</h2> <h1>Release Notes for v23.3.9</h1> <h2>Fixes</h2> <ul> <li>Fixed <code>preload</code> script may not run in some child windows opened by <code>window.open</code>. <a href="https://redirect.github.com/electron/electron/pull/38933">#38933</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/38932">24</a>, <a href="https://redirect.github.com/electron/electron/pull/38931">25</a>, <a href="https://redirect.github.com/electron/electron/pull/38930">26</a>)<!-- raw HTML omitted --></li> <li>Fixed minimize button to be visible when all buttons reenabled. <a href="https://redirect.github.com/electron/electron/pull/38880">#38880</a> <!-- raw HTML omitted -->(Also in <a href="https://redirect.github.com/electron/electron/pull/38881">24</a>, <a href="https://redirect.github.com/electron/electron/pull/38879">25</a>)<!-- raw HTML omitted --></li> </ul> <h2>electron v23.3.8</h2> <h1>Release Notes for v23.3.8</h1> <h2>Other Changes</h2> <ul> <li>Security: backported fix for CVE-2023-3215. <ul> <li>Security: backported fix for CVE-2023-3216.</li> <li>Security: backported fix for 1450536. <a href="https://redirect.github.com/electron/electron/pull/38788">#38788</a></li> </ul> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4b782e259b`"><code>4b782e2</code></a> fix: avoid package.json check on built-in modules (<a href="https://redirect.github.com/electron/electron/issues/39426">#39426</a>)</li> <li><a href="`b2047d710c`"><code>b2047d7</code></a> ci: fix hang when validating AppVeyor artifacts (<a href="https://redirect.github.com/electron/electron/issues/39401">#39401</a>)</li> <li><a href="`10b2baea43`"><code>10b2bae</code></a> docs: clean up removed systemPreferences methods (<a href="https://redirect.github.com/electron/electron/issues/39349">#39349</a>)</li> <li><a href="`454990a201`"><code>454990a</code></a> chore: cherry-pick 4 changes from Release-0-M115 (<a href="https://redirect.github.com/electron/electron/issues/39268">#39268</a>)</li> <li><a href="`10b49ffa12`"><code>10b49ff</code></a> chore: cherry-pick 2 changes from webrtc (<a href="https://redirect.github.com/electron/electron/issues/39274">#39274</a>)</li> <li><a href="`dc0fc78fac`"><code>dc0fc78</code></a> fix: do not resolve electron entrypoints on disk (<a href="https://redirect.github.com/electron/electron/issues/39249">#39249</a>)</li> <li><a href="`1aafc2ae38`"><code>1aafc2a</code></a> ci: fail appveyor build if artifacts are missing (<a href="https://redirect.github.com/electron/electron/issues/39219">#39219</a>)</li> <li><a href="`595e25a270`"><code>595e25a</code></a> fix: use StartUpdating method for PipeWire capturer (<a href="https://redirect.github.com/electron/electron/issues/39116">#39116</a>)</li> <li><a href="`7fe5925c94`"><code>7fe5925</code></a> build: disable unneeded depot_tools update on Windows CI (<a href="https://redirect.github.com/electron/electron/issues/39016">#39016</a>)</li> <li><a href="`c4b0ff4994`"><code>c4b0ff4</code></a> chore: cherry-pick 4 changes from Release-3-M114 (<a href="https://redirect.github.com/electron/electron/issues/38948">#38948</a>)</li> <li>Additional commits viewable in <a href="https://github.com/electron/electron/compare/v23.1.2...v23.3.13">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=23.1.2&new-version=23.3.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-07 17:39:49 -07:00
Dmitri Smirnov	21c202bb5d	Eliminate hashmap copies during function inlining (#17439 ) ### Description Eliminate unnecessary HashMap copies. This saves 22% of CPU usage on a reference Dynamo exported model. ### Motivation and Context Our function inlining is currently slow. Before: ![image](https://github.com/microsoft/onnxruntime/assets/11303988/fd38a857-8c12-42ef-9de2-3485123a9fe7) After ![image](https://github.com/microsoft/onnxruntime/assets/11303988/ea65813d-26cb-41dc-ba55-6a609b169767)	2023-09-07 14:08:38 -07:00
Xavier Dupré	024f1dd72b	Fix float 8 rounding on CPU (#16940 ) ### Description Fix float 8 rounding issues discovered in issue #16938 (only CPU provider).	2023-09-07 20:48:25 +02:00
Yi Zhang	0a3eb60b01	Fix Bug: Step failed but not exited with error (#17442 ) ### Description Add "set -ex" in the script. ### Motivation and Context Build failed but it still passed. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1132003&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=39e3f98f-7fe5-578c-20bd-5ae5a4590bda	2023-09-07 14:33:31 +08:00
Changming Sun	b38fb0da06	Revert the yaml file changes in "Nodejs_Packaging_CPU" build job (#17441 ) ### Description The yaml file changes made in #16050 do not really work. Currently the pipeline is failing with error: ``` Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib ``` So, I will revert the yaml changes first to bring the pipeline back. Some people are waiting for our nightly packages. Test run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results ### Motivation and Context	2023-09-06 20:20:55 -07:00
Adrian Lizarraga	1e4bfa1da2	[QNN EP] Add more op unit tests (#17424 ) ### Description Adds more units and enables HTP support for several ops: - Exp - Floor (enable qdq node unit) - Min (enable qdq node unit) - Max (enable qdq node unit) - Neg (enable qdq node unit) - Not - Pow - PRelu (enable qdq node unit) - Relu (Does not work!) - Sigmoid - Sqrt - Tanh - LogSoftmax (enable qdq node unit) - Concat - GlobalAveragePool Still missing (9): - Reshape - Flatten - Squeeze - Unsqueeze - Gemm - Clip - Split - Topk - Tile ### Motivation and Context Increase test coverage and op support	2023-09-06 18:36:09 -07:00
Yi Zhang	ede339f304	Move dotnet build and test into docker in Linux CPU CI (#17417 ) ### Description install dotnet 6.0 in the docker image. move C# build and test into docker. ### Motivation and Context ### Note The Unit tests and Symbolic shape infer's migration will be in another PR.	2023-09-07 09:28:16 +08:00
Changming Sun	7862a521b3	Update cmake's hash in android custom build (#17435 ) ### Description Update cmake's hash in android custom build. It was forgotten in last PR.	2023-09-06 15:26:46 -07:00
Tianlei Wu	e8b8d0d13b	Fix weight tensors in transformers optimizer not saved to external data (#17427 ) Some initializers are added without raw=True flag. That causes those tensors cannot be saved to external data. If those tensors exceed 2GB in total, optimized model cannot be saved due to protobuf limit. This change will save attention weights and bias in raw data. Note: it is optional to use raw data for shape tensor since they are tiny. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/17212 https://github.com/microsoft/onnxruntime/issues/15349	2023-09-06 13:06:19 -07:00
BoarQing	2629cb8606	[VitisAI] graph_save only saves proto of the graph instead of entire model (#17368 ) ### Description <!-- Describe your changes. --> graph_save only saves proto of the graph instead of entire model. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We would like to export a part of a model as a new model for unit test. Therefore, we have to change the API to support such need.	2023-09-06 12:40:48 -07:00
Jian Chen	8914fe687b	[js/webgpu] Include Support for neg.int32 (#17374 ) ### Description Include Support for neg.int32 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-06 12:00:16 -07:00
Edward Chen	a3a1237270	Disable xcpretty filtering of xcodebuild output in iOS packaging pipeline. (#17429 )	2023-09-06 09:04:17 -07:00
Yulong Wang	fa868ca9cd	[js/node] release sessions after use in npm test (#17353 ) ### Description resolve sessions after use in NPM test.	2023-09-05 23:42:32 -07:00
Yulong Wang	d88406a31b	[js/common] use Map instead of object for backends (#17352 ) ### Description resolved https://github.com/microsoft/onnxruntime/security/code-scanning/1140	2023-09-05 23:14:46 -07:00
Yulong Wang	75710f0006	[js/webgpu] add matmul broadcast tests (#17335 ) ### Description Commit `fffefb1c22` (#16969) optimized matmul and also fixes broadcasting. So #17191 is no longer needed. However, the newly added operator test file from the PR by @dakenf is helpful so pick and add it to enhance the tests.	2023-09-05 20:41:46 -07:00
Yulong Wang	110a2d0b73	[build][wasm] add js_internal_api.js to link dependency (#17407 ) ### Description add js_internal_api.js to link dependency. Now changes to js_internal_api.js will correctly trigger re-link of ort-wasm.wasm	2023-09-05 20:40:40 -07:00
Yulong Wang	2cb75420ac	[js/common] clean up JSDoc (#17408 ) ### Description clean up JSDoc for onnxruntime-common: - replace "@internal" to "@ignore" as JSDoc do not use "@internal". Using "@ignore" will let the content not show on the generated doc.	2023-09-05 20:40:23 -07:00
Vincent Wang	deda5db231	[ORTModule] Add Manual Seed to Fix UT Failure (#17411 ) Add manual seed to fix ORTModule UT failure.	2023-09-06 11:24:55 +08:00
James Baker	16eba537a8	rust bindings: Do not unnecessarily re-run build.rs (#17018 ) ### Description Remove unnecessary cargo:rerun-if-changed declaration. ### Motivation and Context 'cargo:rerun-if-changed' declarations tell Cargo when to re-run the build script. The intention is that if the build script depends on other files, then Cargo knows to re-run if those files change. It stores the output and checks it before each build. The intention is that one emits the declarations for _inputs_ of the build. This rerun-if-changed declaration is a declaration on the _output_ of the build, and stores the absolute path of the output. This is not a useful declaration because the output path is unique to the build script - there is no way for anything else to change it. However, this does generate unnecessary rebuilds in some cases, for example if the dependent repository is moved in the filesystem. This causes me some issues when using https://crane.dev, as due to some implementation details, if a crate being moved triggers a rebuild, by default the build is broken. To summarise: - declaration is redundant - causes issues in niche cases.	2023-09-05 19:42:06 -07:00
Changming Sun	c6b0d185b4	Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856 ) ### Description 1. Update docker files and their build instructions. ARM64 and x86_64 can use the same docker file. 2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8 AB#18990	2023-09-05 18:12:10 -07:00
xhcao	026672e947	[js/webgpu] Support slice int32 (#16968 ) Co-authored-by: Xing Xu <xing.xu@intel.com>	2023-09-05 18:05:47 -07:00
Scott McKay	e1a9f2ed6d	Fix insufficient space error in Android CI (#17423 ) ### Description <!-- Describe your changes. --> Remove onnxruntime_test_all from emulator once tests have finished as it's 1.2GB and takes up too much space given the 2GB maximum partition size for the emulator. Side issue is the java build isn't able to strip the binaries in the java apk which causes that to be 800MB (exceeding the 2GB max). That may require an Android/Gradle fix as I don't think we can hardcode an NDK version into our build files. https://issuetracker.google.com/issues/237187538?pli=1 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix Android CI build failures for	2023-09-06 10:12:05 +10:00
petermcaughan	fa28359beb	Reduce GPU memory for Whisper models converted to ONNX (#17378 ) ### Description This PR changes the Whisper export scripts to further optimize the process of removing duplicate initializers from two subgraphs. The current Greedy approach is quicker by a large factor, but results in some duplicate initializers not being caught and removed. This not only results in a slightly larger Whisper model, but also a model that uses more GPU memory. The approach in this PR uses data hashes and caches to keep a quick export but no longer rely on a greedy approach. --------- Co-authored-by: Peter McAughan <petermca@microsoft.com>	2023-09-05 16:24:20 -07:00
Dmitri Smirnov	dbcc60bed5	Introduce output type/shape validation (#17301 ) ### Description Validate outputs type and shapes. Make sure sparse initializers are taken into account. ### Motivation and Context ORT currently does not validate output types or shapes. Further, neither inputs or outputs take into account sparse initializers that are converted from dense. It is currently possible to pre-allocate a wrong type/shape buffer for output. Cc: @Craigacp	2023-09-05 15:25:12 -07:00
Tianlei Wu	8818a99c93	Set proper nvcc threads to avoid OOM (#17419 ) ### Description There are 8 cu files under [flash attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention) and 4 cu files under [cutlass fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha) need a lot of memory to compile. Previously, the default value is same as parallel - number of CPU cores. Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16 nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job). Each thread might take 4 GB on average (peak is around 6GB, but threads are not started at same time). OOM happens since 16 threads might need close to 64 GB in worst case. When build machine has 64GB or larger memory, OOM is rare. Here we set a proper nvcc --threads based on available memory to avoid OOM. ### Motivation and Context Fix `Python Packaging Pipeline (Training Cuda 11.8)`	2023-09-05 10:59:27 -07:00

1 2 3 4 5 ...

9582 commits