onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-22 02:30:26 +00:00

Author	SHA1	Message	Date
Tianlei Wu	4beca149a3	Cherry-pick 1.17.3 round 3 (#20195 ) ### Description Bring the fix for DML to 1.17.3 to resolve an issue https://github.com/microsoft/onnxruntime/issues/20180 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Lei Cao <leca@microsoft.com>	2024-04-04 19:01:41 -07:00
Rachel Guo	a61add2a29	Cherry pick 1.17.3 - Round 2 (#20178 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>	2024-04-03 14:14:41 -07:00
Yulong Wang	45ff957973	1.17.3 cherry-picks for ORT Web changes (#19926 ) ### Description This PR is a preview of cherry-picks for ort-web to `rel-1.17.3` based on `rel-1.17.2`. <details> <summary>Changes of ort-web to cherry-pick</summary> The following commits are from main branch. `o` stands for pick, and `x` stands for skip. ``` o `2e0a388c36` [js/webgpu] Add HardSigmoid support (#19215) o `d226e40856` [js/webgpu] set query type in onRunStart (#19202) o `61610ff986` [js/webgpu] Add FusedConv clip test case (#18900) o `a33b5bd1fa` [JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788) o `591f90c0b9` [js/webgpu] Fix issue of timestamp query (#19258) o `7252c6e747` [WebNN EP] Support WebNN async API with Asyncify (#19145) o `5b06505073` [js/webgpu] Fix Tanh explosion (#19201) o `656ca66186` [js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753) o `a3f0e2422b` [js/webgpu] Support f16 uniform (#19098) o `9e69606360` fix f16 for attention, enable slice and flatten for more types (#19262) o `624b4e2063` [js/webgpu] Remove enableShapesUniforms (#19279) o `90883a366a` [js/webgpu] Add hardSigmoid activation for fusedConv (#19233) o `85cef0af8c` [js/webgpu] Support capture and replay for jsep (#18989) o `d73131cf0f` [js/webgpu] Use DataType as uniform cpu type (#19281) o `dd1f6ccc45` [js/webgpu] resolve codescan alert (#19343) o `3a2ab1963a` [js/webgpu] Refactor createTensorShapeVariables (#18883) o `efc17e79de` [js/webgpu] Fix the undefined push error (#19366) x `50806a7dd5` [js/web] support external data in npm test (#19377) o `ccbe264a39` [js/webgpu] Add LeakyRelu activation for fusedConv (#19369) o `5ff27ef02a` [js/webgpu] support customop FastGelu (#19392) x `03be65e064` [js/web] fix types exports in package.json (#19458) o `06269a3952` [js/webgpu] allow uint8 tensors for webgpu (#19545) o `dfeda9019c` [JS/WebGPU] Add MatMulNBits (#19446) o `1b48054e1b` [js/webgpu] Create Split indices helpers by rank, not by shape (#19554) o `3fe2c137ee` [js] small fix to workaround formatter (#19400) x `70567a4b3a` [js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358) o `6e04e36e3f` [js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317) o `58f4921686` [js] changes to allow Float16Array if any polyfill is available (#19305) o `57d6819212` [js/web] Fix fused-conv is not included in npm test (#19581) o `ebd220b073` Misspelling in README.md (#19433) o `38c3432393` Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582) o `fe82fccf1a` [js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596) o `76a2a487a1` Bump ip from 1.1.8 to 1.1.9 in /js/react_native/e2e (#19583) o `29b1106033` [node] Switch to setImmediate to avoid starving the Node.js event loop (#19610) o `ae3d73c981` [JS/WebGPU] Fix Split and Where to handle corner cases. (#19613) o `aec2389ad0` [js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614) o `bb43a0f133` [js/webgpu] minor fixes to make tinyllama work (#19564) o `0edb035808` [js/web] fix suite test list for zero sized tensor (#19638) o `3cb81cdde2` [js/common] move 'env.wasm.trace' to 'env.trace' (#19617) o `e30618d055` [js/webgpu] use Headless for webgpu test by default (#19702) o `f06164ef8b` [js/web] transfer input buffer back to caller thread (#19677) x `a788514027` [js/web] dump debug logs for karma for diagnose purpose (#19785) o `24b72d2613` [JS/WebGPU] Preserve zero size input tensor dims. (#19737) o `4538d31a8b` [js/webgpu] expose a few properties in WebGPU API (#19857) o `53de2d8cb0` [js/webgpu] Enable GroupedConvVectorize path (#19791) o `ed250b88c3` [JS/WebGPU] Optimize MatMulNBits (#19852) x `e771a763c3` [js/test] align web test runner flags with ort.env (#19790) o `79e50aeef3` [js/web] rewrite backend resolve to allow multiple EPs (#19735) o `acb0df2280` Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932) o `b29849a287` [js/common] fix typedoc warnings (#19933) o `afdab62f53` Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949) o `28ad6c3955` Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951) o `7e0d424934` accumulate in fp32 for Reduce* (#19868) o `4c6a6a37f7` [js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387) o `01c7aaf6aa` [js/webgpu] allow setting env.webgpu.adapter (#19940) o `c45cff60cf` [js/webgpu] fix maxpool / fp16 (#19981) ``` </details> <details> <summary>Cherry-pick commandlines</summary> ```sh git cherry-pick `2e0a388c36` git cherry-pick `d226e40856` git cherry-pick `61610ff986` git cherry-pick `a33b5bd1fa` git cherry-pick `591f90c0b9` git cherry-pick `7252c6e747` git cherry-pick `5b06505073` git cherry-pick `656ca66186` git cherry-pick `a3f0e2422b` git cherry-pick `9e69606360` git cherry-pick `624b4e2063` git cherry-pick `90883a366a` git cherry-pick `85cef0af8c` #<<<<< Note: conflicts git cherry-pick `d73131cf0f` git cherry-pick `dd1f6ccc45` git cherry-pick `3a2ab1963a` git cherry-pick `efc17e79de` git cherry-pick `ccbe264a39` git cherry-pick `5ff27ef02a` git cherry-pick `06269a3952` git cherry-pick `dfeda9019c` git cherry-pick `1b48054e1b` git cherry-pick `3fe2c137ee` git cherry-pick `6e04e36e3f` git cherry-pick `58f4921686` git cherry-pick `57d6819212` git cherry-pick `ebd220b073` git cherry-pick `38c3432393` git cherry-pick `fe82fccf1a` git cherry-pick `76a2a487a1` git cherry-pick `29b1106033` git cherry-pick `ae3d73c981` git cherry-pick `aec2389ad0` git cherry-pick `bb43a0f133` git cherry-pick `0edb035808` git cherry-pick `3cb81cdde2` git cherry-pick `e30618d055` git cherry-pick `f06164ef8b` git cherry-pick `24b72d2613` git cherry-pick `4538d31a8b` git cherry-pick `53de2d8cb0` git cherry-pick `ed250b88c3` git cherry-pick `79e50aeef3` git cherry-pick `acb0df2280` git cherry-pick `b29849a287` git cherry-pick `afdab62f53` git cherry-pick `28ad6c3955` git cherry-pick `7e0d424934` git cherry-pick `4c6a6a37f7` git cherry-pick `01c7aaf6aa` git cherry-pick `c45cff60cf` ``` </details> <details> <summary>Cherry-pick conflicts</summary> - `85cef0af8c` #18989 this change is for enabling graph capture feature for JSEP, and it is done after ROCM EP enabled graph capture feature. However, the ROCM EP graph capture feature is not cherry-picked in rel-1.17.2. </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Jiajia Qin <jiajia.qin@intel.com> Co-authored-by: Xu Xing <xing.xu@intel.com> Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com> Co-authored-by: Yang Gu <yang.gu@intel.com> Co-authored-by: Wanming Lin <wanming.lin@intel.com> Co-authored-by: Jiajie Hu <jiajie.hu@intel.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Matttttt <18152455+martholomew@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Segev Finer <segev208@gmail.com> Co-authored-by: Belem Zhang <belem.zhang@intel.com>	2024-03-29 13:13:39 -07:00
Rachel Guo	046d06ff26	Cherry-pick for 1.17.3 (#20013 ) ### Description <!-- Describe your changes. --> Web prs are not included yet. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Maximilian Müller <44298237+gedoensmax@users.noreply.github.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Your Name <your@email.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: enximi <70036307+enximi@users.noreply.github.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Markus Tavenrath <mtavenrath@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Adam Pocock <adam.pocock@oracle.com> Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>	2024-03-29 13:10:13 -07:00
Yulong Wang	dbf1a8cd39	Rebase rel-1.17.3 (#20006 ) ### Description the release branch `rel-1.17.3` is created based on `rel-1.17.2` last week. However, there are latest code change merged into `rel-1.17.2`: #19897. The branch `rel-1.17.3` is protected so no push or delete can be performed on it. This PR cherry-picks the commit `633c22f` based on `6bc6adc` to make sure the base of `rel-1.17.3` matches `rel-1.17.2`. @snnn @pranavsharma This operation will ensure the code base contains same code, but the git history will not be exactly same. If you want it to be exactly same, I need your help to do a git rebase or delete and recreate the branch. Co-authored-by: Rachel Guo <35738743+YUNQIUGUO@users.noreply.github.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: George Wu <jywu@microsoft.com>	2024-03-21 18:08:11 -07:00
Rachel Guo	6bc6adc658	Update version number to 1.17.2 (#19701 ) ### Description <!-- Describe your changes. --> As title. Follow up pr for source code release 1.17.2 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Changming Sun <chasun@microsoft.com>	2024-03-01 13:51:00 -08:00
Rachel Guo	3a4599792d	Cherry pick for 1.17.2 source code release (#19679 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Markus Tavenrath <mtavenrath@users.noreply.github.com>	2024-02-28 10:20:01 -08:00
Rachel Guo	8f5c79cb63	Update 1.17.1 patch release version (#19622 ) ### Description <!-- Describe your changes. --> Need to update patch release version. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-02-23 16:10:36 -08:00
Rachel Guo	75968b9eca	Cherry-pick for 1.17.1 patch release (#19477 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: petermcaughan <peter.mcaughan@gmail.com> Co-authored-by: Peter McAughan <petermca@microsoft.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Patrice Vignola <vignola.patrice@gmail.com> Co-authored-by: ivberg <ivberg@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com> Co-authored-by: Sheil Kumar <smk2007@gmail.com> Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Prathik Rao <prathik.rao@gmail.com> Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>	2024-02-21 12:51:37 -08:00
Rachel Guo	5f0b62cde5	[ORT 1.17.0 Release] Cherry-pick Final Round (#19327 ) ### Description <!-- Describe your changes. --> Cherry-pick Final Round ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com>	2024-01-30 16:51:05 -08:00
Rachel Guo	3fd94a8cc7	[ORT 1.17.0 Release] Cherry pick 1st round (#19243 ) ### Description <!-- Describe your changes. --> [ORT 1.17.0 Release] Cherry pick 1st round PR authors please take a look, and let me know if there are any questions about the changes or approve accordingly. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: wejoncy <wejoncy@163.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: luoyu-intel <yu.luo@intel.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Heflin Stephen Raj <heflinstephen03@gmail.com> Co-authored-by: Yifan Li <109183385+yf711@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2024-01-26 20:11:48 -08:00
Adrian Lizarraga	daafe63ecc	cherry pick qnn sdk 2.18 updates into release branch (#19197 ) cherry picked from commit `28a16c223c` https://github.com/microsoft/onnxruntime/pull/19129	2024-01-19 17:04:47 -08:00
Rachel Guo	a63b71eadb	Cherry-pick "Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187 )" (#19194 ) ### Description Cherry-pick "Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187)"	2024-01-18 13:44:48 -08:00
Patrice Vignola	80f274ca6f	Fix SkipLayerNormalization shape inference (#18724 ) SkipLayerNorm has more than one input, so `propagateShapeAndTypeFromFirstInput` is not enough.	2024-01-16 09:42:59 -08:00
Changming Sun	e2e488d6f8	Revert "iOS packaging pipeline stability" (#19135 ) Reverts microsoft/onnxruntime#19097 because it broken Android CI pipeline.	2024-01-16 09:18:35 -08:00
Jian Chen	c92f72ebeb	Merge Linux Nuget GPU pipeline with zip-nuget (#19120 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-16 08:59:03 -08:00
Jeff Bloomfield	8d4369b77e	Update DirectML nuget version to 1.13.1 (#19122 ) ### Description Update DML version to 1.13.1 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-15 19:04:41 -08:00
Wanming Lin	1bab98988b	[WebNN EP] Fixed bug in int8 data type processing (#19134 )	2024-01-15 18:44:25 -08:00
Guenther Schmuelling	9dee543bed	fix gemm beta for fp16 (#19153 ) per onnx spec beta is always fp32 so we need to cast it	2024-01-15 18:40:38 -08:00
Jeff Bloomfield	9f87c5c41d	Fix build error due to merge with DML adapter enumeration macro defined (#19121 ) ### Description Fix build error when ENABLE_NPU_ADAPTER_ENUMERATION is defined ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-15 17:10:58 -08:00
pengwa	1150b1f81e	ORTModule memory improvement (#18924 ) ## Dependency https://github.com/microsoft/onnxruntime/pull/19007 ## ORTModule memory efficient gradient management Previously I have tried to solve the coarsed-grained gradient accumulation/update problem in ORTModule with https://github.com/microsoft/onnxruntime/pull/8979, while that resolution somehow is not fully validated with DDP or there is user hooks on the gradient accumulation on torch parameter. This PR is addressing the problem in the similar approach as PR 8979, e.g. trigger gradient accumulation once ORT computed the grad, but instead of use a AccumulateGrad op, this time with a ONNX operator PythonOp, internally it will call param.backward(grad), which will help handle all related hooks correctly. ## Design Check the details from https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ ## Convergence Validation: ![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe) differences are on mostly 0.000x, sometimes 0.00x, which may comes from the different order gradient apply happens before or after this change (on deepspeed zero stage 2) ## TODO Consolidate the logic with Stage3's similar logic.	2024-01-16 08:57:37 +08:00
Adam Pocock	191525301f	[java] Updating TensorInfo so it contains the named dimensions (#18962 ) ### Description The Java `TensorInfo` object which is used to describe a tensor's shape, along with the input and output placeholders for a model couldn't show any symbolic/named dimensions in that tensor. Now this information is stored in Java strings on construction and included in the toString. ### Motivation and Context Setting symbolic dimensions required external information in Java, the names were not discoverable from within the API.	2024-01-15 14:42:50 -08:00
Ben Niu	a97199c62d	Fix Arm64EC build for test_q4qdq.cpp (#18523 ) ### Description Fix ifdef guards in test_q4qdq.cpp to exclude code blocks intended only for native x64 compilation instead of x64 + Arm64EC.	2024-01-15 14:29:19 -08:00
Yi Zhang	922a2f00e3	Extend timeout in Nuget-CUDA-Packaging-Pipeline (#19138 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Linux_GPU_x64 job in the pipeline has been canceled due to timeout since 0112.	2024-01-15 14:37:22 +08:00
Scott McKay	b2ce3eedb9	Fix build error for CoreML Split op (#19099 ) ### Description <!-- Describe your changes. --> The `split` input of the Split op is int64_t. Fixing that resolves a type mismatch build error on Windows when CoreML is enabled (for debugging the partitioning code). ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix build error --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-01-15 15:09:49 +10:00
Adam Pocock	71657d1eb8	[java] Fix double close (#19133 ) ### Description The `OnnxValue` and `OrtProviderOptions` implementations now check to see if they've been closed before accessing the native pointer, and also before close is called. ### Motivation and Context Before they could be closed twice which SIGSEGV'd the JVM. Fixes #19125.	2024-01-14 14:53:26 -08:00
Jian Chen	c3ce9df80c	Disabling python3.12 on training python packaging pipleines (#19123 )	2024-01-14 14:51:00 -08:00
Jian Chen	76797127d6	Always download cuda and trt libraries from Azure blob (#19118 ) ### Description This way, we will not need to update the windows images constantly and allow more flexibility to choose the cuda version in the future.	2024-01-14 11:37:26 -08:00
Changming Sun	bb4011b2b1	Set default flags nvcc and do not set default compile flags for ROCM EP (#19124 ) ### Description Set default flags nvcc and do not set the flags for ROCM EP. ### Motivation and Context 1. To meet a BinSkim requirement for CUDA EP. https://github.com/microsoft/binskim/blob/main/docs/BinSkimRules.md#rule-BA2024EnableSpectreMitigations 2. The ROCM EP's pipeline is broken since PR #19073 . Unit tests failed to load the EP with the following error message: Failed to load library libonnxruntime_providers_rocm.so with error: /build/Release/libonnxruntime_providers_rocm.so: undefined symbol: vtable for onnxruntime::InsertMaxPoolOutput . This PR is a hot fix to bring the pipeline back. So far I don't know why the error happened. The symbol "InsertMaxPoolOutput" is in onnxruntime_optimizers. I don't see any EP code references it directly.	2024-01-14 11:36:49 -08:00
Yulong Wang	f917dde717	[web] remove xnnpack from web backends (#19116 ) ### Description XNNPACK is already disabled in web assembly build. This change removes the xnnpack backend registration in JS.	2024-01-13 23:04:02 -08:00
Edward Chen	e1e45901e2	iOS packaging pipeline stability (#19097 ) - Remove protoc build step which sometimes times out. Download protoc instead. - Use macOS-12 image in the set variables stage. It seems more stable.	2024-01-13 19:27:44 -08:00
Changming Sun	5558912d7b	Disable ccache in Windows CPU CI pipeline (#19131 ) ### Description Disable ccache for all the jobs in in Windows CPU CI pipeline. Before disabling it, the build has a warning that: "MSIL .netmodule or module compiled with /GL found; restarting link with /LTCG; add /LTCG to the link command line to improve linker performance" After disabling it, the warning is gone and the build doesn't use /GL or /LTCG. Cache itself should not cause this difference. ### Motivation and Context	2024-01-13 18:40:43 -08:00
Adrian Lizarraga	65893ef382	Add --parallel to QNN EP NuGet pipeline build command (#19126 ) ### Description Add --parallel to QNN EP NuGet pipeline build command ### Motivation and Context Improve build times for pipeline.	2024-01-13 02:38:40 -08:00
Yang Gu	e803f8eb0f	[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894 ) We submit kernels in a batch (a fixed number 16 is used except for the last batch) for better performance. However, timestamp query support is at pass level so we disable the batch execution in profiling mode in previous implementation. Actually we can have multiple passes in a batch so that we don't have to disable batch execution, which is the first enhancement of this PR. Furthermore, WebGPU has an extension to support timestamp query inside passes, which isn't supported by all the platforms (e.g., Windows supports it, while macOS doesn't). This is expected to have lower cost compared with multiple passes solution. So this PR also introduce this support when available. This PR also refactors some implementation related to kernelInfo, and try to unify the related kernel names.	2024-01-13 00:23:17 -08:00
Jian Chen	78e796bb27	Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. (#19096 ) ### Description Fixing issue where unzip package froim 'onnxruntime-win-x64-gpu' was also uploaded. For example, https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=396440&view=artifacts&pathAsName=false&type=publishedArtifacts	2024-01-12 22:30:43 -08:00
Yulong Wang	07cfc56538	[js] enable external data loading for ort-web (#19087 ) ### Description enable external data loading for ort-web. ### Why The ORT external data design is highly depending on the file system, especially synchronous file I/O APIs. Those are not available in web platforms. We need to have extra code to make external data working on web. ### How Considering there is no file system in web, an implementation for web to support external data is to use pre-loaded data. Assume model file a.onnx includes initializers that linked to ./b.bin, we require users to pass a full data file list when creating the session. The user code will be look like: ```js const mySess = await ort.InferenceSession.create('./path/model/a.onnx', { // session options externalData: [ { // relative or absolute path/URL of the file, // or a pre-loaded Uint8Array containing the data of the external data file data: './path/data/b.bin', // the relative path of the external data. Should match initializers' "location" value defined in the model file path: './b.bin' }, // { } if multiple external data file ] }); ``` Currently, this feature only works with JSEP build enabled.	2024-01-12 19:24:24 -08:00
Jian Chen	e5eacc6d11	Fix cuda-packaging-pipeline.yml (#19115 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-12 19:09:25 -08:00
Hector Li	62a4e9103e	Add extreme_power_saver for htp_performance_mode (#19111 ) ### Description Add extreme_power_saver mode for htp_performance_mode	2024-01-12 19:07:02 -08:00
Yifan Li	443aeb851c	[TensorRT EP] Customizable engine cache prefix (#19083 ) ### Description <!-- Describe your changes. --> Add new option `trt_engine_cache_prefix` to customize TRTEP engine cache prefix. i.e: - If user specifies `trt_engine_cache_prefix\|FRCNN trt_engine_cache_enable\|true` when running FRCNN model - the cache will be saved/loaded: `FRCNN_2068723788287043730__sm80.engine`. Engine profile follows same pattern. - If skipping this option, the engine will be saved/loaded: `TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_2068723788287043730__*_sm80.engine` as default case. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/16708 --------- Co-authored-by: Chi Lo <Chi.Lo@microsoft.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>	2024-01-12 18:10:05 -08:00
Edward Chen	150c4cb8fe	[MLAS AArch64] SQNBitGemm CompInt8 kernel (#18953 ) Implement ARM NEON SQNBitGemm kernel that first block quantizes A to int8 and then does int8 multiplication.	2024-01-12 17:58:08 -08:00
Guenther Schmuelling	a756017e9f	[js/webgpu] more fixes for access above 2GB (#19065 ) when jsep calls javascript with an index to HEAP8 or HEAP32 the index is negative when the heap is above 2GB, even if we pass it as uint32_t it remains negative. So in javascript use >>> 0 to make it unsigned.	2024-01-12 17:47:37 -08:00
Adrian Lizarraga	8deeba3ad0	[Quantization] Fix get_qnn_qdq_config to use new scale/zp np.array data types (#19114 ) ### Description - Updates `get_qnn_qdq_config()` to use new scale/zp np.array data types. - Adds missing unit test to help prevent future regression. ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/18043 changed the usage of `extra_options["TensorQuantizationOverrides"]`. We need to update its use in quantization/execution_providers/qnn/quant_config.py	2024-01-12 17:02:32 -08:00
Guenther Schmuelling	96dbac6e4b	update to emsdk-3.1.51 (#18844 )	2024-01-12 16:04:33 -08:00
Scott McKay	8f2e57f5d0	Make session configuration options available to kernels via OpKernelInfo (#18897 ) ### Description <!-- Describe your changes. --> Pass through the ConfigOptions from the session via OpKernelInfo so that kernel behavior can be configured. Initial usage would be to optionally enable a fast path for ARM64 bloat16 GEMM - see #17031 Other usages could be things like selected the exact implementations of the activation functions for RNN operators instead of the default approximations (e.g. use [sigmoid_exact instead of sigmoid](`2d6e2e243d/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h (L379-L382)`)) OpKernelInfo is already passing through things from the session state, and adding a new member of ConfigOptions is the simpler update. It's also a more natural fit given it's providing state/info to the kernel. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-13 10:02:43 +10:00
Jiangzhuo	a503561d0c	[js] using OffscreenCanvas when DOM is not available (#19033 ) ### Description when DOM API is not avaiable, using OffscreenCanvas ### Motivation and Context In some environment like service worker or web worker, the DOM API is not avaiable, we can use OffscreenCanvas API to replace `document.createElement('canvas')`. Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same, except that `toDataUrl` is missing. It fix this issues #19032	2024-01-12 13:54:05 -08:00
Guenther Schmuelling	4a5f13b681	fix resize for fp16 (#19110 ) resize for fp16 has 2 issues: scales are always f32 and roi can be f32 or f16. scales: this is fixed. roi this is fixed for the case where roi is not passed as optional input with f16. To fix this it requires a much larger change and I did not want to risk this short before a release. For all practical purpose passing roi as input with f16 should be rare and we can fix it in the near future.	2024-01-12 13:44:28 -08:00
Caroline Zhu	4dbaa73738	[js/web/training] added end-to-end tests (#18700 ) ## Summary * following inference's [set-up for end-to-end tests](https://github.com/microsoft/onnxruntime/tree/main/js/web/test/e2e), created an end-to-end test runner for training * this test runner copies testdata from the [trainingapi folder](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/training_api) * then runs two tests (training session with evalModel & optimizer model, and training session with the minimum options), and tests if the ORT-web training package encompasses inference * these tests check * createTrainingSession * runTrainStep * runOptimizerStep if applicable * the parameters methods (getParametersSize, loadParametersBuffer, and getContiguousParameters) ## TL;DR * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for setting up and running the end to end tests * [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8) contains the test function definitions (`testInferenceFunction`, `testTrainingFunctionMin`, `testTrainingFunctionAll`) ## Flow * entrypoint: user runs the following command in the terminal: `npm run test:training:e2e` * [`js/web/package.json`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-79275844e75c3c410bb3a71c7f59b2b633e5a3e975c804ffc47220025084da28) was modified to include an npm script that will run `run.js` which will run the end to end tests * [`js/web/test/training/e2e/run.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-c1359c4d401f9ba69e937814219cefe5fd11b151a6ffd084c641af3c82e8216c) is responsible for * detecting and installing local tarball packages of ORT-web * copying training data to the `js/web/training/e2e/data` folder * starting two Karma processes. Karma is a test runner framework that simulates testing in the browser. * In this case, the tests happen in Chrome. We can configure the tests to run in Edge and other browsers in the future. * one of these karma processes is self-hosted, meaning it pulls the ORT-web package from local * the other karma process is not self-hosted, meaning it pulls the ORT-web package from another source. In this case, we start an http server that serves the ORT-web binaries. * [`js/web/test/training/e2e/simple-http-server.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-f798ab485f3ec26c299fe5b2923574c9e4b090200ba20d490bbf6c183286993c) is responsible for starting the HTTP server and serving the ORT binary files. This code almost identical to the same code in the inference E2E tests. * [`js/web/test/training/e2e/karma.conf.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-436cfe8f670c768a04895bd4a1874a5e033f85e0e2d84941c62ff1f7c30a9f28) Karma configuration file that specifies what happens when a karma process is started. The config specifies Mocha as the testing framework, which will go through all the loaded files and run any tests that exist * [`js/web/test/training/e2e/browser-test-wasm.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-13b6155e106dddc7b531ef671186e69b2aadb8a0f4b2f3001db0991567d78221) File that contains the tests that Mocha will pick up on and run. * The test functions (such as testInference and testTrainingFunctionAll) are defined in [`js/web/test/training/e2e/common.js`](https://github.com/microsoft/onnxruntime/compare/main...carzh:onnxruntime:carzh/training-e2e-runner?expand=1#diff-ee5452491b7b2563d175d13d81d10f2323b12b18589aa4c5798962a8b904a4a8). ## Notes * I followed the [tests for training core](`b023de0bfc/orttraining/orttraining/test/training_api/core/training_api_tests.cc`) where they randomly generated input for the training session * E2E tests are triggered by running `npm run test:training:e2e` -- suggestions for alternative script names are appreciated!!! ## Motivation and Context - adding training bindings for web	2024-01-12 13:33:33 -08:00
Preetha Veeramalai	c340bf08f6	Openvino EP code changes for 1.17 update (#19023 ) ### Description Introduce AppendExecutionProvider_OpenVINO_V2 API and support for OV 2023.3. ### Context - The API is added to facilitate customers in using published official Microsoft onnxruntime libraries with OVEP libraries. - Add support for OpenVINO 2023.3 official release. - Extend operator coverage - GH fixes --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2024-01-12 13:20:51 -08:00
Aditya Goel	dcd6d4cad6	Label encoder opset4 (#17977 ) ### Description <!-- Describe your changes. --> Implements LabelEncoder as per `ai.onnx.ml` opset 4 for the upcoming ONNX 1.15 release. ~~This currently depends on a new ONNX release candidate and so is marked as draft in the meantime.~~ ### Motivation and Context Closes https://github.com/microsoft/onnxruntime/issues/17602	2024-01-12 12:43:44 -08:00
Changming Sun	55b046e97e	Remove enable_mac_silicon settings (#19108 ) ### Description Remove enable_mac_silicon settings from two packaging pipelines. ### Motivation and Context Now we build universal2 packages instead.	2024-01-12 11:01:39 -08:00

1 2 3 4 5 ...

10391 commits