onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

Author	SHA1	Message	Date
Ye Wang	dec11afb83	Fix a prefast warning (#15343 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/14272/?triage=true	2023-04-03 18:25:25 -07:00
Hector Li	44027797b0	[QNN EP] Gather support int64 indices input (#15317 ) ### Description Gather support int64 indices input ### Motivation and Context Support more scenario	2023-04-03 17:51:42 -07:00
Matthieu Darbois	85bb13345d	Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` (#15323 ) ### Description Rework some external targets to ease building with `-DFETCHCONTENT_FULLY_DISCONNECTED=ON` This will allow package managers to more easily provide an onnxruntime package by reducing the amount of patching needed downstream at each version. ### Motivation and Context Availability of onnxruntime in some C++ package managers https://github.com/microsoft/onnxruntime/issues/7150 https://github.com/conan-io/conan-center-index/issues/16699 https://github.com/microsoft/vcpkg/issues/20548 My initial intent is to get this in conan but the PR would most likely be useful (though not tested) to vcpkg as well (and maybe others). I tried to get only a first batch of not too specific patches (i.e. not specific to conan). The first commit reworks `flatbuffers` and just extends what @snnn did in https://github.com/microsoft/onnxruntime/pull/13991 The second commit reworks `pytorch_cpuinfo` The third commit reworks `google_nsync`	2023-04-03 17:45:12 -07:00
RandySheriffH	e4aae94f20	Remove azure build to unblock PRs (#15336 ) Temporarily remove Azure build check to unblock PR(s). We need to investigate the sudden build failure and reenable. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-04-03 12:47:14 -07:00
Ye Wang	fbfe92f66a	DecoderMaskedMultiHeadAttention enhancement (#15292 )	2023-04-02 21:53:03 -07:00
Sheil Kumar	7ccdf9ad8c	User/sheilk/sequence fix (#15239 ) Ensure that Loop operators run on CPU. Fix memcpy for Sequence Tensors, so that empty sequences (like when SequenceEmpty runs on DirectML) can be copied back to CPU.	2023-03-31 12:57:25 -07:00
Dmitri Smirnov	c06ab5e353	Optimize use of Eigen::DenseBase::select() for PRelu (#15287 ) MSVC and gcc are both not good at optimizing select(), even in trivial usage outside of ORT. gcc seems to do better with -ffast-math (not used by ORT) but /fp:fast does nothing for MSVC This PR delivers a 33% speedup on the same model (360us -> 270us on Windows; 205 us -> 153 us on Linux; measured on different systems). TODO: Examine and fix Elu and other similar activation functions for the use of `Eigen::select` Co-authored-by: @fpribeiro ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-31 11:20:07 -07:00
shalvamist	fff75a301c	ORT_Web - JS graph parsing update (#15185 ) ### Description Simplified the JS graph parsing logic - addressing gitHub issue #15006 bug fix	2023-03-31 09:26:55 -07:00
Yufeng Li	c68044cc4b	fix prefast warning for GenerationCudaDeviceHelper::ProcessLogits (#15163 )	2023-03-31 08:50:53 -07:00
Yufeng Li	c08d6b42e8	Add tool to support packing mode for BERT model (#15283 ) ### Description <!-- Describe your changes. --> Add a tool to convert fused BERT like model to packing mode ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-31 08:46:47 -07:00
cloudhan	027e231a83	Report unsupport reason during tuning (#15246 )	2023-03-31 16:54:11 +08:00
JiCheng	60cc082f0a	[NNAPI] Minor fix (#15052 ) ### Description <!-- Describe your changes. --> Followed by https://github.com/microsoft/onnxruntime/pull/14881 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-31 15:13:57 +08:00
PeixuanZuo	d80859f63d	[ROCm] fix python packaging pipeline and add python10 (#15282 ) rocm python packaging pipeline failed because manylinux version and manylinux.patch update. 1. fix duplicate `epel-release` installation issue, ROCm pipeline install it at the begin of the dockerfile to install rocm libs. remove duplicate installation on install-runtime-packages.sh. ``` /var/tmp/yum-root-sMRl36/epel-release-latest-7.noarch.rpm: does not update installed package. Error: Nothing to do ``` 2. add python10 to fix error below. ``` + /opt/python/cp310-cp310/bin/python -m venv /opt/_internal/tools build_scripts/finalize.sh: line 40: /opt/python/cp310-cp310/bin/python: No such file or directory ``` 3. add python10 to rocm pipeline. pipeline link: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=294776&view=results	2023-03-31 10:25:21 +08:00
Baiju Meswani	e870089ca8	Refining the offline tooling for training artifact generation (#15212 )	2023-03-30 18:05:51 -07:00
Pranav Sharma	818b94b4ea	Add owners for public facing API files (#15288 ) ### Description Add owners for public facing API files ### Motivation and Context Tighter control on the APIs	2023-03-30 17:16:15 -07:00
Chen Fu	605c2f4b89	Remove fp16 support from apple (#15270 ) ### Description Removing fp16 support from apple build ### Motivation and Context FP16 support on ARM64 only available after armv8.2a, thus the clang compiler needs a compilation flag `-march=armv8.2-a+fp16`. Unfortunately, our current universal build does not support hardware specific compilation flags on cpp source files, as it would cause trouble when compiling against more than one hardware target. Until we figure out how to remove this limitation, had to disable fp16 support for Apple systems.	2023-03-30 16:44:26 -07:00
Guenther Schmuelling	4645726d74	fix for webgl lrn (#15236 ) fix issue that resulted in wrong results for lrn on webgpu	2023-03-30 16:16:57 -07:00
Edward Chen	9f942e1a3e	Graph transformer to ensure unique DQ nodes for QDQ node units (#15145 ) ### Description <!-- Describe your changes. --> Add required graph transformer to duplicate DQ nodes to ensure that QDQ node units have unique DQ nodes. This condition is necessary for QDQ node unit processing. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> There is an existing Python utility that does this: `c7ced7a5e9/tools/python/util/qdq_helpers/qdq_model_utils.py (L77)` This PR implements it as a graph transformer so it is integrated into ORT and does not require a separate step to update the model. There are also tests to ensure that its effects are not undone by basic level graph optimizations.	2023-03-31 08:39:43 +10:00
Xavier Dupré	786f8b98f7	Add a page in the documentation for every operator in onnxruntime (#14340 )	2023-03-30 14:39:16 -07:00
yf711	dc61d3b5b6	Fix symbolic shape inference script on precision loss issue (#15215 ) ### Description When calculating symbolic shape like `mul(get_int_val(values=[1024, 0.5]))`, the current script calls `get_int_val()` to get values, which values becomes `[1024, 0]`. Thus, the result of `mul(values)`->`mul([1024,0])`=0, but the expected shape size is 512 Fix: for math binary operations like `mul()` and `div()`, don't convert input shapes into integers if any possible precision loss happen; keep the input shape as float, finish the operation, and cast final result into integer and output the shape. Test cases are added: 1. mul(1024, 0.5)=>512 (before this fix, the output would be 0, as float 0.5 would be converted to int 0) 2. div(768, 1.5)=>512 (before this fix, the output would be 768, as float 1.5 would be converted to int 0) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-30 12:15:27 -07:00
cao lei	c2dad6893b	use cudaStreamNonBlocking flag (#15258 ) ### Description This PR uses cudaStreamNonBlocking flag when creating cuda stream, meaning the created stream will run concurrently with default stream, no implicit synchronization with default stream. ### Motivation and Context This PR is required for the perf concern	2023-03-30 11:43:50 -07:00
Changming Sun	75f6861cb8	Skip DNNL's opset18 tests (#15275 ) ### Description DNNL EP doesn't support opset18 yet. So, let it skip such tests so that we could still test the other EPs. The models mentioned above are ONNX node tests that live in github.com/onnx/onnx	2023-03-30 09:58:11 -07:00
Scott McKay	6d464748ba	Make internal nhwc schema registrations complete (#15278 ) ### Description <!-- Describe your changes. --> Add all the ONNX layout sensitive ops from opset 11 on. Make list in transpose optimizer consistent. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> When we run L1 optimizers after layout transform in a full build it needs a schema for any layout sensitive ops that get converted to the internal domain. Previously we did not run L1, so we got away without having schemas unless the EP used a static kernel for the nhwc version of the op.	2023-03-30 08:55:14 -07:00
Yi Zhang	c5f5e3ec5e	Improve 2 cache tasks in one pipeline yaml (#15267 ) ### Description 1. Make 2 cache tasks in one pipeline really works 2. Each building step has its own environment variable CCACHE_DIR instead of job variables. 3. Extenal Protobuf compilation cache only updates with deps.txt. It doesn't generate new cache in every commit. ### Motivation and Context The simple workflow is as below ``` --------build with ccache------- \| cache \| {CCACHE_DIR}-----cache stat. ``` ``` -------Cache@2------ \| download cache \| {path}--------upload cache ``` 1. {XXX} means environment variable or task input. 2. {CCACHE_DIR} must be consistent with {path}. Ccache produces caches in {CCACHE_DIR} and Cache@2 download cache into {path} and tar {path} and upload it. 3. Protobuf changes with deps.txt so that it would reduce the storage size. 4. Next step, we may split the compilation into 2 steps, one for external dependencies and another for ORT.	2023-03-30 23:22:11 +08:00
Yi Zhang	aab3c15585	Add Compliation Cache in CoreML pipeline (#15259 ) ### Description 1. move the cache task definition into template 2. In debug mode, the compiler mtime is different in different machine. So, change the CCACHE_COMPILERCHECK to content. ### Motivation and Context 1. Accelerate the CoreML pipeline. Test run: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=938040&view=logs&j=1ac7588f-a5bd-5ff7-4a8a-a34869d50220 With Cache, the run can be finished in 12 minutes. Without cache, it takes about 1 hour. 3. Make the cache function easy to use and maintain. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-30 23:18:52 +08:00
Yulong Wang	2928fda490	[web] disable browser test temporarily (#15280 ) ### Description This PR disables browser test temporarily. The test randomly fails and we are investigating the issue. Disable the test to unblock others.	2023-03-30 08:15:36 -07:00
Changming Sun	15f7dca9fb	Update protobuf to 3.21.x (#15245 ) ### Description Fixed [AB#10092](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/10092), [AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753), [AB#11759](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11759) ### Motivation and Context The one we use has a security issue in Java, though we don't use that version's protobuf java package.	2023-03-29 14:08:18 -07:00
Changming Sun	5d1dbfb432	Update ONNX test data (#15256 ) Change the test data version from 1.13.0 to 1.13.1, which will include some bug fixes.	2023-03-29 13:13:11 -07:00
Changming Sun	4a0b86eba6	Update the post-merge pipeline (#14965 ) ### Description 1. Remove Linux jobs for ORT-Extension combined build 2. Add a macOS build job for ORT-Extension combined build 3. Adjust the yaml file so that it can support two different ADO instances. ### Motivation and Context To test our code better. And it will enable us to run such tests for every commit in the main branch. It would be easier for us to figure out which change caused a build break. See [AB#13435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13435)	2023-03-29 13:12:07 -07:00
Changming Sun	fb1f03fdff	Increase the timeout value of win-wasm-ci.yml (#15257 )	2023-03-29 13:11:51 -07:00
FFFrog	ecb89ed752	[CANN] Multi-stream execution support for CANN EP. (#14058 ) ### Description Multi-stream execution support for CANN EP. ### Motivation and Context CANN EP is currently unavailable due to the introduction of a new mechanism for multi-stream execution [#13495](https://github.com/microsoft/onnxruntime/pull/13495), the deletion of the Fence-based synchronization mechanism, and the failure to update the relevant logic of CANN EP synchronously. This PR is to fix it.	2023-03-29 11:57:22 -07:00
Adrian Lizarraga	febc69e1b2	[QNN EP] Support Cast in HTP backend (#15234 ) ### Description Adds support for the Cast operator to the QNN HTP backend. ### Motivation and Context Enable more models to run on QNN HTP backend.	2023-03-29 11:01:34 -07:00
PeixuanZuo	a6279d4cfb	[ROCm] update Stable Diffusion benchmark to support ROCm EP (#15094 ) Update Stable Diffusion benchmark to support ROCm EP	2023-03-29 15:19:52 +08:00
Jian Chen	85948d6bc6	Cjian/windows update python3.11 (#15243 ) ### Description windows update python3.11 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-28 22:15:47 -07:00
Ryan Hill	659118f939	Prefast warning fixes (#15175 ) ### Description transpose.cc: Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). cuda_provider_factory.cc: The type 'struct onnxruntime::ProviderInfo_CUDA_Impl' with a virtual function needs either public virtual or protected non-virtual destructor (c.35).	2023-03-28 21:36:03 -07:00
Tianlei Wu	f752bb9973	Update stable diffusion benchmark results: A100 and PyTorch 2.0 (#15195 ) Update stable diffusion benchmark results with A100 results and PyTorch 2.0 number.	2023-03-28 19:47:22 -07:00
Justin Chu	710d095124	Refactor the constant `_ONE` in `orttraining_test_ortmodule_api.py` (#15128 ) Follow up of https://github.com/microsoft/onnxruntime/pull/15097#discussion_r1142399537	2023-03-28 08:59:51 -07:00
Chen Fu	41ddcd30a1	Fp16 NHWC Max and Average Pooling (#15181 ) ### Description Max and average pooling operators for fp16, NHWC ### Motivation and Context Continue on the steps for fp16 inference support	2023-03-28 08:22:55 -07:00
PeixuanZuo	021e46179a	[ROCm] refactor GroupNorm to set vecterize number as template parameter (#15198 ) refactor GroupNorm to set vecterize number as template parameter.	2023-03-28 16:09:56 +08:00
Justin Chu	938e2136c6	Enable pylint and numpy rules (#15218 ) ### Description Enable pylint and numpy rules ### Motivation and Context Modernize numpy usage and enable more quality checks	2023-03-27 20:37:53 -07:00
PeixuanZuo	62b2947ac1	[ROCm] remove python3.7 from python packaging pipeline (#15230 ) remove python3.7 from python packaging pipeline. https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=289720&view=results	2023-03-28 10:37:04 +08:00
Changming Sun	462c6043b5	Remove Win8 support (#15219 ) ### Description Remove Win8 support since it is EOL. See https://learn.microsoft.com/en-us/lifecycle/announcements/windows-8-1-end-support-january-2023 ### Motivation and Context Simplify code.	2023-03-27 18:51:49 -07:00
Scott McKay	eb8f6c7c52	Transpose optimizer enhancements (#15117 ) ### Description <!-- Describe your changes. --> - Add debug infrastructure to dump out model at various stages of transpose optimization. - Handle more scenarios where Transpose -> Reshape can be merged. - Run L1 optimizers after layout transform to constant fold initializers that had their layout changed. - Use cost check for Concat post layout transform as pushing a Transpose through it can potentially add Transpose nodes to multiple other inputs. - Update internal testing EP to support test where you want it to take all nodes, use NHWC layout, and to use dummy static kernels instead of compiling so the ops in the graph post-initialization can be counted. - Misc cleanup in InferenceSession to not unnecessarily pass args to TransposeGraph for class members. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Address perf issue seen with model where a Transpose gets blocked by a Reshape that could have been treated as a Transpose. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-28 08:28:17 +10:00
Jian Chen	792d411135	Update python 3.11 and remove 3.7 for Linux (#15214 ) ### Description Update python 3.11 and remove 3.7 ### Motivation and Context Update python 3.11 and remove 3.7 --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-27 14:46:30 -07:00
Edward Chen	ea40dc3ad6	Update build.py to disallow running as root user by default. (#15164 ) Try to address intermittent permissions issues that show up in non-transient CI environments.	2023-03-27 14:46:04 -07:00
Nat Kershaw (MSFT)	3064fa7611	Fix C API docs error (#15216 )	2023-03-27 14:34:18 -07:00
Jian Chen	527e006124	Update mlas (#15228 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-27 14:18:48 -07:00
Bengt Gustafsson	063ee8d504	Fixed some warnings that were treated as errors when compiling with D… (#15157 ) …ML in Win32/MSVC. ### Description Use onnxruntime::narrow to silence some warnings that are turned into errors when compiling the DML provider in Win32. Also one case of warning turned to error for mixing int loop variable type with a vector size() as upper bound. ### Motivation and Context Solves [https://github.com/microsoft/onnxruntime/issues/14595](https://github.com/microsoft/onnxruntime/issues/14595) Co-authored-by: bengt.gustafsson <bengt.gustafsson@contextvision.se>	2023-03-27 14:17:28 -07:00
Changming Sun	63cc1bb26a	Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144 ) ### Description 1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper 2. Enable CCache for orttraining pipeline ### Motivation and Context Azure AMD CPU machines are generally much cheaper than Intel CPU machines. However, they don't have local disks.	2023-03-27 14:10:08 -07:00
Patrice Vignola	67a6022c03	[DML EP] Add GroupNorm (#15189 ) Comparison between the different normalization operations: ![](https://user-images.githubusercontent.com/1041752/106491728-73d40680-64b7-11eb-8769-3f758996e959.png)	2023-03-27 12:52:53 -07:00

1 2 3 4 5 ...

8459 commits