onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Baiju Meswani	e870089ca8	Refining the offline tooling for training artifact generation (#15212 )	2023-03-30 18:05:51 -07:00
Pranav Sharma	818b94b4ea	Add owners for public facing API files (#15288 ) ### Description Add owners for public facing API files ### Motivation and Context Tighter control on the APIs	2023-03-30 17:16:15 -07:00
Chen Fu	605c2f4b89	Remove fp16 support from apple (#15270 ) ### Description Removing fp16 support from apple build ### Motivation and Context FP16 support on ARM64 only available after armv8.2a, thus the clang compiler needs a compilation flag `-march=armv8.2-a+fp16`. Unfortunately, our current universal build does not support hardware specific compilation flags on cpp source files, as it would cause trouble when compiling against more than one hardware target. Until we figure out how to remove this limitation, had to disable fp16 support for Apple systems.	2023-03-30 16:44:26 -07:00
Guenther Schmuelling	4645726d74	fix for webgl lrn (#15236 ) fix issue that resulted in wrong results for lrn on webgpu	2023-03-30 16:16:57 -07:00
Edward Chen	9f942e1a3e	Graph transformer to ensure unique DQ nodes for QDQ node units (#15145 ) ### Description <!-- Describe your changes. --> Add required graph transformer to duplicate DQ nodes to ensure that QDQ node units have unique DQ nodes. This condition is necessary for QDQ node unit processing. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> There is an existing Python utility that does this: `c7ced7a5e9/tools/python/util/qdq_helpers/qdq_model_utils.py (L77)` This PR implements it as a graph transformer so it is integrated into ORT and does not require a separate step to update the model. There are also tests to ensure that its effects are not undone by basic level graph optimizations.	2023-03-31 08:39:43 +10:00
Xavier Dupré	786f8b98f7	Add a page in the documentation for every operator in onnxruntime (#14340 )	2023-03-30 14:39:16 -07:00
yf711	dc61d3b5b6	Fix symbolic shape inference script on precision loss issue (#15215 ) ### Description When calculating symbolic shape like `mul(get_int_val(values=[1024, 0.5]))`, the current script calls `get_int_val()` to get values, which values becomes `[1024, 0]`. Thus, the result of `mul(values)`->`mul([1024,0])`=0, but the expected shape size is 512 Fix: for math binary operations like `mul()` and `div()`, don't convert input shapes into integers if any possible precision loss happen; keep the input shape as float, finish the operation, and cast final result into integer and output the shape. Test cases are added: 1. mul(1024, 0.5)=>512 (before this fix, the output would be 0, as float 0.5 would be converted to int 0) 2. div(768, 1.5)=>512 (before this fix, the output would be 768, as float 1.5 would be converted to int 0) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-30 12:15:27 -07:00
cao lei	c2dad6893b	use cudaStreamNonBlocking flag (#15258 ) ### Description This PR uses cudaStreamNonBlocking flag when creating cuda stream, meaning the created stream will run concurrently with default stream, no implicit synchronization with default stream. ### Motivation and Context This PR is required for the perf concern	2023-03-30 11:43:50 -07:00
Changming Sun	75f6861cb8	Skip DNNL's opset18 tests (#15275 ) ### Description DNNL EP doesn't support opset18 yet. So, let it skip such tests so that we could still test the other EPs. The models mentioned above are ONNX node tests that live in github.com/onnx/onnx	2023-03-30 09:58:11 -07:00
Scott McKay	6d464748ba	Make internal nhwc schema registrations complete (#15278 ) ### Description <!-- Describe your changes. --> Add all the ONNX layout sensitive ops from opset 11 on. Make list in transpose optimizer consistent. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> When we run L1 optimizers after layout transform in a full build it needs a schema for any layout sensitive ops that get converted to the internal domain. Previously we did not run L1, so we got away without having schemas unless the EP used a static kernel for the nhwc version of the op.	2023-03-30 08:55:14 -07:00
Yi Zhang	c5f5e3ec5e	Improve 2 cache tasks in one pipeline yaml (#15267 ) ### Description 1. Make 2 cache tasks in one pipeline really works 2. Each building step has its own environment variable CCACHE_DIR instead of job variables. 3. Extenal Protobuf compilation cache only updates with deps.txt. It doesn't generate new cache in every commit. ### Motivation and Context The simple workflow is as below ``` --------build with ccache------- \| cache \| {CCACHE_DIR}-----cache stat. ``` ``` -------Cache@2------ \| download cache \| {path}--------upload cache ``` 1. {XXX} means environment variable or task input. 2. {CCACHE_DIR} must be consistent with {path}. Ccache produces caches in {CCACHE_DIR} and Cache@2 download cache into {path} and tar {path} and upload it. 3. Protobuf changes with deps.txt so that it would reduce the storage size. 4. Next step, we may split the compilation into 2 steps, one for external dependencies and another for ORT.	2023-03-30 23:22:11 +08:00
Yi Zhang	aab3c15585	Add Compliation Cache in CoreML pipeline (#15259 ) ### Description 1. move the cache task definition into template 2. In debug mode, the compiler mtime is different in different machine. So, change the CCACHE_COMPILERCHECK to content. ### Motivation and Context 1. Accelerate the CoreML pipeline. Test run: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=938040&view=logs&j=1ac7588f-a5bd-5ff7-4a8a-a34869d50220 With Cache, the run can be finished in 12 minutes. Without cache, it takes about 1 hour. 3. Make the cache function easy to use and maintain. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-30 23:18:52 +08:00
Yulong Wang	2928fda490	[web] disable browser test temporarily (#15280 ) ### Description This PR disables browser test temporarily. The test randomly fails and we are investigating the issue. Disable the test to unblock others.	2023-03-30 08:15:36 -07:00
Changming Sun	15f7dca9fb	Update protobuf to 3.21.x (#15245 ) ### Description Fixed [AB#10092](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/10092), [AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753), [AB#11759](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11759) ### Motivation and Context The one we use has a security issue in Java, though we don't use that version's protobuf java package.	2023-03-29 14:08:18 -07:00
Changming Sun	5d1dbfb432	Update ONNX test data (#15256 ) Change the test data version from 1.13.0 to 1.13.1, which will include some bug fixes.	2023-03-29 13:13:11 -07:00
Changming Sun	4a0b86eba6	Update the post-merge pipeline (#14965 ) ### Description 1. Remove Linux jobs for ORT-Extension combined build 2. Add a macOS build job for ORT-Extension combined build 3. Adjust the yaml file so that it can support two different ADO instances. ### Motivation and Context To test our code better. And it will enable us to run such tests for every commit in the main branch. It would be easier for us to figure out which change caused a build break. See [AB#13435](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13435)	2023-03-29 13:12:07 -07:00
Changming Sun	fb1f03fdff	Increase the timeout value of win-wasm-ci.yml (#15257 )	2023-03-29 13:11:51 -07:00
FFFrog	ecb89ed752	[CANN] Multi-stream execution support for CANN EP. (#14058 ) ### Description Multi-stream execution support for CANN EP. ### Motivation and Context CANN EP is currently unavailable due to the introduction of a new mechanism for multi-stream execution [#13495](https://github.com/microsoft/onnxruntime/pull/13495), the deletion of the Fence-based synchronization mechanism, and the failure to update the relevant logic of CANN EP synchronously. This PR is to fix it.	2023-03-29 11:57:22 -07:00
Adrian Lizarraga	febc69e1b2	[QNN EP] Support Cast in HTP backend (#15234 ) ### Description Adds support for the Cast operator to the QNN HTP backend. ### Motivation and Context Enable more models to run on QNN HTP backend.	2023-03-29 11:01:34 -07:00
PeixuanZuo	a6279d4cfb	[ROCm] update Stable Diffusion benchmark to support ROCm EP (#15094 ) Update Stable Diffusion benchmark to support ROCm EP	2023-03-29 15:19:52 +08:00
Jian Chen	85948d6bc6	Cjian/windows update python3.11 (#15243 ) ### Description windows update python3.11 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-28 22:15:47 -07:00
Ryan Hill	659118f939	Prefast warning fixes (#15175 ) ### Description transpose.cc: Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). cuda_provider_factory.cc: The type 'struct onnxruntime::ProviderInfo_CUDA_Impl' with a virtual function needs either public virtual or protected non-virtual destructor (c.35).	2023-03-28 21:36:03 -07:00
Tianlei Wu	f752bb9973	Update stable diffusion benchmark results: A100 and PyTorch 2.0 (#15195 ) Update stable diffusion benchmark results with A100 results and PyTorch 2.0 number.	2023-03-28 19:47:22 -07:00
Justin Chu	710d095124	Refactor the constant `_ONE` in `orttraining_test_ortmodule_api.py` (#15128 ) Follow up of https://github.com/microsoft/onnxruntime/pull/15097#discussion_r1142399537	2023-03-28 08:59:51 -07:00
Chen Fu	41ddcd30a1	Fp16 NHWC Max and Average Pooling (#15181 ) ### Description Max and average pooling operators for fp16, NHWC ### Motivation and Context Continue on the steps for fp16 inference support	2023-03-28 08:22:55 -07:00
PeixuanZuo	021e46179a	[ROCm] refactor GroupNorm to set vecterize number as template parameter (#15198 ) refactor GroupNorm to set vecterize number as template parameter.	2023-03-28 16:09:56 +08:00
Justin Chu	938e2136c6	Enable pylint and numpy rules (#15218 ) ### Description Enable pylint and numpy rules ### Motivation and Context Modernize numpy usage and enable more quality checks	2023-03-27 20:37:53 -07:00
PeixuanZuo	62b2947ac1	[ROCm] remove python3.7 from python packaging pipeline (#15230 ) remove python3.7 from python packaging pipeline. https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=289720&view=results	2023-03-28 10:37:04 +08:00
Changming Sun	462c6043b5	Remove Win8 support (#15219 ) ### Description Remove Win8 support since it is EOL. See https://learn.microsoft.com/en-us/lifecycle/announcements/windows-8-1-end-support-january-2023 ### Motivation and Context Simplify code.	2023-03-27 18:51:49 -07:00
Scott McKay	eb8f6c7c52	Transpose optimizer enhancements (#15117 ) ### Description <!-- Describe your changes. --> - Add debug infrastructure to dump out model at various stages of transpose optimization. - Handle more scenarios where Transpose -> Reshape can be merged. - Run L1 optimizers after layout transform to constant fold initializers that had their layout changed. - Use cost check for Concat post layout transform as pushing a Transpose through it can potentially add Transpose nodes to multiple other inputs. - Update internal testing EP to support test where you want it to take all nodes, use NHWC layout, and to use dummy static kernels instead of compiling so the ops in the graph post-initialization can be counted. - Misc cleanup in InferenceSession to not unnecessarily pass args to TransposeGraph for class members. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Address perf issue seen with model where a Transpose gets blocked by a Reshape that could have been treated as a Transpose. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-28 08:28:17 +10:00
Jian Chen	792d411135	Update python 3.11 and remove 3.7 for Linux (#15214 ) ### Description Update python 3.11 and remove 3.7 ### Motivation and Context Update python 3.11 and remove 3.7 --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-27 14:46:30 -07:00
Edward Chen	ea40dc3ad6	Update build.py to disallow running as root user by default. (#15164 ) Try to address intermittent permissions issues that show up in non-transient CI environments.	2023-03-27 14:46:04 -07:00
Nat Kershaw (MSFT)	3064fa7611	Fix C API docs error (#15216 )	2023-03-27 14:34:18 -07:00
Jian Chen	527e006124	Update mlas (#15228 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-27 14:18:48 -07:00
Bengt Gustafsson	063ee8d504	Fixed some warnings that were treated as errors when compiling with D… (#15157 ) …ML in Win32/MSVC. ### Description Use onnxruntime::narrow to silence some warnings that are turned into errors when compiling the DML provider in Win32. Also one case of warning turned to error for mixing int loop variable type with a vector size() as upper bound. ### Motivation and Context Solves [https://github.com/microsoft/onnxruntime/issues/14595](https://github.com/microsoft/onnxruntime/issues/14595) Co-authored-by: bengt.gustafsson <bengt.gustafsson@contextvision.se>	2023-03-27 14:17:28 -07:00
Changming Sun	63cc1bb26a	Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144 ) ### Description 1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper 2. Enable CCache for orttraining pipeline ### Motivation and Context Azure AMD CPU machines are generally much cheaper than Intel CPU machines. However, they don't have local disks.	2023-03-27 14:10:08 -07:00
Patrice Vignola	67a6022c03	[DML EP] Add GroupNorm (#15189 ) Comparison between the different normalization operations: ![](https://user-images.githubusercontent.com/1041752/106491728-73d40680-64b7-11eb-8769-3f758996e959.png)	2023-03-27 12:52:53 -07:00
Tianlei Wu	2e56620611	Add file and line info in CudaCall and RocmCall macros (#15148 ) This PR add file and line information so that it is easy to trouble shoot the issue of cuda error. Update Rocm call as well for hipify.	2023-03-27 11:04:19 -07:00
Changming Sun	ffcfb1ec98	Remove protobuf submodule (#15190 ) ### Description Remove protobuf submodule as a follow-up of #13523 "Android CI Pipeline" and "Zip-Nuget-Java-Nodejs Packaging Pipeline" need to be tested. ### Motivation and Context It is related to [AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753) Fixed [AB#14027](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14027)	2023-03-27 10:35:49 -07:00
Justin Chu	e754edaecf	Run rustfmt in CI (#15217 ) I considered running clippy as well but ort takes too long to build	2023-03-27 08:12:59 -07:00
Patrice Vignola	b10aaf4e9c	Fix error message when running NhwcConv with a bad weight channel count (#15221 )	2023-03-27 00:15:19 -07:00
Yi Zhang	d182d34f1d	pause caching docker image in pipeline cache in Linux Aten Pipeline (#15227 ) ### Description Pause caching the docker images in pipeline cache in Linux Aten Pipeline. ### Motivation and Context We need to work out a better way to reduce the storage.	2023-03-27 11:06:53 +08:00
Adrian Lizarraga	d24b630fc3	[QNN EP] Support reduce ops with axes as initializer input (#15126 ) ### Description - Adds support for newer opset of Reduction operators (ReduceSum, ReduceMax, ReduceMin, ReduceMean, ReduceProd) with axes as an initializer input. - Adds tests for HTP and CPU backends. ### Motivation and Context Newer opset versions changed the `axes` attribute into an optional input. This PR adds support for these newer reduction operators as long as the axes input is defined as an initializer. The goal is to enable more models on QNN.	2023-03-26 16:39:22 -07:00
cloudhan	d3565779c3	Allow bert_perf_test.py to load/save tuning results (#15096 )	2023-03-26 18:03:08 +08:00
Chris Austen	93e6902790	resolve undefined symbol: rocblas_create_handle (#15204 ) Update migraphx section of onnxruntime_providers.cmake to add the rocblas library	2023-03-26 18:01:58 +08:00
Jian Chen	750747d8c9	Cjian/multi stage packaging pipeline (#14993 )	2023-03-24 23:39:15 -07:00
Hector Li	5a2e43bdd5	[QNN EP] Improve Slice to support opset 9 (#15186 ) ### Description Improve Slice to support Onnx opset9 which has starts, ends & axes in node attributes. ### Motivation and Context To unblock some models.	2023-03-24 16:07:06 -07:00
Justin Chu	d834ec895a	Adopt linrtunner as the linting tool - take 2 (#15085 ) ### Description `lintrunner` is a linter runner successfully used by pytorch, onnx and onnx-script. It provides a uniform experience running linters locally and in CI. It supports all major dev systems: Windows, Linux and MacOs. The checks are enforced by the `Python format` workflow. This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors in Python code. `lintrunner` now runs all required python lints including `ruff`(replacing `flake8`), `black` and `isort`. Future lints like `clang-format` can be added. Most errors are auto-fixed by `ruff` and the fixes should be considered robust. Lints that are more complicated to fix are applied `# noqa` for now and should be fixed in follow up PRs. ### Notable changes 1. This PR removed some suboptimal patterns: - `not xxx in` -> `xxx not in` membership checks - bare excepts (`except:` -> `except Exception`) - unused imports The follow up PR will remove: - `import *` - mutable values as default in function definitions (`def func(a=[])`) - more unused imports - unused local variables 2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than flake8 and is more robust. We are using it successfully in onnx and onnx-script. It also supports auto-fixing many flake8 errors. 3. Removed the legacy flake8 ci flow and updated docs. 4. The added workflow supports SARIF code scanning reports on github, example snapshot: ![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png) 5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Unified linting experience in CI and local. Replacing https://github.com/microsoft/onnxruntime/pull/14306 --------- Signed-off-by: Justin Chu <justinchu@microsoft.com>	2023-03-24 15:29:03 -07:00
Dmitri Smirnov	2de15c5d50	Re-work OrtApi struct to satisfy C++20 compilers (#15183 ) ### Description <!-- Describe your changes. --> Remove `deletion` of copy functions from `OrtApi` as its initialization no longer compiles in C++20. Introduce a non-copyable member to implicitly delete copy ctor. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Inspired by https://github.com/microsoft/onnxruntime/pull/14901 Solution credits: @RyanUnderhill Cc: @georgthegreat	2023-03-24 13:52:17 -07:00
Justin Stoecker	dc87691000	Enable DML graph fusion independently of graph optimization level (#15172 ) ### Description Apply the DML graph fusion transformer optimization independently of ORT graph optimization level. ### Motivation and Context The DML graph fusion transformer is not a graph optimizer in the normal sense: it isn't optimizing the ONNX graph structure, but rather fusing nodes into what will later become a single IDMLCompiledOperator (using IDMLDevice1::CompileGraph). This transformer can't be done ahead of time (hence why it's disabled if saving an optimized model), but it's also gated by the ORT graph optimization level; this makes it impossible to preoptimize ONNX models ("offline mode") and then later disable graph optimizations for better startup performance ("online mode") while benefiting from DML graph fusion.	2023-03-24 13:50:17 -07:00

1 2 3 4 5 ...

8446 commits