onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-07 00:13:17 +00:00

Author	SHA1	Message	Date
wejoncy	8aeed1e87d	amend	2023-03-15 13:23:56 +08:00
wejoncy	5fe61c53a3	comments	2023-03-15 13:23:56 +08:00
JiCheng	d236085845	Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	d490908836	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	a8ed956fa7	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	de5e58c077	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	d70b5b38e2	amend	2023-03-15 13:23:56 +08:00
wejoncy	fdc970a40d	rename constant var	2023-03-15 13:23:56 +08:00
wejoncy	18015f0f55	use span	2023-03-15 13:23:56 +08:00
JiCheng	4ca84ac303	Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	28a4cfeeef	refactor	2023-03-15 13:23:56 +08:00
wejoncy	adfd38edb9	Fail early when getting device	2023-03-15 13:23:56 +08:00
wejoncy	82ae138143	address comments	2023-03-15 13:23:56 +08:00
JiCheng	36aac0036b	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	760a2b99d0	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	7c6fc31b65	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	adf990f30f	more docs	2023-03-15 13:23:56 +08:00
wejoncy	028c2372fa	remove disable_cpu_soft temporarily	2023-03-15 13:23:56 +08:00
wejoncy	30151323da	move init_device to ctor	2023-03-15 13:23:56 +08:00
JiCheng	d135bb7c0c	Update nnapi_execution_provider.cc	2023-03-15 13:23:56 +08:00
JiCheng	781e72f663	Update nnapi_api_helper.cc	2023-03-15 13:23:56 +08:00
JiCheng	8383a54f9d	Update include/onnxruntime/core/providers/nnapi/nnapi_provider_factory.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	3873a55bd3	[NNAPI] fix feature_level query	2023-03-15 13:23:56 +08:00
Jian Chen	6891ab5bac	fix_macos (#15018 ) ### Description <!-- Describe your changes. --> This fix macos packaging build on universal2 arch. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-14 21:54:44 -07:00
Tianlei Wu	bdfdebfca7	Fix ReduceSum in attention fusion (#15047 ) Fix https://github.com/microsoft/onnxruntime/issues/14959. ReduceSum-13 move axes from attribute to node input.	2023-03-14 20:34:17 -07:00
PeixuanZuo	c70838cbbb	[ROCm] add Conv, NhwcConv benchmark to microbench (#15017 ) Add Conv, NhwcConv benchmark to microbench. Related PR: https://github.com/microsoft/onnxruntime/pull/14982, https://github.com/microsoft/onnxruntime/pull/14980	2023-03-15 11:07:17 +08:00
Yi Zhang	f096f6167b	Remove python37 and cuda37 packages in orttraing (#15041 ) ### Description supplement of #14874 and #14887 ### Motivation and Context N.B. I'm not sure if python matrix of rocm is expected (python3.7-3.9) @faxu @snnn (https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/azure-pipelines/orttraining-py-packaging-pipeline-rocm.yml)	2023-03-15 08:54:15 +08:00
Yi-Hong Lyu	a8680ff188	Add float16 Sigmoid support (#14910 )	2023-03-14 17:12:49 -07:00
Tianlei Wu	0bb7390d11	fix prefast warnings (#15033 ) Fix prefast warnings	2023-03-14 13:51:51 -07:00
Rachel Guo	db4e664f7c	Re-enable react native e2e android unit test for CI and upgrade targetSDK level for test project (#14989 ) ### Description <!-- Describe your changes. --> Re-enable the react native e2e android unit test for react native CI as recent change of specifying `default` instead of `google-apis` in android emulator CI tests gives pretty stable result for now. Upgrade the targetSDKversion for gradle test project in react-native/android to meet minimum target api level requirement for Google Play apps. https://support.google.com/googleplay/android-developer/answer/11926878?hl=en ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> React Native CI issue.	2023-03-14 13:35:38 -07:00
Alex Kogan	8b09702b88	Enable parallel computation in Clip ops (#14925 ) ### Description <!-- Describe your changes. --> This PR speeds-up Clip operations by replacing their sequential implementation with a parallelized one. The parallelization is achieved by dividing the input data into chunks of size N and using a thread pool to process the chunks in parallel. The chunk size N is set to 16K based on performance evaluation on input tensors of 10^i elements for i in [1 .. 6]. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The Clip operation is frequently executed in image processing models. Its implementation can be easily parallelized and therefore sped up when executed on a multi-core machine. On long inputs (>= 100K elements) this PR achieves speedup of over 2x. On shorter inputs, this PR does not introduce any substantial performance change.	2023-03-14 09:41:44 -07:00
PeixuanZuo	2ff7f3e93a	[ROCm] support optimized Stable Diffusion model (#14980 ) Add BiasSplitGelu/BiasAdd/GroupNorm/NhwcConv operator for ROCm EP. 1. BiasSplitGelu and BiasAdd operators can be automatically hipified from CUDA EP. 2. GroupNorm was hipified from CUDA EP and modified to build. 3. NhwcConv is similar to NhwcConv in CUDA EP, But the MIOpen API and cuDnn API are different. `miopenConvolutionForwardbias` and `miopenOpTensor` of MIOpen doesn't support NHWC layout now, use BinaryElementwise to replace miopenConvolutionForwardbias(NHWC layout).	2023-03-14 23:15:37 +08:00
PeixuanZuo	ff2850029b	[ROCm] refact SkipLayernorm long if-elseif statements (#14795 ) Refact SkipLayernorm long if-elseif statements.	2023-03-14 23:04:55 +08:00
Ye Wang	0fa00429d5	[T5 optimization] script fusions and fixes (#14967 ) ### Description <!-- Describe your changes. --> 1. added script for t5 encoder self attention and t5 decoder self/cross attention fusions. 2. added simplified layernorm fusion for --external_data_format senario. (otherwise relying on ORT optimizer) 3. added rel_pos_bias shape inference code, modified attention/mha shape inference script. 4. reworked graph_topologic_sort() because the currently implementation is not functioning correctly. also added an option to topo-sort the graph in a deterministic way to let tests pass. note: 1. the t5-beamsearch export code is slightly modified. specifically, encoder_hidden_states(ehs) is no longer an input to the t5 decoder since the ehs is not actually used in the graph execution. 2. recent PRs do not add optimizations to t5 on cpu. 3. the fp32 model(encoder and decoder) for t5-small, t5-base and t5-large can get a parity of e-5 and the corresponding beam search models generate same results as pytorch. 4. fp16(mixed-precision) models, however, get a parity around 3e-2 and some has maximum diff a bit over 3e-2. But the beam search models still generate same results as pytorch (based on limited input data) 5. mt-5 model has a parity issue at the moment, even before any optimization. will investigate later. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-13 23:35:56 -07:00
Christian Veenhuis	59dfcfdce7	Fix typos in sources: operater, tranform, neccessary, trainig (#14907 ) ### Description While browsing the sources I found several typos here and there. I collected them to a single PR and fixed them. Namely these typos are: operater, tranform, neccessary, trainig. After fixing none of them was found anymore: $ git grep "operater" $ git grep "tranform" $ git grep "neccessary" $ git grep "trainig" $ ### Motivation and Context Since some of the typos are in example notebooks and markdown files, users can see them.	2023-03-13 22:45:04 -07:00
Ye Wang	538d64891a	[t5 optimization] kernel changes to t5 (#14928 ) ### Description <!-- Describe your changes. --> 1. support optional bias in Attention op (used in T5 encoder) 2. support broadcasting rel_pos_bias in attention_softmax.h 3. add scale in MHA op's attributes 4. support past_key/past_value and present_key/present_value in MHA 5. UT and parity tests are added 6. fix an issue: https://github.com/microsoft/onnxruntime/issues/14920 note: the fusions will be in another PR since mt5 needs to be tested and an issue from github will be investigated. Future works: 1. support shared buffer for past/present 2. enable trt kernels when possible and investigate (trt/cutlass)kernels with rel_pos_bias) 3. support KV/QKV packing with past/present ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-13 14:29:16 -07:00
Dmitri Smirnov	b34e570ad0	Enable LeakyRelu latest and refactor fast_gelu_fusion to enable the script (#15003 ) ### Description Enable LeakyRelu latest since the last version differs only in type support. Refactor `fast_gelu_fusion` to enable the script, because our script is unable to check if any of the optimizers are outdated and no longer in effect. ### Motivation and Context We do not want to loose performance. Next step is to file improvements issues if any are required.	2023-03-13 14:20:11 -07:00
Nat Kershaw (MSFT)	a5d814008c	Fix API docs deploy so that a PR is not required (#15011 ) Fixes this [issue](https://github.com/microsoft/onnxruntime/actions/runs/4387534694/jobs/7682945415#step:12:534) and removes the extra PR step in the workflow. Also logs the commit of the main branch that the docs were generated from to a file called version.txt at the root of the API docs tree. Tested for Java API docs and results staged here: https://natke.github.io/onnxruntime/docs/api/java/index.html If approved, I can migrate all of the other API docs generation workflows to use this scheme.	2023-03-13 09:36:08 -07:00
pengwa	44dda08b51	Renaming files (#15015 ) ### Renaming files for compute optimizer ### Motivation and Context A follow up for https://github.com/microsoft/onnxruntime/pull/14832	2023-03-13 17:07:59 +08:00
PeixuanZuo	c55f347689	[ROCm] change miopen_conv_use_max_workspace=true (#14982 ) Change miopen_conv_use_max_workspace=true to get best algorithm during `miopenFindConvolutionForwardAlgorithm` process.	2023-03-13 16:19:23 +08:00
pengwa	448e989df8	Op slicing upstream refactor (#14832 ) ### Slice op upstream refactor A refactor work for https://github.com/microsoft/onnxruntime/pull/13672. ### Motivation and Context There is a similar optimization opportunity for other operator upstreaming, to reduce compute flops. So refactor the existing code base for making it easier to support other ops. The changes in this PR are mainly about renaming and moving. - Move common logic (from compute_optimizer.h/cc) into upstream_transformer_base.h/cc and shared_utils.h/cc. - For upstream common logic, they are moved into upstream_transformer_base.h/cc - For shared utilities, they are moved to shared_utils.h/cc. - After the move, compute_optimizer.h/cc mainly for upstreaming gather implementation (inheriting upstream_transformer_base.h/cc). Ideally it should be renamed, but for easier review this time, I keep its name.	2023-03-13 08:19:32 +08:00
Yi-Hong Lyu	cce9e0eaad	Add float32 hardsigmoid tests (#14948 )	2023-03-12 10:56:29 -07:00
G. Ramalingam	930e009567	[WIP] Update call to GetFunction (#14949 ) ### Description OpSchema::GetFunction() changed in ONNX to support opset-version-dependent function-body. Update the call to GetFunction appropriately. ### Motivation and Context Motivated by https://github.com/microsoft/onnxruntime/issues/14810 --------- Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>	2023-03-11 07:04:17 -08:00
Yi Zhang	ca315b9148	Use ADO cache to cache docker image instead of ACR (#14496 ) ### Description Now, we only enable image cache in pipeline cache for Linux Aten Pipeline. It'll be enabled in other Linux pipelines gradually. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixed [AB#13143](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13143) ### Verification 1. No Image Cache in Pipeline https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904531&view=results 2. Use Cached Image in Pipeline https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904533&view=results	2023-03-11 10:32:02 +08:00
Vincent Wang	7950189920	[CUDA] Optimize Perf for AtomicAdd of Half Type (#14992 )	2023-03-11 08:52:01 +08:00
Changming Sun	a8ad0edbeb	BUG FIX: the if...else in telemetry-steps.yml does not really work (#14972 ) ### Description BUG FIX: the if...else in telemetry-steps.yml does not really work. It always says "Telemetry is disabled." even through the pipeline doesn't have the pipeline variable. ### Motivation and Context For example, recently I setup a new pipeline in https://dev.azure.com/onnxruntime/onnxruntime/_build without setting the ADO variable, but the powershell code still thinks that we have enabled telemetry. See: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=910107&view=results The reason it didn't work because when the pipeline variable("TELEMETRYGUID") doesn't exist, the occurrence of "$(TELEMETRYGUID)" would be not replace to anything. It will remain as it is.	2023-03-10 15:39:07 -08:00
Adrian Lizarraga	d8ddd25272	Add InstanceNormalization operator to QNN EP (#14867 ) ### Description QNN EP: - Adds the [InstanceNormalization](https://onnx.ai/onnx/operators/onnx__InstanceNormalization.html) operator to QNN EP. - Fixes graph composition bug when Transpose node is the last node in a graph. - Adds check for input shape when GetCapability is called (before and after layout transformation) - Should add similar checks for other layout sensitive ops (conv, pool, ...) in a separate PR - Adds initial QNN op tests for QDQ conv and QDQ InstanceNormalization - Should add tests for other ops in a separate PR Optimizer: - Makes InstanceNormalization a layout sensitive operator. - Adds a custom QDQ group selector for InstanceNormalization. Quantization tool: - Adds QDQ support for InstanceNormalization operator. - Adds python unit test for InstanceNormalization quantization. ### Motivation and Context Needed to support stable diffusion models with QNN. --------- Co-authored-by: Hector Li <hecli@microsoft.com>	2023-03-10 14:42:41 -08:00
Ryan Hill	a5c436e148	Fix prefast warnings (#14975 ) ### Description In transpose.cc: Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). In cuda_provider_factory.h: The type 'struct onnxruntime::CUDA_Provider' with a virtual function needs either public virtual or protected non-virtual destructor (c.35).	2023-03-10 14:31:55 -08:00
Dmitri Smirnov	0d7855ea5a	Re-work global objects dependancies in pybind layer. (#14941 ) ### Description Re-work handling of static objects in pybind. Make sure we ref-count Environment from Sessions. The following has been done: - Make global objects function static. This ensures that the objects are constructed on demand. The first object constructed is destructed last. This is platform independent. - Make global objects ownership shared as suggested by pybind since they are not surfaced at Python level, and they cannot be referred to by dependent python objects. Verified that all python objects are GCed before globals are destroyed. This takes care of inference session dependency on environment and its default logger and this is also platform independent. - Utilize pybind atexit mechanism to clear execution providers and unload CUDA libraries (as suggested by https://github.com/microsoft/onnxruntime/pull/14903) . Since this is registered for module exit, it takes place before any other global are destroyed and clears shared objects state or even unloads the libraries. This should also work in a platform independent way. ### Motivation and Context - Global object destruction order is managed manually and that becomes source of trouble. We want to make it deterministic and platform independent. - Frequent hangs in Python layer due to the static object's destruction order. Some of the Python session objects are being garbage collected after main exits and they require ORT environment to be alive. (Use after free)	2023-03-10 13:55:31 -08:00
Adrian Lizarraga	e2febe87f6	[QNN EP] Update QNN SDK to 2.8 (#14978 ) ### Description - Add QNN 2.8 SDK - Make QNN SDK version a pipeline template parameter for QNN pipelines. ### Motivation and Context Updates to latest QNN SDK version, and allows testing different QNN SDK versions without modifying yaml files.	2023-03-10 13:21:19 -08:00

1 2 3 4 5 ...

8330 commits