onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

Author	SHA1	Message	Date
Hariharan Seshadri	ed7ab1660d	[CUDA] Add option to use DecoderMaskedMultiheadAttention in BeamSearch (#14990 )	2023-03-15 17:16:32 -07:00
Yufeng Li	da084b0fc1	check axis range for LayerNorm (#14845 ) ### Description <!-- Describe your changes. --> Add check on axis to make sure it is in a valid range ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-15 14:44:59 -07:00
Changming Sun	5213546e62	Change how to find npm (#15001 )	2023-03-15 11:10:10 -07:00
wejoncy	32533dd1c2	fix	2023-03-15 13:23:56 +08:00
JiCheng	cc15ceef4e	Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/model_builder.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	6bdb03281a	clean comments	2023-03-15 13:23:56 +08:00
wejoncy	762ea2402e	fix	2023-03-15 13:23:56 +08:00
JiCheng	8db28d9139	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	8d00961321	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	c10462b5f5	ORT_UNUSED_PARAMETER	2023-03-15 13:23:56 +08:00
wejoncy	92fabf57ea	comments	2023-03-15 13:23:56 +08:00
JiCheng	cd3173d531	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	dad772ef09	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	8aeed1e87d	amend	2023-03-15 13:23:56 +08:00
wejoncy	5fe61c53a3	comments	2023-03-15 13:23:56 +08:00
JiCheng	d236085845	Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	d490908836	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	a8ed956fa7	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	de5e58c077	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	d70b5b38e2	amend	2023-03-15 13:23:56 +08:00
wejoncy	fdc970a40d	rename constant var	2023-03-15 13:23:56 +08:00
wejoncy	18015f0f55	use span	2023-03-15 13:23:56 +08:00
JiCheng	4ca84ac303	Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	28a4cfeeef	refactor	2023-03-15 13:23:56 +08:00
wejoncy	adfd38edb9	Fail early when getting device	2023-03-15 13:23:56 +08:00
wejoncy	82ae138143	address comments	2023-03-15 13:23:56 +08:00
JiCheng	36aac0036b	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	760a2b99d0	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	7c6fc31b65	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	adf990f30f	more docs	2023-03-15 13:23:56 +08:00
wejoncy	028c2372fa	remove disable_cpu_soft temporarily	2023-03-15 13:23:56 +08:00
wejoncy	30151323da	move init_device to ctor	2023-03-15 13:23:56 +08:00
JiCheng	d135bb7c0c	Update nnapi_execution_provider.cc	2023-03-15 13:23:56 +08:00
JiCheng	781e72f663	Update nnapi_api_helper.cc	2023-03-15 13:23:56 +08:00
JiCheng	8383a54f9d	Update include/onnxruntime/core/providers/nnapi/nnapi_provider_factory.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	3873a55bd3	[NNAPI] fix feature_level query	2023-03-15 13:23:56 +08:00
Jian Chen	6891ab5bac	fix_macos (#15018 ) ### Description <!-- Describe your changes. --> This fix macos packaging build on universal2 arch. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-14 21:54:44 -07:00
Tianlei Wu	bdfdebfca7	Fix ReduceSum in attention fusion (#15047 ) Fix https://github.com/microsoft/onnxruntime/issues/14959. ReduceSum-13 move axes from attribute to node input.	2023-03-14 20:34:17 -07:00
PeixuanZuo	c70838cbbb	[ROCm] add Conv, NhwcConv benchmark to microbench (#15017 ) Add Conv, NhwcConv benchmark to microbench. Related PR: https://github.com/microsoft/onnxruntime/pull/14982, https://github.com/microsoft/onnxruntime/pull/14980	2023-03-15 11:07:17 +08:00
Yi Zhang	f096f6167b	Remove python37 and cuda37 packages in orttraing (#15041 ) ### Description supplement of #14874 and #14887 ### Motivation and Context N.B. I'm not sure if python matrix of rocm is expected (python3.7-3.9) @faxu @snnn (https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/azure-pipelines/orttraining-py-packaging-pipeline-rocm.yml)	2023-03-15 08:54:15 +08:00
Yi-Hong Lyu	a8680ff188	Add float16 Sigmoid support (#14910 )	2023-03-14 17:12:49 -07:00
Tianlei Wu	0bb7390d11	fix prefast warnings (#15033 ) Fix prefast warnings	2023-03-14 13:51:51 -07:00
Rachel Guo	db4e664f7c	Re-enable react native e2e android unit test for CI and upgrade targetSDK level for test project (#14989 ) ### Description <!-- Describe your changes. --> Re-enable the react native e2e android unit test for react native CI as recent change of specifying `default` instead of `google-apis` in android emulator CI tests gives pretty stable result for now. Upgrade the targetSDKversion for gradle test project in react-native/android to meet minimum target api level requirement for Google Play apps. https://support.google.com/googleplay/android-developer/answer/11926878?hl=en ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> React Native CI issue.	2023-03-14 13:35:38 -07:00
Alex Kogan	8b09702b88	Enable parallel computation in Clip ops (#14925 ) ### Description <!-- Describe your changes. --> This PR speeds-up Clip operations by replacing their sequential implementation with a parallelized one. The parallelization is achieved by dividing the input data into chunks of size N and using a thread pool to process the chunks in parallel. The chunk size N is set to 16K based on performance evaluation on input tensors of 10^i elements for i in [1 .. 6]. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The Clip operation is frequently executed in image processing models. Its implementation can be easily parallelized and therefore sped up when executed on a multi-core machine. On long inputs (>= 100K elements) this PR achieves speedup of over 2x. On shorter inputs, this PR does not introduce any substantial performance change.	2023-03-14 09:41:44 -07:00
PeixuanZuo	2ff7f3e93a	[ROCm] support optimized Stable Diffusion model (#14980 ) Add BiasSplitGelu/BiasAdd/GroupNorm/NhwcConv operator for ROCm EP. 1. BiasSplitGelu and BiasAdd operators can be automatically hipified from CUDA EP. 2. GroupNorm was hipified from CUDA EP and modified to build. 3. NhwcConv is similar to NhwcConv in CUDA EP, But the MIOpen API and cuDnn API are different. `miopenConvolutionForwardbias` and `miopenOpTensor` of MIOpen doesn't support NHWC layout now, use BinaryElementwise to replace miopenConvolutionForwardbias(NHWC layout).	2023-03-14 23:15:37 +08:00
PeixuanZuo	ff2850029b	[ROCm] refact SkipLayernorm long if-elseif statements (#14795 ) Refact SkipLayernorm long if-elseif statements.	2023-03-14 23:04:55 +08:00
Ye Wang	0fa00429d5	[T5 optimization] script fusions and fixes (#14967 ) ### Description <!-- Describe your changes. --> 1. added script for t5 encoder self attention and t5 decoder self/cross attention fusions. 2. added simplified layernorm fusion for --external_data_format senario. (otherwise relying on ORT optimizer) 3. added rel_pos_bias shape inference code, modified attention/mha shape inference script. 4. reworked graph_topologic_sort() because the currently implementation is not functioning correctly. also added an option to topo-sort the graph in a deterministic way to let tests pass. note: 1. the t5-beamsearch export code is slightly modified. specifically, encoder_hidden_states(ehs) is no longer an input to the t5 decoder since the ehs is not actually used in the graph execution. 2. recent PRs do not add optimizations to t5 on cpu. 3. the fp32 model(encoder and decoder) for t5-small, t5-base and t5-large can get a parity of e-5 and the corresponding beam search models generate same results as pytorch. 4. fp16(mixed-precision) models, however, get a parity around 3e-2 and some has maximum diff a bit over 3e-2. But the beam search models still generate same results as pytorch (based on limited input data) 5. mt-5 model has a parity issue at the moment, even before any optimization. will investigate later. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-13 23:35:56 -07:00
Christian Veenhuis	59dfcfdce7	Fix typos in sources: operater, tranform, neccessary, trainig (#14907 ) ### Description While browsing the sources I found several typos here and there. I collected them to a single PR and fixed them. Namely these typos are: operater, tranform, neccessary, trainig. After fixing none of them was found anymore: $ git grep "operater" $ git grep "tranform" $ git grep "neccessary" $ git grep "trainig" $ ### Motivation and Context Since some of the typos are in example notebooks and markdown files, users can see them.	2023-03-13 22:45:04 -07:00
Ye Wang	538d64891a	[t5 optimization] kernel changes to t5 (#14928 ) ### Description <!-- Describe your changes. --> 1. support optional bias in Attention op (used in T5 encoder) 2. support broadcasting rel_pos_bias in attention_softmax.h 3. add scale in MHA op's attributes 4. support past_key/past_value and present_key/present_value in MHA 5. UT and parity tests are added 6. fix an issue: https://github.com/microsoft/onnxruntime/issues/14920 note: the fusions will be in another PR since mt5 needs to be tested and an issue from github will be investigated. Future works: 1. support shared buffer for past/present 2. enable trt kernels when possible and investigate (trt/cutlass)kernels with rel_pos_bias) 3. support KV/QKV packing with past/present ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-13 14:29:16 -07:00
Dmitri Smirnov	b34e570ad0	Enable LeakyRelu latest and refactor fast_gelu_fusion to enable the script (#15003 ) ### Description Enable LeakyRelu latest since the last version differs only in type support. Refactor `fast_gelu_fusion` to enable the script, because our script is unable to check if any of the optimizers are outdated and no longer in effect. ### Motivation and Context We do not want to loose performance. Next step is to file improvements issues if any are required.	2023-03-13 14:20:11 -07:00

1 2 3 4 5 ...

8343 commits