onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Edward Chen	2ecd1d6622	Switch GSL to MS GSL 4.0.0 (#13416 )	2022-10-29 04:15:20 -07:00
Edward Chen	7fbfbf789f	Increase timeout for binary-size-checks-pipeline. (#13498 )	2022-10-28 23:15:56 -07:00
zhangyaobit	33b8778a46	Minor improvement for the documentation of kernel explorer (#13490 ) ### Description <!-- Describe your changes. --> Fix the input shape of FastGelu Minor improvement for the documentation of kernel explorer ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-28 22:57:53 -07:00
Fei Hu	943e156f4c	Allow custom ops to set input memory type (#10879 )	2022-10-28 21:45:26 -07:00
Hector Li	1b494daffa	Add yml file for Snpe EP build (#13494 ) Add yml file for Snpe EP build	2022-10-28 19:47:50 -07:00
Changming Sun	689e524c58	Move DML packaging pipelines to aiinfra-dml-winbuild machine pool (#13487 ) 1. Move DML packaging pipelines to aiinfra-dml-winbuild machine pool 2. Delete tools/ci_build/github/azure-pipelines/templates/windowsai-nuget-build.yml because the pipeline has been migrated to Onebranch. I monitored it for months, it worked well.	2022-10-28 10:30:16 -07:00
Numfor Tiapo	49e5a11ccd	Fix SDL and Prefast Errors (#13465 ) Fixes Errors 1978844, 1978870, 1978850, 1978855, and 9245 Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>	2022-10-28 09:41:18 -07:00
zhangyaobit	0a524cfe1c	Fix the input shape of FastGelu (#13488 ) ### Description <!-- Describe your changes. --> Fix the input shape of FastGelu ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-28 09:36:31 -07:00
cloudhan	4dd053cc15	Update CK and fix performance due to lacking -amdgpu-early-inline-all=true (#13493 ) 1. Update CK to its latest develop branch 2. `-mllvm -amdgpu-early-inline-all=true` is critical to CK's performance, add it.	2022-10-28 09:36:00 -07:00
Vincent Wang	8b0669bf63	QuickGelu Fusion (#12417 ) Some models have QuickGelu(x)=x*sigmoid(1.702x), which has 3 Ops for forward and 5 Ops for backward. The PR is to fuse this to a single Op named QuickGelu and its gradient QuickGeluGrad. For CUDA, tested in V100 using input tensor with shape [64,128,2048] and float16 type: Before, FW takes 335us, BW takes 614us ![image](https://user-images.githubusercontent.com/11661208/182291335-15188709-ffe7-44d1-9d14-0b544cbe5e55.png) After, FW takes 115us, BW takes 139us, which is much faster. ![image](https://user-images.githubusercontent.com/11661208/182291502-f0b5161c-b95c-45fc-90f8-ad0c592d2433.png) For CPU kernel, using same shape and float type: Before, FW takes 10us, BW takes 49us Mul: 3480[µs] Sigmoid: 1996[µs] Mul: 4789[µs] Mul: 4642[µs] Mul: 4195[µs] SigmoidGrad: 18328[µs] Mul: 2988[µs] Sum: 18576[µs] After, FW takes 4us, BW takes 5us, which is also much faster. QuickGelu: 3939[µs] QuickGeluGrad: 5089[µs] Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2022-10-28 18:12:07 +08:00
JiCheng	20c3c35c33	[XNNPACK] support building xnnpack EP for IOS (#13461 ) ### Description support building xnnpack for IOS ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-28 15:03:04 +08:00
Changming Sun	07271b6c8a	Update docs/OperatorKernels.md (#13485 )	2022-10-27 20:11:49 -07:00
Jian Chen	f9378c5cca	Cjian/c4244 round 2 (#13473 ) ### Description Round 2 of fixing C4244 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-27 18:50:26 -04:00
Changming Sun	4a20c0d98b	Delete zlib.cmake (#13467 ) Delete the file because it is not included by any other file.	2022-10-27 15:36:04 -07:00
Yi Zhang	67074851a3	Skip failed models on training ci and openvino ci (#13477 )	2022-10-27 15:22:47 -07:00
Changming Sun	35659d9021	Increase the timeout value for linux-gpu-tensorrt-ci-pipeline.yml (#13481 ) Now it takes about 55-60 minutes. It is on the edge so it often fails.	2022-10-27 14:26:22 -07:00
Scott McKay	ab71c4bbc0	Document generation CI is broken (#13308 ) ### Description <!-- Describe your changes. --> Fix document generation CI. It's not currently updating the docs as we're skipping the tests, which is the invocation of build.py that would have generated the documentation. Setup specific task to generate documentation for greater clarity. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Operator kernel documentation is not getting updated and is now out of date.	2022-10-28 07:20:48 +10:00
Patrice Vignola	0b29f64dba	[DML EP] Enable all datatypes for Abs and Sign (#13470 ) ### Description Enables all datatypes supported for DML for `Abs` and `Sign`. ### Motivation and Context `Abs` and `Sign` haven't been updated since DML started to support all datatypes for them. These ops are used in some transformer models and were forcing unnecessary copies between the CPU and the GPU.	2022-10-27 11:36:11 -07:00
Dmitri Smirnov	0e2087acff	Add extension method to compensate for Contains() absence (#13466 ) ### Description The targeted framework does not contain `Contains(string, orginal)`. Add extension method to compensate in following the suggestion [here](https://learn.microsoft.com/en-us/dotnet/api/system.string.contains?view=net-7.0). ### Motivation and Context Packaging pipeline fails.	2022-10-27 10:00:47 -07:00
Baiju Meswani	a46c599a40	Training API to export the eval model to an inference model (#13345 )	2022-10-27 09:34:01 -07:00
Jian Chen	8827c4bdbc	First round of fixes. (#13452 ) ### Description First round of fixes for C4244 error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-26 23:05:45 -04:00
Edward Chen	601b74b904	Add '$schema' entry to cgmanifest.json files. (#13444 )	2022-10-26 16:15:05 -07:00
Changming Sun	7d58332298	Update tsaoptions.json: update the email alias (#13448 )	2022-10-26 15:56:16 -07:00
Vincent Wang	805ec459a0	Fix a PoliCheck finding in _hierarchical_ortmodule.py(#13462 )	2022-10-26 15:45:18 -07:00
sumitsays	490e4ddea5	[DML EP] Don't fuse a capability outside the compile call (#13468 ) ### Description DML EP was a special EP w.r.t. capability fusion. It used to fuse a capability outside the IExecutionProvider::Compile() call. But after recent re-architecture #13131, it is no longer a special case. ### Motivation and Context Why is this change required? What problem does it solve? To make DML EP consistent with the ORT design. - If it fixes an open issue, please link to the issue here. N/A Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2022-10-26 15:21:33 -07:00
Dmitri Smirnov	1c8a22ec68	Improve logging and default affinity mask generation (#13338 ) ### Description Fix logging for affinity failures on Linux. Make `GetCpuCores()` consistently return the number of physical cores. Use `CpuInfo` library to correctly set affinities for Linux where supported. Make windows generate affinity masks as ordinals and convert them to masks at the setting site. Allow setting multiple logical processors affinity masks per thread. We continue to set all logical processors as thread affinity per physical core. ### Motivation and Context Error logging on Linux uses `pthread_self()` which does not return Thread ID. Fix default affinity mask generation on Windows. The following are the issues with Windows: - `GetThreadAffinityMasks()` returns bitmasks, but on other platforms it returns ordinals generated for the hardware concurrency - The maximum number of processors supported for requires a mask of 64-bits, but `size_t` type used is not always 64-bit - The masks returned per physical core may have multiple bits set, because the mask applies to several logical cores hosted by the physical core. In the past, customers complained that their threads jump from one core to another which adversely affects performance. The decision was made to stay this way. - 64-bit masks do not allow for logical processors with IDs that are outside of 0-63 range.	2022-10-26 13:30:27 -07:00
Rui Ren	136e15bfaf	revert cmake external file (#13459 )	2022-10-26 11:38:15 -07:00
Adrian Lizarraga	8770201e96	[EP-Perf-Dashboard] Decouple docker image name from branch name (#13449 ) ### Description Updates naming scheme for docker images built by the EP Perf pipeline. Specifically, the docker image name is no longer based on the branch name. ### Motivation and Context The docker image name used by EP Perf pipeline is built from the branch name. This makes the pipeline fail for branches with uppercase letters because docker image names can only contain lower-case letters.	2022-10-26 10:27:22 -07:00
Juan Villamizar	48b2ec944c	Fix warnings preventing Onnx build (#13447 )	2022-10-26 07:53:55 -07:00
Abhishek Udupa	8fbdc6cc46	Add a script for quick profile analysis (#13423 ) ### Description Implements a Python script for quick analysis of a generated JSON profile from ORT. ### Motivation and Context This PR implements a script that lists kernels that take up the most time in a JSON profile, from both the CPU and GPU points-of-view. The script also supports various options for CSV output, grouping of kernels wrt shape of input tensors and wrt kernel dimensions. Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>	2022-10-26 07:43:03 -07:00
PeixuanZuo	a0cc289be6	Update SkipLayerNorm fusion rules (#13350 ) ### Description <!-- Describe your changes. --> The subgraph below meet the SkipLayerNorm fusion pattern, but the fusion rules also required every input dimension has a certain value. So the subgraph below cannot fused to SkipLayerNorm. subgraph we want to fuse ![image](https://user-images.githubusercontent.com/94887879/196386821-3e678a4c-83e4-4bca-8900-5ef4ea996868.png) fusion pattern 3 [Sub1] [Sub2] \ / \ / \ / Add1 \| LayerNormalization This change allow inputs of FirstAdd operator has dimension which only has dim_param. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2022-10-26 16:15:27 +08:00
Patrice Vignola	ac48bdec89	DML EP add einsum MatMul NHCW ops (#13440 ) ### Description This adds the "NHCW" format support for einsum MatMul. The logic is basically a merge of the existing Transpose and MatMul Einsum implementations. ### Motivation and Context Some transformer models that I'm tracking use Einsum quite often during a single inference, and about half of those were "NHCW" MatMul Einsums. Supporting them will reduce the number of copies to the CPU.	2022-10-25 23:09:07 -07:00
Patrice Vignola	d5e8d59243	DML EP register all data types for Where operator (#13443 ) ### Description Register all datatypes for DML's `Where` operator since DML now supports everything. ### Motivation and Context Some transformer models use the `Where` operator on int64 data, but since DML wasn't supporting it, it needed to fall back to the CPU.	2022-10-25 22:47:55 -07:00
PeixuanZuo	70b73afd36	[ADD] fuse Matmul + fastgalu -> gemmfastgelu (#11699 ) Description: Describe your changes. fuse MatMul + FastGelu -> GemmFastGelu prepare for AMD optimized fused operator GemmFastGelu usage: python benchmark.py -g -m bert-base-cased --sequence_length 384 --batch_sizes 128 --provider=rocm -p fp16 --disable_embed_layer_norm --enable_gemm_fast_gelu Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-10-26 09:33:58 +08:00
Adam Louly	cf8bf0c141	add on device training to the packaging pipelines (#13446 ) ### Description enabling on device training apis in the packaging pipelines. ### Motivation and Context adding on device training flag so we can enable the on-device training apis for Federated learning scenarios Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-10-25 15:03:34 -07:00
Tianlei Wu	7aafd86229	Update Attention operator to support separated Q/K/V inputs (#13410 ) ### Description Allow separated Q, K and V inputs to support cross attention: * Q: [batch_size, sequence_length, hidden_size] * K: [batch_size, kv_sequence_length, hidden_size] * V: [batch_size, kv_sequence_length, v_hidden_size] * Output: [batch_size, sequence_length, v_hidden_size] To use separated Q/K/V inputs, the input tensor is for query, and two optional inputs are added for key and value. Weights for input projection is not included for now, so the MatMul of input projection shall be done out of Attention operator, but Add bias is included for performance consideration.	2022-10-25 11:51:06 -07:00
Changming Sun	a396a91c9a	Move build machines with Nvidia M60 GPUs to Nvidia T4 (#13170 )	2022-10-25 11:21:13 -07:00
Dwayne Robinson	0201cd75e1	Document generation for operator kernels, enable internal overload of DML EP to initialize on software-only devices (#13428 ) ### Description The documentation pipeline does not require an actual GPU, and running on GPU-capable agents costs more. So to enable running on CPU-only devices and to potentially consolidate future pipelines, and since the tests are not actually executed on this device anyway (it just needs to initialize the EP for the sake of operator kernel enumeration), add an initialization flag to skip the software device check - this is only an internal overload not exposed in the public API. See https://github.com/microsoft/onnxruntime/pull/13308. ### Motivation and Context - If it fixes an open issue, please link to the issue here. NA	2022-10-25 11:14:43 -07:00
Tianlei Wu	d80212d42c	Add script for question answering (SQuAD) accuracy evaluation of BERT model (#12947 ) Add script to evaluate accuracy of BERT/DistilBERT/Roberta models on question-answering task. By default, pretrained model `bert-large-uncased-whole-word-masking-finetuned-squad` will be used if model name is not specified. If onnx path is not specified, optimum will be used to export an ONNX model for testing. Example usage: * Evaluate with CPU execution provider: `python eval_squad.py` * Evaluate with CUDA execution provider: `python eval_squad.py --use_gpu` * Evaluate an optimized onnx model for 'distilbert-base-cased-distilled-squad' with sequence lengths 128/192/256/384 on first 100 samples: `python eval_squad.py -m distilbert-base-cased-distilled-squad --use_gpu -s 128 192 256 384 --onnx_path ./optimized_fp16.onnx -t 100`	2022-10-25 09:21:01 -07:00
cloudhan	d82036dbbd	Add Pre- and Post-tunning API to allow pre- and postprocessing of params (#13411 ) Some op will use a buffer for input and output at the same time, so it will do inplace update to it. If we blindly tune over the `params`, there will be accumulated update to that buffer during FindFastest, which is an undesired side effect. In this case, we use a proxy params struct for the tuning to avoid this side effect.	2022-10-25 17:44:28 +08:00
Vincent Wang	b6a3562ffb	[ORTModule] Add Env Variable to Control Disabling Custom AutoGrad Function Support (#13430 ) Add env variable to control disabling custom autogard function support. When using ORTModule, if the torch model has torch.nn.Function, if user confirms that it can be exported to ONNX (for example, by inline PythonOp) and the backward implementation is matched to the forward impl, user can export "ORTMODULE_DISABLE_CUSTOM_AUTOGRAD_SUPPORT=1" to disable the custom autograd support so that it won't use ORT's PythonOp to fallback to PyTorch. Exporting to ONNX sometimes can leverage some graph optimizations in ORT so that perf is better.	2022-10-25 16:58:04 +08:00
Cheng	ea1bdb162f	[NNAPI] Refactor `Resize` as layout insensitive (#13412 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-25 16:50:05 +08:00
cloudhan	93f7a97a6d	Exculde hipify option from policheck (#13431 )	2022-10-25 16:35:16 +08:00
PeixuanZuo	28f470c26c	[ROCm] Use SkipLayerNorm original implementation in kernel explorer (#13382 ) ### Description <!-- Describe your changes. --> Wrap SkipLayerNormoriginal implementation as a function. Use it as part of SkipLayerNormTunableOp. Use it in Kernel explorer to compare the gap between TunableOp and Original implementation. the profile output like below: `float16 8 512 768 <class '_kernel_explorer.SkipLayerNorm_half_Original'> 23.48 us 804.04 GB/s float16 8 512 768 <class '_kernel_explorer.SkipLayerNorm_half_Tunable'> 20.41 us 925.00 GB/s ...` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2022-10-24 22:00:24 -07:00
cloudhan	2748f38362	Drop hip_add_library (#13406 ) Switching to use CMake's builtin hip language support.	2022-10-25 12:57:48 +08:00
Yi Zhang	e160688a9b	Skip some failed models winml and training workflows on Windows CPU (#13407 ) ### Description 1. update model name structure in model_tests.cpp with source name. To avoid `Condition test_param_names.count(param_name) == 0 failed. Duplicate parameterized test name 'BERT_Squad_opset10_CPU'` 2. skip some failed models https://github.com/onnx/models/issues/568 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-25 10:05:04 +08:00
sumitsays	24818cfd73	[DML EP] Attention Kernel (#13371 ) ### Description DML EP kernel for com.microsoft.attention operator. It has been implemented via DML_Graph. References for this implementation: 1. [Hugging Face Attention for BERT](`310340d0d0/src/transformers/models/bert/modeling_bert.py (L245-L284)`) 2. Chapter 3 of book Orielly: Natural Language Processing with Transformers, Revised Edition This PR also - includes a very tiny fix for QLinearSigmoid kernel, which is storing the temporary object into a named variable. - enables 4 L2 transformers LayerNorm, Gelu, MatMulScale, Attention. ### Motivation and Context - Why is this change required? What problem does it solve? One of the main operators used in Transformer-based model. It contributes to the overall perf of DML EP for Transformer models. - If it fixes an open issue, please link to the issue here. N/A Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2022-10-24 14:32:37 -07:00
Yi Zhang	1885460776	skip some models failed in dynamic shape infer (#13400 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Some models from model zoo failed in the Linux CPU workflow. https://github.com/onnx/models/issues/562 Skip them temporarily. ###Verfication Linux CPU CI passed with beta image https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789772&view=results 2022-10-21T13:31:17.6740348Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/Inception-1-int8/inception-v1-12-int8.onnx 2022-10-21T13:31:17.6740998Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/DenseNet-121-12-int8/densenet-12-int8.onnx 2022-10-21T13:31:17.6741618Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/MNIST-12/mnist-12.onnx 2022-10-21T13:31:17.6742207Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-int8/ssd-12-int8.onnx 2022-10-21T13:31:17.6742898Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet50_fp32/resnet50-v1-12.onnx 2022-10-21T13:31:17.6743544Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/MobileNet v2-1.0-fp32/mobilenetv2-12.onnx 2022-10-21T13:31:17.6744259Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ResNet101_DUC_HDC-12/ResNet101-DUC-12.onnx 2022-10-21T13:31:17.6744891Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/YOLOv3-12-int8/yolov3-12-int8.onnx 2022-10-21T13:31:17.6745501Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/AlexNet/bvlcalexnet-12.onnx 2022-10-21T13:31:17.6746114Z Running symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/ZFNet-512-int8/zfnet512-12-int8.onnx 2022-10-21T13:31:17.6746768Z Skip symbolic shape inference on : /mnt/vss/_work/1/b/Release/../models/zoo/opset12/SSD-MobilenetV1-12-int8/ssd_mobilenet_v1_12-int8.onnx	2022-10-25 01:48:46 +08:00
Yi Zhang	143725604e	Skip some models failed in Windows CPU C# tests (#13395 ) ### Description For models from model zoo, in C# tests of Windows CPU CI skip models whose name contains int8 or qdq. skip some models (VGG16, VGG19) in x86 workflow ### Motivation and Context These models always failed in Windows CPU C# tests (https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789442&view=results) ### verified https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=789861&view=results C# tests passed	2022-10-22 13:54:24 +08:00
Jian Chen	397edf9918	Bumping up version number to 1.14.0 on main branch (#13401 ) ### Description Bumping up version number to 1.14.0 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2022-10-21 19:16:44 -04:00

1 2 3 4 5 ...

7634 commits