onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

Author	SHA1	Message	Date
Edward Chen	d514a960ee	Remove "Python Checks" pipeline status from readme as that pipeline no longer exists. (#18697 )	2023-12-04 13:38:36 -08:00
Caroline Zhu	c02a386145	[js/web/training] Implemented runEvalStep & runOptimizerStep (#18259 ) ### Description * implemented runEvalStep and runOptimizerStep * added hasEvalModel and hasOptimizerModel boolean fields in TrainingSession representation * added evalInputNames and evalOutputNames fields to TrainingSessionHandler & TrainingSession * removed the inputNamesEncoded and outputNamesEncoded fields from TrainingSessionHandler -- since none of the training methods require the input names and output names as parameters, there's no need to store them. ### Motivation and Context * part of the work for implementing web bindings for training * previous PR: #18250 --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-12-04 13:37:14 -08:00
Jiajia Qin	5353adcde3	[js/webgpu] Use the naive convTranspose when in/out channels are both 1 (#18658 ) ### Description With this change, convTranspose with input0 [1, 18, 32, 1], input1 [1, 1, 16, 16] becomes 0.59ms from 6.64ms.	2023-12-04 13:18:37 -08:00
trajep	a5b2291e0f	[Transformer Optimization]Return model directly for unknown model type (#18642 ) This pull request is used to improves the handling of unsupported model types in the optimization process.	2023-12-04 12:26:50 -08:00
Deoksang Kim	2f8b86b939	Fix typo in the TensorShape (#17813 ) The function name in the log should be SizeToDimension	2023-12-01 16:48:55 -08:00
Jiajia Qin	92ee664f64	[js/webgpu] Fix shader errors in indicesGet/Set when rank > 4 (#18661 ) ### Description Currently, for non-uniform variables, we still use `array<u32, N>` type instead of array<vec4<u32>, N1>`. So we can't always treat all variables with rank > 4 as uniforms to index. This PR fixes below errors: ``` error(s) generated while compiling the shader: :5:44 error: index 4 out of bounds [0..1] return uniforms.input_strides[4] * (outputIndices[4] % uniforms.input_shape[4])+uniforms.input_strides[3] * (outputIndices[3] % uniforms.input_shape[3])+uniforms.input_strides[2] * (outputIndices[2] % uniforms.input_shape[2])+uniforms.input_strides[1] * (outputIndices[1] % uniforms.input_shape[1])+uniforms.input_strides[0] * (outputIndices[0] % uniforms.input_shape[0]); ^ FAILED #OpTest# - expand.jsonc [webgpu]Expand - Expand 5D - float32 Expand 5 - float32 FAILED #OpTest# - expand.jsonc [webgpu]Expand - Expand 5D - float32 Expand 5 - shape < input.size()	2023-12-01 15:35:35 -08:00
Changming Sun	eaaf27015e	Remove EnvSetupScript parameter from win-ci.yml (#18662 ) ### Description To make the code more consistent. Now some TRT pipelines download TRT binaries on-the-fly, while other TRT pipelines use a preinstalled version. This PR make them the same.	2023-12-01 15:30:16 -08:00
Rachel Guo	9c45fe4957	Fix macos xcframework test stage codesign info (#18649 ) ### Description <!-- Describe your changes. --> Remove developement id and force codesign not required in the test macos target. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix failure happened in iOS_Full_xcframwork stage in Zip-Nuget-Java-NodeJS packaging pipeline. --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-12-01 14:47:46 -08:00
Edward Chen	a353805631	Fix Windows TVM CI workflow (#18667 ) Fix issue with installing LLVM dependency.	2023-12-01 13:49:45 -08:00
Edward Chen	b22f49ff35	Fix unit tests failures in build with contrib ops disabled (#18659 ) Fix unit tests failures in build with contrib ops disabled. - QDQTransformerTests.QDQPropagation_GH11605_Opset12_19 - TransposeOptimizerTests.QnnTransposeNonConstBroadcastInput	2023-12-01 09:41:25 -08:00
Bowen Bao	fcea2cb7f1	[Dort] Run type promotion pass to resolve dtype discrepancy (#18516 ) Fixes CI failures mentioned in #18507 But we should not keep two separate dort impls in both pytorch and onnxruntime. They are out of sync.	2023-12-01 09:36:18 -08:00
snadampal	05a9c95764	[DNNL] add Arm Compute Library (ACL) backend for dnnl execution provider (#15847 ) Add ACL as the DNNL runtime option for aarch64 platforms. Update makefile and the python wheel build script. ### Description <!-- Describe your changes. --> Add ACL as the DNNL runtime option for aarch64 platforms. Update makefile and the python wheel build script. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is to enable the optimized ACL gemm kernels for dnnl execution provider on aarch64 platform.	2023-12-01 09:16:44 -08:00
Jian Chen	d69842226b	Update the template files to correct stage to fix the python cuda 12 packaging pipeline (#18651 )	2023-12-01 07:57:46 -08:00
guyang3532	182c525416	Support MatMulBnb4 in PaddingElimination (#18646 ) Also support Cast pattern between input and embedding node for sparsity inspecting	2023-12-01 19:27:50 +08:00
Hector Li	ccfea55942	[QNN EP] Enable QNN HTP VTCM size setting (#18653 ) ### Description [QNN EP] Enable QNN HTP VTCM size setting	2023-11-30 21:09:13 -08:00
Tianlei Wu	9c9e6adeb2	Add SDXL Turbo to demo (#18627 ) * Add SDXL Turbo to the demo. * Change default scheduler to EulerA for XL or Turbo since DDIM does not work well with small steps. Example to run the model in demo (See README for instructions): ``` python3 demo_txt2img_xl.py --version xl-turbo --height 512 --width 512 --denoising-steps 1 --scheduler UniPC "little cute gremlin sitting on a bed, cinematic" ```	2023-11-30 18:19:31 -08:00
Wanming Lin	c7732a78d7	[WebNN EP] Fixed bug in op checking (#18638 )	2023-11-30 17:47:56 -08:00
Xu Xing	73d9b03509	[js/webgpu] Add multidimensional(>4) uniform support (#18546 ) This change removes the check of enableShapesUniforms. When all uses of this are removed, enableShapesUniforms can be removed too.	2023-11-30 17:10:33 -08:00
Wanming Lin	73a2eb82eb	Fixed bug in Flatten's axis (#18645 ) Flatten's axis is in the range [-r, r] rather than [-r, r-1].	2023-11-30 16:19:22 -08:00
Jiajia Qin	6781b6cf3d	[js/webgpu] add bool type for Expand/Gather (#18615 ) ### Description In [detr-resnet-50](https://huggingface.co/Xenova/detr-resnet-50) model, it uses expand with bool type running on cpu ep. \| Kernel \| Shape \| Provider \| \| -------- \| ------- \| ------- \| \| Expand \| "input_type_shape" : [{"bool":[1,1,1,625]},{"int64":[4]}],"activation_size" : "657","output_type_shape" : [{"bool":[1,1,625,625]}] \| CPUExecutionProvider \| After this change, it will run on jsep. \| Kernel \| Shape \| Provider \| \| -------- \| ------- \| ------- \| \| Expand \| "input_type_shape" : [{"bool":[1,1,1,625]},{"int64":[4]}],"activation_size" : "657","output_type_shape" : [{"bool":[1,1,625,625]}] \| JsExecutionProvider \|	2023-11-30 15:47:08 -08:00
Yi Zhang	efee9abdb7	Reduce downloads in Nuget-Java pipeline to reduce connection exception (#18635 ) ### Description 1. Add a new stage to download java tools from https://oss.sonatype.org and publish them to pipeline artifact 2. Remove downloads in other jobs, they get the java tools from pipeline artifact 3. consolidate final_java_testing stages. ### Motivation and Context Reduce downloads to reduce the connection error like below. ``` --2023-11-28 07:16:31-- https://oss.sonatype.org/service/local/repositories/releases/content/org/junit/platform/junit-platform-console-standalone/1.6.2/junit-platform-console-standalone-1.6.2.jar Resolving oss.sonatype.org (oss.sonatype.org)... 3.227.40.198, 3.229.50.23 Connecting to oss.sonatype.org (oss.sonatype.org)\|3.227.40.198\|:443... connected. HTTP request sent, awaiting response... 502 Bad Gateway 2023-11-28 07:16:32 ERROR 502: Bad Gateway. ```	2023-12-01 07:44:44 +08:00
zesongw	4025bd8ebd	[WebNN EP] Fix bug of padding in Op ConvTranspose (#18577 ) Get the dimensions of H and W according to the layout.	2023-11-30 12:59:36 -08:00
Jiajia Qin	b1e749e3be	[js/webgpu] Add program name into webgpuProfiling info (#18640 ) ### Description Currently, we only print the kernelName, which is hard to distinguish which shader we actually used. For example, GroupedConv/Conv2DMatMul both belong to Conv kernel. It's not intuitive for profiling.	2023-11-30 12:57:29 -08:00
Dmitri Smirnov	c5ea1547c6	Eliminate intermediate string conversion buffer. (#18608 ) ### Description Make use of unsafe string constructor that is able to convert native UTF-8 string straight into the string instance buffer. ### Motivation and Context Reduce garbage,	2023-11-30 10:50:24 -08:00
Yulong Wang	e7f64f4510	[js/web] fix ESLint by excluding generated .js from tsconfig.json (#18634 ) ### Description ESLint will went into error sometimes. The root cause is because some large generated JavaScript file in the tsconfig's include path will cause TypeScript parser fail in a line of `string.match()` with a regex on a huge string (~8MB), causing the following error: ``` RangeError: Maximum call stack size exceeded ``` The solution is to remove the large files from the tsconfig's include path. Previously I excluded the `web/dist/` folder and this PR excludes `web/test/ort.test[.min].js`.	2023-11-30 09:50:47 -08:00
Changming Sun	23a91c8ba8	Fix warning C4003 in ORT python binding code (#18612 ) ### Description Fix warning C4003 in ORT python binding code. ### Motivation and Context It's better to fix the warning instead of suppressing it.	2023-11-30 08:07:47 -08:00
Changming Sun	1b5675ff0f	Update post-merge-jobs.yml: increase timeout value for the Ios job (#18602 )	2023-11-30 08:07:13 -08:00
Vincent Wang	148495ebc5	[ORTModule] Use Default Topo-order for GraphViewer (#18410 ) ORT's default topo-order is a reversed DFS algorithm, while the priority-based topo-order is a forward BFS algorithm. It's likely that the default order is better than priority-based order on memory because tensor memory is more likely to be released right after it's consumed. Currently ORTModule uses priority-based order, for some models, it sorts lots of small Ops to the beginning, this introduces big CPU overhead at the beginning (see below screenshot), this PR is to use default order for training. The priority-based order is heavily used for some recompute optimization, so if there is recompute enabled, we will still use priority-based order. This PR also adds an optimization to the default order, which is to move all Shape/Size Ops to right after their parent nodes. This is to make sure the shape and size nodes are executed right after their parents so it's possible the input tensor memory can be released as soon as possible. This is especially important for non-CPU devices or for training case where some gradient graphs use only shape/size of tensors from forward. Profiling result: Before <img width="910" alt="截屏2023-11-13 12 09 02" src="https://github.com/microsoft/onnxruntime/assets/11661208/e54d5ead-274f-4725-923e-521bbcfce752"> After <img width="910" alt="截屏2023-11-13 12 10 44" src="https://github.com/microsoft/onnxruntime/assets/11661208/f50d196d-11ac-43a2-9493-517e4552ffab">	2023-11-30 20:17:22 +08:00
Vincent Wang	e1d1033131	[ORTModule] Remove Unused Arguments from Generated Triton Code (#18636 ) This PR: - Remove unused arguments from generated triton code, - Remove unnecessary mask for symbolic shape case from generated triton code. - Add doc for usage of ORTMODULE_TRITON_CONFIG_FILE.	2023-11-30 18:32:36 +08:00
George Wu	5c67a00d8e	Revert "remove full protobuf requirement for tensorrt ep" (#18626 ) Reverts microsoft/onnxruntime#18413 there's a timing issue here. we eventually want to get this change merged in but we need to update OSS onnx-tensorrt first.	2023-11-29 22:27:51 -08:00
Jambay Kinley	c20488ced7	skip_infer for SkipGroupNorm in SymbolicShapeInference (#18630 ) ### Description <!-- Describe your changes. --> https://github.com/microsoft/onnxruntime/pull/18273 added `SkipGroupNorm` contrib op but it did not skip onnx shape inference for this op in `SymbolicShapeInference`. This leads to failed shape inference of the transformers optimized model with `enable_skip_group_norm=True`. Also results in an invalid float16 model for the SD CUDA example. This PR adds `SkipGroupNorm` to `skip_infer` so that it skips onnx shape inference for this op and instead uses the relevant dispatcher. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix shape inference failure for models with `SkipGroupNorm` nodes.	2023-11-29 18:27:04 -08:00
Yang Gu	227dcb3a88	[js/webgpu] Log the key and program info for artifact (#18365 ) With uniform support, ideally we may just keep one artifact for each program to save the compilation time. This PR just logs the related info, including key and program name, so that we may understand better the situation.	2023-11-29 18:01:12 -08:00
satyajandhyala	7335760424	[JS/Web] Add uniforms to Einsum (#18531 ) ### Description Add uinforms to Einsum ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance.	2023-11-29 15:30:33 -08:00
Edward Chen	483c490ec4	Refine error checks in onnxruntime/core/providers/coreml/model/model.mm. (#18620 ) #18606 updated the original error checks to check that the returned object != nil to appease the static analyzer. However, per the API docs, checking `error != nil` is the way to determine whether an error occurred. This change adds back the `error != nil` check to be safe.	2023-11-29 14:38:44 -08:00
Dmitri Smirnov	d2dfbf4179	Add float16 type support to SplitToSequence and make code type independent (#18594 ) ### Description Add support for `float16` type to address the below issue. Re-work the code to make it type independent. This reduces binary size by ~11 K. ![image](https://github.com/microsoft/onnxruntime/assets/11303988/1a77c7bc-34a8-478c-a16a-abd94062c6c6) ### Motivation and Context This PR addresses https://github.com/microsoft/onnxruntime/issues/18481	2023-11-29 10:44:59 -08:00
Yi Zhang	68209307da	Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 (#18614 ) ### Description Replace all Azure-Pipelines-EO-Windows2022-aiinfrat to Onnxruntime-Win-CPU-2022 ### Motivation and Context Reduce the maintenance cost	2023-11-29 10:32:42 -08:00
Wanming Lin	38b640c797	[WebNN EP] Re-implement Unsqueeze, Squeeze, Flatten with WebNN's reshape (#18585 ) WebNN will not provide `unsqueeze`, `squeeze`, `flatten2d` ops, as it can be easily implemented by reshape.	2023-11-29 08:00:23 -08:00
Edward Chen	14a343441d	Fix Objective-C static analysis build (#18606 ) - Patch abseil to fix a compile error about not finding `cxxabi.h`. - Fix some static analysis warnings.	2023-11-28 17:14:20 -08:00
ivberg	e833d22f14	Change QNN EP Profiling logs to output to CSV (#18201 ) ### Description Change QNN EP Profiling logs to output to CSV. Output is in a similar format to QNN SDK Tools (instead of to ORT logs) https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#configuration-options (profiling_level) ### Motivation and Context It is hard to read and interpret QNN profiling logs in the ORT logs. --------- Co-authored-by: Hector Li <hecli@microsoft.com>	2023-11-28 16:58:51 -08:00
Tianlei Wu	f13380f3d8	Support LoRA and Control Net in Stable Diffusion demo (#18593 ) ### Description (1) Export onnx model with LoRA weights for both SD 1.5 and SDXL (2) Export onnx model with Control Net for both SD 1.5 and SDXL. For SD 1.5, it is allowed to use multiple control nets. For SDXL, at most one control net is supported right now. (3) Add demo of LCM LoRA (3) Add demo of control net.	2023-11-28 15:46:42 -08:00
Yulong Wang	50e6235af1	[js/web] allow ShaderHelper to use internal (non-I/O) variables (#18525 ) ### Description This PR includes a change that inspired from #18452 to resolve a requirement: a shader may depend on an instance of `IndicesHelper` to generate WGSL code snippet, but the IndicesHelper instance is not necessarily an input/output of the program. So the existing `declareVariables()` function does not work with this scenario. In order to support this requirement, I added this "use" function to `interface ShaderHelper`, which takes a helper-like object as parameter. The hidden implementation `ShaderHelperImpl` class will iterate the helpers and call `impl()` for each. @axinging @qjia7	2023-11-28 15:15:59 -08:00
Jian Chen	a49f31b670	Remove drop-nuget artifact from all pipelines (#18592 ) ### Description Currently, the `drop-nuget` artifact only contains protoc.exe which is also part of the `drop-extra` artifact. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-11-28 13:23:01 -08:00
Mike Guo	e24733cfe9	fix the Olive CI pipeline failure on Windows (#18464 ) Fix the https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1046 failure for Windows	2023-11-28 11:42:39 -08:00
Rachel Guo	288b80d363	Add MacOS build to ORT C Pod (#18550 ) ### Description <!-- Describe your changes. --> As title. 1. Add macos build as an optionally enabled arch for pod and changes to exsiting build_ios_framework/assemble_c_pod scripts. 2. Enable macos build arch in ios packaging pipeline (currently for variants other than Mobile) and check the output artifacts are correct. 3. Write MacOS Test Target scheme in the test app and integrate into ios packaging CI testing pipeline. Currently the changes only apply to onnxruntime-c pod. as the original request was from ORT SPM which consumes the onnxruntime-c pod only as the binary target. TODO: could look into adding macos platform to objc pod as well. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable macos platform support in cocoapods. and also potentially produce binary target for enabling macos platform in SPM as well. Replace https://github.com/microsoft/onnxruntime/pull/18334 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-11-28 10:11:53 -08:00
Chen Fu	05046e5452	Adding unit test for sm80 prepack (#18514 ) ### Description Prepacking code for block q4 x fp16 GEMM cuda kernel, for SM80 hardware ### Motivation and Context Preparing for addition of Q4 x FP16 GEMM kernel on Nvidia Ampere GPUs. This kernel requires sophisticated quantized weight rearrangement to speedup loading data to tensor-core. To facilitate the addition, this change includes the following: 1. matrix_layout.h A new layout lib that facilitate iterating matrix elements and tiles that balance memory safety and performance. 2. prepack_sm80.h Code for rearranging quantized weight, scales and offsets (aka. prepacking) 3. blkq4_fp16_sm80_prepack_test.cc Unit tests that explicitly test the memory safety and correctness of the prepacking code. Currently the prepacking code runs on CPU with single threaded code. We run this on CPU in order to minimize GPU memory fragmentation. On the other hand, hopefully we get around to parallelize this part of the code. Should be straight forward with the unit tests in place.	2023-11-28 10:01:09 -08:00
Adrian Lizarraga	8d5ecc4dae	[Quantization] Fix scale/zero-point for 16-bit QDQ Softmax (#18589 ) ### Description Sets the appropriate scale and zero-point values for 16-bit QDQ Softmax. Previously, the scale/zp were set to fixed values that were specific to 8-bit quantization. ### Motivation and Context Generate more accurate 16-bit QDQ models that contain Softmax.	2023-11-28 09:46:47 -08:00
Sheil Kumar	0b7048e7d6	Update winml to use #cores - #soc cores by Default as the number of intraopthreads (#18384 ) Update winml to use #cores - #soc cores by Default as the number of intraopthreads --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2023-11-28 09:26:48 -08:00
Yi Zhang	a6d8726407	Update ADO windows image to custom image (#18598 ) ### Description Update Azure-Pipelines-EO-Windows2022-aiinfra to onnxruntime-win-CPU-2022 in Nuget_Package_CPU. To make the debugging easier, use flex-downloadPipelineArtifact ### Motivation and Context Azure-Pipelines-EO-Windows2022-aiinfra is using 1ES window-latest image. The pipeline might be failed by unexpected upgrade. Verified: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=384425&view=results ### P.S. I think we should replace all Azure-Pipelines-EO-Windows2022-aiinfra.	2023-11-28 09:04:25 -08:00
Jian Chen	3ea27c2925	Create a new Nuget Package pipeline for CUDA 12 (#18135 )	2023-11-28 09:03:46 -08:00
Xavier Dupré	94a6020a7f	Improve parallelization of TfIdfVectorizer, Reduce memory consumption (#18539 ) ### Description TfIdfVectorizer has two steps: first search for n-grams in the input, second, weight the results. The second step was not parallelized. The PR adresses that issue. Before two vectors were of the size of the output were allocated to compute the results. The first one, frequencies, was used as an intermediate vector between the two steps. This vector is now broken into multiple small vectors, one per thread. The memory consumption is then reduced for batches with a number of rows > the number of threads. ### Motivation and Context Performance and memory consumption. For one model, the improvment is +15% faster (4 cores, model size is ~6Mb, batch size is 100). Here is another benchmark on a machine with 32 cores with different size of vocabularies and batch sizes. The tested TfIdfVectorizer only deals with unigram and processes sequences of 10 tokens (integers). ![image](https://github.com/microsoft/onnxruntime/assets/22452781/0bb9abe9-ed81-44da-b5c4-ad2a12f129bd)	2023-11-28 12:56:00 +01:00

1 2 3 4 5 ...

10099 commits