onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-27 03:11:28 +00:00

Author	SHA1	Message	Date
Yulong Wang	abdc31de40	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 ) ### Description See `454996d496` for manual changes (excluded auto-generated formatting changes) ### Why Because the toolsets for old clang-format is out-of-date. This reduces the development efficiency. - The NPM package `clang-format` is already in maintenance mode. not updated since 2 years ago. - The VSCode extension for clang-format is not maintained for a while, and a recent Node.js security update made it not working at all in Windows. No one in community seems interested in fixing those. Choose Prettier as it is the most popular TS/JS formatter. ### How to merge It's easy to break the build: - Be careful of any new commits on main not included in this PR. - Be careful that after this PR is merged, other PRs that already passed CI can merge. So, make sure there is no new commits before merging this one, and invalidate js PRs that already passed CI, force them to merge to latest.	2024-08-14 16:51:22 -07:00
Guenther Schmuelling	bb43a0f133	[js/webgpu] minor fixes to make tinyllama work (#19564 )	2024-02-23 15:45:30 -08:00
Xu Xing	3a2ab1963a	[js/webgpu] Refactor createTensorShapeVariables (#18883 )	2024-02-01 17:59:00 -08:00
Xu Xing	d73131cf0f	[js/webgpu] Use DataType as uniform cpu type (#19281 ) This saves turning data type to string by tensorDataTypeEnumToString.	2024-01-30 21:05:08 -08:00
Xu Xing	624b4e2063	[js/webgpu] Remove enableShapesUniforms (#19279 )	2024-01-29 17:49:06 -08:00
Jiajie Hu	447a3a7c70	[js/webgpu] Fix Expand/Gather when input type is bool (#18999 ) ### Description Also update the op test suite. ### Motivation and Context Previously the total size in case `Expand - last dim is not divisible by 4` was a multiple of 4, even though the last dimension was not, so the bug has never been caught.	2024-01-05 08:16:15 -08:00
Jiajia Qin	6781b6cf3d	[js/webgpu] add bool type for Expand/Gather (#18615 ) ### Description In [detr-resnet-50](https://huggingface.co/Xenova/detr-resnet-50) model, it uses expand with bool type running on cpu ep. \| Kernel \| Shape \| Provider \| \| -------- \| ------- \| ------- \| \| Expand \| "input_type_shape" : [{"bool":[1,1,1,625]},{"int64":[4]}],"activation_size" : "657","output_type_shape" : [{"bool":[1,1,625,625]}] \| CPUExecutionProvider \| After this change, it will run on jsep. \| Kernel \| Shape \| Provider \| \| -------- \| ------- \| ------- \| \| Expand \| "input_type_shape" : [{"bool":[1,1,1,625]},{"int64":[4]}],"activation_size" : "657","output_type_shape" : [{"bool":[1,1,625,625]}] \| JsExecutionProvider \|	2023-11-30 15:47:08 -08:00
Xu Xing	949ac4b7ce	[js/webgpu] Support uniforms for gather (#18312 )	2023-11-13 11:24:34 -08:00
Yulong Wang	d532645bed	[js/webgpu] revise uniform support (#17871 ) ### Description <!-- Describe your changes. --> work for items (2) and (3) in #17860	2023-10-11 16:41:46 -07:00
Yulong Wang	d9b9c5a537	[js/webgpu] support using uniform buffer (#17803 ) ### Description support using uniform buffer. This PR allows to use uniform buffer in shader program, so that some runtime information (eg. input/output shape) is no longer need to be hardcoded into shader code. There are 2 commits in this PR: - [667f31c](`667f31c83d`): framework changes to support uniform buffer, as well as updates in program manager, gpu data manager and indices helper. - [09e1d2a](`09e1d2ad1d`): an example change for operator `Transpose` to use input's rank-only instead of dims as shader key. With this change, model mobilenetv2-12 shader compile times dropped from 71 to 52.	2023-10-10 00:31:12 -07:00
Jiajia Qin	891fba3b9c	[js/webgpu] Optimize Gather op (#17625 ) ### Description This PR optimizes the gather op, which is improved ~6ms in segment anything model in ADL. The problem in original algorithm is that it includes a for loop to calculate a block size of data. However, the block size may be very large, like `65536`. In GPU shader, we should try to avoid large loop in shader and try to use more threads to do it parallelly. Before: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 6886207 ns ``` After: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 11719 ns	2023-09-21 21:00:36 -07:00
Yulong Wang	9aafbe3feb	[js/web] revise TensorView (#17473 ) ### Description This change: - removes the unused `Tensor` types declared in /js/web/lib/wasm/jsep/tensor.ts - removes duplicated util functions in /js/web/lib/wasm/jsep/tensor.ts - renames /js/web/lib/wasm/jsep/tensor.ts to /js/web/lib/wasm/jsep/tensor-view.ts and update corresponding references. It was kind of confusing that we have multiple `Tensor` types defined in different places also we have multiple `tensor.ts` source files. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 https://github.com/microsoft/onnxruntime/pull/17473 (this one)	2023-09-14 21:14:44 -07:00
Arthur Islamov	ccf14e891e	[js/web] JSEP node assignment optimization (#17128 ) ### Description Since WebGPU supports only float32 and int32, having Gather, Reshape, Shape, Squeeze and Unsqueeze ops with other data types create additional MemCpy ops and slow down the overall execution as all other OPs with other tensor types will be done on CPU. Before this patch SD Unet had these numbers: Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1141 Node(s) placed on [JsExecutionProvider]. Number of nodes: 4025 memcpy tokens: 2001 After patch: Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1735 Node(s) placed on [JsExecutionProvider]. Number of nodes: 2243 memcpu tokens: 813 It also gives more than 5X performance benefit. From 12sec for one Unet step to 2.2sec on RTX 3090 Ti, so we are almost getting to native performance. UPD: with latest changes from main branch and multi-threading it went down to 1.6sec. Will try re-exporting my model to onnx with maximum optimizations, like using MultiHeadAttention to decrease node count. Maybe after implementing that it can go in less than 1 sec	2023-08-15 18:58:05 -07:00
Arthur Islamov	ea55700e1c	[js/web] JSEP Gather OP (#16855 ) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-03 14:09:37 -07:00

14 commits