onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-29 03:30:52 +00:00

Author	SHA1	Message	Date
Yulong Wang	56bced0581	[js/web] enable webgpu in browser unit test (#16310 ) ### Description enable webgpu in browser unit test. The CI pipeline uses Edge v113+ which enables WebGPU. === UPDATE on 08/07/2023: - add flags to Edge browser launch commandline so that Edge on CI agents can initialize WebGPU correctly. - ONLY enable webgpu on web release build. Other pipelines are using flag `-b=wasm,webgl,xnnpack` to specify the other 3 backends explicitly. - disable "Resize" related test failures. Once they are fixed the tests can be re-enabled. --------- Co-authored-by: Satya Jandhyala <satya.k.jandhyala@gmail.com>	2023-08-08 11:45:04 -07:00
Arthur Islamov	c3f04251c7	[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830 ) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks	2023-08-08 09:09:37 -07:00
satyajandhyala	7ad43d9564	[JS/Web] Fixed ArgMin and ArgMax and refactored (#17002 ) Fixed ArgMin and ArgMax and refactored using functionality from Reduce operator code. ### Description Removed code/functionality duplication and fixed some issue. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-04 12:59:36 -07:00
satyajandhyala	cc4b64f646	[JS/Web] Modify Reduce, Expand and Slice to pass op and node tests. (#16979 ) ### Description Make CacheHint mechanism, which is designed to avoid running the same test multiple times saving the result mapped against a key, working by adding input dims. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:48:47 -07:00
Yulong Wang	641c3a4a37	[js/web] update op test schema (#16921 ) ### Description update op test schema. This changes fixes several problems for operator tests for web: - `opsets` -> `opset`: an operator uses exactly one opset instead of multiple - `condition` -> `platformCondition`: make it less confusing - `inputShapeDefinitions`: allows to test ORT behaviors when it get no/partial/full shape info. Added a JSON schema file and also an example file	2023-08-03 14:20:20 -07:00
Arthur Islamov	ea55700e1c	[js/web] JSEP Gather OP (#16855 ) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-03 14:09:37 -07:00
Guenther Schmuelling	0df2e14038	js/webgpu: argmax,argmin,softmax support (#16882 ) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin	2023-08-02 18:16:19 -07:00
Hariharan Seshadri	506ddb3d5d	[js/WebGPU] Support int32 Transpose in WebGPU (#16952 )	2023-08-02 16:27:24 -07:00
Yulong Wang	6046456bb6	build break: apply formatter fix (#16947 ) ### Description build break: apply formatter fix	2023-08-01 01:10:55 -07:00
satyajandhyala	77b2b618b2	[JS/WebGPU] Add Resize operator (#16680 ) ### Description Implemented Resize operator support in JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:35:06 -07:00
satyajandhyala	dd24d52737	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 ) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:18:58 -07:00
satyajandhyala	e67547b978	[JS/WebGPU] Added Flatten operator support. (#16860 ) ### Description Added Flatten operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-27 12:50:45 -07:00
satyajandhyala	03ce0a5693	[Web/JS] Added Slice operator in JSEP. (#16811 ) ### Description Added Slice operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-25 14:19:20 -07:00
satyajandhyala	d41bbac7b9	[Web/JS] Added Expand operator support. (#16577 ) ### Description Added Expand operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-11 09:38:16 -07:00
satyajandhyala	00e8f2a2a9	[Web/JS] Add ConvTranspose support (#16433 ) ### Description Add ConvTranspose support for WebGPU ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-08 11:10:50 -07:00
satyajandhyala	e55a20ece8	[Web/JS] Added Split operator support. (#16567 ) ### Description Added WeGPU/JSEP Split operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-07 12:16:10 -07:00
satyajandhyala	5933a183df	[Web/JS] Added missing L1Reduce and L2Reduce oprator kernels. (#16580 ) ### Description Add missing L1Reduce and L2Reduce operator kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-07 09:55:55 -07:00
Yulong Wang	d13f3153d7	[js/webgpu] enable op test for webgpu (#16542 ) ### Description This change enables the JSON-format operator tests for webgpu. Usage: ``` npm test -- op abs.jsonc -b=webgpu ```	2023-07-06 08:35:19 -07:00
satyajandhyala	889f80082f	[js/web] Added Reduce operators support (#16122 ) ### Description Added support for ReduceL1, ReduceL2, ReduceMean, ReduceMin, ReduceMax, ReduceSum, ReduceLogSum, ReduceLogSumExp, ReduceProd and ReduceSquareSum. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com> Co-authored-by: guschmue <guschmue@microsoft.com>	2023-06-12 07:46:27 -07:00
Alexander Visheratin	e6c6184fee	[JS/WebGPU] Unsqueeze operator implementation (#16138 ) ### Description This PR adds an implementation of the Squeeze operator to WebGPU JSEP. The implementation follows the [operator schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Unsqueeze). To implement the `Unsqueeze` operator in the same fashion as the `Squeeze`, I added the `ComputeOutputShape()` method to the `UnsqueezeBase` class and made some slight modifications. Please let me know if it is a bad idea and if I should move this method to the JS implementation. I also uncommented test case lines in the `suite-test-list.jsonc` file for both Squeeze and Unsqueeze operators following @hariharans29's [comment](https://github.com/microsoft/onnxruntime/pull/16024#issuecomment-1565113633). ### How was it tested 1. I created a model with only one operator: ```Python import onnx.helper node = onnx.helper.make_node( "Unsqueeze", inputs=["T", "axes"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [2])], [onnx.helper.make_tensor_value_info("y", 1, [3, 1, 4, 5, 1])]) onnx.save(onnx.helper.make_model(graph), "unsqueeze.onnx") ``` 2. I compiled the runtime using @fs-eire's [instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce). 3. I ran the test models in the browser using this minimal setup: ```HTML <html> <script src=".\dist\ort.webgpu.min.js"></script> <script> async function run() { const session = await ort.InferenceSession.create('unsqueeze.onnx', {executionProviders: ['webgpu']}); console.log(session); const input = new ort.Tensor('float32', new Float32Array(60), [3, 4, 5]); const dim = new ort.Tensor('int64', [1n, 4n], [2]); const output = await session.run({ "T": input, "axes": dim }); console.log(output); } run(); </script> </html> ``` ### Motivation and Context Improve operator coverage for WebGPU JSEP.	2023-06-01 12:23:02 -07:00
Wanming Lin	00b1e79e04	Support WebNN EP (#15698 ) Description: This PR intends to enable WebNN EP in ONNX Runtime Web. It translates the ONNX nodes by [WebNN API](https://webmachinelearning.github.io/webnn/), which is implemented in C++ and uses Emscripten [Embind API](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#). Temporarily using preferred layout NHWC for WebNN graph partitions since the restriction in WebNN XNNPack backend implementation and the ongoing [discussion](https://github.com/webmachinelearning/webnn/issues/324) in WebNN spec that whether WebNN should support both 'NHWC' and 'NCHW' layouts. No WebNN native EP, only for Web. Motivation and Context: Allow ONNXRuntime Web developers to access WebNN API to benefit from hardware acceleration. WebNN API Implementation Status in Chromium: - Tracked in Chromium issue: [#1273291](https://bugs.chromium.org/p/chromium/issues/detail?id=1273291) - CPU device: based on XNNPack backend, and had been available on Chrome Canary M112 behind "#enable-experimental-web-platform-features" flag for Windows and Linux platforms. Further implementation for more ops is ongoing. - GPU device: based on DML, implementation is ongoing. Open: - GitHub CI: WebNN currently is only available on Chrome Canary/Dev with XNNPack backend for Linux and Windows. This is an open to reviewers to help identify which GitHub CI should involved the WebNN EP and guide me to enable it. Thanks!	2023-05-08 21:25:10 -07:00
Yulong Wang	c0116af619	[js/webgpu] operator Exp (#15713 ) ### Description operator Exp	2023-04-27 15:04:09 -07:00
Yulong Wang	a02c885f86	[js/webgpu] add implementation of Relu, LeakyRelu and ThresholdedRelu (#15668 ) ### Description add implementation of Relu, LeakyRelu and ThresholdedRelu	2023-04-26 15:11:01 -07:00
Yulong Wang	14cc02c65c	[js/web] WebGPU backend via JSEP (#14579 ) ### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (ON_COMPLETE)(PVOID state, DATA data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview Inter-op JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. Resource Management Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. about data transfer `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. run kernel in JS Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_`. disabled features* memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. prefer channels last JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. Testing code It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-24 15:21:18 -07:00
Yulong Wang	b1a17188a6	[js/web] add LRN unpacked kernel for webgl backend (#14459 ) ### Description add LRN unpacked kernel for webgl backend	2023-02-01 11:51:10 -08:00
101arrowz	148b1efe5e	[js/web] add ConvTranspose2D to WebGL backend (#11990 ) * Add ConvTranspose * Update docs + tests * fix lint * fix output shape calculations * Revert "fix output shape calculations" This reverts commit 8014fa9b33115f1d6a677fe2270a6da1b510ff67. * fix format * remove broken output_shape test	2022-07-27 13:57:12 -07:00
Yulong Wang	0c78b71352	prepare test folder from GitHub (#12220 ) * consume onnx test data from github * ensure tests * update script and allow opset specification * fix python format * fix python format * consume new filter format * fix linting error	2022-07-20 22:01:08 -07:00
Yulong Wang	1424b796ff	[js/web] disable test_tan temorarily (#11048 )	2022-03-29 21:47:52 -07:00
Sunghoon	c79307e7b4	[js/web] support opset-13 of softmax (#9493 ) * add p50 in test * support opset-13 of softmax * update a operators.md * resolve comments * fix lint and format Co-authored-by: Yulong Wang <yulongw@microsoft.com>	2021-10-26 23:58:50 -07:00
Sunghoon	74eaaad768	[js/web] Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast and clip (#9249 ) * Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast, clip * merge master and update a operators.md * resolve comment. revise pool and cast kernel implementation. * skip fusion when clip min and max is not in initializer	2021-10-14 16:29:37 -07:00
Yulong Wang	3e8cabbc3e	[js/web] WebGL backend refactor (#8586 )	2021-08-12 12:30:49 -07:00
Yulong Wang	e66846da4a	revise terms according to guideline	2021-07-23 13:26:15 -07:00

32 commits