onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

History

Yulong Wang 14cc02c65c [js/web] WebGPU backend via JSEP (#14579 ) ### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (ON_COMPLETE)(PVOID state, DATA data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview Inter-op JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. Resource Management Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. about data transfer `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. run kernel in JS Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_`. disabled features* memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. prefer channels last JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. Testing code It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>		2023-04-24 15:21:18 -07:00
..
nodejs/templates	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
nuget/templates	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
templates	[js/web] WebGPU backend via JSEP (#14579 )	2023-04-24 15:21:18 -07:00
android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml	Remove protobuf submodule (#15190 )	2023-03-27 10:35:49 -07:00
android-x86_64-crosscompile-ci-pipeline.yml	Refactor web-ci pipeline and delete eager mode CI pipeline (#15416 )	2023-04-10 10:41:04 -07:00
binary-size-checks-pipeline.yml	Update binary size checks pipeline to use stages for separate checks. (#15408 )	2023-04-07 09:55:40 -07:00
build-perf-test-binaries-pipeline.yml	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
c-api-noopenmp-packaging-pipelines.yml	[TensorRT EP] support TensorRT 8.6-EA (#15299 )	2023-04-12 11:34:59 -07:00
clean-build-docker-image-cache-pipeline.yml
linux-ci-pipeline.yml	Add compilation cache in 2 Linux CPU pipelines and refactor the Linux build step with cache (#15484 )	2023-04-14 23:56:59 +08:00
linux-cpu-aten-pipeline.yml	Add compilation cache in 2 Linux CPU pipelines and refactor the Linux build step with cache (#15484 )	2023-04-14 23:56:59 +08:00
linux-cpu-eager-pipeline.yml	Refactor web-ci pipeline and delete eager mode CI pipeline (#15416 )	2023-04-10 10:41:04 -07:00
linux-cpu-minimal-build-ci-pipeline.yml	Make docker command fail if bash command fails. (#15564 )	2023-04-20 13:38:58 -07:00
linux-dnnl-ci-pipeline.yml	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
linux-gpu-ci-pipeline.yml	clear cache stat. after building (#15439 )	2023-04-10 13:56:55 +08:00
linux-gpu-tensorrt-ci-pipeline.yml	[TensorRT EP] avoid excessive library load/unload overhead when running unit tests. (#15639 )	2023-04-24 14:43:13 -07:00
linux-gpu-tensorrt-daily-perf-pipeline.yml	[TensorRT EP] support TensorRT 8.6-EA (#15299 )	2023-04-12 11:34:59 -07:00
linux-migraphx-ci-pipeline.yml	[ROCm] disable composable_kernel and kernel explorer for MIGraphX CI (#15479 )	2023-04-12 22:26:40 +08:00
linux-multi-gpu-ci-pipeline.yml	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
linux-multi-gpu-tensorrt-ci-pipeline.yml
linux-openvino-ci-pipeline.yml
linux-openvino-nightly-pipeline.yml
linux-qnn-ci-pipeline.yml	[QNN EP] Update QNN SDK to 2.8 (#14978 )	2023-03-10 13:21:19 -08:00
mac-ci-pipeline.yml	Cjian/multi stage packaging pipeline (#14993 )	2023-03-24 23:39:15 -07:00
mac-coreml-ci-pipeline.yml	Refactor all Mac build steps (#15440 )	2023-04-11 12:12:46 +08:00
mac-ios-ci-pipeline.yml	Refactor all Mac build steps (#15440 )	2023-04-11 12:12:46 +08:00
mac-ios-packaging-pipeline.yml	Add iOS Swift Package Manager support (#15297 )	2023-04-20 16:18:35 +10:00
mac-objc-static-analysis-ci-pipeline.yml	Add iOS Swift Package Manager support (#15297 )	2023-04-20 16:18:35 +10:00
mac-react-native-ci-pipeline.yml	Add compilation cache in react native CI (#15329 )	2023-04-06 10:39:14 +08:00
npm-packaging-pipeline.yml	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
orttraining-linux-ci-pipeline.yml	clear cache stat. after building (#15439 )	2023-04-10 13:56:55 +08:00
orttraining-linux-external-custom-ops.yml	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
orttraining-linux-gpu-amd-e2e-test-ci-pipeline.yml
orttraining-linux-gpu-ci-pipeline.yml	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
orttraining-linux-gpu-distributed-e2e-test-pipeline.yml
orttraining-linux-gpu-docker-release-pipeline.yml
orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
orttraining-linux-gpu-ortmodule-test-clear-cache-pipeline.yml	Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144 )	2023-03-27 14:10:08 -07:00
orttraining-linux-gpu-training-apis.yml	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
orttraining-linux-nightly-ortmodule-test-pipeline.yml	Update acpt image in the training pipeline (#14855 )	2023-03-07 14:10:32 -08:00
orttraining-mac-ci-pipeline.yml	Cjian/multi stage packaging pipeline (#14993 )	2023-03-24 23:39:15 -07:00
orttraining-pai-ci-pipeline.yml	clear cache stat. after building (#15439 )	2023-04-10 13:56:55 +08:00
orttraining-py-packaging-pipeline-cpu.yml	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
orttraining-py-packaging-pipeline-cuda.yml	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
orttraining-py-packaging-pipeline-rocm.yml	[ROCm] fix python packaging pipeline and add python10 (#15282 )	2023-03-31 10:25:21 +08:00
post-merge-jobs.yml	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
py-package-build-pipeline.yml	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
py-package-test-pipeline.yml	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
py-packaging-pipeline.yml	Update python 3.11 and remove 3.7 for Linux (#15214 )	2023-03-27 14:46:30 -07:00
sign_ov_ep_binaries.yml
snpe-ep-nuget-packaging-pipeline.yml	Cjian/windows update python3.11 (#15243 )	2023-03-28 22:15:47 -07:00
web-ci-pipeline.yml	Refactor web-ci pipeline and delete eager mode CI pipeline (#15416 )	2023-04-10 10:41:04 -07:00
web-packaging-pipeline.yml
win-ci-fuzz-testing.yml	Cjian/windows update python3.11 (#15243 )	2023-03-28 22:15:47 -07:00
win-ci-pipeline.yml	Disable XNNPack EP's tests in Windows CI pipeline (#15406 )	2023-04-13 12:19:32 -07:00
win-gpu-ci-pipeline.yml	Move DML CI Pipeline to A10 (#15468 )	2023-04-12 10:19:40 -07:00
win-gpu-reduce-op-ci-pipeline.yml	Cjian/windows update python3.11 (#15243 )	2023-03-28 22:15:47 -07:00
win-gpu-tensorrt-ci-pipeline.yml	[TensorRT EP] avoid excessive library load/unload overhead when running unit tests. (#15639 )	2023-04-24 14:43:13 -07:00
win-qnn-arm64-ci-pipeline.yml	[QNN EP] Update Windows ARM64 pipeline to use Visual Studio 2022 (#15607 )	2023-04-21 09:31:10 -07:00
win-qnn-ci-pipeline.yml	Cjian/windows update python3.11 (#15243 )	2023-03-28 22:15:47 -07:00