Commit graph

6 commits

Author SHA1 Message Date
Yulong Wang
080c67e900
[WebGPU] allow build WebGPU EP for WebAssembly (#23364)
### Description

This PR allows WebGPU EP to be built with Emscripten for WebAssembly,
Including:


- cmake build files update to support correct setup for Emscripten.
- code changes to fix build breaks for wasm
- change in Web CI pipeline to add a build-only target for wasm with
`--use_webgpu`.
2025-01-16 10:52:17 -08:00
Yulong Wang
a74817ab10
add missing build dependency for onnxruntime_providers_webgpu (#23324)
### Description

Fixes build when specify with flag `--target
onnxruntime_providers_webgpu`

Otherwise the following error will occur:
```
  range.cc
D:\code\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\onnx_pb.h(65,10): error C1083: Cannot open include file: 'o
nnx/onnx-ml.pb.h': No such file or directory [D:\code\onnxruntime\build\Windows\Debug\onnxruntime_providers_webgpu.vcxp
roj]
  (compiling source file '../../../onnxruntime/core/providers/webgpu/math/binary_elementwise_ops.cc')
```
2025-01-10 18:07:12 -08:00
Yulong Wang
8680244ebc
Fix delay load for WebGPU EP and DML EP (#23111)
### Description

This change fixes the DLL delay load problem for the WebGPU EP and
DirectML EP. See detailed explanation below.

### Problem

When onnxruntime.dll uses delay loading for its dependencies, the
dependencies are loaded using `LoadLibraryEx()`, which search the
directory of process (.exe) instead of this library (onnxruntime.dll).
This is a problem for usages of Node.js binding and python binding,
because Windows will try to find the dependencies in the directory of
node.exe or python.exe, which is not the directory of onnxruntime.dll.

There was previous attempt to fix this by loading DirectML.dll in the
initialization of onnxruntime nodejs binding, which works for DML EP but
is not a good solution because it does not really "delay" the load.

For WebGPU, the situation became worse because webgpu_dawn.dll depends
on dxil.dll and dxcompiler.dll, which are explicitly dynamically loaded
in the code using `LoadLibraryA()`. This has the same problem of the DLL
search.

### Solutions

For onnxruntime.dll loading its direct dependencies, it can be resolved
by set the [`__pfnDliNotifyHook2`
hook](https://learn.microsoft.com/en-us/cpp/build/reference/understanding-the-helper-function?view=msvc-170#structure-and-constant-definitions)
to load from an absolute path that constructed from the onnxruntime.dll
folder and the DLL name.

For webgpu_dawn.dll loading dxil.dll and dxcompiler.dll, since they are
explicitly loaded in the code, the hook does not work. Instead, it can
be resolved by ~~using WIN32 API `SetDllDirectory()` to add the
onnxruntime.dll folder to the search path.~~ preloading the 2 DLLs from
the onnxruntime.dll folder .
2024-12-19 10:23:48 -08:00
Yulong Wang
3a0b958586
add 2 CMake build options of Dawn (#23096)
### Description

This change adds the following CMake build options for Dawn:
- onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY
  - OFF by default
  - when enabled, builds Dawn as a monolithic library (webgpu_dawn.dll)
- onnxruntime_ENABLE_DAWN_BACKEND_VULKAN
  - OFF by default
  - when enabled, build with Vulkan backend for Dawn on Windows
- onnxruntime_ENABLE_DAWN_BACKEND_D3D12
  - ON by default
  - when enabled, build with DirectX 12 backend for Dawn on Windows



### File Size Comparison (Windows)

|  Build | cmdline  |  File Size  |
|---|---|---|
| Baseline | --config Release<br/> --build_shared_lib | `12,755,456
onnxruntime.dll` |
| WebGPU D3D12 (default) | --use_webgpu<br/> --config Release<br/>
--build_shared_lib | `17,082,368 dxcompiler.dll`<br/>` 1,508,472
dxil.dll`<br/>`18,708,480 onnxruntime.dll` |
| WebGPU D3D12+Vulkan | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_ENABLE_DAWN_BACKEND_D3D12=1<br/>
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN=1 | `17,081,344
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`19,388,416
onnxruntime.dll` |
| WebGPU Vulkan | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_ENABLE_DAWN_BACKEND_D3D12=0<br/>
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN=1 | `17,615,872 onnxruntime.dll`
|
| Monolithic | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY=1 | `17,082,368
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`13,277,696
onnxruntime.dll`<br/>` 5,616,640 webgpu_dawn.dll` |
| External Dawn | --use_webgpu<br/> --config Release<br/>
--build_shared_lib<br/> --cmake_extra_defines<br/>
onnxruntime_USE_EXTERNAL_DAWN=1<br/> --skip_tests | `17,081,344
dxcompiler.dll`<br/>` 1,508,472 dxil.dll`<br/>`13,277,184
onnxruntime.dll`
2024-12-13 16:05:48 -08:00
Yulong Wang
7a8fa12850
Add implementation of WebGPU EP (#22591)
### Description

This PR adds the actual implementation of the WebGPU EP based on
https://github.com/microsoft/onnxruntime/pull/22318.

This change includes the following:

<details>
<summary><b>core framework of WebGPU EP</b></summary>

  - WebGPU EP factory classes for:
    - handling WebGPU options
    - creating WebGPU EP instance
    - creating WebGPU context
  - WebGPU Execution Provider classes
    - GPU Buffer allocator
    - data transfer
  - Buffer management classes
    - Buffer Manager
    - BufferCacheManager
      - DisabledCacheManager
      - SimpleCacheManager
      - LazyReleaseCacheManager
      - BucketCacheManager
  - Program classes
    - Program (base)
    - Program Cache Key
    - Program Manager
  - Shader helper classes
    - Shader Helper
    - ShaderIndicesHelper
    - ShaderVariableHelper
  - Utils
    - GPU Query based profiler
    - compute context
    - string utils
  - Miscs
    - Python binding webgpu support (basic)
 
</details>

<details>
<summary><b>Kernel implementation</b></summary>


  - onnx.ai (default opset):
- Elementwise (math): Abs, Neg, Floor, Ceil, Reciprocal, Sqrt, Exp, Erf,
Log, Sin, Cos, Tan, Asin, Acos, Atan, Sinh, Cosh, Asinh, Acosh, Atanh,
Tanh, Not, Cast
- Elementwise (activation): Sigmoid, HardSigmoid, Clip, Elu, Relu,
LeakyRelu, ThresholdedRelu, Gelu
- Binary (math): Add, Sub, Mul, Div, Pow, Equal, Greater,
GreaterOrEqual, Less, LessOrEqual
    - (Tensors): Shape, Reshape, Squeeze, Unsqueeze
    - Where
    - Transpose
    - Concat
    - Expand
    - Gather
    - Tile
    - Range
    - LayerNormalization
  - com.microsoft
    - FastGelu
    - MatMulNBits
    - MultiHeadAttention
    - RotaryEmbedding
    - SkipLayerNormalization
    - LayerNormalization
    - SimplifiedLayerNormalization
    - SkipSimplifiedLayerNormalization

</details>

<details>
<summary><b>Build, test and CI pipeline integration</b></summary>

  - build works for Windows, macOS and iOS
  - support onnxruntime_test_all and python node test
  - added a new unit test for `--use_external_dawn` build flag.
  - updated MacOS pipeline to build with WebGPU support
  - added a new pipeline for WebGPU Windows

</details>

This change does not include:

- Node.js binding support for WebGPU (will be a separate PR)
2024-10-29 18:29:40 -07:00
Yulong Wang
c5d28cac4d
Initial WebGPU EP checkin (#22318)
### Description

This change introduces the WebGPU EP into ONNX Runtime.

To make the PR as simple as possible, this PR excluded the following:
- C API changes for WebGPU EP
- actual implementation of WebGPU EP. Currently in this PR, WebGPU is a
stub implementation that does not register any kernel.
- Python IO Binding update
- Node.js IO Binding update

This PR now contains only 43 file changes (while the working branch
contains 130+) and hopefully this makes it easier to review.

There is going to be separated PRs for each mentioned above.

Current working branch: #21904
2024-10-08 16:10:46 -07:00