onnxruntime/js/web/script/test-runner-cli-args.ts
Yulong Wang 14cc02c65c
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
  - Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
  - initial implementation of kernels:
    - elementwise operators (22)
    - binary operators (5)
    - tensor: Shape, Reshape, Transpose, Gemm
    - nn: Conv, {Global}Maxpool, {Global}AveragePool


Code need to be polished. still working on it.

## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.

Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.

What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.

What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
> 
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
>   // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.

What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.

## Design Overview

**Inter-op**

JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
    Module.jsepBackend = backend;
    Module.jsepAlloc = alloc;
    Module.jsepFree = free;
    Module.jsepCopy = copy;
    Module.jsepCopyAsync = copyAsync;
    Module.jsepCreateKernel = createKernel;
    Module.jsepReleaseKernel = releaseKernel;
    Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this

The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.

**Resource Management**

Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.

For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.

**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.

**run kernel in JS**

Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.

`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.

**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.

**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.

**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 15:21:18 -07:00

460 lines
17 KiB
TypeScript

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
import minimist from 'minimist';
import npmlog from 'npmlog';
import {Env, InferenceSession} from 'onnxruntime-common';
import {Logger} from '../lib/onnxjs/instrument';
import {Test} from '../test/test-types';
/* eslint-disable max-len */
const HELP_MESSAGE = `
test-runner-cli
Run ONNX Runtime Web tests, models, benchmarks in different environments.
Usage:
test-runner-cli <mode> ... [options]
Modes:
suite0 Run all unittests, all operator tests and node model tests that described in suite test list
suite1 Run all operator tests and node model tests that described in suite test list
model Run a single model test
unittest Run all unittests
op Run a single operator test
Options:
*** General Options ***
-h, --help Print this message.
-d, --debug Specify to run test runner in debug mode.
Debug mode outputs verbose log for test runner, sets up environment debug flag, and keeps karma not to exit after tests completed.
-b=<...>, --backend=<...> Specify one or more backend(s) to run the test upon.
Backends can be one or more of the following, splitted by comma:
webgl
webgpu
wasm
xnnpack
-e=<...>, --env=<...> Specify the environment to run the test. Should be one of the following:
chrome (default)
edge (Windows only)
firefox
electron
safari (MacOS only)
node
bs (for BrowserStack tests)
-p, --profile Enable profiler.
Profiler will generate extra logs which include the information of events time consumption
-P[=<...>], --perf[=<...>] Generate performance number. Cannot be used with flag --debug.
This flag can be used with a number as value, specifying the total count of test cases to run. The test cases may be used multiple times. Default value is 10.
-c, --file-cache Enable file cache.
*** Session Options ***
-u=<...>, --optimized-model-file-path=<...> Specify whether to dump the optimized model.
-o=<...>, --graph-optimization-level=<...> Specify graph optimization level.
Default is 'all'. Valid values are 'disabled', 'basic', 'extended', 'all'.
*** Logging Options ***
--log-verbose=<...> Set log level to verbose
--log-info=<...> Set log level to info
--log-warning=<...> Set log level to warning
--log-error=<...> Set log level to error
The 4 flags above specify the logging configuration. Each flag allows to specify one or more category(s), splitted by comma. If use the flags without value, the log level will be applied to all category.
*** Backend Options ***
--wasm-number-threads Set the WebAssembly number of threads
--wasm-init-timeout Set the timeout for WebAssembly backend initialization, in milliseconds
--wasm-enable-simd Set whether to enable SIMD
--wasm-enable-proxy Set whether to enable proxy worker
--webgl-context-id Set the WebGL context ID (webgl/webgl2)
--webgl-matmul-max-batch-size Set the WebGL matmulMaxBatchSize
--webgl-texture-cache-mode Set the WebGL texture cache mode (initializerOnly/full)
--webgl-texture-pack-mode Set the WebGL texture pack mode (true/false)
--webgpu-profiling-mode Set the WebGPU profiling mode (off/default)
*** Browser Options ***
--no-sandbox This flag will be passed to Chrome.
Sometimes Chrome need this flag to work together with Karma.
Examples:
Run all suite0 tests:
> test-runner-cli suite0
Run single model test (test_relu) on WebAssembly backend
> test-runner-cli model test_relu --backend=wasm
Debug unittest
> test-runner-cli unittest --debug
Debug operator matmul, highlight verbose log from BaseGlContext and WebGLBackend
> test-runner-cli op matmul --backend=webgl --debug --log-verbose=BaseGlContext,WebGLBackend
Profile an ONNX model on WebGL backend
> test-runner-cli model <model_folder> --profile --backend=webgl
Run perf testing of an ONNX model on WebGL backend
> test-runner-cli model <model_folder> -b=webgl -P
`;
/* eslint-enable max-len */
export declare namespace TestRunnerCliArgs {
type Mode = 'suite0'|'suite1'|'model'|'unittest'|'op';
type Backend = 'cpu'|'webgl'|'webgpu'|'wasm'|'onnxruntime'|'xnnpack';
type Environment = 'chrome'|'edge'|'firefox'|'electron'|'safari'|'node'|'bs';
type BundleMode = 'prod'|'dev'|'perf';
}
export interface TestRunnerCliArgs {
debug: boolean;
mode: TestRunnerCliArgs.Mode;
/**
* The parameter that used when in mode 'model' or 'op', specifying the search string for the model or op test
*/
param?: string;
backends: [TestRunnerCliArgs.Backend];
env: TestRunnerCliArgs.Environment;
/**
* Bundle Mode
*
* this field affects the behavior of Karma and Webpack.
*
* For Karma, if flag '--bundle-mode' is not set, the default behavior is 'dev'
* For Webpack, if flag '--bundle-mode' is not set, the default behavior is 'prod'
*
* For running tests, the default mode is 'dev'. If flag '--perf' is set, the mode will be set to 'perf'.
*
* Mode | Output File | Main | Source Map | Webpack Config
* ------ | --------------------- | -------------------- | ------------------ | --------------
* prod | /dist/ort.min.js | /lib/index.ts | source-map | production
* node | /dist/ort-web.node.js | /lib/index.ts | source-map | production
* dev | /test/ort.dev.js | /test/test-main.ts | inline-source-map | development
* perf | /test/ort.perf.js | /test/test-main.ts | (none) | production
*/
bundleMode: TestRunnerCliArgs.BundleMode;
logConfig: Test.Config['log'];
/**
* Whether to enable InferenceSession's profiler
*/
profile: boolean;
/**
* Whether to enable file cache
*/
fileCache: boolean;
/**
* Specify the times that test cases to run
*/
times?: number;
/**
* whether to dump the optimized model
*/
optimizedModelFilePath?: string;
/**
* Specify graph optimization level
*/
graphOptimizationLevel: 'disabled'|'basic'|'extended'|'all';
cpuOptions?: InferenceSession.CpuExecutionProviderOption;
cudaOptions?: InferenceSession.CudaExecutionProviderOption;
cudaFlags?: Record<string, unknown>;
wasmOptions?: InferenceSession.WebAssemblyExecutionProviderOption;
webglOptions?: InferenceSession.WebGLExecutionProviderOption;
globalEnvFlags?: Env;
noSandbox?: boolean;
}
function parseBooleanArg(arg: unknown, defaultValue: boolean): boolean;
function parseBooleanArg(arg: unknown): boolean|undefined;
function parseBooleanArg(arg: unknown, defaultValue?: boolean): boolean|undefined {
if (typeof arg === 'undefined') {
return defaultValue;
}
if (typeof arg === 'boolean') {
return arg;
}
if (typeof arg === 'number') {
return arg !== 0;
}
if (typeof arg === 'string') {
if (arg.toLowerCase() === 'true') {
return true;
}
if (arg.toLowerCase() === 'false') {
return false;
}
}
throw new TypeError(`invalid boolean arg: ${arg}`);
}
function parseLogLevel<T>(arg: T) {
let v: string[]|boolean;
if (typeof arg === 'string') {
v = arg.split(',');
} else if (Array.isArray(arg)) {
v = [];
for (const e of arg) {
v.push(...e.split(','));
}
} else {
v = arg ? true : false;
}
return v;
}
function parseLogConfig(args: minimist.ParsedArgs) {
const config: Array<{category: string; config: Logger.Config}> = [];
const verbose = parseLogLevel(args['log-verbose']);
const info = parseLogLevel(args['log-info']);
const warning = parseLogLevel(args['log-warning']);
const error = parseLogLevel(args['log-error']);
if (typeof error === 'boolean' && error) {
config.push({category: '*', config: {minimalSeverity: 'error'}});
} else if (typeof warning === 'boolean' && warning) {
config.push({category: '*', config: {minimalSeverity: 'warning'}});
} else if (typeof info === 'boolean' && info) {
config.push({category: '*', config: {minimalSeverity: 'info'}});
} else if (typeof verbose === 'boolean' && verbose) {
config.push({category: '*', config: {minimalSeverity: 'verbose'}});
}
if (Array.isArray(error)) {
config.push(...error.map(i => ({category: i, config: {minimalSeverity: 'error' as Logger.Severity}})));
}
if (Array.isArray(warning)) {
config.push(...warning.map(i => ({category: i, config: {minimalSeverity: 'warning' as Logger.Severity}})));
}
if (Array.isArray(info)) {
config.push(...info.map(i => ({category: i, config: {minimalSeverity: 'info' as Logger.Severity}})));
}
if (Array.isArray(verbose)) {
config.push(...verbose.map(i => ({category: i, config: {minimalSeverity: 'verbose' as Logger.Severity}})));
}
return config;
}
function parseCpuOptions(_args: minimist.ParsedArgs): InferenceSession.CpuExecutionProviderOption {
return {name: 'cpu'};
}
function parseCpuFlags(_args: minimist.ParsedArgs): Record<string, unknown> {
return {};
}
function parseWasmOptions(_args: minimist.ParsedArgs): InferenceSession.WebAssemblyExecutionProviderOption {
return {name: 'wasm'};
}
function parseWasmFlags(args: minimist.ParsedArgs): Env.WebAssemblyFlags {
const numThreads = args['wasm-number-threads'];
if (typeof numThreads !== 'undefined' && typeof numThreads !== 'number') {
throw new Error('Flag "wasm-number-threads" must be a number value');
}
const initTimeout = args['wasm-init-timeout'];
if (typeof initTimeout !== 'undefined' && typeof initTimeout !== 'number') {
throw new Error('Flag "wasm-init-timeout" must be a number value');
}
let simd = args['wasm-enable-simd'];
if (simd === 'true') {
simd = true;
} else if (simd === 'false') {
simd = false;
} else if (typeof simd !== 'undefined' && typeof simd !== 'boolean') {
throw new Error('Flag "wasm-enable-simd" must be a boolean value');
}
let proxy = args['wasm-enable-proxy'];
if (proxy === 'true') {
proxy = true;
} else if (proxy === 'false') {
proxy = false;
} else if (typeof proxy !== 'undefined' && typeof proxy !== 'boolean') {
throw new Error('Flag "wasm-enable-proxy" must be a boolean value');
}
return {numThreads, initTimeout, simd, proxy};
}
function parseWebglOptions(_args: minimist.ParsedArgs): InferenceSession.WebGLExecutionProviderOption {
return {name: 'webgl'};
}
function parseWebglFlags(args: minimist.ParsedArgs): Env.WebGLFlags {
const contextId = args['webgl-context-id'];
if (contextId !== undefined && contextId !== 'webgl' && contextId !== 'webgl2') {
throw new Error('Flag "webgl-context-id" is invalid');
}
const matmulMaxBatchSize = args['webgl-matmul-max-batch-size'];
if (matmulMaxBatchSize !== undefined && typeof matmulMaxBatchSize !== 'number') {
throw new Error('Flag "webgl-matmul-max-batch-size" must be a number value');
}
const textureCacheMode = args['webgl-texture-cache-mode'];
if (textureCacheMode !== undefined && textureCacheMode !== 'initializerOnly' && textureCacheMode !== 'full') {
throw new Error('Flag "webgl-texture-cache-mode" is invalid');
}
const pack = args['webgl-texture-pack-mode'];
if (pack !== undefined && typeof pack !== 'boolean') {
throw new Error('Flag "webgl-texture-pack-mode" is invalid');
}
const async = args['webgl-async'];
if (async !== undefined && typeof async !== 'boolean') {
throw new Error('Flag "webgl-async" is invalid');
}
return {contextId, matmulMaxBatchSize, textureCacheMode, pack};
}
function parseWebgpuFlags(args: minimist.ParsedArgs): Env.WebGpuFlags {
const profilingMode = args['webgpu-profiling-mode'];
if (profilingMode !== undefined && profilingMode !== 'off' && profilingMode !== 'default') {
throw new Error('Flag "webgpu-profiling-mode" is invalid');
}
return {profilingMode};
}
function parseGlobalEnvFlags(args: minimist.ParsedArgs): Env {
const wasm = parseWasmFlags(args);
const webgl = parseWebglFlags(args);
const webgpu = parseWebgpuFlags(args);
const cpuFlags = parseCpuFlags(args);
return {webgl, wasm, webgpu, ...cpuFlags};
}
export function parseTestRunnerCliArgs(cmdlineArgs: string[]): TestRunnerCliArgs {
const args = minimist(cmdlineArgs);
if (args.help || args.h) {
console.log(HELP_MESSAGE);
process.exit();
}
// Option: -d, --debug
const debug = parseBooleanArg(args.debug || args.d, false);
if (debug) {
npmlog.level = 'verbose';
}
npmlog.verbose('TestRunnerCli.Init', 'Parsing commandline arguments...');
const mode = args._.length === 0 ? 'suite0' : args._[0];
// Option: -e=<...>, --env=<...>
const envArg = args.env || args.e;
const env = (typeof envArg !== 'string') ? 'chrome' : envArg;
if (['chrome', 'edge', 'firefox', 'electron', 'safari', 'node', 'bs'].indexOf(env) === -1) {
throw new Error(`not supported env ${env}`);
}
// Option: -b=<...>, --backend=<...>
const browserBackends = ['webgl', 'webgpu', 'wasm', 'xnnpack'];
// TODO: remove this when Chrome support WebGPU.
// we need this for now because Chrome does not support webgpu yet,
// and ChromeCanary is not in CI.
const defaultBrowserBackends = ['webgl', /* 'webgpu', */ 'wasm', 'xnnpack'];
const nodejsBackends = ['cpu', 'wasm'];
const backendArgs = args.backend || args.b;
const backend = (typeof backendArgs !== 'string') ? (env === 'node' ? nodejsBackends : defaultBrowserBackends) :
backendArgs.split(',');
for (const b of backend) {
if ((env !== 'node' && browserBackends.indexOf(b) === -1) || (env === 'node' && nodejsBackends.indexOf(b) === -1)) {
throw new Error(`backend ${b} is not supported in env ${env}`);
}
}
const globalEnvFlags = parseGlobalEnvFlags(args);
// Options:
// --log-verbose=<...>
// --log-info=<...>
// --log-warning=<...>
// --log-error=<...>
const logConfig = parseLogConfig(args);
globalEnvFlags.logLevel = logConfig[0]?.config.minimalSeverity;
// Option: -p, --profile
const profile = (args.profile || args.p) ? true : false;
if (profile) {
logConfig.push({category: 'Profiler.session', config: {minimalSeverity: 'verbose'}});
logConfig.push({category: 'Profiler.node', config: {minimalSeverity: 'verbose'}});
logConfig.push({category: 'Profiler.op', config: {minimalSeverity: 'verbose'}});
logConfig.push({category: 'Profiler.backend', config: {minimalSeverity: 'verbose'}});
globalEnvFlags.logLevel = 'verbose';
}
// Option: -P[=<...>], --perf[=<...>]
const perfArg = (args.perf || args.P);
const perf = perfArg ? true : false;
const times = (typeof perfArg === 'number') ? perfArg : 10;
if (debug && perf) {
throw new Error('Flag "perf" cannot be used together with flag "debug".');
}
if (perf && (mode !== 'model')) {
throw new Error('Flag "perf" can only be used in mode "model".');
}
if (perf) {
logConfig.push({category: 'TestRunner.Perf', config: {minimalSeverity: 'verbose'}});
}
// Option: -u, --optimized-model-file-path
const optimizedModelFilePath = args['optimized-model-file-path'] || args.u || undefined;
if (typeof optimizedModelFilePath !== 'undefined' && typeof optimizedModelFilePath !== 'string') {
throw new Error('Flag "optimized-model-file-path" need to be either empty or a valid file path.');
}
// Option: -o, --graph-optimization-level
const graphOptimizationLevel = args['graph-optimization-level'] || args.o || 'all';
if (typeof graphOptimizationLevel !== 'string' ||
['disabled', 'basic', 'extended', 'all'].indexOf(graphOptimizationLevel) === -1) {
throw new Error(`graph optimization level is invalid: ${graphOptimizationLevel}`);
}
// Option: -c, --file-cache
const fileCache = parseBooleanArg(args['file-cache'] || args.c, false);
const cpuOptions = parseCpuOptions(args);
const wasmOptions = parseWasmOptions(args);
const webglOptions = parseWebglOptions(args);
// Option: --no-sandbox
const noSandbox = !!args['no-sandbox'];
npmlog.verbose('TestRunnerCli.Init', ` Mode: ${mode}`);
npmlog.verbose('TestRunnerCli.Init', ` Env: ${env}`);
npmlog.verbose('TestRunnerCli.Init', ` Debug: ${debug}`);
npmlog.verbose('TestRunnerCli.Init', ` Backend: ${backend}`);
npmlog.verbose('TestRunnerCli.Init', 'Parsing commandline arguments... DONE');
return {
debug,
mode: mode as TestRunnerCliArgs['mode'],
param: args._.length > 1 ? args._[1] : undefined,
backends: backend as TestRunnerCliArgs['backends'],
bundleMode: perf ? 'perf' : 'dev',
env: env as TestRunnerCliArgs['env'],
logConfig,
profile,
times: perf ? times : undefined,
optimizedModelFilePath,
graphOptimizationLevel: graphOptimizationLevel as TestRunnerCliArgs['graphOptimizationLevel'],
fileCache,
cpuOptions,
webglOptions,
wasmOptions,
globalEnvFlags,
noSandbox
};
}