Commit graph

29 commits

Author SHA1 Message Date
Yulong Wang
405ca49012
build ONNXRuntime into WebAssembly (#6478)
* Simplified version of WebAssembly support to keep most of existing data structures and add cmake using Ninja and emcmake

* Clean up CMakeLists.txt and add an example to create and compute a kernel

* Load a model from bytes and remove graph building steps

* Add all cpu and contrib ops with mlas library

* WebAssembly build with Onnxruntime C/CXX API

* Use protobuf cmakefile directory instead of adding every necessary source file

* Fix invalid output at example

* add missing files

* Change an example to use Teams model and support ort mobile format

* add API for javascript

* fix input releasing in _ort_run()

* update API

* Let onnxruntime cmake build WebAssembly with option '--wasm'

* allow one-step building for wasm

* Make build script working on Linux and MacOS

* Fix broken build from Windows command

* Enable unit test on building WebAssembly

* Resolve comments

* update build flags

* wasm conv improvement from: 1) GemmV; 2) Depthwise direct convolution 3x3; 3) Direct convolution 3x3

* Cleaned mlas unittest.

* use glob

* update comments

* Update baseline due to loss scale fix (#6948)

* fix stream sync issue (#6954)

* Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960)

* Update EyeLike CPU kernel.

* Update Mod CPU kernel.

* Update Multinomial CPU kernel.

* Slight improvement to Pad CPU kernel binary size.

* Update RandomNormal[Like], RandomUniform[Like] CPU kernels.

* Fix warning from setting multiple MSVC warning level options. (#6917)

Fix warning from setting multiple MSVC warning level options. Replace an existing /Wn flag instead of always appending a new one.

* MLAS: quantized GEMM update (#6916)

Various updates to the int8_t GEMMs:

1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before.
2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size.
3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator.

* Implement QLinearAveragePool with unit tests. (#6896)

Implement QLinearAveragePool with unit tests.

* Attention fusion detect num_heads and hidden_size automatically (#6920)

* fixed type to experimental session constructor (#6950)

* fixed type to experimental session constructor

Co-authored-by: David Medine <david.medine@brainproducts.com>

* Update onnxruntime_perf_test.exe to accept free dimension overrides (#6962)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Fix possible fd leak in NNAPI (#6966)

* Release buffers for prepacked tensors (#6820)

Unsolved problems:

1. One test failure was caused by a bug in Cudnn rnn kernels, when they can allocate a buffer and partially initialize it, the garbage data near tail of the buffer caused problem in some of the hardware. To attack this problem in a broader sense, should we add code in our allocators, and during a memory fuzzing test, fill an allocated buffer with garbage before returning to the caller?


2. Prepacking is used more widely than we know. For instance, Cudnn rnn kernels also cache their weights. They mix several weight tensors together into a single buffer, and never touch the original weight tensor anymore. This is the same idea with pre-pack, but they didn't override the virtual function, and they never tried to release those weight tensors, leading to memory waste. It also seems to me that there are some other kernels have similar behavior. Wonder how much memory we can save if we try to cleanup those too.

3. Turning off memory pattern planning does increase memory fragmentation, leading to out of memory error in some training test cases. Perhaps we can revisit the idea of pushing kernels-creation stage earlier, and then during initializer deserialization, we only avoid tracing those that will be prepacked.

* Enable type reduction for Range, ReverseSequence, ScatterND, Split, and Unique CPU kernels. (#6963)

* add CI

* fix test in ci

* fix flags for nsync in wasm build

* add copyright banner

* fix wasm source glob

* add missing exports

* resolve comments

* Perf gain by make packb wide to 4 from 16 on GEMM for WASM.
Remove no need direct conv in previous perf tuning.

* fix buildbreak introduced from latest master merge

* fix buildbreak in mlasi.h

* resolve all comments except MLAS

* rewrite packb related 3 functions for WASM_SCALAR seperately rather than using #ifdef in each.
and other changes according to PR feedback in mlas.

* More complete scalar path in sgemm from Tracy.

* Fix edge case handling in depthwise conv2d kernel 3x3. where:
  *) support input W==1 and H==1
  *) recalc in accurate pad_right and pad_bottom
  *) support hidden pad_right == 2 or pad_bottom == 2 when W == 1 or H==1 and no pad left/top

* Add more test coverage for conv depthwise from Tracy.
Fix one typo according to PR.

* resolve comments

* replace typedef by using

* do not use throw in OrtRun()

* output error message

Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: David Medine <david.eric.medine@gmail.com>
Co-authored-by: David Medine <david.medine@brainproducts.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Chen Fu <chenfucs@gmail.com>
2021-04-06 16:18:10 -07:00
Scott McKay
25f7c93504
Require explicit inclusion of custom op support in a minimal build (#6663)
* Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build.
Add ability to explicitly enable custom op support in a minimal build.
Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)
2021-02-13 12:42:33 +10:00
Changming Sun
aa31ba5774
Merge CPU packaging pipelines (#6480)
1. Merge Nuget CPU pipeline, Java CPU pipeline, C-API pipeline into a single one.
2. Enable compile warnings for cuda files(*.cu) on Windows.
3. Enable static code analyze for the Windows builds in these jobs. For example, this is our first time scanning the JNI code.
4. Fix some warnings in the training code.
5. Enable code sign for Java. Previously we forgot it.
6. Update TPN.txt to remove Jemalloc.
2021-02-04 08:38:56 -08:00
Scott McKay
e1dc268e45
Add support for custom ops to minimal build. (#6228)
* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.
2021-01-25 10:41:00 +10:00
Edward Chen
9810b9e02b
Reduce amount of compiled CUDA device code (#6118)
Move CudaKernel from cuda_common.h to a new separate header, cuda_kernel.h. Update include sites to use cuda_kernel.h instead if they need CudaKernel. Inclusions of cuda_common.h are now more lightweight.

Make corresponding changes for ROCM execution provider code.

Other minor cleanup.
2020-12-14 15:27:40 -08:00
Ryan Hill
ba739a8000
Convert OpenVINO into a shared provider (#5778)
Same as Dnnl and TensorRT before it, now with more methods and more cleanup.
2020-11-20 17:39:57 -08:00
Scott McKay
7b76b57fc8
Support EPs that compile nodes in a minimal build. (#5776)
* Support EPs that compile nodes in a minimal build. This enables NNAPI being used.
2020-11-17 13:52:22 +10:00
Ryan Hill
8fa427b264
Ryanunderhill/backout 5014 (#5167)
* Revert 5014
2020-09-14 22:48:00 -07:00
Ryan Hill
d792af776d
Remove Cuda dependency from TensorRT shared provider (#5014) 2020-09-04 11:35:02 -07:00
gwang-msft
7ca8388dc9
[ORT Mobile] file format schema and file I/O code (#4973)
* ort mobile file format schema and [de]serializing code
2020-09-01 11:51:31 +10:00
edgchen1
b41e5e88fb
Add more node debug dump functionality. (#4921)
Add ability to dump node inputs/outputs to files, filter nodes, configure behavior with environment variables.
2020-08-31 10:17:23 -07:00
Scott McKay
db7669b225
Reduce ONNX dependency in minimal build (#4890)
* Next round of changes.

Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
2020-08-23 07:02:13 +10:00
Scott McKay
e00ad83f2b
Initial changes to disable code in a minimal build (#4872)
* Initial set of changes to start disabling code in the minimal build. Breaking changes into multiple PRs so they're more easily reviewed. Focus on InferenceSession, Model and Graph here. SessionState will be next.
Needs to be integrated with de/serialization code before being testable so changes are all off by default.

Changes are limited to
  - #ifdef'ing out code
  - moving some things around so there are fewer #ifdef statements
  - moving definition of some one-line methods into the header so we don't need to #ifdef out in a .cc as well
  - exclude some things in the cmake setup

* Update session state and a few other places.

The core code builds if ORT_MINIMAL_BUILD is specified.
2020-08-22 07:14:53 +10:00
George Wu
f12e9de111
build fixes for https://github.com/microsoft/onnxruntime/pull/4721 (#4784)
* test

* test

* add missing CUDA header include

* debug

* fix

* fix python package for dnnl and tensorrt.

* fix

* fix windows build.

* revert

* target_link_directories for tensorrt shared lib.
2020-08-14 06:24:44 +08:00
Ryan Hill
ac725b53f6
Convert TensorRT provider into a shared library (#4721)
Lots of changes to shared library interfaces, new lighter weight design.
2020-08-10 21:17:16 -07:00
edgchen1
0ec90f7019
Put safeint_interface include directory into onnxruntime_common interface include directories to simplify usage by other targets. (#3546) 2020-04-16 10:34:32 -07:00
ytaous
2ce90cff4c
PR comments (#3374)
* PR comments

* PR comments

* PR comments

* PR comments

* PR comments

* PR comments

* PR comments

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-04-01 10:36:16 -07:00
Edward Chen
e542cfd0e0 Introduce training changes. 2020-03-11 14:39:03 -07:00
Scott McKay
a1db87b382
Add SafeInt bounds checking to memory allocation size calculations. (#3022)
* Add SafeInt bounds checking to memory allocation size calculations.

* Fix TensorRT library includes
2020-02-20 11:41:03 -08:00
Changming Sun
fc6773a65b
Add Tracelogging for profiling (#1639)
Enabled only if onnxruntime_ENABLE_INSTRUMENT is ON
2019-11-11 21:34:10 -08:00
Scott McKay
5c86889beb
Fix linux build issue with debug dump of shapes and data. (#2202)
Add option to dump just shapes or shapes and data.
2019-10-20 20:35:48 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
Scott McKay
c1a34a8ba6
Add ability to dump node input/output (#1202)
Address #1155

Add debug helper methods to be able to dump input name and shape information for node inputs, and the data from node outputs.

As the input data comes from graph inputs, initializers or node outputs we don't dump it.

Must be manually enabled by building with '--cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=ON'
2019-06-13 06:47:50 +10:00
Maik Riechert
ded7eeb033 make builds more robust (#906) (#932) 2019-04-29 12:58:20 -07:00
shschaefer
ff253631b5
Enable use of session based threadpool. (#854)
* Enable use of session based threadpool.

* Fix build dir issue
2019-04-18 10:20:46 -07:00
Ashwin Kumar
492d9fd6cc
Use Eigen ThreadPool in OnnxRuntime (#323)
* switch to nonblocking threadpool in inference session and sessions state

* switch to eigen threadpool - first draft

* refine

* refine

* add a switch to easily revert back to windows thread pool

* switch thread pool in test runner and turn on leak checker

* remove unncessary files

* fix build error

* more build fixes

* catch exceptions in parallel executor

* fix mac build error

* fix mac build error

* more build fixes

* more mac build fixes

* fix cv issue

* change macro to include cuda compiler for  disabled compiler warning

* try switching the macro to win32 only

* test #error

* move #disable warning to the top

* Update onnxruntime_framework.cmake

* move eigen include to public scope

* turn off eigenthreadpool by default and add todo comment
2019-01-15 15:19:30 -08:00
Changming Sun
5e113661a9 Build system upgrades (#281)
* update

* runas normal user
2019-01-07 13:15:24 -08:00
Pranav Sharma
7aef8a1cca Sync with internal master. 2018-11-22 20:56:43 -08:00
Pranav Sharma
89618e8f1e Initial bootstrap commit. 2018-11-19 16:48:22 -08:00