Commit graph

306 commits

Author SHA1 Message Date
liqunfu
4c862c73ed
for training to use new python package naming convention to explicitl… (#7204) 2021-04-13 16:19:42 -07:00
Yulong Wang
405ca49012
build ONNXRuntime into WebAssembly (#6478)
* Simplified version of WebAssembly support to keep most of existing data structures and add cmake using Ninja and emcmake

* Clean up CMakeLists.txt and add an example to create and compute a kernel

* Load a model from bytes and remove graph building steps

* Add all cpu and contrib ops with mlas library

* WebAssembly build with Onnxruntime C/CXX API

* Use protobuf cmakefile directory instead of adding every necessary source file

* Fix invalid output at example

* add missing files

* Change an example to use Teams model and support ort mobile format

* add API for javascript

* fix input releasing in _ort_run()

* update API

* Let onnxruntime cmake build WebAssembly with option '--wasm'

* allow one-step building for wasm

* Make build script working on Linux and MacOS

* Fix broken build from Windows command

* Enable unit test on building WebAssembly

* Resolve comments

* update build flags

* wasm conv improvement from: 1) GemmV; 2) Depthwise direct convolution 3x3; 3) Direct convolution 3x3

* Cleaned mlas unittest.

* use glob

* update comments

* Update baseline due to loss scale fix (#6948)

* fix stream sync issue (#6954)

* Enable type reduction in EyeLike, Mod, random.cc CPU kernels. (#6960)

* Update EyeLike CPU kernel.

* Update Mod CPU kernel.

* Update Multinomial CPU kernel.

* Slight improvement to Pad CPU kernel binary size.

* Update RandomNormal[Like], RandomUniform[Like] CPU kernels.

* Fix warning from setting multiple MSVC warning level options. (#6917)

Fix warning from setting multiple MSVC warning level options. Replace an existing /Wn flag instead of always appending a new one.

* MLAS: quantized GEMM update (#6916)

Various updates to the int8_t GEMMs:

1) Add ARM64 udot kernel to take advantage of dot product instructions available in newer cores. Some models run 4x faster than the stock implementation we used before.
2) Refactor the x64 kernels to share common code for AVX2(u8u8/u8s8/avxvnni) vs AVX512(u8u8/u8s8/avx512vnni) to reduce binary size.
3) Extend kernels to support per-column zero points for matrix B. This is not currently wired to an operator.

* Implement QLinearAveragePool with unit tests. (#6896)

Implement QLinearAveragePool with unit tests.

* Attention fusion detect num_heads and hidden_size automatically (#6920)

* fixed type to experimental session constructor (#6950)

* fixed type to experimental session constructor

Co-authored-by: David Medine <david.medine@brainproducts.com>

* Update onnxruntime_perf_test.exe to accept free dimension overrides (#6962)

Co-authored-by: Ori Levari <orlevari@microsoft.com>

* Fix possible fd leak in NNAPI (#6966)

* Release buffers for prepacked tensors (#6820)

Unsolved problems:

1. One test failure was caused by a bug in Cudnn rnn kernels, when they can allocate a buffer and partially initialize it, the garbage data near tail of the buffer caused problem in some of the hardware. To attack this problem in a broader sense, should we add code in our allocators, and during a memory fuzzing test, fill an allocated buffer with garbage before returning to the caller?


2. Prepacking is used more widely than we know. For instance, Cudnn rnn kernels also cache their weights. They mix several weight tensors together into a single buffer, and never touch the original weight tensor anymore. This is the same idea with pre-pack, but they didn't override the virtual function, and they never tried to release those weight tensors, leading to memory waste. It also seems to me that there are some other kernels have similar behavior. Wonder how much memory we can save if we try to cleanup those too.

3. Turning off memory pattern planning does increase memory fragmentation, leading to out of memory error in some training test cases. Perhaps we can revisit the idea of pushing kernels-creation stage earlier, and then during initializer deserialization, we only avoid tracing those that will be prepacked.

* Enable type reduction for Range, ReverseSequence, ScatterND, Split, and Unique CPU kernels. (#6963)

* add CI

* fix test in ci

* fix flags for nsync in wasm build

* add copyright banner

* fix wasm source glob

* add missing exports

* resolve comments

* Perf gain by make packb wide to 4 from 16 on GEMM for WASM.
Remove no need direct conv in previous perf tuning.

* fix buildbreak introduced from latest master merge

* fix buildbreak in mlasi.h

* resolve all comments except MLAS

* rewrite packb related 3 functions for WASM_SCALAR seperately rather than using #ifdef in each.
and other changes according to PR feedback in mlas.

* More complete scalar path in sgemm from Tracy.

* Fix edge case handling in depthwise conv2d kernel 3x3. where:
  *) support input W==1 and H==1
  *) recalc in accurate pad_right and pad_bottom
  *) support hidden pad_right == 2 or pad_bottom == 2 when W == 1 or H==1 and no pad left/top

* Add more test coverage for conv depthwise from Tracy.
Fix one typo according to PR.

* resolve comments

* replace typedef by using

* do not use throw in OrtRun()

* output error message

Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com>
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Co-authored-by: David Medine <david.eric.medine@gmail.com>
Co-authored-by: David Medine <david.medine@brainproducts.com>
Co-authored-by: Ori Levari <ori.levari@microsoft.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>
Co-authored-by: Chen Fu <chenfucs@gmail.com>
2021-04-06 16:18:10 -07:00
Changming Sun
2fcd69d644
Cleanup build.py (#7245) 2021-04-05 18:49:29 -07:00
Changming Sun
5bd192c439
Update ContribOperators.md (#7246) 2021-04-05 17:11:33 -07:00
Guoyu Wang
d500c5952b
Add Android AAR packaging script for ORT-Mobile (#7138)
* Add Android aar packaging script for ORT-Mobile

* Address CR comments
2021-03-30 18:42:18 -07:00
Ben Niu
d1acdd4f4b
Support building ARM64EC onnxruntime.dll (#6999) 2021-03-29 15:35:30 -07:00
Yufeng Li
c965878a69
fix a bug in global average pool and add unit test (#6913)
* fix bug in QGlobalAveragePool

* add unit test for quant GlobalAveragePool

* not run quantization tests if disable_contrib_ops enabled
2021-03-22 20:01:27 -07:00
Thiago Crepaldi
867804bea1
Add auto doc gen for ORTModule API during CI build (#7046)
In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI
2021-03-22 10:20:33 -07:00
Tianlei Wu
8a6f6bc38b
add --enable_cuda_line_info to build.py (#6773) 2021-02-22 22:00:21 -08:00
Edward Chen
ee35be0129
Support specifying globally allowed types from build script (#6677)
Add initial support for constraining operator kernel implementations (which support this type-granularity) to a set of allowed types from scripts.
2021-02-22 14:05:00 -08:00
Ivan Stojiljkovic
c91f314217
Add robust dependency check for Python package (#6436)
* Add robust dependency check for Python package

* Add version_info.py to .gitignore

* Fix Linux build

* Fix Windows CPU build

* Fix Windows 32-bit build

* Minor tweak

* Generate version_info.py earlier in onnxruntime_python.cmake

* Print a user-friendly message if cuDNN is not found in

* Relax version requirements for CUDA 11 - only the major version has to match

* Fix PATH environment variable to include CUDA 11 in 'Python packaging pipeline' (Windows/GPU)

* Fix the build with cuDNN 7
2021-02-21 15:11:28 -08:00
liqunfu
2c5e603bad
Liqun/nuphar nuget (#6656)
create nuphar nuget with correct name
2021-02-17 16:13:07 -08:00
Scott McKay
33279250b5
Update a couple of usages of args.minimal_build to check for not specified vs empty list correctly. (#6688) 2021-02-16 14:46:51 +10:00
Scott McKay
25f7c93504
Require explicit inclusion of custom op support in a minimal build (#6663)
* Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build.
Add ability to explicitly enable custom op support in a minimal build.
Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)
2021-02-13 12:42:33 +10:00
Sheil Kumar
87cb6fd495
Add LearningModelBuilder to WinML Experimental Namespace along with various Audio operators (#6623)
* model building

* fix build

* winml adapter model building api

* model building

* make build

* make build again

* add model building with audio op

* inplace and inorder fft

* add ifft

* works!

* cleanup

* add comments

* switch to iterative rather than recursive and use parallelization

* batched parallelization

* fft->dft

* cleanup

* window functions

* add melweightmatrix op

* updates to make spectrogram test work

* push latest

* add onesided

* cleanup

* Clean up building apis and fix mel

* cleanup

* cleanup

* naive stft

* fix test output

* middle c complete

* 3 tones

* cleanup

* signal def new line

* Add save functionality

* Perf improvements, 10x improvement

* cleanup

* use bitreverse lookup table for performance

* implement constant initializers for tensors

* small changes

* add matmul tests

* merge issues

* support add attribute

* add tests for double data type windowfunctions and minor cleanup

* stft onesided/and not tests

* cleanup

* cleanup

* clean up

* cleanup

* remove threading attribute

* forward declare orttypeinfo

* warnings

* fwd declare

* fix warnings

* 1 more warning

* remove saving to e drive...

* cleanup and fix stft test

* add opset picker

* small additions

* add onnxruntime tests

* add signed/unsigned

* fix warning

* fix warning

* finish onnxruntime tests

* make windows namespace build succeed

* add experimental flag

* add experimental api into nuget package

* add experimental api build flag and add to windows ai nuget package

* turn experimental for tests

* add minimum opset version to new experimental domain

* api cleanup

* disable ms experimental ops test when --ms_experimental is not enabled

* add macro behind flag

* remove unused x

* pr feedback

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-02-12 14:17:10 -08:00
Scott McKay
13d7db9a98
Don't update the excluded ops/types unless args.update is true. Updating the exclusion info triggers rebuilding of all kernels using type reduction. (#6604) 2021-02-09 07:15:31 +10:00
Edward Chen
2ef792ae6e
Don't resolve symlink in resolve_executable_path(). (#6540) 2021-02-04 12:32:03 -08:00
Cian Hayes
6fc5237d9e
Introduce --enable_training_ops build flag (#6523)
* minimal_build with training ops

* Removing redundant comment from an earlier attempt at a fix

* Fixing a bad merge conflict resolution

* Responding to PR feedback

* tweaking the makefiles based on feedback

* combining two enable_training blocks in CMakeLists.txt
2021-02-01 21:54:16 -08:00
suryasidd
1a5b75a554
[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py
2021-01-28 23:00:41 -08:00
liqunfu
00afd00059
merge e2e with distributed pipeline (#6443)
merge e2e with distributed pipeline
2021-01-28 14:17:47 -08:00
Scott McKay
c84bb9df9f
Add ability to track per operator types in reduced build config. (#6428)
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs
2021-01-29 07:59:51 +10:00
Guoyu Wang
c05adb1147
Initial version of CoreML EP (#6392) 2021-01-27 10:43:17 -08:00
liqunfu
6ed12402a4
Liqun/liqun/enable pipeline parallel test2 (#6399)
* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-25 15:15:26 -08:00
Yufeng Li
c20965f9b2
enable pipeline to run quantization tests (#6416)
* enable pipeline to run quantization tests
setup test pipeline for quantization
2021-01-25 09:33:08 -08:00
wezuo
5b6753ce27
Wezuo/memory analysis (#5658)
* merged alloc_plan

* pass compilation

* Start running, incorrect allocation memory info

* add in comments

* fix a bug of recording pattern too early.

* debugging lifetime

* fix lifetime

* passed mnist

* in process of visualization

* Add code to generate chrome trace for allocations.

* in process of collecting fragmentation

* before rebuild

* passed mnist

* passed bert tiny

* fix the inplace reuse

* fix the exception of weight in pinned memory

* add guards to ensure the tensor is in AllocPlan

* add customized profiling

* debugging

* debugging

* fix the reuse of differnt location type

* add rank

* add the rank

* add fragmentation

* add time_step_trace

* Add summary for each execution step (total bytes, used/free bytes).

* add top k

* change type of top k parameter

* remove prints

* change heap to set{

* add the name pattern

* add the useage for pattern

* add partition

* change to static class

* add custom group

* remove const

* update memory_info

* in process of adding it as runtime config

* change the memory profiling to be an argument

* add some comments

* add checks to recored meomry_info in traaining session

* set the "local rank setting" to correct argument.

* addressing comments

* format adjustment

* formatting

* remove alloc_interval

* update memory_info.cc to skip session when there is no tensor for a particular memory type

* fix memory_info multiple iteration seg-fault

* consolidate mainz changes

* fixed some minor errors

* guard by ORT_MINIMAL_BUILD

* add ORT_MEMORY_PROFILE flag

* added compiler flag to turn on/off memory profiling related code

* clean up the code regarding comments

* add comments

* revoke the onnx version

* clean up the code to match master

* clean up the code to match master

* clean up the code to match master

Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
2021-01-19 08:30:55 -08:00
Scott McKay
e54e2f969d
Use readelf for minimal build binary size checks. (#6338)
* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function
2021-01-15 07:46:02 +10:00
Edward Chen
042053c55e
Add support for running Android emulator from build.py on Windows. (#6317) 2021-01-13 19:21:49 -08:00
Alberto Magni
5623cc6d17
Use onnxruntime_USE_FULL_PROTOBUF=OFF for the cuda execution provider (#6340)
This removes a special case of the cuda EP.
2021-01-13 18:27:13 +00:00
Changming Sun
5084ce0969
Update nuget build (#6297)
1. Update the ProtoSrc path. The old one is not used anymore.
2. Regenerate OnnxMl.cs
3. Delete some unused code in tools/ci_build/build.py
4. Avoid set intra_op_param.thread_pool_size in ModelTests in OpenMP build.
5. Fix a typo in the C API pipeline.
2021-01-11 10:49:05 -08:00
William Tambellini
39a988ce1c Upgrade build.py to assert for python 3.6+
Upgrade build.py to assert for python 3.6+
as python 3.5 cannot build anymore todays master.
2020-12-30 20:17:09 -08:00
Changming Sun
1b23b28706
Remove MKLML/openblas/jemalloc build config (#6212) 2020-12-30 17:18:19 -08:00
Michael Goin
bbb6b416f0
Fix ImportError in build.py (#6231)
There is a possible ImportError where build.py can import the wrong 'util' package if there are others present in `sys.path` already
2020-12-30 14:22:55 -08:00
Tixxx
32c67c2944
Deprecating Horovod and refactored Adasum computations (#5468)
deprecated horovod submodule
refactored adasum logic to be ort-native
added tests for native kernel and e2e tests
2020-12-17 16:21:33 -08:00
Edward Chen
64709b1335
Deprecate Python global configuration functions [Part 1] (#5923)
Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.
2020-12-15 11:32:43 -08:00
baijumeswani
dd2e5a1a05
state_dict and load_state_dict for ORTTrainer (#6095)
* add functions state_dict and load_state_dict to ORTTrainer

* unit tests for state_dict and load_state_dict for ORTTrainer
2020-12-14 11:55:52 -08:00
baijumeswani
523d187193
save data to and load data from an hdf5 file for checkpointing (#5975)
* save python dictionary to hdf5 representation and load an hdf5 file into a python dictionary

* unit tests for saving data to and loading data from hdf5 file
2020-12-08 11:40:57 -08:00
satyajandhyala
f68a256140
Android code coverage (#6061)
* Added Onnxruntime_GCOV_COVERAGE flag for Android.

* Set CMAKE_SYSTEM_NAME explicityly for Android.

* Added GCOV_PREFIX option to collect code coverage data.
Added a new python script to generate code coverage info.
Modified build pipeline to geneate Android code coverage info

* Added build command line option --android_coverage

* Added a comment describing the GCOV environment variables

* Fixed PEP8 issues.

* Added --android_coverage option to the build command.

* Increased Android emulator memory from 3K to 8K.

* Increased Android partition-size from 2GB to 4GB to overcome no-space-left-on-device error

* Removed source_dir from command line args.

* Use cwd absolute path to run tests.

* Added commands to output the contents of /data/local/tmp on the emulator.

* Added run_adb_shell function.

* Format changes.

* Removed keywd argument cwd.

* Removed Android in the --build_dir path.

* Removed commands added for debugging.

* Removed exxtra new-lines.

* Fix MacOs build pipeline failures by uninstalling openssl before running build script.

* Revert "Fix MacOs build pipeline failures by uninstalling openssl before running build script."

This reverts commit 90d0568fe533e9456c20d061a2d435c8fea48266.

* Change dir to the build directory where the tar file is copied.

* Changed the option from --android_coverage to --code_coverage

* Moved steps to generate Android code coverage to run_nnap_code_coverage.sh

* Require --android option if --code_coverage is specified.

* No code coverage needed for onnx_test_runner.

* Expect that the emulator is running when the script is executed.

* Fixed the title in the buildpipeline step.

* Fixed the formatting issue.

* Added a command line argument, ORT_ROOT, to run_nnapi_code_coverage.sh script

Co-authored-by: Satya Jandhyala <satyajandhyala@Satyas-Mac-mini.local>
2020-12-08 10:55:02 -08:00
baijumeswani
2b35f7d4f6
Fix build.py bug which prevents running some unit tests (#5990)
Also ignore an exception occurred for execution providers which generate compiled nodes
2020-12-03 08:57:55 -08:00
Guoyu Wang
6846c665ff
Use loose version in build.py (#5998) 2020-12-01 20:57:44 -08:00
Wenbing Li
2ec211ea7b
Support the cross compiling for Apple Silicon (#5974)
* support macos_arm64 cross compiling

* update the build docs

* update as commented.

* Update BUILD.md
2020-12-01 10:00:06 -08:00
Wenbing Li
1852ade75d
Enable the xcode build for Apple Silicon (arm64 MacOS) (#5924)
* fix the build script for macos/xcode

* add the version check

* correct the osx-arch configuration

* typo
2020-11-30 11:22:08 -08:00
Changming Sun
5fdd9f0fd2
Fix Python Linux GPU package name (#5943)
Fix Python Linux GPU package name. I accidentally added "noopenmp" to it.
2020-11-25 17:46:11 -08:00
Xueyun Zhu
58ea7b3572
temporarily disable test (#5868) 2020-11-23 15:18:37 -08:00
Ryan Hill
ba739a8000
Convert OpenVINO into a shared provider (#5778)
Same as Dnnl and TensorRT before it, now with more methods and more cleanup.
2020-11-20 17:39:57 -08:00
Edward Chen
bef06dac93
Automatically clean up build docker image cache. (#5843)
Follow up to #5811 to automate cleanup of the build docker image cache.
Added a script and build definition to clean up docker images that haven't been accessed recently.
2020-11-20 11:56:26 -08:00
S. Manohar Karlapalem
ff58f621fa
Remove nGraph Execution Provider (#5858)
* Remove nGraph Execution Provider

Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice

**Deprecation Notice**

| | |
| --- | --- |
| Deprecation Begins	| June 1, 2020 |
| Removal Date |	December 1, 2020 |

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.

* Remove nGraph Licence info from ThirdPartyNotices.txt

* Use simple Test.Run() for tests without EP exclusions

To be consistent with rest of test code.

* Remove nGraph EP functions from Java code
2020-11-19 16:47:55 -08:00
Hariharan Seshadri
62508ef0e4
Revert "Remove MKLML build config (#5559)" (#5855) 2020-11-19 10:53:08 -08:00
Edward Chen
71e7c2b423
Cache build docker images in container registry. (#5811)
This PR adds infrastructure to automatically cache docker images used in CI builds in a container registry.

Currently, build images are pulled from a container registry for some builds and built every time for others. The container registry requires maintenance to keep the images up to date and building images every time wastes build agent resources.

With this change, a given build image can be looked up in a cache container registry and if present, pulled, and otherwise, built and pushed. The uniqueness of a build image is determined by a hash digest of the dockerfile, docker build context directory, and certain "docker build" options. This digest is part of the image tag in the cache container repository.

The cache container registry will need to be cleaned up periodically. This is not automated yet.
2020-11-17 17:02:24 -08:00
Scott McKay
7b76b57fc8
Support EPs that compile nodes in a minimal build. (#5776)
* Support EPs that compile nodes in a minimal build. This enables NNAPI being used.
2020-11-17 13:52:22 +10:00
Scott McKay
a3f3a63206
Move OpenVINO specific validation function to somewhere more sensible, and rename to provide context on its usage. (#5822) 2020-11-17 10:58:43 +10:00