Commit graph

116 commits

Author SHA1 Message Date
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
Chi Lo
7242627fec
Integrate TensorRT into GPU Python package (#9785)
* add use_tensorrt build option

* Add use_tensorrt to running tests

* add use_tensorrt for Windows

* make trt ep to skip backend test

* make trt ep to skip backend test

* Fix bug

* Add/Modify description

* modify for debug

* swtich pool to test

* modify to debug

* modify to debug

* add vobersity

* refine the code

* refine the code

* refine the code

* fix flake8 warning

* refine the code

* add pre_load check for trt as well as add cupti lib to cuda depedencies

* modify script to make trt build path the same as cuda

* show error message when user wants to run TensorRT but TensorRT is not installed in the env

* fix bug

* fix bug

* add trt lib for manylinux

* include cuda_dependencies for trt

* rewrite the condition to throw exception

* make code more compact
2021-11-18 13:26:51 -08:00
Suffian Khan
b409cbe62c
Fix incorrect library reference in Python manylinux package for CUDA (#9769) 2021-11-16 13:40:17 -08:00
Guoyu Wang
5ad6dbb314
Remove experimental from ORT format namespace (#9729)
* schema change

* cc channges

* remove temp debug code

* Adding fbs namespace to session_state_flatbuffers_utils.h

* Add fbs namepsace to all ort format utils
2021-11-11 19:46:30 -08:00
Suffian Khan
e6f0fdd653
Strip AMD libraries bundled with Python package due to libonnxruntime_providers_rocm.so change (#9679)
* remove AMD library depedence from libonnxruntime_providers_rocm.so

* fix flake error

* remove rocm dependency from original library as well
2021-11-11 09:32:09 -08:00
Weixing Zhang
e11fde0179
libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. (#9618)
* libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-11-01 19:12:09 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
Abhishek Jindal
87e726d1a0
Abjindal/merge eager with external custom ops (#8986)
* switching to pytorch nightly build

* adding eager mode

* enable pybind and remove install step

* removing auditwheel repair process

* installing package

* adding auditwheel back

* disabling auditwheel repair for eager mode

* typo correction
2021-10-14 13:19:45 -07:00
baijumeswani
bcdb411c8d
Implement FusedAdam for ORT adapted from DeepSpeed (#9266) 2021-10-05 20:50:34 -07:00
Thiago Crepaldi
ceb51dda4a
Support external torch cpp extensions on ORTModule (#9223) 2021-09-30 10:37:35 -04:00
Wei-Sheng Chin
1b0816859f
Only wrap sub-modules which can be wrapped as ORTModule (#9021) 2021-09-27 17:18:22 -07:00
Ryan Hill
b7971575f8
Fix python manylinux to not load cuda if it fails to load dependencies (#8882)
* Fix python manylinux to not load cuda if it fails to load dependencies
2021-09-07 11:09:25 -07:00
liqun Fu
f126a12699
decouple pytorch from onnxruntime training build (#8815) 2021-09-01 16:31:53 -07:00
pengwa
3eb08d4dc7
custom autograd func memory (#8901)
* remove PythonOpGrad control dependency && avoid segement fault

* comment alignment

* fix bugs
2021-09-01 09:29:26 +08:00
satyajandhyala
31926176ac
Support external custom operator schemas on Ubuntu (#8807)
* Expose symbols in onnx and protobuf namespaces in python when building with --enable_external_custom_op_schemas

* Add external onnx and protobuf files to wheel

* Added an example to demonstrate external custom ops use-case

* Added a Linux build pipeline to test external custom ops
2021-08-28 11:05:21 -07:00
Thiago Crepaldi
6f2f4721ec
Update Python setuptools classfiers to remove windows and mac (#8776) 2021-08-20 08:53:25 -07:00
liqun Fu
1a2b41dbbc
packaging pipeline produces -cpu- named packages due to a logical error (#8665) 2021-08-09 16:49:59 -07:00
Edward Chen
baf8c39a8d
Add Python checks pipeline (#7032)
This change adds a new pipeline for checking Python code. Currently this pipeline only runs flake8.
flake8 is also run as part of the CMake project builds, but we can switch over completely to the new pipeline later.
The .flake8 config file was also updated to make it easier to run standalone (flake8 --config ./.flake8) and some Python formatting issues were addressed in files that were not previously scanned.
2021-08-09 10:37:05 -07:00
liqun Fu
419fd5cc6e
reformat build suffix so that the latest is always correct (#8267) 2021-08-06 16:44:51 -07:00
liqun Fu
eab6c51413
to create a training cpu package for torch-ort documentation (#7845) 2021-08-05 16:43:37 -07:00
baijumeswani
816ad86d14
Configuring ORTModule - Internal Options (#8537) 2021-07-30 13:05:32 -07:00
Suffian Khan
e71846b029
fix ld_preload for rocm (#8290) 2021-07-02 17:15:28 -07:00
Thiago Crepaldi
97f1eea2ea
Propagate ROCM version to onnxruntime wheel package (#8247) 2021-06-30 13:52:22 -07:00
Thiago Crepaldi
83be3759bc
Add post-install command to build PyTorch CPP extensions from within onnxruntime package (#8027)
ORTModule requires two PyTorch CPP extensions that are currently JIT compiled. The runtime compilation can cause issues in some environments without all build requirements or in environments with multiple instances of ORTModule running in parallel

This PR creates a custom command to compile such extensions that must be manually executed before ORTModule is executed for the first time. When users try to use ORTModule before the extensions are compiled, an error with instructions are raised

PyTorch CPP Extensions for ORTModule can be compiled by running:
python -m onnxruntime.training.ortmodule.torch_cpp_extensions.install

Full build environment is needed for this
2021-06-28 18:11:58 -07:00
liqunfu
9366114028
make pipelines to support torch1.8.1 and torch1.9.0 (#8084) 2021-06-25 14:55:49 -07:00
Changming Sun
1fa6986656
Chang how numpy version is handled. (#8130)
Numpy has binary compatibility, which means "binaries compiled against a given version of NumPy will still run correctly with newer NumPy versions, but not with older versions." So, if an onnx runtime package was built with numpy version A, then at run time it requires numpy version >=A. In this change, we read numpy version from the installed packages at build time, to avoid manually keeping the build time/runtime consistency.
2021-06-23 14:08:37 -07:00
Changming Sun
96989b83ee
Create python packages for DML (#8061) 2021-06-16 16:59:12 -07:00
Changming Sun
b854f2399d
Update manylinux build scripts and GPU CUDA version from 11.0 to 11.1 (#7632)
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs). 
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2.  (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use. 
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines.  In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build. 
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines. 
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
2021-06-02 23:36:49 -07:00
liqunfu
bed6e87cbd
add environment variable to control default training package's local version (#7849) 2021-05-26 22:44:20 -07:00
George Wu
1c6b6f696e
fixes for cuda centos/manylinux (#7830)
* fixes for cuda centos/manylinux

* remove providers_shared.so dep processing.
2021-05-25 19:38:59 -07:00
Scott McKay
c4f515d380
- Fix training cmake file so it builds if --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF is specified. (#7789)
- Fix check on cudart_versions when building on Windows to handle None being returned
2021-05-23 09:53:15 +10:00
Ryan Hill
c99aa3a3f3
Ryanunderhill/cuda shared (#7626)
* First iteration of making cuda a shared provider.
Separated out shared OpKernel change, so doing this to merge with that change.

* More cuda shared library refactoring

* More cuda shared library refactoring

* More build options tested, converted the training ops over.

* Fix merge breaks

* Fix submodules

* Fix submodules

* Fix submodules

* Fix python

* Fix compile errors

* Duplicate symbol fix

* Test fix for ROCM provider

* Another ROCM test workaround

* ROCM Build Test

* ROCM build fix

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM test

* Reduce header dependencies

* Remove redundant namespace

* Test fix for linux

* Fix linux build

* Fix Eigen build error

* Fix unused parameter warning

* Test link error

* Another linker test

* Linker test

* Linker test

* Another test

* Another build test

* Fix linux link error

* Build test

* Fix control flow ops to use common base class with core code

* Remove extra qualifiers

* Fix template syntax for linux

* Fix cuda memory leak

* Fix pybind

* Test disabling cast

* Cleanup

* Restore cuda in test

* Remove more header dependencies

* Test not adding cuda provider to session

* Make GetProviderInfo_CUDA throw

* No-op cuda provider creation

* Fix some setup issues

* Fix memory cleanup on unload

* Diagnostics

* Don't unload library

* Add diagnostics

* Fix deleting registry at right time.

* Test disabling profiler

* Fix merge break

* Revert profiler change

* Move unloading of shared providers into Environment

* Free more global allocations before library unloads

* Add more diagnostics

* Move unloading back to the OrtEnv as there are multiple Environments created during a session.

Remove some library dependencies for tests.

* Fix more cmake files

* ERROR -> WARNING

* Fix python shutdown

* Test not using dml in pipeline

* Change python version and disable dml

* Update python version

* Test adding unload method for shared providers

* Disable DLL test

* Python test

* Revert "Python test"

This reverts commit c7ec2cfe98.

* Revert "Disable DLL test"

This reverts commit e901cb93aa.

* Revert "Test adding unload method for shared providers"

This reverts commit c427b78799.

* Point to RyanWinGPU

* Revert python version

* Fix id_to_allocator_map

* Another python exit test

* Remove extra debug messages
Try a more clean python shutdown through DllMain

* Revert DllMain idea, it didn't work

* Merge conflicts

* Fix merge with master issues.

* Comments

* Undo edit to file

* Cleanup + new training ops

* Revert yml changes

* Fix another merge error

* ROCM fix

* ROCM fix v2

* Put back Linux hack, it is necessary

* Stupid fixes

* Fix submodule out of sync

* ROCM fix 3

* ROCM 4

* Test java fix

* Fix typos

* Java test on my VM

* Fix build error

* Spotless fix

* Leave temp file around to load properly

* Fix cleanup on exit

* Fix break

* Java comments

* Remove LongformerAttentionBase workaround

* Spotless fix

* Switch yml back to regular build pool

* Revert "Switch yml back to regular build pool"

This reverts commit be35fc2a5a.

* Code review feedback

* Fix errors due to merge

* Spotless fix

* Fix minimal build

* Java fix for non cuda case

* Java fix for CPU build

* Fix Nuphar?

* Fix nuphar 2

* Fix formatting

* Revert "Remove LongformerAttentionBase workaround"

This reverts commit 648679b370.

* Training fix

* Another java fix

* Formatting

* Formatting

* For orttraining

* Last orttraining build fix...

* training fixes

* Fix test provider error

* Missing pass command

* Removed in wrong spot

* Python typo

* Python typos

* Python crash on exit, possibly due to unloading of libraries.

* Remove test_execution_provider from training build
Only enable python atexit on windows
Remove assert on provider library exit

* Still can't unload providers in python, alas.

* Disable Nvtx temporarily

* MPI Kernels for Training

* MPI Kernels part 2

* Patch through INcclService

* Oops, wrong CMakeLists

* Missing namespace

* Fix missing ()

* Move INcclService::GetInstance around to link nicer

* Missing }

* Missing MPI libraries for Cuda

* Add extra GetType functions used by MPI

* Missing Nccl library

* Remove LOGS statements as a test

* Add in a couple more missing GetType methods

* Update comments

* Missed a logging reference in mpi_context.h

* Convert aten_op to shared (due to marge with master)

* Test moving DistributedRunContext instance into shared provider layer
(with purpose error to verify it's being built properly)

* Test passed, now with fix

* Missing static

* Oops, scope DistributedRunContext to just NCCL

* Merge related issues and code review feedback.

* Merge error

* Bump to rel-1.9.1 (#7684)

* Formatting

* Code review feedback for Java build on non Windows

* Remove cupti library dependency from core library

* Test Java pipeline fix

* Linux build fix

* Revert "Linux build fix"

This reverts commit a73a811516.

* Revert "Remove cupti library dependency from core library"

This reverts commit 6a889ee8bf.

* Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages

* Add cuda to Tensorrt nuget package

* onnxruntime_common still has a cuda header dependency

Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
2021-05-20 07:53:47 -07:00
liqunfu
359fe1d197
Liqun/ort training version (#7620) 2021-05-14 09:54:19 -07:00
Adrian Tsai
70e67ddd2b
Update DirectML version to 1.5.1 and enable ARM/ARM64 builds with DML (#7511)
* Update DirectML to version 1.5.1
* Enable --use_dml with ARM and ARM64
* Add ARM/ARM64 binaries to nuget packages
2021-04-30 00:49:30 -07:00
Scott McKay
d6df5764d7
Android package infrastructure (#7430)
* Include ORT format model conversion scripts and infrastructure in ORT python package.
  - tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).

* Address PR comments
2021-04-30 14:23:54 +10:00
liqunfu
4cbd2cce9b
. (#7466)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-27 09:20:21 -07:00
Suffian Khan
7a3c1787af
Add CI pipeline to publish Python training package targeting Rocm (#7417)
* first attempt rocm training wheel

* modifications needed to python packaging pipeline for Rocm 4.1

* changges to not conflict with cuda

missed stage1 changes

remove package push

add option r to getopt

try again without python install

try again without python install

try again without python install

split pipelines and add back push to remote storage

try on cuda gpu pool

try again

try again

try running without az subscription set

try again on original pipeline

change pool

passing AMD Rocm whl on AMD-GPU pool

split rocm pipeline from cuda pipeline

remove comments

* try adding Rocm tests as well

* try with tests in place

* fix trailing ws

* add training data

* try again as root for tests

* use python3

* typo

* try to map video, render group into container

* try again

* try again

* try to avoid yum error code

* make UID 1001

* try without yum downgrade

* define rocm_version=None

* remove CUDA related comments for Rocm Dockerfile

* Dont pin nightly torch torchvision torchtext versions as they expire (for now nightly is required for Rocm 4.1)

* missed requirements-rocm.txt from last commit

* fix whitespace
2021-04-23 17:22:31 -07:00
liqunfu
75d8319286
Liqun/ort package name2 (#7337) 2021-04-13 20:36:24 -07:00
liqunfu
4c862c73ed
for training to use new python package naming convention to explicitl… (#7204) 2021-04-13 16:19:42 -07:00
jeyblu
61ba9ac1bb
matmul in dnnl (#7311)
* update dnnl to v2.2

* dnnl matmul
2021-04-12 08:03:03 -07:00
KeDengMS
6987106bf5
Add missing Python dependencies for ORT training (#7104)
* Add missing Python dependencies for training

cerberus - option parsing
h5py - checkpoint
onnx - model proto
packaging/sympy - symbolic shape inference

* Separate requirements.txt for inference and training Python packages.
2021-03-23 18:43:19 -07:00
Chi Lo
8c3b59a026
Quantization calibration refactor (#6893)
* Code refactor

* Modify code to tackle OOM when calibrating on larget dataset

* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax

* Add COCO val 2017 annotation

* Fix mismatch issue when setting keepdims on ReduceMin/ReduceMax

* Fix bug of "No module named:onnxruntime.quantization.CalTableFlatBuffers"

* Check and install flatbuffers module

* Add script to donwload coco dataset image and refactor example

* Fix bug of "No module
named:onnxruntime.quantization.CalTableFlatBuffers"

* Add CalTableFaltBuffers as module

* Remove annotation, user can download by themselves.

* Uncommet code

* Add back instances_val2017.json

* Make sure flatbuffers installed when ORT is installed

* Refactor code to call coco api

* Enable FP16 for example
2021-03-19 01:09:11 -07:00
Faith Xu
72eb5de0e2 Add Python 3.9 to pypi metadata 2021-02-12 20:00:17 -08:00
suryasidd
1a5b75a554
[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py
2021-01-28 23:00:41 -08:00
Faith Xu
7a0ab9c450
Update pypi package metadata (#6354)
* Update setup file data

* add missing comma

* remove python 3.5

* fix typo bracket
2021-01-27 19:27:37 -08:00
Tianlei Wu
ec81e29c84
Add longformer to python package (#6314)
* add longformer to python package
* move test related script and data to a new folder
2021-01-12 10:38:39 -08:00
S. Manohar Karlapalem
40926867c3
Add OpenVINO EP shared lib to Py Wheel (#5920)
* Add OpenVINO EP shared lib to Py Wheel

Include the libonnxruntime_providers_openvino.so/.dll to the wheel

* Follow libs.extend pattern as other EPs
2020-11-24 21:27:13 -08:00
S. Manohar Karlapalem
ff58f621fa
Remove nGraph Execution Provider (#5858)
* Remove nGraph Execution Provider

Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice

**Deprecation Notice**

| | |
| --- | --- |
| Deprecation Begins	| June 1, 2020 |
| Removal Date |	December 1, 2020 |

Starting with the OpenVINO™ toolkit 2020.2 release, all of the features
previously available through nGraph have been merged into the OpenVINO™
toolkit. As a result, all the features previously available through
ONNX RT Execution Provider for nGraph have been merged with ONNX RT
Execution Provider for OpenVINO™ toolkit.

Therefore, ONNX RT Execution Provider for **nGraph** will be deprecated
starting June 1, 2020 and will be completely removed on December 1,
2020. Users are recommended to migrate to the ONNX RT Execution Provider
for OpenVINO™ toolkit as the unified solution for all AI inferencing on
Intel® hardware.

* Remove nGraph Licence info from ThirdPartyNotices.txt

* Use simple Test.Run() for tests without EP exclusions

To be consistent with rest of test code.

* Remove nGraph EP functions from Java code
2020-11-19 16:47:55 -08:00
Tianlei Wu
c5d4ae0401
Add transformers tools to python package (#5090)
* Add transformers to onnxruntime python package
2020-09-10 15:42:15 -07:00