Commit graph

5158 commits

Author SHA1 Message Date
baijumeswani
6652d17dcd
Support lists as inputs to ORTModule (#8311) 2021-07-07 13:04:19 -07:00
Thiago Crepaldi
9a855fe9e7
Make Torch CPP extension build optional for packaging pipelines (#8305) 2021-07-07 07:24:58 -07:00
Tang, Cheng
d7c3703371
handle unsqueeze change in opset13 (#8308)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-06 22:30:24 -07:00
pengwa
2347a0aca8
Autograd Function Fallback bug fix - moe support (#8105)
* Support forward inputs orders like "Non_tensor/Tensor/Non_tensor". Correspondingly, support "None/Tensor_Grad/None" fpr backward outputs.

* Report RuntimeError when PythonOp detected but _enable_custom_autograd_function is enabled.

* Fix "PoliCheck ] - Defect : Term "hang", Component : orttraining\orttraining\python\training\ortmodule\__init__.py (1 issue)"

* rename call_convention->input_convention, input_tensor_requires_grads->input_requires_grads

* fix minor comment

* revert polycheck fix in case of conflict

* Update orttraining/orttraining/core/graph/training_op_defs.cc

Co-authored-by: Tim Harris <tiharr@microsoft.com>

* Apply suggestions from code review

Refine the schema description

Co-authored-by: Tim Harris <tiharr@microsoft.com>

* Resolve review comments

Co-authored-by: Tim Harris <tiharr@microsoft.com>
2021-07-07 08:58:01 +08:00
Nick Kreeger
40e5279f8f
Drop unused functions from math.h (#8304)
* Drop unused functions from math.h

* fix dnnl_conv.h
2021-07-06 19:18:18 -05:00
Nick Kreeger
62d1458ea8
Move kernel implementations outside of lookup table utility functions. (#8306) 2021-07-06 18:31:05 -05:00
baijumeswani
090bae21ab
Pinning pillow version to 8.2.0 to circumvent regression introduced by 8.3.0 (#8303) 2021-07-06 13:02:39 -07:00
Suffian Khan
008c5f7640
Use single builder image across Python versions for ROCm wheels (#8302)
* first attempt share docker image across python and torch versons

* set dependency between jobs

* fix yaml grammer

* remove python version from first stage

* clean deepspeed directroy

* split into two images according torch version

* fix yaml syntax

* invalidate cache

* remove DS to prevent torch 1.9.0 upgrade
2021-07-06 11:56:00 -07:00
RandySheriffH
56e4dd1d3e
Fix optimizer crash (#8274) 2021-07-02 17:19:15 -07:00
Suffian Khan
e71846b029
fix ld_preload for rocm (#8290) 2021-07-02 17:15:28 -07:00
Suffian Khan
036eee5b66
register softmaxinternal with rocm (#8289) 2021-07-02 16:29:18 -07:00
Pranav Sharma
969eb545d1
Update issue template to ask users to check known issues to avoid repetition. (#8288) 2021-07-02 15:36:14 -07:00
Tiago Koji Castro Shibata
0fa9ac3648
Remove path from telemetry strings (#8281) 2021-07-02 10:49:59 -07:00
Nick Kreeger
552806f3be
Fix lamda function formatting in layer_norm.cc (#8276) 2021-07-02 12:30:16 -05:00
baijumeswani
2bda2a62fd
Pin version of Pillow to 8.2.0 to circumvent noncompatibility with numpy (#8278) 2021-07-02 09:05:49 -07:00
Vincent Wang
88ec95ea96
Support OrtMemTypeCPUInput for ATenOp/ATenOpGrad (#8116) 2021-07-02 23:04:43 +08:00
Edward Chen
b42e7d2c78
Add iOS packaging pipeline (#8264)
Create a pipeline to produce the iOS package artifacts.
2021-07-02 06:21:59 -07:00
Tang, Cheng
a9a2394fa5
disable computation reduction optimization for non-gpu build (#8251)
* disable computation reduction optimization for non-gpu build

* fix comments in pr

* add cpu execution provider

* apply the core provider list to computation reduction optimizer

* try macro

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-01 16:43:51 -07:00
Vincent Wang
9cfe642b34
enable BN training in cpu inference build (#8269) 2021-07-01 13:15:59 -07:00
Tang, Cheng
996a98b3ac
fix the shared provider test for training build; expose more symbols to non cuda build (#8249)
* expose more symbols for non cuda build

* fix the test execution provider for training build

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-01 11:03:02 -07:00
Zuwei Zhao
b46310b349
Integrate onnxruntime-extensions into onnxruntime. (#8143)
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-07-01 09:34:03 -07:00
baijumeswani
f616cd07b4
Provide torch module interface for ORTModule (#8148)
* Interface for the module manager and implementation of the torch module manager
2021-07-01 09:15:16 -07:00
Vincent Wang
ce9d134952
gather elements optimization (#8154) 2021-07-01 14:30:00 +08:00
Vincent Wang
ef8f50c4ab
ScatterNDGrad (#8261) 2021-07-01 13:49:49 +08:00
Thiago Crepaldi
97f1eea2ea
Propagate ROCM version to onnxruntime wheel package (#8247) 2021-06-30 13:52:22 -07:00
Edward Chen
665ecdf9ce
[CoreML EP] Use partitioning utils in CoreMLExecutionProvider::GetCapability(). (#8179)
Use partitioning utils in CoreMLExecutionProvider::GetCapability().
2021-06-30 09:57:36 -07:00
Scott McKay
4993680e56
Graph::GetNodeProvidesGraphOutput -> NodeProducesGraphOutput (#8243)
'GetNode' is a little confusing as it returns a bool.

Update a couple more places where GetNodeOutputsInGraphOutputs was being used unnecessarily.
2021-06-30 20:43:33 +10:00
Scott McKay
b3479367cf
Add helper to check if node provides a graph output. (#8186)
* Add helper to check if node provides a graph output. The current approach unnecessarily creates a vector when most of the optimizers only care about a true/false response.

* Undo accidental change

* Fix a couple of issues due to copying from larger set of changes.
2021-06-30 12:15:42 +10:00
Scott McKay
17d4545ccb
Improve readability of Graph::PerformTopologicalSortAndCheckIsAcyclic. (#8187) 2021-06-30 12:15:17 +10:00
Guoyu Wang
9b19241b27
Disable update database for Android code coverage (#8182) 2021-06-29 18:50:16 -07:00
Ankur Verma
fa8768723a
Allow custom loaders for testing (#8150) 2021-06-29 16:54:36 -07:00
Nick Kreeger
507d97b200
Add initializer for embed layer norm unit tests. (#8196) 2021-06-29 17:57:06 -05:00
Pranav Sharma
9ec0fd6a1c
Revert the cuda algo finding change as this causes a significant memory bloat. (#8181)
* Revert the cuda algo finding change as this causes a significant memory bloat.

* Address PR comment
2021-06-28 22:49:36 -07:00
Thiago Crepaldi
83be3759bc
Add post-install command to build PyTorch CPP extensions from within onnxruntime package (#8027)
ORTModule requires two PyTorch CPP extensions that are currently JIT compiled. The runtime compilation can cause issues in some environments without all build requirements or in environments with multiple instances of ORTModule running in parallel

This PR creates a custom command to compile such extensions that must be manually executed before ORTModule is executed for the first time. When users try to use ORTModule before the extensions are compiled, an error with instructions are raised

PyTorch CPP Extensions for ORTModule can be compiled by running:
python -m onnxruntime.training.ortmodule.torch_cpp_extensions.install

Full build environment is needed for this
2021-06-28 18:11:58 -07:00
Changming Sun
25db5706bb
Change "Export PyTorch CustomOp" build pipeline to use Ubuntu 20.04 (#8158)
Change "Export PyTorch CustomOp" build pipeline to use Ubuntu 20.04
2021-06-28 16:13:55 -07:00
RajalakshmiSR
32ceaf4532
POWER10: Optimized SGEMM in MLAS (#8121)
* POWER10: Optimized SGEMM in MLAS

This patch introduces new optimized version of SGEMM in MLAS
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.

* Adjust tabs in cmake

Changing tabs to spaces as per review comment.

* Adjust tabs in new sgemm file

Changing tabs to spaces in SgemmKernelPOWER10.cpp.

* Reusing functions using common header

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-06-28 14:41:08 -07:00
Changming Sun
9b75be3d3e
Fix a warning in pool.cc (#8168)
The warning is:
"Potential comparison of a constant with another constant. at D:\a_work\1\s\onnxruntime\core\providers\cuda\nn\pool.cc@167,21".

It was found by VS static code analyzer in our CUDA EP.
2021-06-28 07:58:02 -07:00
Nick Kreeger
821492f6f5
Drop std::count_if() in *EmbedLayerNorm Ops. (#8161)
* Drop std::count_if() in *EmbedLayerNorm Ops.

Profiling has shown that summing up the vector using the std function
can be 2x slower than just a simple plain vector sum loop.

* try and revert sumodule commits

* ensure mask is 1.
2021-06-28 08:36:02 -05:00
Pranav Sharma
523db6ef44
Check for null runoptions in Run (#8163) 2021-06-25 21:34:31 -07:00
Nick Kreeger
588511d6da
Rename embedlayernorm_op_test.cc to embed_layer_norm_op_test.cc (#8160)
* Rename embedlayernorm_op_test.cc to embed_layer_norm_op_test.cc

* cleanup
2021-06-25 21:53:50 -05:00
Nick Kreeger
800b62a139
Create a quantized EmbedLayerNorm for ORT. (#8124)
Create a quantized EmbedLayerNorm Op for ORT
2021-06-25 17:51:43 -05:00
liqunfu
9366114028
make pipelines to support torch1.8.1 and torch1.9.0 (#8084) 2021-06-25 14:55:49 -07:00
Changming Sun
c716b56f26
Update C++ Standard from 14 to 17 (#8041)
Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions:

1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work.

2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long.

3. Please avoid to use C++17 in CUDA files(*.cu). Also, the *.h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.
2021-06-25 14:08:01 -07:00
Guoyu Wang
9618b6ba62
Fix mac shared_provider warning (#8153) 2021-06-25 13:25:28 -07:00
Changming Sun
a41d0db43c
Enable C# GPU tests in Windows GPU CI pipeline (#8142) 2021-06-25 08:11:45 -07:00
Chi Lo
91075255a7
Enable TRT provider option configuration for C# (updated version) (#7808)
* prepare for C# to configure provider options

* add c# code

* revert modification

* Add update provider info configuration in trt ep side

* fix bugs

* fix bug for compiler error C2259

* Add c# test

* fix bug

* fix bug

* Properly deal with string

* Add c# api for accepting trt provider options

* fix bug

* Modify C# test

* add shared lib test

* Add get provider options functionality

* clean up

* clean up

* fix bug

* fix bugs for CI

* Fix bugs for CI and documentation

* Move TRT EP provider options related functions out of C API

* revert

* fix bug

* refactor

* add check for provider options string

* code refactor

* fix CI bug

* Fix CI bugs

* clean up

* fix bug

* Fix bug for Post Analysis

* fix accidental bug

* Add API_IMPL_BEGIN/API_IMPL_END

* clean up

* code refactor

* code refactor

* fix CI fail

* fix bug

* use string append

* Change the code to better handle strncpy and string append
2021-06-25 03:21:22 -07:00
Ryan Hill
49938cce77
Fix Python Cuda loading issues (#7939) 2021-06-25 02:26:50 -07:00
Changming Sun
378a98597e Use std::make_reverse_iterator directly 2021-06-24 15:29:39 -07:00
ashbhandare
00e44861c5
Fetching frontier tensors to frontend for ORTModule (#8086)
* Fetching frontier tensors to frontend

* Move before session initialize call
2021-06-24 15:04:35 -07:00
SilvanK4t1qbit
eb36258df4
Enable signed int8 data type for activations in static quantization (#7029)
* Add support for signed int8 static activation quantization. Make symmetrization in quantization switcheable
2021-06-24 14:42:22 -07:00