Commit graph

5176 commits

Author SHA1 Message Date
Chen Fu
df4cb6f301
Adding pytorch cpuinfo as dependency (#8178)
Pytorch cpuinfo library allows us to query current cpu features, micro-architecture and cache size, etc. These information is needed for targeted performance optimizations.

Unfortunately it does not work under Windows/ARM. We need to develop our own later
2021-07-12 14:21:12 -07:00
Sheil Kumar
eec8e1394a
Memory map files on windows to speed up model load (#8349)
* Memory map files on windows to speed up model load

* fix custom ops

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-07-12 11:52:08 -07:00
Yufeng Li
f6956e0259
Refactor qgemm file (#8322)
This PR purely extracts each kernel to a standalone file. No functionality change. It includes specifically:

leave the MlasGemm function and thread handling in the qgemm.cc
put dispatcher functions and the template functions (interfaces) that are required to implement a kernel into qgemm.h
put each kernel implementation in a separate file, which implements/specialize template functions: MlasGemmU8X8FixupZeroPointB, MlasGemmU8X8CopyPackA, MlasGemmU8X8CopyPackB, MlasGemmU8X8Kernel
determine the files to be compiled in cmake file
2021-07-12 10:13:20 -07:00
KeDengMS
b7c9696ac3
Symbolic_shape_infer fixes (#8280)
1. Add support for sequence ops: ConcatFromSequence, SequenceAt, SequenceInsert. There are other sequence ops supported by onnx that worked well after adding these ops, so no need to add all of them in symbolic_shape_infer
2. For If node, the two branches output might have different shapes. In that case, for sequence output, use None in dimension; For tensor output, create a new symbolic dimension.
3. Fix a bug in Tile, where input for repeats might be of unknown value
4. Topological sort of nodes in graph need to consider implicit input in subgraphs for If/Loop/Scan ops
5. Generate unique prefix for new dimensions inside subgraph
2021-07-09 19:14:26 -07:00
Guoyu Wang
10142f9510
Add metadata_props to ORT model (#8340)
* Add metadata_props to ORT model

* Minor update

* Update python binding, and increase the minimal pipeline size threshold

* Fixed a small bug in serializing ir_version

* Remove temp ort.py.fbs and add it to .gitignore
2021-07-09 11:28:27 -07:00
Changming Sun
60641a19e4
Add "/external:templates-" to VC++ flags (#8338) 2021-07-09 11:23:53 -07:00
Tang, Cheng
e467d78a11
fix a typo (#8334)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-09 09:24:43 -07:00
Tang, Cheng
598454bb5f
Fix the mix precision handle for square case (#8333)
* handle unsqueeze change in opset13

* fix the node arguments index check for square case (x * x)

* Revert "fix the node arguments index check for square case (x * x)"

This reverts commit c66344f0a82c35d8c24d31f2264cf7e9b235ce22.

* handle the square case (x * x) for node argument search

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-09 09:24:19 -07:00
Rachel Guo
187743726b
[CoreML EP] Add Int32<->Int64 handling around coreml ep (#8183)
* initial int32-int64 type handling

* initial

* clean and fix UT error

* modify code comments

* address partial pr comments

* minor update

* address pr comments

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2021-07-09 09:08:05 -07:00
Hariharan Seshadri
5369821ad6
Support SpaceDepth ops in the CUDA and ROCM EPs (#7960) 2021-07-09 01:00:22 -07:00
Scott McKay
1b2e1a7e0c
Refactor QDQ optimizers to enable future usage in minimal build (#8191)
* Add new transformer that can split node selection from node modification to allow just the modifications to be applied at runtime in a minimal build. This is the first step of a few to enable a QDQ model to be optimized for the NNAPI EP and/or the CPU EP at runtime in a mobile scenario.
Add generic and QDQ specific helpers for selection and modification.
Replace existing QDQ optimizers with optimizer based on new approach.
2021-07-09 16:11:43 +10:00
Hariharan Seshadri
46e5c8d4b9
Cosmetic change in test infrastructure (#8292) 2021-07-08 21:52:02 -07:00
pengwa
5454af4b95
decouple the shared python dependency (#8294)
* remove warnining message for non-training build

* move to/from dlpack for onnxruntime_python back into python project
2021-07-09 11:47:11 +08:00
Dmitry Yutkin
067759b387 Fix bad URL to huggingface onnx-export example notebook 2021-07-08 15:01:46 -07:00
satyajandhyala
84bc20fe9d
Enable cast propagation with level one by default. (#8286) 2021-07-08 14:38:09 -07:00
RandySheriffH
f40df30219
Replace functions with secured version for OSX compliance (#7586)
* replace strlen with strnlen

* replace vsnprintf with vsnprintf_l

* add macro

* switch to std numeric::limits

* apply uint16 max

* fix build err

* fix mac build

* define MAX_STR_LEN

* define MAX_STR_LEN

* fix typo

* trim empty lines

* apply constexpr

* fix typo

* add namespace

* fix build err

* rename global constant

Co-authored-by: Randy <Randy@randysmac.attlocal.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Randy <Randy@randysmac.local>
2021-07-08 11:02:36 -07:00
pengwa
6dbfb8db0e
autograd function fallback perf (#8312)
* fix known issues

* Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2021-07-09 00:29:40 +08:00
Edward Chen
c254c3c355
Fix issue with ONNX to ORT format model conversion script when given single model file as input. (#8323) 2021-07-07 14:08:47 -07:00
baijumeswani
6652d17dcd
Support lists as inputs to ORTModule (#8311) 2021-07-07 13:04:19 -07:00
Thiago Crepaldi
9a855fe9e7
Make Torch CPP extension build optional for packaging pipelines (#8305) 2021-07-07 07:24:58 -07:00
Tang, Cheng
d7c3703371
handle unsqueeze change in opset13 (#8308)
Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-06 22:30:24 -07:00
pengwa
2347a0aca8
Autograd Function Fallback bug fix - moe support (#8105)
* Support forward inputs orders like "Non_tensor/Tensor/Non_tensor". Correspondingly, support "None/Tensor_Grad/None" fpr backward outputs.

* Report RuntimeError when PythonOp detected but _enable_custom_autograd_function is enabled.

* Fix "PoliCheck ] - Defect : Term "hang", Component : orttraining\orttraining\python\training\ortmodule\__init__.py (1 issue)"

* rename call_convention->input_convention, input_tensor_requires_grads->input_requires_grads

* fix minor comment

* revert polycheck fix in case of conflict

* Update orttraining/orttraining/core/graph/training_op_defs.cc

Co-authored-by: Tim Harris <tiharr@microsoft.com>

* Apply suggestions from code review

Refine the schema description

Co-authored-by: Tim Harris <tiharr@microsoft.com>

* Resolve review comments

Co-authored-by: Tim Harris <tiharr@microsoft.com>
2021-07-07 08:58:01 +08:00
Nick Kreeger
40e5279f8f
Drop unused functions from math.h (#8304)
* Drop unused functions from math.h

* fix dnnl_conv.h
2021-07-06 19:18:18 -05:00
Nick Kreeger
62d1458ea8
Move kernel implementations outside of lookup table utility functions. (#8306) 2021-07-06 18:31:05 -05:00
baijumeswani
090bae21ab
Pinning pillow version to 8.2.0 to circumvent regression introduced by 8.3.0 (#8303) 2021-07-06 13:02:39 -07:00
Suffian Khan
008c5f7640
Use single builder image across Python versions for ROCm wheels (#8302)
* first attempt share docker image across python and torch versons

* set dependency between jobs

* fix yaml grammer

* remove python version from first stage

* clean deepspeed directroy

* split into two images according torch version

* fix yaml syntax

* invalidate cache

* remove DS to prevent torch 1.9.0 upgrade
2021-07-06 11:56:00 -07:00
RandySheriffH
56e4dd1d3e
Fix optimizer crash (#8274) 2021-07-02 17:19:15 -07:00
Suffian Khan
e71846b029
fix ld_preload for rocm (#8290) 2021-07-02 17:15:28 -07:00
Suffian Khan
036eee5b66
register softmaxinternal with rocm (#8289) 2021-07-02 16:29:18 -07:00
Pranav Sharma
969eb545d1
Update issue template to ask users to check known issues to avoid repetition. (#8288) 2021-07-02 15:36:14 -07:00
Tiago Koji Castro Shibata
0fa9ac3648
Remove path from telemetry strings (#8281) 2021-07-02 10:49:59 -07:00
Nick Kreeger
552806f3be
Fix lamda function formatting in layer_norm.cc (#8276) 2021-07-02 12:30:16 -05:00
baijumeswani
2bda2a62fd
Pin version of Pillow to 8.2.0 to circumvent noncompatibility with numpy (#8278) 2021-07-02 09:05:49 -07:00
Vincent Wang
88ec95ea96
Support OrtMemTypeCPUInput for ATenOp/ATenOpGrad (#8116) 2021-07-02 23:04:43 +08:00
Edward Chen
b42e7d2c78
Add iOS packaging pipeline (#8264)
Create a pipeline to produce the iOS package artifacts.
2021-07-02 06:21:59 -07:00
Tang, Cheng
a9a2394fa5
disable computation reduction optimization for non-gpu build (#8251)
* disable computation reduction optimization for non-gpu build

* fix comments in pr

* add cpu execution provider

* apply the core provider list to computation reduction optimizer

* try macro

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-01 16:43:51 -07:00
Vincent Wang
9cfe642b34
enable BN training in cpu inference build (#8269) 2021-07-01 13:15:59 -07:00
Tang, Cheng
996a98b3ac
fix the shared provider test for training build; expose more symbols to non cuda build (#8249)
* expose more symbols for non cuda build

* fix the test execution provider for training build

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-07-01 11:03:02 -07:00
Zuwei Zhao
b46310b349
Integrate onnxruntime-extensions into onnxruntime. (#8143)
Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-07-01 09:34:03 -07:00
baijumeswani
f616cd07b4
Provide torch module interface for ORTModule (#8148)
* Interface for the module manager and implementation of the torch module manager
2021-07-01 09:15:16 -07:00
Vincent Wang
ce9d134952
gather elements optimization (#8154) 2021-07-01 14:30:00 +08:00
Vincent Wang
ef8f50c4ab
ScatterNDGrad (#8261) 2021-07-01 13:49:49 +08:00
Thiago Crepaldi
97f1eea2ea
Propagate ROCM version to onnxruntime wheel package (#8247) 2021-06-30 13:52:22 -07:00
Edward Chen
665ecdf9ce
[CoreML EP] Use partitioning utils in CoreMLExecutionProvider::GetCapability(). (#8179)
Use partitioning utils in CoreMLExecutionProvider::GetCapability().
2021-06-30 09:57:36 -07:00
Scott McKay
4993680e56
Graph::GetNodeProvidesGraphOutput -> NodeProducesGraphOutput (#8243)
'GetNode' is a little confusing as it returns a bool.

Update a couple more places where GetNodeOutputsInGraphOutputs was being used unnecessarily.
2021-06-30 20:43:33 +10:00
Scott McKay
b3479367cf
Add helper to check if node provides a graph output. (#8186)
* Add helper to check if node provides a graph output. The current approach unnecessarily creates a vector when most of the optimizers only care about a true/false response.

* Undo accidental change

* Fix a couple of issues due to copying from larger set of changes.
2021-06-30 12:15:42 +10:00
Scott McKay
17d4545ccb
Improve readability of Graph::PerformTopologicalSortAndCheckIsAcyclic. (#8187) 2021-06-30 12:15:17 +10:00
Guoyu Wang
9b19241b27
Disable update database for Android code coverage (#8182) 2021-06-29 18:50:16 -07:00
Ankur Verma
fa8768723a
Allow custom loaders for testing (#8150) 2021-06-29 16:54:36 -07:00
Nick Kreeger
507d97b200
Add initializer for embed layer norm unit tests. (#8196) 2021-06-29 17:57:06 -05:00