Commit graph

30 commits

Author SHA1 Message Date
KeDengMS
4b900dc585 Simplify cache implementation and avoid static variables that may carry over between models 2019-12-20 21:04:17 -08:00
Changming Sun
da03ed4473 Tiny fix to codegen 2019-12-20 21:04:17 -08:00
KeDengMS
9017e93701 [NupharEP] fix for Windows build and VS 2019 (#2694) 2019-12-18 16:16:46 -08:00
Yang Chen
2ca9733cee
Dump subgraph ID and fused graph ID (#2607)
* Dump subgraph ID and fused graph ID

Dump subgraph ID and fused graph ID for better debugging

* Remove local static fused_count

added a field global_fused_count_ to NupharExecutionProvider class
2019-12-10 19:56:39 -08:00
KeDengMS
60208463a9
[NupharEP] Enable parallel schedule (#2505)
* [NupharEP] Enable parallel schedule
* Update TVM with the fix to TVM threadpool to use OpenMP if possible
* Add parallel schedule when trying to vectorize
With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms

* Address CR, docs and cmake update

* Doc fix

* Fix mkl

* Fix TVM windows build when using mklml
2019-11-28 08:35:56 -08:00
Yang Chen
d486481455
Correctly handle implicit inputs for fused nodes (#2390)
* Correctly handle implicit inputs for fused nodes

Previously, nuphar's partitioning function didn't include
node's implicit inputs into the inputs list of MetaDef, and hence
a crash was triggered in the onnx graph checker.

This commit fixed the issue. Furthermore, it also fixed a related
issue where we didn't add implicit inputs into
graph_inputs_excluding_initializers_ in Graph::SetGraphInputsOutputs.

the issue was that graph_inputs_including_initializers_ populated by
SetInputs (e.g. called by FunctionImpl::FunctionImpl) may contain
implicit inputs which were not of any node's initializers in the graph.
Because they were not part of any initializers, these implicit inputs
couldn't be visited by going through all nodes' inputs.
Consequently, they would *not* be added into graph_inputs_excluding_initializers_.

We fixed the issue by first copying the populated graph_inputs_including_initializers_
into graph_inputs_excluding_initalizers_, which then had both initializers and
non-initializers as its initial content. Later, we erase initializers from the
list. In this way, we can ensure all implicit inputs to remain in
graph_inputs_excluding_initializers_.

* refined comments and fixed duplicates

Address CR by revisiting comments in terms of implicit inputs

Also fixed an issue by skipping duplicates while copying inputs
from graph_inputs_including_initializers_.

* address CR

explain why we need to collect nodes' implicit inputs

* don't rely on pointer values for iterating std::set

Previously, openvino relied on iterating a set of NodeArg pointers
to construct inputs and outputs for a fused graph. It could cause
non-determinism. The reason was that although iterating std::set by
itself is stable, pointer values of NodeArgs may vary. Consequently,
we could end up visiting the set's elements in different orders for
different runs for the same test, which resulted in constructing
inputs (and outputs) with different orders to the fused graph.
For example, for the same test, we may have inputs [A, B] in some
runs but inputs[B, A] in others.

Let's use std::string as the key type to avoid such nondeterminism.

This commit also added implicit inputs into meta->inputs while returning
the capability from the openvino provider.

* Fixed another latent issue in openvino's GetCapability function

The issue was that we couldn't simply erase fused_inputs and fused_outputs
while iterating the nodes. For example, an output NodeArg may have multiple
uses, and it's wrong if we erase it from fused_outputs when we encounter only
one of its uses as input.
2019-11-21 10:27:09 -08:00
baowenlei
9b7b5e2c27
Adjust codegen vectorization width from target (#2439)
* Adjust codegen vectorization width from target
2019-11-20 13:28:15 -08:00
baowenlei
5ab7041fa7 fix cross compile bug (#2415) 2019-11-16 01:32:57 -08:00
KeDengMS
b15e43a541
[NupharEP] Multiple optimizations (#2380)
Fuse transpose into MatMul
Implement Pow and constant scalar simplification
Vectorize ReduceMean
Improve symbolic shape inference
Minor updates for better debugging in fused function name
2019-11-14 10:40:33 -08:00
Dmitri Smirnov
25b3c51661
Introduce PrimitiveType into a Type System along with an integer constant (#2307)
Improve perf by avoiding GetType<T>() calls. Introduce MLTypeCallDispatcher to switch on Input Type. Add Tensor IsType<T>() fast method.
2019-11-08 17:47:06 -08:00
baowenlei
0f1e24f4a9 [NupharEP] tensorize int8 GEMM for avx (#2142)
* finish avx tensorization and save state

* split tests for better debug

* add missing avx option

* update configure for AVX

* update tensorize avx support

* Merged PR 5327: Fix llvm cross compilation

Fix llvm cross compilation

Related work items: #4080
2019-11-06 14:35:13 -08:00
KeDengMS
e18c9582a8
[NupharEP] performance improvements (#2283)
* [Nuphar EP] performance improvements
1. Add new ops: Shape, Expand
2. Add support for steps in Slice
3. Simplify Gather
4. Always inline alias nodes
5. Transpose nodes with inner loop being symbolic falls back to CPU provider when vectorization is not possible
6. Add opt_inproj option to model_editor to extract MatMuls inside Scan for input projection to outside
2019-10-30 10:15:04 -07:00
KeDengMS
b101f1bcee
Nuphar: Fix a bug in weight layout where read may go out of bound (#2129) 2019-10-15 00:11:41 -07:00
Yang Chen
7d2f0c79bd Bumped up to op_ver 11 for a bunch of Nuphar Ops (#2025)
This change enabled op_ver 11 for a dozen of Nuphar Ops
2019-10-07 10:34:05 -07:00
baowenlei
4bb6385dca
Weba/merge ngemm (#2021)
* save status: add tiling layout; add avx512 skylake cpuid info

* unit tests and matmul integer model passed on skylake, need to verify model

* save commit before update master

* fix check

* address comments
2019-10-05 12:09:22 -07:00
Yang Chen
e8285a7996
Added GatherElements to Nuphar (#2016)
* Added GatherElements to Nuphar

This change added GatherElements (op_ver 11) to the Nuphar provider.

* address CR feedback

* create a utilify function for accessing index safely

* address more CR

* SafeIndex -> ClampIndex
2019-10-04 23:53:02 -07:00
Yang Chen
15138908e7
Yanchen/nuphar/scatter elems (#1992)
* Added Scatter and ScatterElements to Nuphar

Implemented Scatter (op_ver 9 - 10) and ScatterElements (op_ver 11)
nuphar.

Because TVM's compute is output-oriented, our current implementation
uses extern calls for simplicity.

* fixed build issue after rebase

* remove dead code

* Address CR

* removed dead code

* use GetAttrOrDefault

* Address more CR feedback

* add GetStrides to codegen/common/utils.h

* added a unit test for Bool input data
2019-10-03 14:58:10 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
Yang Chen
650fb8754b
use MLAS for nuphar's pool ops (#1937)
* call MLAS's pooling function as an external call for Nuphar

  Note that at the moment Nuphar provider doesn't handle the cases below:

  - symbolic height/weight dimensions
  - Indices output of MaxPool
  - non-default dilations

* unify the pool interface for mti and mti_x86
2019-09-26 16:29:30 -07:00
Changming Sun
d669fc78c3
Revert "use MLAS for nuphar's pool ops (#1914)" (#1933)
This reverts commit 8c809dcc99.
2019-09-26 07:52:24 -07:00
Yang Chen
8c809dcc99
use MLAS for nuphar's pool ops (#1914)
* call MLAS's pooling function as an external call for Nuphar

  Note that at the moment Nuphar provider doesn't handle the cases below:

  - symbolic height/weight dimensions
  - Indices output of MaxPool
  - non-default dilations

* unify the pool interface for mti and mti_x86
2019-09-25 16:19:18 -07:00
Dmitri Smirnov
75f241d02c
Enhance compatibility with proto3 and replace or abstract has_*() methods. (#1778)
Enhance proto3 compatibility.
  Replace has_*() method to corresponding enum handling so we can deal with
  proto3 generated stream from proto2 code.
  Add utility wrappers for remaining has_*() methods so we can
  easily deal with them if/when we switch to proto3.
2019-09-09 14:07:30 -07:00
KeDengMS
c9240f4e93
Implementation of Nuphar execution provider (#881)
* Implement Nuphar execution provider

Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.

* Fix submodules

* Fix TVM submodule

* Update Nuphar to latest and resolve confliction

* Remove stale files caused by merge -X theirs

* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test

* Fix bad merge

* Merge from Nuphar

* Fix warning treated as error, revert some unnecessary changes

* Revert some more test changes

* Some more test revert or comments to make review easier
New tests could be added later

* One more revert of unnecessary changes

* More change revert. Test could be added back later.
2019-09-01 23:01:47 -07:00
KeDengMS
0d204f3f06
Implementation of TVM codegen library (#888)
Description:

This change adds the common part of TVM based codegen library. It includes following parts:
* Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI
* Compiler pass for traversing ONNX graph and generate TVM ops
* Compiler pass for traversing generated graph and specify TVM schedule
* Compiler pass for handling weight layout
* Utils for debugging

Motivation and Context:

TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models.

This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.
2019-07-03 10:32:59 -07:00
Changming Sun
e42099480e
Clean up code (#1033)
Some trivial changes for making gcc compiler happy. Won't impact runtime behavior.
2019-05-15 10:00:39 -07:00
Tang, Cheng
85ec13f58d
fix tvm break (#282) 2019-01-07 10:55:24 -08:00
Ryan Hill
11b369a864
Abbreviate ONNXRuntime as Ort in all of our public APIs (#175)
Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.
2018-12-14 14:54:23 -08:00
Ke Zhang
a78acb2d2c
rename graph.h to graph_viewer.h (#84) 2018-12-04 08:41:03 -08:00
Pranav Sharma
7aef8a1cca Sync with internal master. 2018-11-22 20:56:43 -08:00
Pranav Sharma
89618e8f1e Initial bootstrap commit. 2018-11-19 16:48:22 -08:00