Commit graph

15 commits

Author SHA1 Message Date
Ryan Hill
ac725b53f6
Convert TensorRT provider into a shared library (#4721)
Lots of changes to shared library interfaces, new lighter weight design.
2020-08-10 21:17:16 -07:00
Scott McKay
a1db87b382
Add SafeInt bounds checking to memory allocation size calculations. (#3022)
* Add SafeInt bounds checking to memory allocation size calculations.

* Fix TensorRT library includes
2020-02-20 11:41:03 -08:00
KeDengMS
9017e93701 [NupharEP] fix for Windows build and VS 2019 (#2694) 2019-12-18 16:16:46 -08:00
Yang Chen
2ca9733cee
Dump subgraph ID and fused graph ID (#2607)
* Dump subgraph ID and fused graph ID

Dump subgraph ID and fused graph ID for better debugging

* Remove local static fused_count

added a field global_fused_count_ to NupharExecutionProvider class
2019-12-10 19:56:39 -08:00
Yang Chen
d486481455
Correctly handle implicit inputs for fused nodes (#2390)
* Correctly handle implicit inputs for fused nodes

Previously, nuphar's partitioning function didn't include
node's implicit inputs into the inputs list of MetaDef, and hence
a crash was triggered in the onnx graph checker.

This commit fixed the issue. Furthermore, it also fixed a related
issue where we didn't add implicit inputs into
graph_inputs_excluding_initializers_ in Graph::SetGraphInputsOutputs.

the issue was that graph_inputs_including_initializers_ populated by
SetInputs (e.g. called by FunctionImpl::FunctionImpl) may contain
implicit inputs which were not of any node's initializers in the graph.
Because they were not part of any initializers, these implicit inputs
couldn't be visited by going through all nodes' inputs.
Consequently, they would *not* be added into graph_inputs_excluding_initializers_.

We fixed the issue by first copying the populated graph_inputs_including_initializers_
into graph_inputs_excluding_initalizers_, which then had both initializers and
non-initializers as its initial content. Later, we erase initializers from the
list. In this way, we can ensure all implicit inputs to remain in
graph_inputs_excluding_initializers_.

* refined comments and fixed duplicates

Address CR by revisiting comments in terms of implicit inputs

Also fixed an issue by skipping duplicates while copying inputs
from graph_inputs_including_initializers_.

* address CR

explain why we need to collect nodes' implicit inputs

* don't rely on pointer values for iterating std::set

Previously, openvino relied on iterating a set of NodeArg pointers
to construct inputs and outputs for a fused graph. It could cause
non-determinism. The reason was that although iterating std::set by
itself is stable, pointer values of NodeArgs may vary. Consequently,
we could end up visiting the set's elements in different orders for
different runs for the same test, which resulted in constructing
inputs (and outputs) with different orders to the fused graph.
For example, for the same test, we may have inputs [A, B] in some
runs but inputs[B, A] in others.

Let's use std::string as the key type to avoid such nondeterminism.

This commit also added implicit inputs into meta->inputs while returning
the capability from the openvino provider.

* Fixed another latent issue in openvino's GetCapability function

The issue was that we couldn't simply erase fused_inputs and fused_outputs
while iterating the nodes. For example, an output NodeArg may have multiple
uses, and it's wrong if we erase it from fused_outputs when we encounter only
one of its uses as input.
2019-11-21 10:27:09 -08:00
baowenlei
9b7b5e2c27
Adjust codegen vectorization width from target (#2439)
* Adjust codegen vectorization width from target
2019-11-20 13:28:15 -08:00
baowenlei
5ab7041fa7 fix cross compile bug (#2415) 2019-11-16 01:32:57 -08:00
baowenlei
0f1e24f4a9 [NupharEP] tensorize int8 GEMM for avx (#2142)
* finish avx tensorization and save state

* split tests for better debug

* add missing avx option

* update configure for AVX

* update tensorize avx support

* Merged PR 5327: Fix llvm cross compilation

Fix llvm cross compilation

Related work items: #4080
2019-11-06 14:35:13 -08:00
KeDengMS
e18c9582a8
[NupharEP] performance improvements (#2283)
* [Nuphar EP] performance improvements
1. Add new ops: Shape, Expand
2. Add support for steps in Slice
3. Simplify Gather
4. Always inline alias nodes
5. Transpose nodes with inner loop being symbolic falls back to CPU provider when vectorization is not possible
6. Add opt_inproj option to model_editor to extract MatMuls inside Scan for input projection to outside
2019-10-30 10:15:04 -07:00
Yang Chen
e8285a7996
Added GatherElements to Nuphar (#2016)
* Added GatherElements to Nuphar

This change added GatherElements (op_ver 11) to the Nuphar provider.

* address CR feedback

* create a utilify function for accessing index safely

* address more CR

* SafeIndex -> ClampIndex
2019-10-04 23:53:02 -07:00
Yang Chen
15138908e7
Yanchen/nuphar/scatter elems (#1992)
* Added Scatter and ScatterElements to Nuphar

Implemented Scatter (op_ver 9 - 10) and ScatterElements (op_ver 11)
nuphar.

Because TVM's compute is output-oriented, our current implementation
uses extern calls for simplicity.

* fixed build issue after rebase

* remove dead code

* Address CR

* removed dead code

* use GetAttrOrDefault

* Address more CR feedback

* add GetStrides to codegen/common/utils.h

* added a unit test for Bool input data
2019-10-03 14:58:10 -07:00
Dmitri Smirnov
d1b1cdc5c4
Replace GSL with GSL-LITE submodule and fix up refs (#1920)
Remove gsl subodule and replace with a local copy of gsl-lite
  Refactor for onnxruntime::make_unique
  gsl::span size and index are now size_t
  Remove lambda auto argument type detection.
  Remove constexpr from fail_fast in gsl due to Linux not being happy.
  Comment out std::stream support due to MacOS std lib broken.
  Move make_unique into include/core/common so it is accessible for server builds.
  Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
  due to x86 build.
  Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
2019-10-01 12:43:29 -07:00
Dmitri Smirnov
75f241d02c
Enhance compatibility with proto3 and replace or abstract has_*() methods. (#1778)
Enhance proto3 compatibility.
  Replace has_*() method to corresponding enum handling so we can deal with
  proto3 generated stream from proto2 code.
  Add utility wrappers for remaining has_*() methods so we can
  easily deal with them if/when we switch to proto3.
2019-09-09 14:07:30 -07:00
KeDengMS
c9240f4e93
Implementation of Nuphar execution provider (#881)
* Implement Nuphar execution provider

Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.

* Fix submodules

* Fix TVM submodule

* Update Nuphar to latest and resolve confliction

* Remove stale files caused by merge -X theirs

* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test

* Fix bad merge

* Merge from Nuphar

* Fix warning treated as error, revert some unnecessary changes

* Revert some more test changes

* Some more test revert or comments to make review easier
New tests could be added later

* One more revert of unnecessary changes

* More change revert. Test could be added back later.
2019-09-01 23:01:47 -07:00
KeDengMS
0d204f3f06
Implementation of TVM codegen library (#888)
Description:

This change adds the common part of TVM based codegen library. It includes following parts:
* Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI
* Compiler pass for traversing ONNX graph and generate TVM ops
* Compiler pass for traversing generated graph and specify TVM schedule
* Compiler pass for handling weight layout
* Utils for debugging

Motivation and Context:

TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models.

This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.
2019-07-03 10:32:59 -07:00