onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-24 22:17:32 +00:00

Author	SHA1	Message	Date
Ryan Hill	ac725b53f6	Convert TensorRT provider into a shared library (#4721 ) Lots of changes to shared library interfaces, new lighter weight design.	2020-08-10 21:17:16 -07:00
Scott McKay	a1db87b382	Add SafeInt bounds checking to memory allocation size calculations. (#3022 ) * Add SafeInt bounds checking to memory allocation size calculations. * Fix TensorRT library includes	2020-02-20 11:41:03 -08:00
KeDengMS	9017e93701	[NupharEP] fix for Windows build and VS 2019 (#2694 )	2019-12-18 16:16:46 -08:00
Yang Chen	2ca9733cee	Dump subgraph ID and fused graph ID (#2607 ) * Dump subgraph ID and fused graph ID Dump subgraph ID and fused graph ID for better debugging * Remove local static fused_count added a field global_fused_count_ to NupharExecutionProvider class	2019-12-10 19:56:39 -08:00
Yang Chen	d486481455	Correctly handle implicit inputs for fused nodes (#2390 ) * Correctly handle implicit inputs for fused nodes Previously, nuphar's partitioning function didn't include node's implicit inputs into the inputs list of MetaDef, and hence a crash was triggered in the onnx graph checker. This commit fixed the issue. Furthermore, it also fixed a related issue where we didn't add implicit inputs into graph_inputs_excluding_initializers_ in Graph::SetGraphInputsOutputs. the issue was that graph_inputs_including_initializers_ populated by SetInputs (e.g. called by FunctionImpl::FunctionImpl) may contain implicit inputs which were not of any node's initializers in the graph. Because they were not part of any initializers, these implicit inputs couldn't be visited by going through all nodes' inputs. Consequently, they would not be added into graph_inputs_excluding_initializers_. We fixed the issue by first copying the populated graph_inputs_including_initializers_ into graph_inputs_excluding_initalizers_, which then had both initializers and non-initializers as its initial content. Later, we erase initializers from the list. In this way, we can ensure all implicit inputs to remain in graph_inputs_excluding_initializers_. * refined comments and fixed duplicates Address CR by revisiting comments in terms of implicit inputs Also fixed an issue by skipping duplicates while copying inputs from graph_inputs_including_initializers_. * address CR explain why we need to collect nodes' implicit inputs * don't rely on pointer values for iterating std::set Previously, openvino relied on iterating a set of NodeArg pointers to construct inputs and outputs for a fused graph. It could cause non-determinism. The reason was that although iterating std::set by itself is stable, pointer values of NodeArgs may vary. Consequently, we could end up visiting the set's elements in different orders for different runs for the same test, which resulted in constructing inputs (and outputs) with different orders to the fused graph. For example, for the same test, we may have inputs [A, B] in some runs but inputs[B, A] in others. Let's use std::string as the key type to avoid such nondeterminism. This commit also added implicit inputs into meta->inputs while returning the capability from the openvino provider. * Fixed another latent issue in openvino's GetCapability function The issue was that we couldn't simply erase fused_inputs and fused_outputs while iterating the nodes. For example, an output NodeArg may have multiple uses, and it's wrong if we erase it from fused_outputs when we encounter only one of its uses as input.	2019-11-21 10:27:09 -08:00
baowenlei	9b7b5e2c27	Adjust codegen vectorization width from target (#2439 ) * Adjust codegen vectorization width from target	2019-11-20 13:28:15 -08:00
baowenlei	5ab7041fa7	fix cross compile bug (#2415 )	2019-11-16 01:32:57 -08:00
baowenlei	0f1e24f4a9	[NupharEP] tensorize int8 GEMM for avx (#2142 ) * finish avx tensorization and save state * split tests for better debug * add missing avx option * update configure for AVX * update tensorize avx support * Merged PR 5327: Fix llvm cross compilation Fix llvm cross compilation Related work items: #4080	2019-11-06 14:35:13 -08:00
KeDengMS	e18c9582a8	[NupharEP] performance improvements (#2283 ) * [Nuphar EP] performance improvements 1. Add new ops: Shape, Expand 2. Add support for steps in Slice 3. Simplify Gather 4. Always inline alias nodes 5. Transpose nodes with inner loop being symbolic falls back to CPU provider when vectorization is not possible 6. Add opt_inproj option to model_editor to extract MatMuls inside Scan for input projection to outside	2019-10-30 10:15:04 -07:00
Yang Chen	e8285a7996	Added GatherElements to Nuphar (#2016 ) * Added GatherElements to Nuphar This change added GatherElements (op_ver 11) to the Nuphar provider. * address CR feedback * create a utilify function for accessing index safely * address more CR * SafeIndex -> ClampIndex	2019-10-04 23:53:02 -07:00
Yang Chen	15138908e7	Yanchen/nuphar/scatter elems (#1992 ) * Added Scatter and ScatterElements to Nuphar Implemented Scatter (op_ver 9 - 10) and ScatterElements (op_ver 11) nuphar. Because TVM's compute is output-oriented, our current implementation uses extern calls for simplicity. * fixed build issue after rebase * remove dead code * Address CR * removed dead code * use GetAttrOrDefault * Address more CR feedback * add GetStrides to codegen/common/utils.h * added a unit test for Bool input data	2019-10-03 14:58:10 -07:00
Dmitri Smirnov	d1b1cdc5c4	Replace GSL with GSL-LITE submodule and fix up refs (#1920 ) Remove gsl subodule and replace with a local copy of gsl-lite Refactor for onnxruntime::make_unique gsl::span size and index are now size_t Remove lambda auto argument type detection. Remove constexpr from fail_fast in gsl due to Linux not being happy. Comment out std::stream support due to MacOS std lib broken. Move make_unique into include/core/common so it is accessible for server builds. Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc due to x86 build. Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized	2019-10-01 12:43:29 -07:00
Dmitri Smirnov	75f241d02c	Enhance compatibility with proto3 and replace or abstract has_() methods. (#1778 ) Enhance proto3 compatibility. Replace has_() method to corresponding enum handling so we can deal with proto3 generated stream from proto2 code. Add utility wrappers for remaining has_*() methods so we can easily deal with them if/when we switch to proto3.	2019-09-09 14:07:30 -07:00
KeDengMS	c9240f4e93	Implementation of Nuphar execution provider (#881 ) * Implement Nuphar execution provider Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan. This PR is mainly for a preview of the shared codegen library for other TVM-based providers. * Fix submodules * Fix TVM submodule * Update Nuphar to latest and resolve confliction * Remove stale files caused by merge -X theirs * Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test * Fix bad merge * Merge from Nuphar * Fix warning treated as error, revert some unnecessary changes * Revert some more test changes * Some more test revert or comments to make review easier New tests could be added later * One more revert of unnecessary changes * More change revert. Test could be added back later.	2019-09-01 23:01:47 -07:00
KeDengMS	0d204f3f06	Implementation of TVM codegen library (#888 ) Description: This change adds the common part of TVM based codegen library. It includes following parts: * Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI * Compiler pass for traversing ONNX graph and generate TVM ops * Compiler pass for traversing generated graph and specify TVM schedule * Compiler pass for handling weight layout * Utils for debugging Motivation and Context: TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models. This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.	2019-07-03 10:32:59 -07:00

15 commits