onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
KeDengMS	4b900dc585	Simplify cache implementation and avoid static variables that may carry over between models	2019-12-20 21:04:17 -08:00
Changming Sun	da03ed4473	Tiny fix to codegen	2019-12-20 21:04:17 -08:00
KeDengMS	9017e93701	[NupharEP] fix for Windows build and VS 2019 (#2694 )	2019-12-18 16:16:46 -08:00
Yang Chen	2ca9733cee	Dump subgraph ID and fused graph ID (#2607 ) * Dump subgraph ID and fused graph ID Dump subgraph ID and fused graph ID for better debugging * Remove local static fused_count added a field global_fused_count_ to NupharExecutionProvider class	2019-12-10 19:56:39 -08:00
KeDengMS	60208463a9	[NupharEP] Enable parallel schedule (#2505 ) * [NupharEP] Enable parallel schedule * Update TVM with the fix to TVM threadpool to use OpenMP if possible * Add parallel schedule when trying to vectorize With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms * Address CR, docs and cmake update * Doc fix * Fix mkl * Fix TVM windows build when using mklml	2019-11-28 08:35:56 -08:00
Yang Chen	d486481455	Correctly handle implicit inputs for fused nodes (#2390 ) * Correctly handle implicit inputs for fused nodes Previously, nuphar's partitioning function didn't include node's implicit inputs into the inputs list of MetaDef, and hence a crash was triggered in the onnx graph checker. This commit fixed the issue. Furthermore, it also fixed a related issue where we didn't add implicit inputs into graph_inputs_excluding_initializers_ in Graph::SetGraphInputsOutputs. the issue was that graph_inputs_including_initializers_ populated by SetInputs (e.g. called by FunctionImpl::FunctionImpl) may contain implicit inputs which were not of any node's initializers in the graph. Because they were not part of any initializers, these implicit inputs couldn't be visited by going through all nodes' inputs. Consequently, they would not be added into graph_inputs_excluding_initializers_. We fixed the issue by first copying the populated graph_inputs_including_initializers_ into graph_inputs_excluding_initalizers_, which then had both initializers and non-initializers as its initial content. Later, we erase initializers from the list. In this way, we can ensure all implicit inputs to remain in graph_inputs_excluding_initializers_. * refined comments and fixed duplicates Address CR by revisiting comments in terms of implicit inputs Also fixed an issue by skipping duplicates while copying inputs from graph_inputs_including_initializers_. * address CR explain why we need to collect nodes' implicit inputs * don't rely on pointer values for iterating std::set Previously, openvino relied on iterating a set of NodeArg pointers to construct inputs and outputs for a fused graph. It could cause non-determinism. The reason was that although iterating std::set by itself is stable, pointer values of NodeArgs may vary. Consequently, we could end up visiting the set's elements in different orders for different runs for the same test, which resulted in constructing inputs (and outputs) with different orders to the fused graph. For example, for the same test, we may have inputs [A, B] in some runs but inputs[B, A] in others. Let's use std::string as the key type to avoid such nondeterminism. This commit also added implicit inputs into meta->inputs while returning the capability from the openvino provider. * Fixed another latent issue in openvino's GetCapability function The issue was that we couldn't simply erase fused_inputs and fused_outputs while iterating the nodes. For example, an output NodeArg may have multiple uses, and it's wrong if we erase it from fused_outputs when we encounter only one of its uses as input.	2019-11-21 10:27:09 -08:00
baowenlei	9b7b5e2c27	Adjust codegen vectorization width from target (#2439 ) * Adjust codegen vectorization width from target	2019-11-20 13:28:15 -08:00
baowenlei	5ab7041fa7	fix cross compile bug (#2415 )	2019-11-16 01:32:57 -08:00
KeDengMS	b15e43a541	[NupharEP] Multiple optimizations (#2380 ) Fuse transpose into MatMul Implement Pow and constant scalar simplification Vectorize ReduceMean Improve symbolic shape inference Minor updates for better debugging in fused function name	2019-11-14 10:40:33 -08:00
Dmitri Smirnov	25b3c51661	Introduce PrimitiveType into a Type System along with an integer constant (#2307 ) Improve perf by avoiding GetType<T>() calls. Introduce MLTypeCallDispatcher to switch on Input Type. Add Tensor IsType<T>() fast method.	2019-11-08 17:47:06 -08:00
baowenlei	0f1e24f4a9	[NupharEP] tensorize int8 GEMM for avx (#2142 ) * finish avx tensorization and save state * split tests for better debug * add missing avx option * update configure for AVX * update tensorize avx support * Merged PR 5327: Fix llvm cross compilation Fix llvm cross compilation Related work items: #4080	2019-11-06 14:35:13 -08:00
KeDengMS	e18c9582a8	[NupharEP] performance improvements (#2283 ) * [Nuphar EP] performance improvements 1. Add new ops: Shape, Expand 2. Add support for steps in Slice 3. Simplify Gather 4. Always inline alias nodes 5. Transpose nodes with inner loop being symbolic falls back to CPU provider when vectorization is not possible 6. Add opt_inproj option to model_editor to extract MatMuls inside Scan for input projection to outside	2019-10-30 10:15:04 -07:00
KeDengMS	b101f1bcee	Nuphar: Fix a bug in weight layout where read may go out of bound (#2129 )	2019-10-15 00:11:41 -07:00
Yang Chen	7d2f0c79bd	Bumped up to op_ver 11 for a bunch of Nuphar Ops (#2025 ) This change enabled op_ver 11 for a dozen of Nuphar Ops	2019-10-07 10:34:05 -07:00
baowenlei	4bb6385dca	Weba/merge ngemm (#2021 ) * save status: add tiling layout; add avx512 skylake cpuid info * unit tests and matmul integer model passed on skylake, need to verify model * save commit before update master * fix check * address comments	2019-10-05 12:09:22 -07:00
Yang Chen	e8285a7996	Added GatherElements to Nuphar (#2016 ) * Added GatherElements to Nuphar This change added GatherElements (op_ver 11) to the Nuphar provider. * address CR feedback * create a utilify function for accessing index safely * address more CR * SafeIndex -> ClampIndex	2019-10-04 23:53:02 -07:00
Yang Chen	15138908e7	Yanchen/nuphar/scatter elems (#1992 ) * Added Scatter and ScatterElements to Nuphar Implemented Scatter (op_ver 9 - 10) and ScatterElements (op_ver 11) nuphar. Because TVM's compute is output-oriented, our current implementation uses extern calls for simplicity. * fixed build issue after rebase * remove dead code * Address CR * removed dead code * use GetAttrOrDefault * Address more CR feedback * add GetStrides to codegen/common/utils.h * added a unit test for Bool input data	2019-10-03 14:58:10 -07:00
Dmitri Smirnov	d1b1cdc5c4	Replace GSL with GSL-LITE submodule and fix up refs (#1920 ) Remove gsl subodule and replace with a local copy of gsl-lite Refactor for onnxruntime::make_unique gsl::span size and index are now size_t Remove lambda auto argument type detection. Remove constexpr from fail_fast in gsl due to Linux not being happy. Comment out std::stream support due to MacOS std lib broken. Move make_unique into include/core/common so it is accessible for server builds. Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc due to x86 build. Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized	2019-10-01 12:43:29 -07:00
Yang Chen	650fb8754b	use MLAS for nuphar's pool ops (#1937 ) * call MLAS's pooling function as an external call for Nuphar Note that at the moment Nuphar provider doesn't handle the cases below: - symbolic height/weight dimensions - Indices output of MaxPool - non-default dilations * unify the pool interface for mti and mti_x86	2019-09-26 16:29:30 -07:00
Changming Sun	d669fc78c3	Revert "use MLAS for nuphar's pool ops (#1914 )" (#1933 ) This reverts commit `8c809dcc99`.	2019-09-26 07:52:24 -07:00
Yang Chen	8c809dcc99	use MLAS for nuphar's pool ops (#1914 ) * call MLAS's pooling function as an external call for Nuphar Note that at the moment Nuphar provider doesn't handle the cases below: - symbolic height/weight dimensions - Indices output of MaxPool - non-default dilations * unify the pool interface for mti and mti_x86	2019-09-25 16:19:18 -07:00
Dmitri Smirnov	75f241d02c	Enhance compatibility with proto3 and replace or abstract has_() methods. (#1778 ) Enhance proto3 compatibility. Replace has_() method to corresponding enum handling so we can deal with proto3 generated stream from proto2 code. Add utility wrappers for remaining has_*() methods so we can easily deal with them if/when we switch to proto3.	2019-09-09 14:07:30 -07:00
KeDengMS	c9240f4e93	Implementation of Nuphar execution provider (#881 ) * Implement Nuphar execution provider Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan. This PR is mainly for a preview of the shared codegen library for other TVM-based providers. * Fix submodules * Fix TVM submodule * Update Nuphar to latest and resolve confliction * Remove stale files caused by merge -X theirs * Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test * Fix bad merge * Merge from Nuphar * Fix warning treated as error, revert some unnecessary changes * Revert some more test changes * Some more test revert or comments to make review easier New tests could be added later * One more revert of unnecessary changes * More change revert. Test could be added back later.	2019-09-01 23:01:47 -07:00
KeDengMS	0d204f3f06	Implementation of TVM codegen library (#888 ) Description: This change adds the common part of TVM based codegen library. It includes following parts: * Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI * Compiler pass for traversing ONNX graph and generate TVM ops * Compiler pass for traversing generated graph and specify TVM schedule * Compiler pass for handling weight layout * Utils for debugging Motivation and Context: TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models. This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.	2019-07-03 10:32:59 -07:00
Changming Sun	e42099480e	Clean up code (#1033 ) Some trivial changes for making gcc compiler happy. Won't impact runtime behavior.	2019-05-15 10:00:39 -07:00
Tang, Cheng	85ec13f58d	fix tvm break (#282 )	2019-01-07 10:55:24 -08:00
Ryan Hill	11b369a864	Abbreviate ONNXRuntime as Ort in all of our public APIs (#175 ) Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.	2018-12-14 14:54:23 -08:00
Ke Zhang	a78acb2d2c	rename graph.h to graph_viewer.h (#84 )	2018-12-04 08:41:03 -08:00
Pranav Sharma	7aef8a1cca	Sync with internal master.	2018-11-22 20:56:43 -08:00
Pranav Sharma	89618e8f1e	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00

30 commits