onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
George Wu	91f85dfdad	update Dockerfile.manylinux2014_cuda11_4_tensorrt8_2 to TensorRT 8.2.2.1 (#10167 )	2022-01-03 20:38:37 -08:00
Chi Lo	c29397ad4f	Modify the code to get correct ragne for symmetric quantization (#10170 )	2022-01-03 19:13:37 -08:00
Nat Kershaw (MSFT)	0c517112c4	Automate Python API docs generation (#10116 )	2022-01-03 18:22:22 -08:00
Yufeng Li	230f323600	add qdq support for LeakyRelu (#10077 ) * add qdq support for LeakyRelu	2022-01-03 14:48:49 -08:00
Tongliang Liao	1d3b34cc92	Add `.git` suffix to github URL. Although github works with both, this is more precise. Having an extension also makes it easy to match with regex, when we want to inject code to reroute traffic to our own git mirror.	2022-01-03 14:38:35 -08:00
Yufeng Li	7208fcbe1c	use wasmscalar as default kernel (#9988 ) * use wasmscalar as default kernel	2022-01-03 10:55:08 -08:00
Dmitri Smirnov	28ce2a5a78	Re-work hierarchy, fix virtual method overload/hiding (#10160 ) Re-work hierarchy, fix virtual method overload/hiding Use std::optional with a clear comment on the member thread-safety.	2022-01-03 10:24:49 -08:00
Abhishek Jindal	d5742f3a43	moving from torch nightly build to stable build (#10150 ) * moving from torch nightly build to stable build * using torch cpu version * using torch cpu version from link	2021-12-29 19:35:10 -08:00
Edward Chen	3bc91c2151	Move reduced ops files into build directory (#10030 ) In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.	2021-12-28 19:04:20 -08:00
Scott McKay	a367f0664d	From Python 3.8 and on you need to explicitly add the current directory for libraries to be loaded from it. Update onnxruntime_test_python.py with that handling. (#10129 )	2021-12-28 16:10:26 +10:00
George Wu	3d6786c92e	update tensorrt multi gpu pipeline to tensorrt 8.2 (#10141 )	2021-12-27 15:43:27 -08:00
Vincent Wang	ceb17f82ff	Use FusedMatMul When Transpose is Between First Dim and Contiguous Batch Dims (#9734 ) * fusedmatmul support transpose batches * fix win build * fix contrib op md * more comments	2021-12-27 10:49:46 +08:00
Vincent Wang	f780f06240	ConcatGrad for OpSet13 (#10109 )	2021-12-24 10:02:52 +08:00
stevenlix	05d20343ee	Remove duplicated constant initializer copies for TensorRT nodes (#10105 ) * add new field constant_initializers in metadef and remove constant initializers from trt node inputs * remove redundancy * use GetConstantInitializer() to get constant initializers * add ORT_ENFORCE check Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2021-12-22 12:19:56 -08:00
Sheil Kumar	ce1a9ca618	Fix Microsoft.AI.MachineLearning NuGet App failure with multiple binaries copied to same destination (#10076 ) * Include onnxruntime binary when not using pacakge referene or uap app. * Remove the lib\uap10.0 build from the nuget package - causing conflicts * Add UWP test * remove build files * remove local change * reset mimalloc and onnx-tensorrt * change username to Microsoft Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2021-12-21 12:34:03 -08:00
Ye Wang	7a1bdc2052	Don't check cache shape when using dynamic axis (#10090 ) Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>	2021-12-20 21:19:29 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
satyajandhyala	bd4fb4c5da	Coding style fix. (#10080 )	2021-12-18 12:05:48 -08:00
ashari4	cdbd678192	Check kMSDomain already exists before registering it (#10078 ) * Check domain before registration	2021-12-17 17:55:15 -08:00
Yufeng Li	12ee2e942f	add int8_t for Resize (#10067 ) As we support quantization for format s8s8, we need Resize to support int8_t.	2021-12-17 15:36:09 -08:00
Moshe David	4fd85cd97a	Fix broken link to TRT doc in exception details (#9496 ) Co-authored-by: Moshe <modav@microsoft.com>	2021-12-17 09:00:33 -08:00
Faith Xu	d42feae042	Add citation file (#10061 ) * Add citation file * Fix typos	2021-12-16 19:56:21 -08:00
Guoyu Wang	f3c72de718	[QDQ] Add shared NodeUnit class (#10052 ) * initial change * move more function to node_unit * Remove commented code * Minor update * Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * address CR comments Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2021-12-16 17:37:51 -08:00
Tianlei Wu	ef36488df0	Add BeamSearch operator for GPT-2 decoding (#9680 ) * Add BeamSearch operator and CPU implementation * Add ONNX conversion script	2021-12-16 16:08:05 -08:00
Yufeng Li	fab39b4704	Update optimization level message in perf_test tool (#9972 )	2021-12-16 13:49:18 -08:00
Bowen Bao	102f9b05e1	Support new symbolic function api from PyTorch with PythonOp (#9880 ) * Support new symbolic function api from PyTorch with PythonOp * Specify exact exception * add comments * move comments and arg	2021-12-16 11:08:06 -05:00
George Nash	93636cbd20	Reduce ops for DNNL ep (#10056 ) * Add Reduce Ops to DNNL ep Combine the Reduction ops into one class Add ReduceL1, ReduceL2, ReduceSum, ReduceMax, ReduceMin, and ReduceProd, ReduceSumSquare, ReduceLogSum, and ReduceLogSumExp Reduce code now also handles the keepdims attribute Also updated code to use HandleNegativeAxis function from the providers/common.h code instead of manually calculating. In code documentation exists to help explain complex reduction op code Add elementwise ops to Reduction op capability code removed keepdims check from the Reduction op capability code. Updated the error_tolerance for LogGrad(DNNL EP only) after finding a few instances that the tests were a little out of tolerance. Signed-off-by: George Nash <george.nash@intel.com> * Documentation cleanup in dnnl_qattention Cleaned up the Comments documenting the QAttention operator For some reason a bunch of new lines were introduced to the comment making it harder to read. Signed-off-by: George Nash <george.nash@intel.com>	2021-12-16 07:31:16 -08:00
Changming Sun	44c701192b	Revert a bad change in bfc_arena.cc (#10057 )	2021-12-15 23:38:45 -08:00
Tang, Cheng	6357c12977	use inplace reshape (#9991 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2021-12-15 21:17:29 -08:00
George Nash	7922a8c22f	Optimization Convolution op when using dnnl ep (#10051 ) If Group attr = 1 allow the OneDNN library to optimize the memory layout for the device the Convolution operator is being run on. With out this optimization the default NCHW memory layout is used on CPUs the NCHW memory layout can result in a significant performance decrease. Signed-off-by: George Nash <george.nash@intel.com>	2021-12-15 20:28:34 -08:00
Edward Chen	3466ee45a3	Add hash value typedef. (#9710 ) Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.	2021-12-15 19:07:17 -08:00
Chih-Hsuan Yen	4e73cc83d6	Fix building DNNL EP with clang (#10014 ) Before this change, building DNNL EP from onnxruntime 1.10.0 with clang fails with: In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc:4: In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.h:5: In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_subgraph.h:10: In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/shared_library/provider_api.h:19: In file included from /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/common.h:36: /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:33:6: error: call to function 'operator<<' that is neither visible in the template definition nor found by argument-dependent lookup ss << t; ^ /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<std::vector<long>>' requested here MakeStringImpl(ss, args...); ^ /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char , std::vector<long>>' requested here /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<long, const char , std::vector<long>>' requested here /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char , long, const char , std::vector<long>>' requested here /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<unsigned long, const char , long, const char , std::vector<long>>' requested here /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:46:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char , unsigned long, const char , long, const char , std::vector<long>>' requested here MakeStringImpl(ss, args...); ^ /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:93:18: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char , unsigned long, const char , long, const char , std::vector<long>>' requested here return detail::MakeStringImpl(detail::if_char_array_make_ptr_t<Args const&>(args)...); ^ /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc:46:7: note: in instantiation of function template specialization 'onnxruntime::MakeString<char [20], unsigned long, char [23], long, char [9], std::vector<long>>' requested here ORT_ENFORCE(data_dims[i] == 1, "Dimension of input ", i, " must be 1 instead of ", data_dims[i], ^ /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/common.h:184:64: note: expanded from macro 'ORT_ENFORCE' ::onnxruntime::MakeString(__VA_ARGS__)); \ ^ /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:147:15: note: 'operator<<' should be declared prior to the call site std::ostream& operator<<(std::ostream& out, const TensorShape& shape); ^ 1 error generated. make[2]: *** [CMakeFiles/onnxruntime_providers_dnnl.dir/build.make:384: CMakeFiles/onnxruntime_providers_dnnl.dir/build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc.o] Error 1 Two-phase lookups fail as: 1. visible in the template definition - fails as `std::ostream& operator<<(std::ostream& out, const TensorShape& shape)` (from include/onnxruntime/core/framework/tensor_shape.h) is defined after `template <typename... Args> std::string MakeString(const Args&... args)` (from include/onnxruntime/core/common/make_string.h) as per `clang++ -E` 2. argument-dependent lookup - fails as the argument data_dims has type `std::vector<long>` (via typedef in dnnl.hpp), while `std::ostream& operator<<(std::ostream& out, const TensorShape& shape)` is in namespace onnxruntime instead of std There are several possible fixes: * Make operator<< appear before MakeString by adjust the order of header files - I consider it fragile * Also define operator<< in namespace std - may results in namespace pollution * Use an argument of a class in onnxruntime namespace - this commit	2021-12-15 17:08:57 -08:00
Valery Chernov	b327e89efa	Standalone TVM Executor Provider (#10019 ) * squashed commit for standalone tvm execution provider * critical fix for correct python build with stvm ep * get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG * updates and fixes * update parsing of stvm provider options * add support of external data for onnx model * add conditional dump of subgraphs * remove unused code * get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API * support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type) * add fp16 * add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options * fix license text in header. fix log format * small fixes * fix issues from flake8 * remove model proto construction from GetCapability * reserve memory for vector of DLTensors * add simple tutorial for STVM EP * STVM docs * jroesch/tvm -> apache/tvm * remove dead code, unneccessary logs and comments * fix in readme * improve tutorial notebook * tvm update * update STVM_EP.md * fix default value * update STVM_EP.md * some TODOs for the future development * shorten long lines * add hyperlink to STVM_EP.md * fix Linux CI error * fix error in csharp test Co-authored-by: Jared Roesch <jroesch@octoml.ai> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2021-12-15 16:59:20 -08:00
George Wu	16274beb6f	update TensorRT EP to use TensorRT 8.2 (#9981 ) * update base image from 11.4.0 to 11.4.2 * update Linux TRT GPU pipeline to TRT 8.2 * update onnx-tensorrt to 8.2-GA * disable failing TensorRT 8.2 tests. * update pad test. * fix * update win trt ci pipeline to trt 8.2 * test run with cuda 11.4 and cudnn 8.2 * increase timeout * revert * revert * update packaging pipelines to use trt 8.2 * fix typo * update trt gpu perf pipeline to trt 8.2 * increase timeout * delete deprecated ci-perf-pipeline.yml * bump timeout * adjust timeout packaging	2021-12-15 15:59:31 -08:00
Yufeng Li	ee975de77b	reorganize quantization files (#10023 ) * reorganize quantization files	2021-12-15 15:45:04 -08:00
Edward Chen	6cdab06255	Enable argument files in build.py. (#10040 )	2021-12-15 08:22:15 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00
jingyanwangms	8043a9facc	Bump master version to 1.11 (#9957 ) * Bump master version to 1.11 * Update Windows AI version * update version in onnxruntime_c_api.cc	2021-12-14 23:32:06 -08:00
Changming Sun	91096781c3	A small fix to allocators (#10042 )	2021-12-14 21:21:07 -08:00
Changming Sun	9d9ebd3b85	Fix some static analysis warnings in the core framework (#10033 )	2021-12-14 14:41:42 -08:00
Changming Sun	e0a0f385bb	Fix some warnings in mlas (#10034 )	2021-12-14 14:41:11 -08:00
ashari4	af71da0ac6	Yield op supports bf16 (#10035 )	2021-12-14 13:12:37 -08:00
Ye Wang	703becd796	Fix a bug in fusion_embedlayer.py (#10022 )	2021-12-14 12:50:35 -08:00
Ginés Hidalgo	5be0fa13c0	[DML] Fixed huge bug in ORT_NO_EXCEPTIONS for DML back end, the check is reversed	2021-12-14 10:17:06 -08:00
ashari4	9e04b7e59b	Remove memcpy in in-place ATen ops (#9913 ) * Make ops in-place * Add comment	2021-12-14 08:28:12 -08:00
Vincent Wang	a7c2d1cb09	bf16 for dlpack (#10016 )	2021-12-14 13:34:14 +08:00
Chen Fu	cd0af7ad44	Symmetric quantized convolution kernel ARM64 (#9772 ) Adding a symmetric quantized convolution kernel for ARM64 Note: Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster Co-authored-by: Chen Fu <fuchen@microsoft.com>	2021-12-13 21:14:45 -08:00
Suffian Khan	7e55a942cd	Add torch 1.10 requirements for rocm (#10028 )	2021-12-13 20:39:58 -08:00
Sunghoon	6de2a878cb	[js/react_native] Fix a broken manual build (#10012 ) * Fix a broken manual build * Keep the same file structures	2021-12-13 19:02:10 -08:00
Changming Sun	7b63d1102b	Fix some warnings in orttraining code (#10009 )	2021-12-13 15:28:21 -08:00

1 2 3 4 5 ...

6160 commits