onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-18 18:52:16 +00:00

Author	SHA1	Message	Date
Tracy Sharpe	5d773ee57b	MLAS: add sgemv path for aarch64 builds (#4254 ) Implement a fast path for GEMMs where M=1 and TransB=CblasNoTrans.	2020-06-17 20:10:35 -07:00
Chih-Hsuan Yen	5da849b414	Fix detection of protobuf with onnxruntime_PREFER_SYSTEM_LIB on Linux (#4230 ) The CMake module is FindProtobuf.cmake [1]. Thus the name should be capitalized so that protobuf can be found on case-sensitive file systems. [1] https://github.com/Kitware/CMake/blob/v3.17.3/Modules/FindProtobuf.cmake	2020-06-17 17:34:47 -07:00
Changming Sun	43deec2174	Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266 )	2020-06-17 16:25:24 -07:00
Vincent Wang	b41fcf1570	Bugfix for shape inference and GetShape. (#4243 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-06-17 15:11:02 +08:00
Yulong Wang	12367a6b11	[C#] enable string-typed FixedBufferOnnxValue in input (#4178 )	2020-06-16 11:06:11 -07:00
Wei-Sheng Chin	189fb60ef9	Fix a bug and add code to profile memory (#4241 ) * Fix a bug and add code to profile memory 1. Compile Send/Recv again (currently broken because of HOROVOD refactor). 2. Add code to print out initializer allocation size and activation memory size. * Address comments * Split memory counts per locations * Fix a metric	2020-06-16 10:17:27 -07:00
edgchen1	63bf587623	Use azcopy to download test data (#4221 ) Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy. Update download_test_data.py to use helper function.	2020-06-16 10:14:34 -07:00
Tianlei Wu	61fa5476d5	Update PyTorch Bert notebooks (#4239 ) update PyTorch Bert SquAD notebooks to use onnxruntim-tools and update usage of intra_op_num_threads. rename python files according to coding style Fix change_input_to_int32. update keras notebook to copy script from rel-1.3.0 branch (Will update them later)	2020-06-16 09:36:51 -07:00
Weixing Zhang	7ccce4379e	Improve fast_divmod (#4224 ) * improve fast_divmod BERT-L throughput is improved about ~1.8% * fix Win build. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-16 03:03:58 -07:00
Changming Sun	825392c25b	Fix ORT server CI build (#4165 )	2020-06-15 21:26:19 -07:00
ytaous	5d28efd434	opset12 code cleanup (#4242 ) * opset12 code cleanup * opset12 code cleanup Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 19:45:35 -07:00
ytaous	e0334f177c	Opset12 upgrade for existing models used by perf/e2e pipelines (#4238 ) * opset12 support * opset12 support * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 14:26:53 -07:00
Ashwini Khade	4486c66ed4	enable conv transpose 3D (#4218 ) * enable convtranspose 3D * test fix	2020-06-15 13:38:32 -07:00
Bowen Bao	b08771f00e	Add ONNX Training Post-Passes to Front-End - Cont (#4041 ) * Add ONNX postpasses * add flag + add bert test from onnx file * address PR comments * fix typo * fix rebase * address comments * Fix test failures * add new pass for expand for new pt version, add comments * fix rebase Co-authored-by: lahaidar <lahaidar@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-15 10:33:26 -07:00
Cecilia Liu	0b5bbb16b8	Benchmark With IO Binding (#4206 ) * add io binding to benchmark.py	2020-06-15 10:06:33 -07:00
Weixing Zhang	b4b1c6440a	Enable ORT with CUDA 11 toolkit (#4168 ) * ORT on CUDA 11 1. Seperate HOROVOD and MPI 2. Seperate NCCL from HOROVOD in CMakeLists.txt 2. Remove dependency on external cub 3. cudnnSetRNNDescriptor is changed in cuDNN 8.0 * polish the code about MPI/NCCL in CMakeLists.txt and build.py * check CUDA version * ${MPI_INCLUDE_DIRS} should be PUBLIC * sm30, sm50 are deprecated in CUDA 11 Toolkit * update change based on code review feedback. * add sm_52 * improve MPI/NCCL build path Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-15 08:47:03 -07:00
Emad El-Haraty	88a9cceb41	fix relative links in CONTRIBUTING.md (#4212 ) * fix a links to Engineering Design and API in CONTRIBUTING.md * fix additional links in CONTRIBUTING.md * correct the link to the public API in CONTRIBUTING.md Co-authored-by: Emad El-Haraty <emad.elharaty@limebike.com>	2020-06-15 06:48:09 -07:00
Guoliang Hua	d0d31efd86	fix transformer doc format (#4003 ) fix transformer doc format	2020-06-15 01:30:47 -07:00
Wei-Sheng Chin	ecc901717e	Use subset to release gradient tensors earlier (#4222 )	2020-06-14 22:52:54 -07:00
Andrews548	886befaba1	Add BatchNorm and Concat to ACL EP (#4190 ) * Fix acl padding * Add BatchNormalization operator to ACL Execution Provider * Add Concat operator to ACL Execution Provider Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>	2020-06-14 21:48:22 -07:00
Hariharan Seshadri	877862184e	Fix subgraph based reshape fusion (#4185 )	2020-06-14 21:10:08 -07:00
Tracy Sharpe	bf3c32166d	fix optional input/outputs (#4229 )	2020-06-15 08:10:22 +10:00
Hariharan Seshadri	5708c4feaf	Handle corner case in Resize op (#4183 ) * Handle corner case in Resize op * Nit * Fix build * PR feedback	2020-06-13 18:05:25 -07:00
Tracy Sharpe	7a96cfc8f5	operator code cleanup (#4228 ) Search/replace of the pattern "const auto foo = tensor.Shape()" to "const auto& foo = tensor.Shape()" to avoid unneeded copies at runtime and reduce code size (8KB drop for onnxruntime.dll). Remove some unnecessary header includes.	2020-06-13 14:47:44 -07:00
jornt-xilinx	c55f6d76be	[Vitis-AI EP] Fix to enable multi-output subgraphs inside Vitis-AI EP + edit docs (#4171 )	2020-06-13 04:56:07 -07:00
Wei-Sheng Chin	de9da123cf	Enable static memory planning for pipeline. (#4204 ) * Enable static memory planning for pipeline. 1. We fix a bug when resolving symbolic shape for scalars. 2. We pass the original inputs to all pipeline stages so that the symbolic shapes can be resolved. * Further Improvements 1. Address comments. 2. Further reduce activation size by ~50% when pipeline is on. This is done by removing all but one gradient tensor from the last RecordEvent in the backward pass. * Address a comment * Fix Windows build	2020-06-12 21:43:50 -07:00
Hariharan Seshadri	b377266eb3	Fix Mac build linker warnings (#4155 )	2020-06-12 21:10:12 -07:00
Hariharan Seshadri	91a41298cc	Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954 )	2020-06-12 19:32:57 -07:00
Tracy Sharpe	155e22d1ab	MLAS: fuse float output into quantized GEMM (#4215 ) Add more variants of MlasGemm that do a u8x8 GEMM with the output type as float. This fuses the common sequence of MatMulInteger + Cast + Mul(OutputScale) + optional Add(BiasVector).	2020-06-12 17:50:40 -07:00
Tiago Koji Castro Shibata	2e3607c7cd	Remove hardcoded desktop lib (#4193 )	2020-06-12 16:51:54 -07:00
Edward Chen	f74861841e	Fix dangling pointer to local string variable in onnxruntime_pybind_state.cc.	2020-06-12 14:28:39 -07:00
Edward Chen	6b4f652017	Clean up status checks in gradient_graph_builder_test.cc.	2020-06-12 14:28:39 -07:00
Edward Chen	7096e6f5ef	Reduce severity of GraphAugmenter logging statement.	2020-06-12 14:28:39 -07:00
Changming Sun	6f4320fb85	Fix the python package name issue (#4207 ) Fix the package package name issue. In my last change(#4197) about enabling code sign. I forgot to pass the additional flags to setup.py,	2020-06-12 08:32:59 -07:00
Yufeng Li	87d68d8531	matmul integer fusion (#4195 ) * Introduce DynamicQuantizeMatMul It fuses DynamicQuantizeLinear, MatMul and following cast, multiplier. It gets float in and float out for quantized matmul. We have a MLAS kernel in implementation for this op.	2020-06-11 21:42:09 -07:00
Tianlei Wu	2605faef88	Add past state support in Attention Op for GPT-2 (#4107 ) Update Attention op to allow past state input and output. Add fusion script and tests	2020-06-11 14:19:55 -07:00
pengwa	e6ccb1ac28	GatherNDGrad for CPU (#4123 ) * GatherNDGrad on CPU * Remove __CUDA_ARCH__ check in .cc files	2020-06-12 02:43:49 +08:00
Xueyun Zhu	65a682354b	enable pipeline to run with mixed precision (#4113 ) * enable pipeline to run with mixed precision * address feedback * address feedback * test log * pipe infomation if test fails * ci failure	2020-06-10 22:16:24 -07:00
Changming Sun	8f8d899bf2	Enable code sign in c api pipeline and python pipeline	2020-06-10 19:31:22 -07:00
Yulong Wang	73bc6be5d1	build: split nodejs binding build and test to avoid timeout issue (#4188 ) * split nodejs binding build and test * enable nodejs tests	2020-06-10 19:16:32 -07:00
Matthew Hill	117b2e7743	Fix GPU memory leak on TensorRT (#4172 )	2020-06-10 16:56:51 -07:00
Dmitri Smirnov	af0750ba1b	Java GPu artifact naming (#4179 ) Modify gradle build so artifactID has _gpu for GPU builds. Pass USE_CUDA flag on CUDA build Adjust publishing pipelines to extract POM from a correct path. Co-Authored-By: @Craigacp	2020-06-10 11:15:48 -07:00
George Wu	e8ed14bcb3	disable MEMLEAK CHECKER for openvino	2020-06-10 11:12:17 -07:00
stevenlix	c296884fc3	bump up ORT version to 1.3.1 (#4181 )	2020-06-10 08:44:03 -07:00
Changming Sun	c0bdbc0b39	Enable telemetry for the C API and python pipeline (#4174 )	2020-06-10 00:07:46 -07:00
Tracy Sharpe	35d9f396c4	MLAS: refactor quantized GEMM loops (#4182 )	2020-06-09 23:28:55 -07:00
George Wu	9d65ce53bc	move back to toolset 14.16 to possibly work around nvcc bug (#4180 )	2020-06-09 19:36:30 -07:00
Changming Sun	a7366d82af	Disable nuphar large model test (#4173 ) Disable nuphar large model test, because it takes too long(40+ minutes), while the default cpu provider takes about 5 minutes. After this change, we still keep a lot of other nuphar model tests, I think that should be enough.	2020-06-09 17:45:17 -07:00
Ashwini Khade	9eba9fba7c	Fix for BiasGelu fusion optimizer (#4160 ) * Fix for BiasGelu fusion optimizer * changes per review comments	2020-06-09 14:33:34 -07:00
Yulong Wang	2b3ce1b090	add script to support update nodejs binding version (#4164 )	2020-06-09 13:12:55 -07:00

1 2 3 4 5 ...

2718 commits