onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

Author	SHA1	Message	Date
Tiago Koji Castro Shibata	fabe02ddc2	Don't change global FPU state during round-half-to-even (#5376 ) * Don't change global FPU state * Handle infinity properly	2020-10-13 20:10:33 -07:00
Ye Wang	67315d8ae0	Optimize openai-gpt/albert model and add fusion test (#5466 ) * optimize openai-gpt * add huggingface model fusion test * move albert's attention fusion here * add test for albert fusion	2020-10-13 19:24:14 -07:00
Scott McKay	5544391e79	Fix linking of MLAS unit test lib on platforms where libatomic is required. (#5469 )	2020-10-14 07:25:43 +10:00
Bowen Bao	8e9afe1944	Add long type support for SplitToSequence operator (#5367 )	2020-10-13 12:57:11 -07:00
Hariharan Seshadri	e01d152464	Add OpSet kernel registrations as part of opset 13 support (#5465 )	2020-10-13 10:02:00 -07:00
S. Manohar Karlapalem	6e6147fb75	Use correct protoc tool file name for C# builds (#5429 ) In Linux builds, the protoc tool is simply named 'protoc' (without the .exe extension).	2020-10-13 09:43:03 -07:00
Xiang Zhang	b12824fa7a	add telemetry event for nodejs binding (#5463 )	2020-10-12 22:53:01 -07:00
Guoyu Wang	ce5465d5f3	[NNAPI EP] Add Resize and Clip support (#5427 ) * Add resize and clip support in NNAPI EP * Try to get around tensor rt test failure * Addressed PR comments	2020-10-12 22:29:19 -07:00
KeDengMS	c444b9d76a	Add CUDA option to run copy in default stream (#5445 ) * Add CUDA option to run copy in default stream This change fixes #4829. Thanks @maherzog for providing the repro! The bug is caused by memory reuse in BFC arena, where copy and compute stream in CUDA has a racing condition. BFC arena is an arena allocator on top of cudaMalloc/Free to reduce the cost in syncing CPU and GPU when alloc/free. It means when CPU alloc/free the memory, GPU might not finished previous work on the memory, so that CPU and GPU could run asynchronously. This is OK if there's only one stream, where the execution order in CPU and GPU are consistent. For example, if we have two kernels A and B, CPU runs allocA->computeA->freeA->allocB->computeB->freeB, A and B could shares the same memory since computeA and computeB will not have racing as long as they run in the same GPU compute stream. However, if CPU runs allocA->CopyA->freeA->allocB->computeB->freeB, the order of execution in GPU could have copyA happen after computeB, if copy and compute happens in different GPU streams. This change makes copy to run in default compute stream, while adding an option to fall back to previous behavior if there's perf hit. This is a short term fix before BFC arena could support multiple streams. User may use following options to revert to previous behavior: C API: struct OrtCUDAProviderOptions cudaProviderOpt; cudaProviderOpt.do_copy_in_default_stream = false; C++ API: CUDAExecutionProviderInfo cudaEPInfo; cudaEPInfo.do_copy_in_default_stream = false; C# API: pending... Python: import onnxruntime onnxruntime.capi._pybind_state.set_do_copy_in_default_stream(False) * Confirmed the test failes in CI when doing copy in separate stream Revert the test to get CI pass now * Fix Windows test * Address CR	2020-10-12 22:12:05 -07:00
Wenbing Li	80d36eab86	enable the onnxruntime shared library test on iOS (#5443 ) * enable the onnxruntime shared library test on iOS * fixing as commented. * add return status check.	2020-10-12 21:40:57 -07:00
RandySheriffH	913116e64e	bump ops version to opset13 (#5456 )	2020-10-12 20:47:09 -07:00
Sergii Dymchenko	05b1c02d32	Fix commands in README.md. (#5459 )	2020-10-12 17:53:09 -07:00
Sherlock	60dbd8a1e5	Update maximum batch size for UT; Include recompute modes (#5444 ) * Update MaxBatchSize and include recompute mode * Minor fix for frontend test Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-12 14:50:43 -07:00
Derek Murray	dbc626dcbe	Add ExpGrad registration and test. (#5438 ) Description: Add missing gradient registration for the `Exp` op. Motivation and Context * Adding support for training a model that uses the `Exp` op. Co-authored-by: Derek Murray <demurra@microsoft.com>	2020-10-12 13:56:08 -07:00
Ashwini Khade	2a018cc235	revert contrib op version bump and deprecation of TransposeMatMul (#5424 ) * revert contrib op version bump and deprecation of TransposeMatMul * update documentation	2020-10-12 13:02:15 -07:00
jingyanwangms	20c47ce91c	Simplified layer norm changes (#5028 ) * t5 layer norm changes * add t5 layer norm kernel * use template for t5 layer norm * template definition changes * no build error * add CPU cuda kernel * first unit test * other forward unit tests * add T5LayerNormGrad * Add c++ transform and test for T5 LN * fix and some debug prints * fix cuda error * rename from t5 to simplified * PR comments * revert change on invertible LM code path * remove duplicate forward computation * add GradientCheckerTest.SimplifiedLayerNormGrad * change back macro * Fix SimplifiedLayerNorm Gradient * merge with Sherlockss changes * changed cuda kernel * reapply cpu kernel changes Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: aishwarya bhandare <aibhanda@microsoft.com> Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-12 11:22:12 -07:00
edgchen1	ed60e0fe39	Fix BUILD.md environment variable name typo. (#5402 )	2020-10-12 11:17:09 -07:00
Pranav Sharma	5e48c0fd6c	Register opset13 ops: Dropout, Flatten, LRN, MeanVarianceNormalization, ArgMax, ArgMin, Reshape, Shape, Concat. (#5451 )	2020-10-12 10:09:38 -07:00
stevenlix	186f0668b0	update onnx-tensorrt submodule (#5442 )	2020-10-09 21:49:40 -07:00
Hariharan Seshadri	b9f90e297e	Support sharing of initializers between session via the Python API (#5407 )	2020-10-09 20:26:28 -07:00
Ryan Hill	6132e1f6ae	Shared providers - fix logging plus cleanup (#5406 ) * Fix logging, cleanup, and implement the remainder of the not implemented functions from the shared provider interface.	2020-10-09 17:31:03 -07:00
Wei-Sheng Chin	6cba42e942	Avoid inserting other CUDA calls in-between NCCL Send's and Recv's (#5430 ) * Avoid inserting other CUDA calls in-between NCCL Send's and Recv's * Add a comment * Place CUDA EP on the right device * Fix a warning * Address a comment	2020-10-09 15:34:46 -07:00
liqunfu	dbe7e6623b	only use/import pytest if needed (by enable_training) (#5437 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-09 12:42:19 -07:00
Dmitri Smirnov	9642f1448e	Add OpSet 13 Registrations (#5426 ) Register Sigmoid for OpSet13 Register OpSet 13 for Sum, Min, Max, Mean. Add Erf OpSet 13 registration. Register Clip for OpSet 13 Add Gemm/MatMul Opset 13 resigstartions Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-09 12:39:22 -07:00
Sergii Dymchenko	3a9a1a4ef1	Fix registration for GatherGrad (#5382 ) * Fix registration for GatherGrad to fix GatherGradOpTest.GatherGrad_axis0_indices2d_half. * Fix GatherGrad registration for CUDA also.	2020-10-09 11:57:50 -07:00
liqunfu	1cceefc7d4	use run_orttraining_test_orttrainer_frontend_separately to work aroun… (#5408 ) * use run_orttraining_test_orttrainer_frontend_separately to work around a sporadic segfault. Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-09 09:16:10 -07:00
Scott McKay	a92ccbe1bc	Various armv7 related fixes (#5394 ) * - Link with libatomic if needed - Install pip differently so it doesn't clash with the system pip which may involve a wrapper script - Remove ability to specify offset when Tensor allocates the data. The data prior to offset isn't accessible by anything. - Fix use of offset in TensorOpTest to work on armv7 where it must be aligned to the type it points to. - Fix ActivationOpNoInfTest.Softsign to allow for armv7 behavior - Fix ReductionOpTest.ReduceMean_keepdims to allow for armv7 floating point inaccuracy Address PR comments	2020-10-09 22:34:32 +10:00
Yufeng Li	b99eaa99cd	Prepacking MatMulInteger (#5403 ) * prepack matmulinteger Prepacking constant matrix B for MatMulInteger to get better performance.	2020-10-09 02:37:19 -07:00
Xavier Dupré	621fdb44e5	Fixes #4688 , remove CPUAllocator in TreeEnsemble (#5375 )	2020-10-09 11:26:07 +02:00
Keizo Fujiwara	d4507e9331	Use relative path for HEADER_SEARCH_PATHS (#5412 ) Currently HEADER_SEARCH_PATHS refers a personal directory.	2020-10-08 23:06:11 -07:00
Ye Wang	90f976d060	Some improvements on transformers tool (#5383 ) * modify tensoflow benchmark gpu setting * add export from tf choice in script * fix typo * match more embedlayernorm pattern * format	2020-10-08 19:35:17 -07:00
Tracy Sharpe	fab7f799a7	MLAS: fix ARM64 + VS2017 build break (#5423 )	2020-10-08 18:03:45 -07:00
Sergii Dymchenko	8a632a903f	Remove unused imports from Python tests. (#5405 )	2020-10-08 17:24:10 -07:00
Tianlei Wu	15696b8fce	bump version to 1.5.2 (#5420 )	2020-10-08 16:30:13 -07:00
Suffian Khan	498f94668d	Keep all_finite tensor on CPU when using PyTorch Frontend (#5371 )	2020-10-08 15:47:18 -07:00
Pranav Sharma	c2c78399ee	Include config keys header file in the release packages for Linux and Mac. (#5388 )	2020-10-08 15:00:29 -07:00
Changming Sun	09aef240d6	Skip running onnx tests in python mac os pipeline (#5416 )	2020-10-08 11:49:28 -07:00
Tiago Koji Castro Shibata	83ead3e2eb	Fix com ptr refcount (#5404 )	2020-10-08 10:18:38 -07:00
Yufeng Li	b04cf2d229	Update ORT to 1.5.1 in Bert Quantization Notebook (#5396 ) * Update ORT to 1.5.1 in Bert Quantization Notebook	2020-10-08 09:55:01 -07:00
manashgoswami	132ab2230d	Updated with image for creating the onnxruntime pkg (#5400 ) * Create Mobile.png * Update ONNX_Runtime_for_Mobile_Platforms.md * Update ONNX_Runtime_for_Mobile_Platforms.md	2020-10-08 08:54:27 -07:00
Scott McKay	9684e1b5a8	Add doco for pre-requisites to be able to cross compile for Android on Windows with Java bindings enabled. (#5395 )	2020-10-08 12:31:46 +10:00
Tianlei Wu	8133223871	clear cudaDelayLoadedLibs since delayload is disabled (#5386 )	2020-10-07 11:33:12 -07:00
Tianlei Wu	8ee2b08325	Allow benchmark different threads (#5390 )	2020-10-07 11:13:01 -07:00
Tianlei Wu	094384781e	Add --use_external_data_format in convert_to_onnx.py (#5393 )	2020-10-07 09:42:02 -07:00
Guoyu Wang	5947445457	Add flatbuffers verifier for ORT format buffer (#5378 ) * Add flatbuffers verifier before accessing data in ort format models * Address review comments	2020-10-07 09:23:17 -07:00
Guoyu Wang	deb708d3b1	Move flatbuffers to 1.12 release (#5392 )	2020-10-07 09:23:03 -07:00
Hariharan Seshadri	6f54113a1b	Support OrtValue binding in Python to enable interesting IOBinding scenarios in Python (#5248 )	2020-10-06 21:14:41 -07:00
Tracy Sharpe	0122e890d9	MLAS: implement u8x8 GEMM for ARM64 (#5380 ) Add an implementation for u8u8/u8s8 GEMM for use on ARM64 (Windows/Linux).	2020-10-06 19:22:23 -07:00
Guoyu Wang	b4934b0016	Mitigate pybind11 build break using Xcode 12 on macOS (#5381 ) * turn dev_mode off if we are using macos to build python with xcode 12 * Address CR comments * Add ways to check compiler version	2020-10-06 19:03:33 -07:00
Kaarthik Sivashanmugam	10f1902d90	Update code snippet in README.md	2020-10-06 17:41:56 -07:00

1 2 3 4 5 ...

3550 commits