onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
Tim Harris	9c1900866a	Revert ""Sticky" allocation of worker threads (#7372 )" This reverts commit `3d92723d1c`.	2021-04-30 14:39:58 -07:00
Thiago Crepaldi	9ba9da0c95	Fix unused registered buffers issue on ORTModule (#7525 )	2021-04-30 13:50:23 -07:00
Tang, Cheng	54db6648af	kerne invoker api for eager mode (#7473 ) * initial draft for kernel invoke api * initial implementation of kernel invoker * [eager] fix build on Mac * [eager] increment input name in kernel invoker * temp fix for type in eager mode * use global default log manager * rollback the previous commit since it break linux build * Revert "rollback the previous commit since it break linux build" This reverts commit `58c2c3423a`. * Eager Mode: fix linking on macOS * optimizer_execution_frame: ignore unused lambda capture (model_path) * fix link issue * ORTInvoker: set correct input argument tensor element proto types Do not set a type proto on output arguments to allow ORT to deduce them * ORTInvoker: create only one logging manager * Minor fix to set execution provider type correctly. (#7000) Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com> * training fix * support config output ml values in frame, so we can use it to implement inplace update * Fix range loop error while building. (#7087) Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com> * Conditionally link with nsync_cpp if not windows. (#7151) Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com> * Fixed initialization order in ORT kernel invoker (#7342) * Updated constructor of ort_kernel_invoker to take a logger. * Changed linking order. * Updated test. * add inplace ut * add build option * Update include/onnxruntime/core/eager/ort_kernel_invoker.h Co-authored-by: Derek Murray <Derek.Murray@microsoft.com> * resolve comments in pr * fix build break;merge from master * fix build break Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Aaron Bockover <abock@microsoft.com> Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com> Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com> Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>	2021-04-30 13:33:58 -07:00
Ori Levari	dfca1a09d5	Add Thread Spinning Session Option in WinML (#7498 ) Co-authored-by: Ori Levari <orlevari@microsoft.com>	2021-04-30 11:44:58 -07:00
Weixing Zhang	e6f66f660c	missed change for external allocator in ROCm EP. (#7505 )	2021-04-30 09:53:05 -07:00
Adrian Tsai	70e67ddd2b	Update DirectML version to 1.5.1 and enable ARM/ARM64 builds with DML (#7511 ) * Update DirectML to version 1.5.1 * Enable --use_dml with ARM and ARM64 * Add ARM/ARM64 binaries to nuget packages	2021-04-30 00:49:30 -07:00
Yulong Wang	00aaa6dabb	update CI for onnxruntime-web (#7497 )	2021-04-29 22:22:52 -07:00
Changming Sun	0d107bbb73	Fix CUDA 10.2 pipeline (#7508 )	2021-04-29 22:22:35 -07:00
Scott McKay	d6df5764d7	Android package infrastructure (#7430 ) * Include ORT format model conversion scripts and infrastructure in ORT python package. - tweak existing script setup so it can be easily run directly and from the ORT python package Add config file and readme for Android minimal build package Update ORT Mobile doco Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot). * Address PR comments	2021-04-30 14:23:54 +10:00
Tim Harris	3d92723d1c	"Sticky" allocation of worker threads (#7372 ) * Sticky thread alloaction * Test sticky thread assignment * Test sticky thread assignment * Test sticky thread assignment * Expose control over additional worker assignment stats * Sticky thread alloaction * Test sticky thread assignment * Test sticky thread assignment * Test sticky thread assignment * Expose control over additional worker assignment stats * Merge * Merge * Merge * Fix Windows build * Fix windows build 2 * Build Python 3.8 Windows CPU only * Add env var to override binding * Build Python 3.8 Windows CPU only * Fix windows build * Remove thread affinity override * Remove goodworker * Remove Python build settings * Remove unneeded changes * Remove unneeded changes * Remove unneeded changes * Remove unneeded changes * Remove unneeded changes * Remove unneeded changes * Tidy * Tidy * Avoid race on preferred_worker vector * Improve assertions * Improve assertions * Enum for PushBackWithTag result * Remove unused field * Update comments * Extra debugging * Extra debugging * Extra debugging * Support varying thread pool sizes * Improve comments * Remove requirement for thread local to be trivially destructible * Use unsigned consistently for thread counts, removing casting * Remove debug code * Fix webassembly build * Merge * Merge * Merge * Remove unused code * Fix build * Extra test case for varying loop sizes * Clean variable names * Clean variable names * Clean variable names * Remove unneeded include, fix build * Fix profiling * Update from review comments	2021-04-29 20:42:14 -07:00
Edward Chen	ec04b6203b	Remove conditional compilation of std::is_trivially_copyable since we are no longer supporting GCC 4. (#7504 )	2021-04-29 19:13:09 -07:00
Changming Sun	1012535dab	Change onnxruntime::make_unique to std::make_unique (#7502 ) 1. Change onnxruntime::make_unique to std::make_unique 2. Add "-std=c++14" to ROCM EP's build flags.	2021-04-29 17:04:53 -07:00
Yufeng Li	d337fa90e7	Propagate QDQ only when scale and zp are scalar (#7492 ) fix crash when DeQuantizeLinear's output is graph output propagate only when scale and zp are scalar. fix bug for is_modified= is_modified \|\| TryCancelOutDQQPair(graph, dq_node, q_node); in which TryCancelOutDQQPair wouldn't be invoked if is_modified is true	2021-04-29 14:40:41 -07:00
Scott McKay	e255506bcd	Add another input validation to ReverseSequence (#7445 ) * Add another input validation to ReverseSequence * Limit the bad length test to the CPU EP	2021-04-30 07:24:32 +10:00
Xiaoyu Liu	994c2ed420	GPT2 one step beam search update with configuration support (#7425 ) * check in early stop search as separate type * rename to beam search configurations * update do sample configuration flag help * rename to configurable search step * add option groups * add more unit tests Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>	2021-04-29 13:19:56 -07:00
Ilya Lavrenov	6358e96b63	Added OpenVINO 2021.4 support (#7470 ) * Added OpenVINO 2021.4 support * Added OPENVINO_2021_4 handling	2021-04-29 12:25:04 -07:00
Changming Sun	7b003967b1	Add static code analyzer to Windows CPU/GPU CI builds and fix the warnings (#7489 )	2021-04-29 11:54:57 -07:00
Tracy Sharpe	2b0bbfd1a8	MLAS: add SSE 4.1 u8s8 kernel (#7490 )	2021-04-29 11:12:32 -07:00
Tang, Cheng	e73c3e0651	rollback the GetRuntimePath impl for linux (#7488 ) * rollback the GetRuntimePath impl for linux; limit the dynamic ep load ut for win * remove the override	2021-04-29 09:11:23 -07:00
Chi Lo	0dbe51b002	Enable TRT EP for C# (#7482 ) * enabled TRT EP for C# * Fix potential leak	2021-04-29 04:56:40 -07:00
RajalakshmiSR	3c7c728989	cmake: Add regex pattern for POWER architecture (#7494 ) This patch helps to set architecture as power, when processor check output matches ppc64le*. Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2021-04-28 22:23:14 -07:00
Adrian Tsai	f13b378995	Re-disable tests (#7495 )	2021-04-28 21:50:22 -07:00
sabreshao	e6a3308db7	Optimize cuComputeGradInput performance. (#7479 ) Move the checking of gamma to host and specialize both case through template.	2021-04-28 17:08:31 -07:00
Chandru Ramakrishnan	6773b4f5dd	Fix implicit-exception-spec-mismatch warning. (#7481 ) (#7483 ) * Fix implicit-exception-spec-mismatch warning. (#7481) * Suppress implicit-exception-spec-mismatch warning. * Updated to noexcept. * Unconditionally use noexcept.	2021-04-28 19:17:39 -04:00
Thiago Crepaldi	3ee63beafa	Fix user input order before ORTModule feed it to backend (#7456 )	2021-04-28 14:33:40 -07:00
Changming Sun	d68cedfa85	Fix some C/C++ warnings in the jni part (#7385 )	2021-04-28 14:25:58 -07:00
Lifu Huang	ab373d6f03	Lifhuan/force trt sequential (#7440 ) * Support sequential TensorRT engine build. * Add documentation. * Add tests and fix typos. * Fix missing field in pybind_state.	2021-04-28 13:59:37 -07:00
Bowen Bao	c584d48283	Add sequence identity for opset 14 & fix sequence insert (#7335 ) Description: - Fix SequenceInsert with last position, which is equal to the current sequence length. - Implement Identity to support sequence input for opset 14. Motivation and Context - Required to export Huggingface/transformers T5 with beam search.	2021-04-28 13:26:57 -07:00
thilow	22d7cde725	Fix a 'Squeeze' related issue in symbolic_shape_infer.py (#7380 ) * Update symbolic_shape_infer.py don't rely on static code infer in _infer_Squeeze_ * checking if dorpped axes might be =! 1 * Checking opset. Logging assumption that symbolic dimensions are unequal to 1. * more checks	2021-04-28 13:13:04 -07:00
Maajid khan	674915208a	Fixes RelWithDebInfo build issue on windows for OV-EP (#7471 ) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>	2021-04-28 10:44:05 -07:00
G. Ramalingam	044c78f089	Add function body to LayerNorm (#7378 ) * LayerNorm function body v1 * LayerNorm function body * layernorm function test * Minor fixes * Fix signed unsigned comparison * Move contrib ops test * Handle optional output parameters * Add test case for optional outputs * Handle float16 random generation * Address PR feedback	2021-04-28 09:31:53 -07:00
Pranav Sharma	da5c9263e9	Add log to allow serving platforms to quantify ORT usage. (#7476 )	2021-04-28 08:20:02 -07:00
KeDengMS	8e21329206	Update nuphar notebook model download url (#7475 )	2021-04-27 21:18:06 -07:00
liqunfu	196e6702ad	to support multiple cuda versions in published onnxruntime-training package (#7468 ) to support multiple CUDA versions in published onnxruntime-training package	2021-04-27 17:15:33 -07:00
Zhang Lei	e64e30ee0d	Improve ConvTranspose by transposing const filter during prepacking. (#7388 ) * Improve ConvTranspose by transposing const filter during prepacking. * Fix CI build break for openvino which can not load such onnx model now.	2021-04-27 16:49:03 -07:00
Edward Chen	d21304ceb0	Initial Objective-C API (#7366 ) Initial implementation of an Objective-C API.	2021-04-27 10:06:30 -07:00
Changming Sun	78e583d08c	Add CMAKE_CUDA_ARCHITECTURES=52 to TensorRT CI pipelines (#7455 )	2021-04-27 09:55:23 -07:00
Yulong Wang	c2418a1f42	[wasm] fix memory info creation (#7461 )	2021-04-27 09:29:21 -07:00
liqunfu	4cbd2cce9b	. (#7466 ) Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-27 09:20:21 -07:00
Yulong Wang	4ebc9c3b5e	[JS] onnxruntime-web (#7394 ) * add web * add script and test * fix lint * add test/data/ops * add test/data/node/ to gitignore * modify scripts * add onnxjs * fix tests * fix test-runner * fix sourcemap * fix onnxjs profiling * update test list * update README * resolve comments * set wasm as default backend * rename package * update copyright header * do not use class "Buffer" in browser context * revise readme	2021-04-27 00:04:25 -07:00
Tracy Sharpe	d13e5b2fd9	NCHWc: ReorderInput improvements (#7442 ) Implement various improvements related to reordering a tensor for use by NCHWc operations: Relax the requirement that the input channel count must be a multiple of the NCHWc block size (either 8 or 16 depending on ISA). The requirement now is that the channel count must be a multiple of 4. The implementation of MlasReorderInputNchw would need further work to support relaxing this further, but I don't have any models where I've observed this to be necessary yet. Support fusing a Transpose(NHWC->NCHW) into a following ReorderInput. ReorderInput now has a channels_last attribute as was done in the past for ReorderOutput. This helps with models converted from TF where the converter is unable to remove all Transpose operations. Add threading support to ReorderInput to accelerate performance (ReorderOutput will come later).	2021-04-26 19:16:39 -07:00
M. Zeeshan Siddiqui	82108b18e3	Partial graph execution perf improvements. (#7438 ) * Partial graph execution perf improvements. * PR feedback. * Decrement reference count of tensors in ORTModule. * PR feedback. * PR feedback. * PR feedback.	2021-04-26 17:13:55 -07:00
Thiago Crepaldi	0702a14ee7	Add pytorch version check before loading Python ONNX Runtime training module (#7377 )	2021-04-26 14:53:50 -07:00
Edward Chen	4804ede501	Update build docker image cache cleanup build definition (#7452 ) Decrease default cache history length to 4 days. Other minor updates to build definition.	2021-04-26 14:39:46 -07:00
RandySheriffH	40568d8821	Wait for dispatch done in RunParallelSection to fix random TP UT crash (#7443 ) * wait for dispatch done in RunParallelSection * pass worker_fn by value * cancel move * only move work_fn when it is lastly referred Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-04-26 14:12:10 -07:00
Zhang Lei	ada0fbbd2d	Implement qlinear concat and unit test. (#7341 ) * Implement qlinear concat and unit test. Add quantization tools for QLinearConcat and it quantization tests. * Add kernel def hash for QLinearConcat. * Change according to PR. Add qdq transformer support for QLinearConcat. * Add QDQ Transformer unittest. Fix typo on domain. * remove dup logic of no use. * fix x86 build error. * Update operator docs.	2021-04-26 13:38:40 -07:00
Changming Sun	b5592856a7	Remove thread pool's cancel method and suppress some warnings (#7411 )	2021-04-26 09:33:48 -07:00
Vincent Wang	368e4a324f	SqueezeGrad Bugfix (#7412 ) * squeezegrad bugfix * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2021-04-26 09:12:03 +08:00
Weixing Zhang	ca9b3f18e9	Explicitly pass cuda stream to thrust function rather than use cuda default stream implicitly (#7414 ) * Pass cuda stream to thrust function to not use default stream. In the commit `299ace0`, ORT has been changed to not use cuda default stream. * update amd_hipify.py * remove un-necessary stream sync Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-25 01:18:56 -07:00
jeyblu	b9cbbc41ff	dnnl matmul tensor dimension check (#7383 )	2021-04-23 23:17:22 -07:00

1 2 3 4 5 ...

4731 commits