onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-18 18:52:16 +00:00

Author	SHA1	Message	Date
Guoyu Wang	fa4658e8a9	Move to XCode new build system if building on Mac using XCode (#9617 ) * Use xcode new build system * Address cr comments	2021-10-29 18:44:55 -07:00
Guoyu Wang	57491b6f93	Add App Center test for iOS package (#9605 ) * Add app center test for iOS package * fix flake8 * fix yml templates path * Address CR comments	2021-10-29 15:23:01 -07:00
Hariharan Seshadri	b5f7bb7d10	Update ONNX (#9462 )	2021-10-29 10:33:40 -07:00
sumitsays	7744cc1013	[DmlEp] Make DmlEp compatible with Clang for EPIC (#9585 ) * Make DmlEp Clang compatible for EPIC * Fix build issues occurred when engine/lotus points to ORT Github latest * Fix more build errors * Fixed one build issue and removed temporary changes for Clang * Addressed comments on the PR. * Style fixes * Fix unreachable code Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2021-10-29 03:19:35 -07:00
Scott McKay	eb2612b588	Remove netcoreapp2.1 target as it is EOL and out of support. Attempting to use it with VS now causes unit test run failures. (#9603 )	2021-10-29 11:11:22 +10:00
Changming Sun	173e538b80	Update mac-ios-packaging-pipeline.yml	2021-10-28 14:25:29 -07:00
Changming Sun	cc73bcc243	Suppress component governance component warnings for ios	2021-10-28 14:25:29 -07:00
Ginés Hidalgo	1731f0080a	Update attention_cpu_base.h to suppress static analysis warning	2021-10-28 13:35:57 -07:00
Xavier Dupré	9c15c68ed4	Enable fallback when forward fails due to non contiguous tensor (#9369 )	2021-10-28 13:04:54 -07:00
Tianlei Wu	a01a3f2552	Add more statistics in transformer profiler (#9578 ) * add statistics of cuda kernel * grouping by provider + operator * add --input to import profiling result	2021-10-28 11:35:03 -07:00
Viswanath Boga	85874bb315	embed layer fusion gpt2 (#9336 ) * Changes to fuse embed layer for gpt2, kernal changes pending * verified add output and regular add match * Test added for additional output embedlayernorm, working on CUDA * Test passing on CPU * updated convert_to_onnx toll to check parity correctly * removed some debugs * couple of TODO left as in optimizer.py * removed changes to optimizer.py * fixing build * fixing build * updated order of initilization * added a test case for float16 * updating the docs * updating tests failing due to embed layer fusion * update unit tests * updating CUDA documentation in operatorkernels.md * addressing comments * OperatorKernels.md updated with CUDA * adding TODO to qembed_layer * minor edit * updated docs * addressing comments * adding position ids to embed layer gpt2 * updating fused gpt2 model * added extra test * remove comments * addressing comments * contrib_defs.cc updated * all tests passing * fixing a typo * minor edit * trigger build * qembedlayernorm checkinputs updated * fixing build error * fixing build error * fixing build error	2021-10-28 11:06:26 -07:00
Tianlei Wu	a555740708	Attention fusion: update uint8 tensor parsing for ONNX upgrade (#9564 ) * use UnpackTensor to parse uint8 tensor * address review feedback	2021-10-28 10:38:10 -07:00
Sunghoon	17cf39a964	Clean up unnecessary codes in softmax and hardmax kernel (#9580 ) * add p50 in test * remove unnecessary codes from softmax * remove unnecessary codes from hardmax Co-authored-by: Yulong Wang <yulongw@microsoft.com>	2021-10-28 10:01:46 -07:00
TomWildenhain-Microsoft	e8268c9a18	Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284 ) * Add Transpose Optimizer and modify nhwc optimizer to use it. * Fix casts * Fix casts2 * Fix move * Add tests * Add headers * Fixes and tests * Remove explicit template instantiation * Fix build warning * Name unit tests * Code review fixes * Add some comments * Fix some casts * Make optimization slightly less agressive * Some unit test fixes * Update Attention pattern to work with transpose optimizer * Update attention fuser * Fix attention fusion python script * Improve transpose optimizer documentation * Create OptimizerCtx struct * Disable Slice handler for testing * Implement Slice int32 * Only push transposes leading up to other transposes * Improve optimization heuristic * Add exemption for MaxPool * Document transpose optimizer api.h * Revert fusion tests to master * Remove temp files * Replace typedef with using * Trim trailing whitespace * Move class declarations from api_impl.h to api_impl.cc * Remove copy constructors and move allocator * Alphabetize headers * Add override keyword * Comments for nhwc_transformer * Rename OrtGraph to ApiGraph, etc. * Wrap line * Remove extra qualifier on ApiGraph * Refector attention fusion * Remove c-style casts from api_impl.cc * Improve documentation * Avoid printing vector in ORT_ENSURES * Revert attention fusion refactor * Remove duplicate cost heuristics and improve documentation * Fix size_t casts * Fixes from Scott's review * Unrevert attention refactor and more updates from Scott's review * Revert api_impl.cc ValueInfo change * only optimize first transpose input * Unrevert api_impl.cc changes * Make vector call reserve * transpose_optimizer.cc update from Scott's comments * Rename api::Graph to api::GraphRef etc. * Consider domains 'onnx.ai' and '' equal * Replace AddInput with SetInput * Improve tests * quantization and heuristic tests * Comments for tests * Replace const string_view with string_view and update tests * Fixes requested by Edward * Fix std::string to string_view conversion * Add <string> to includes * Fix bug for broadcasting ops with unknown rank. Slight safety improvements * Changes requested by Edward * Fix formatting * Improve description of cost metric	2021-10-27 22:10:39 -07:00
Changming Sun	87b1fddd97	Add Linux/MacOS ARM64 support to nuget packaging pipeline (#9570 )	2021-10-27 19:00:43 -07:00
Ginés Hidalgo	2d44bd525b	DML functions always returning a value (#9485 ) * Always return a value * @fdwr advice added	2021-10-27 15:21:32 -07:00
Scott McKay	a2b3e6bb23	Remove pointless assert. (#9571 )	2021-10-28 07:33:40 +10:00
Dmitri Smirnov	4e76360261	Prevent PySparseTensor form being garbage collected if we have an outstanding OrtValue (#9540 ) Prevent PySparseTensor form being garbage collected if we have an outstanding OrtValue Improve comments.	2021-10-27 11:28:37 -07:00
Changming Sun	aa76520e60	Update macOS build agents to macOS 11 (#9562 )	2021-10-27 10:00:04 -07:00
Thiago Crepaldi	5d5c03bcdc	Fix opset version change by not using copy of global constant (#9393 )	2021-10-27 12:42:06 -04:00
Scott McKay	b5a652c578	Add Xamarin support (#9436 ) Add Xamarin support to the ORT nuget packages. - Update C# code to support Xamarin builds for iOS and Android - refactor some things to split out common code - include iOS and Android ORT native shared library in native nuget package	2021-10-27 20:07:07 +10:00
Ginés Hidalgo	12f216aab5	Bug in DmlOperatorResize.cpp with m_inputDimensions (#9456 )	2021-10-27 02:50:54 -07:00
Ginés Hidalgo	9639eded4b	Missing `#pragma once` in dml_provider_factory.h (#9457 )	2021-10-27 02:49:52 -07:00
Ginés Hidalgo	1efdbff1a3	Fixed compiler error in Clang (for Win64) for ExecutionProvider (#9482 )	2021-10-27 02:47:22 -07:00
Yi-Hong Lyu	0301f401ee	Cleanup unnecessary opset_version arguments (#9558 )	2021-10-27 02:25:54 -07:00
Sunghoon	c79307e7b4	[js/web] support opset-13 of softmax (#9493 ) * add p50 in test * support opset-13 of softmax * update a operators.md * resolve comments * fix lint and format Co-authored-by: Yulong Wang <yulongw@microsoft.com>	2021-10-26 23:58:50 -07:00
Ginés Hidalgo	d079e0d48f	Fixed Clang (on Windows) compiler error with `#pragma`'s (#9484 )	2021-10-26 21:31:45 -07:00
RajalakshmiSR	c54ad0dd0b	POWER: Add Dgemm kernel for POWER processor (#9459 ) * POWER: Add Dgemm kernel for POWER processor This patch adds new dgemm kernel specific to POWER processor. * POWER: Restrict new functions to VSX in header * Remove warning check in header * POWER: Dgemm Adjust indentation Fixing indentation based on review comments. Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2021-10-26 20:27:24 -07:00
Yulong Wang	90555bf96d	[node.js binding] enable CI for macOS arm64 (#9532 ) * nodejs aggr * add dependency * no unzip * fix aggregation * add arm64 for mac * mac arm64 build * fix commandline * add check for multi-CMAKE_OSX_ARCHITECTURES * fix	2021-10-26 16:42:19 -07:00
Zhang Lei	c1b0f924b7	quantization tool better support operator when subgraph is enabled (#9463 ) * Fix is_valid_quantize_weight recursive issue when enable subgraph. * some clear	2021-10-26 15:36:19 -07:00
Zhang Lei	33ef1d7700	disable inner parallel for global avg pool as normally they are small (#9487 ) * Using cost model's thread count rather than max number of threads when parallel tasks. * according to perf test result, decrease parallel on channels. * Seems no use on parallel channels for qavg_pool according several models, remove it. * Revert "Using cost model's thread count rather than max number of threads when" This reverts commit 5fa47cd5b5ddbaa4e5ef97ccbc53200324379544.	2021-10-26 15:35:49 -07:00
Changming Sun	df7a5342a5	Upgrade com.diffplug.spotless to 5.17.0 (#9546 )	2021-10-26 14:29:46 -07:00
Changming Sun	f39821adbc	Fix a bug in CMakeLists.txt when handling NO RTTI (#9547 )	2021-10-26 14:29:29 -07:00
Jingqiao Fu	da15f5fc2f	change cmake condition to prevent WCOS fom linking advapi32 (#9500 ) * change condition to prevent WCOS fom linking advapi32.dll * Remove linkage to advapi32.lib	2021-10-26 12:16:49 -07:00
Stella Stamenova	542f1a9737	Cleanup some whitespace and capitalization for set (#9504 )	2021-10-26 12:02:07 -07:00
Ginés Hidalgo	a036cc6d4b	Fixing bugs in ORT_NO_EXCEPTIONS (#9479 ) ORT_NO_EXCEPTIONS is not working after the latest changes in: onnxruntime/core/graph/function.cc onnxruntime/core/graph/graph.cc	2021-10-26 10:50:32 -07:00
Ginés Hidalgo	1aabba7120	Avoided warning C4458: declaration of 'X' hides class member. (#9541 )	2021-10-26 10:49:24 -07:00
satyajandhyala	f29057c7c0	Added TanhGrad. (#9507 ) * Added TanhGrad.	2021-10-26 09:10:03 -07:00
pengwa	b125446f9c	Optimize python overhead of APEX amp (#9447 ) * optimize python overhead of _post_amp_backward * overwrite apex amp's zero_grad for faster implementation * move unscale_fp16_grads_into_fp32_grads into C++ impl * improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm. * unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time. * refine the logic a bit after validating Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2021-10-26 13:13:49 +08:00
Yi-Hong Lyu	27ad20df23	Add QDQ support of Resize to able to fuse it into a quantized Resize (#9476 )	2021-10-25 21:48:15 -07:00
ashbhandare	0270ff7951	Minor import fix (#9538 )	2021-10-25 21:29:31 -07:00
Changming Sun	f92b8e2ac8	Clean up optional-lite references (#9534 )	2021-10-25 21:05:45 -07:00
Yulong Wang	bf4c3fa3d6	[node.js binding] aggregate binaries for multiple platforms in single NPM package (#9501 )	2021-10-25 20:16:10 -07:00
Vincent Wang	fb4f7dbbb7	Call ATenOp for ReduceSum on ORTModule (#9471 ) * call ATenOp for ReduceSum * Enable ReduceSum ATenOp for training only * always load extension	2021-10-26 09:48:57 +08:00
marcusfreisleben	651955d3c9	CUDA: Enable parallel compilation (#8974 ) * Pass on parallel option to nvcc * Fixed build.py * Added missing string conversion * Adressed review points	2021-10-25 16:42:58 -07:00
Scott McKay	39d1b9e1c1	Fix bug in Slice helper when dim value is zero (#9492 ) * Don't clamp if dim_value is zero as that will set `step` to an invalid value.	2021-10-25 17:39:01 +10:00
Ginés Hidalgo	dbe1b57a71	Update thread_utils.cc	2021-10-22 16:59:09 -07:00
Ginés Hidalgo	a79d375d24	Added fixes for Clang on Win64	2021-10-22 16:59:09 -07:00
Ginés Hidalgo	9335cf102a	Deleted duplicated "core/graph/function.h" "core/graph/function.h" appears twice: - `include/onnxruntime/core/graph/function.h` - `onnxruntime/core/graph/function.h` --> This one is redundant and not used anywhere	2021-10-22 16:58:29 -07:00
Stella Stamenova	d608504438	Don't use legacy mode for protobuf (#9498 )	2021-10-22 16:50:29 -07:00

1 2 3 4 5 ...

5767 commits