onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-18 21:21:17 +00:00

Author	SHA1	Message	Date
pengwa	a2dc3e9eac	Improve the compilation speed when compiling for multiple architectures. (#12490 ) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments	2022-08-09 11:52:26 +08:00
Scott McKay	56bd96a3f5	Incrementally free initializers while saving to OrtValue instances (#12485 ) * Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage. Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-08-09 10:59:10 +10:00
Hector Li	730240d2a5	remove the link the comments (#12510 )	2022-08-08 15:20:40 -07:00
Adam Pocock	8a86b346a5	[Java] JNI refactor for ONNX Tensor (#12281 ) Working on JNI refactor for OnnxTensor. Simplifying the error handling logic in createTensor. Collapsing casting branches and migrating to ONNX element type enum. Disable cpplint for JNI C files.	2022-08-08 12:48:30 -07:00
Jian Chen	8c5c283471	new quantized operators split (#12495 ) * adding conditional variable again * Adding split test cases in python * Adding python cases for split * Enable s8s8 split * Optimize input * Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)" This reverts commit `d5e34acb` * Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"" This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad. * format file * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Reformat file * Reformat file * format file * Optimize input * Remove unused import * Remove useless init * Format split.py with black	2022-08-08 15:12:09 -04:00
cloudhan	9c05577021	Fix various warning in kernel explorer (#12501 ) Fix various warning	2022-08-08 11:15:41 -07:00
Yufeng Li	bdd6b00c9a	set zero point to 0 if all value are 0.0 (#12470 ) * set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal	2022-08-07 21:34:58 -07:00
cloudhan	ddea1e48df	Avoid false-positive dependent name lookup error by not depending on auto keyword (#12483 ) * Workaround false positive error produced by clang ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name" where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error here.	2022-08-08 10:32:01 +08:00
Dwayne Robinson	eb90b52a75	DML EP fix training build error (#12461 ) Fix onnxruntime_training.cmake missing linkage issue	2022-08-05 16:01:25 -07:00
Vincent Wang	e85e31ee80	Update ORTModule Default Opset Version to 15 (#12419 ) * update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer	2022-08-05 16:55:04 +08:00
Baiju Meswani	a7d6290774	CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412 )	2022-08-04 22:28:28 -07:00
PeixuanZuo	3e1b0ac4b3	[DELETE] delete python package rocm4.3.1 (#12480 ) [delete] delete rocm4.3.1	2022-08-05 13:27:42 +08:00
ytaous	b879dca51c	Fix Python Packaging CI (Rocm) (#12477 ) Fix Python Packaging CI Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-04 20:40:09 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
cloudhan	f39354d7cb	Add composable kernel GEMM baseline for kernel explorer (#12364 ) * Split GemmBase RocBlasGemm * Add composable kernel GEMM baseline * Make linter happy * Address review comment * Update bert cases with batchsize * Adjust includes to fix IWYU lint * Only builds and links used ck kernels to improve building time * Remove warmup run on SelectImpl * Add comment to utility function * Mute cpplint * Make RocBlasGemm<T>::SelectImpl semantically correct * Add reduced basic test cases for ck gemm * More robust gemm testing * Fix warnings * Fix grammar	2022-08-04 17:32:20 -07:00
Vincent Wang	37995a7245	[CUDA] BiasSoftmax Supporting New Pattern (#12361 )	2022-08-05 06:59:24 +08:00
LironKesem	d452462b5e	Lironkesem/unsqueeze_and_squeeze (#12421 )	2022-08-04 15:12:34 -04:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Yufeng Li	ac10f33d2d	Enable quant op to share quantization parameter between input and ouput (#12408 ) * share quant param between tensors	2022-08-03 21:25:35 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Edward Chen	3efd9a73bb	Refactor InferenceSession Load member functions. (#12430 ) Fix comparison of path characters when checking for ".ort" suffix. Some clean up of InferenceSession Load functions. - Reduce duplication between std::string/std::wstring versions. - Renaming for clarity.	2022-08-03 16:28:26 -07:00
Ashwini Khade	97268e023c	dev notes for layout transformer (#12396 ) * first draft * plus fixes * plus more links * Plus updates per review * plus more clarifications * plus updates * plus more nit fixes * plus some additions	2022-08-03 15:15:59 -07:00
Scott McKay	a3de1bbf7d	Update script to find optimizers that potentially need supported opset updates (#12330 ) * Update to handle multiline declarations for the kernels which are typical these days. * Update to new path for the cpu contrib_op kernel registrations. * Update tools/python/find_optimizer_opset_version_updates_required.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2022-08-04 07:37:27 +10:00
Xinya Zhang	77cab7a3a5	[ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops (#11968 ) * [ROCm] disable expected failure tests PoolTest.MaxPool_10_DilationPadding_?d * [ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops * (To squash after review) Replace rocm/nn/pool.cc with amd_hipify.py changes * [ROCM] Replace miCompat with Helper functions * (to squash) fix the compiling error of SetPoolingNdDescriptorHelper	2022-08-03 14:36:36 -07:00
Erick Muñoz	d1497bdf62	[oneDNN EP] Optimized DynamicQuantizeLinear operator (#12403 ) * Removed unnecesary reorders * Removed unnecesary element wise clip	2022-08-03 12:36:42 -07:00
Baiju Meswani	7f58bd7236	Perform graph transformations during offline tooling (#12422 )	2022-08-03 11:27:12 -07:00
Dmitri Smirnov	dc984a03d5	Container and memory allocation guidelines (#12387 ) Container and memory allocation guidelines Re-org and add code samples Clarify the wording on returning gsl::span	2022-08-03 10:31:59 -07:00
Tianlei Wu	97a340bf48	Fix integer overflow in LongformerAttention (#12435 ) fix integer overflow	2022-08-03 10:29:07 -07:00
Changming Sun	44ec2cf088	Update publish-python-apidocs.yml (#12433 )	2022-08-03 10:17:00 -07:00
Ye Wang	b622e5fa9b	Support vocab_mask/prefix_vocab_mask/no_repeat_number in greedysearch op (#12327 ) * support more inputs for greedy search * fix docs * refactor test * lint * review comments	2022-08-03 10:10:08 -07:00
Xinya Zhang	01f3a197d7	[ROCm] InstanceNormalization, BatchNormalization and LRN Ops (#11972 ) * [ROCm] Add InstanceNormalization Op * Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm * [ROCm] Add BatchNormalization for fp32 and fp16 * Enable BatchNormTest for ROCm * [ROCm] Add LRN Op * [ROCM] replace miCompat functions with Helper functions	2022-08-02 23:14:26 -07:00
Vincent Wang	99d2a63e1a	Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432 ) add seed	2022-08-03 13:29:30 +08:00
George Nash	26dc09417b	[oneDNN ep] matmulinteger postop fusion (#12354 ) * MatMulInteger + post op fusion This fuses MatMulInteger with upto 32 binary/elementwise operators if running on the oneDNN execution provider. Signed-off-by: George Nash <george.nash@intel.com> * Remove the un-needed transformer The MatMulIntegerToFloat transformer is not needed since the transform done is handled by the MatMulIntegerBinaryEltwise transformer code. Signed-off-by: George Nash <george.nash@intel.com> * Refactor of the post op trasformer code This separates the code that finds the post op nodes for MatMul and MatMulInteger to reduce code repetition. Signed-off-by: George Nash <george.nash@intel.com> * Minor cleanup based on cpplint resolved unused-variable build failure Signed-off-by: George Nash <george.nash@intel.com>	2022-08-02 20:42:34 -07:00
Changming Sun	5d610bc8eb	Disable CG task in PR pipelines (#12426 )	2022-08-02 19:01:41 -07:00
Yulong Wang	feed5da435	[js] loosen test timeout (#12427 ) Losen the following test timeout: 1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min 2. Node.js binding default per-case timeout: 30 sec -> 90 sec	2022-08-02 19:01:19 -07:00
smrkatte	54d5e86981	Add cast before copy for dissimilar scalar type (#12391 ) * Add proper cast/copy callflow for ORT and non-ORT devices	2022-08-02 18:32:58 -07:00
Yulong Wang	c9e0d0f8b6	[js/node] upgrade terser version (#12351 )	2022-08-02 15:50:44 -07:00
Changming Sun	1a64b94f60	Fix a small issue in nuget packaging pipeline (#12405 ) In #12358 I typed a wrong path in the yaml file.	2022-08-02 15:44:43 -07:00
Dmitri Smirnov	eebaf5f270	Adjust and fixx abseil-cpp debugging visualization (#12415 ) Move abseil-cpp.natvis file, add it to PDB, adjust visualization	2022-08-02 15:08:17 -07:00
shalvamist	ca6b4221fe	[js] Bug fix - permission issue with ensureSymlinkSync (#12369 ) using ensureSymlinkSync might have issues with permissions when using 'dir' - changed to 'junction' to avoid this. If the folder generation fails it will cause the test to fails as well.	2022-08-02 12:21:31 -07:00
Chi Lo	b39257a5e6	Enable support of multi-level nested control flow ops model for TRT EP (#12147 ) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op	2022-08-01 23:57:30 -07:00
Chi Lo	de3a91d85d	Revert TRT EP cache refactoring (#12376 ) * revert cache refactor * fix conflicts when reverting	2022-08-01 23:57:05 -07:00
Yi Zhang	5d1173fe68	Run IOS pipeline concurrently (#12400 ) split ios pipelines	2022-08-02 11:07:17 +08:00
Yi Zhang	63d64636f6	Add the comment linking to wiki (#12398 ) add the comment	2022-08-02 10:09:16 +08:00
LironKesem	315e006532	adding a comment on nll_loss_forward.output that can not be implemented (#12406 ) adding a comment on nll_loss_forward.output that can not be implemented	2022-08-01 19:12:35 -04:00
msftlincoln	62922f4c3c	Eager Mode generator: add comments, rename functions (#12385 ) * eager generator: add comments, rename functions * lint	2022-08-01 15:52:47 -04:00
Edward Chen	f77ab4fea6	Manually add optimization flag for Android Release builds. (#12390 ) With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration. More details here: https://github.com/android/ndk/issues/1740 Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21. This change is a workaround to manually add `-O3` for "Release" Android builds.	2022-08-01 12:49:03 -07:00
George Wu	6bb807ef74	add cuda compute 8.7 to Cmakelists.txt to support Nvidia Orin devices (#12377 ) * add cuda arch 8.7 to cmakelists.txt to support Nvidia Orin devices * add cuda version >= 11 check for orin support	2022-08-01 09:45:58 -07:00
Cheng	3f66297499	code clean (#12392 ) * code clean * mispelling fix	2022-08-01 14:12:35 +08:00
Valery Chernov	1a4868e5c4	[TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381 ) fix of build on Windows with ipp-crypto. cmake warnings fix Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-07-31 14:36:54 +02:00

1 2 3 4 5 ...

7169 commits