onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

Author	SHA1	Message	Date
Scott McKay	b59ccbc75b	Add big endian support to murmurhash3 (#12549 )	2022-08-11 18:39:39 +10:00
Vincent Wang	018fba9b74	Fix Compile Warning (#12552 ) * fix warning * more fix	2022-08-11 16:00:35 +08:00
Changming Sun	ac7538b909	Remove CUDA 10.2 support (#12541 )	2022-08-10 22:46:41 -07:00
Cheng	819c36701f	[xnnpack] basic QDQ operators support (#11912 ) * basic ops for mobilenet,qconv,qsoftmax,qavgpool update Xnnpack to latest unit test * NodeUnit: use outputedge to replace output-node * qdq model e2e test * use inlinedvector to replace vector * conv bias check * tensorshape helpers * Refactor xnn_op minmax * Qlinearsoftmax schema update * Remove qlinearsoftmax registration Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-08-11 10:12:51 +08:00
Baiju Meswani	3e78f3cf1f	Add win-ci pipeline for on-device training (#12513 )	2022-08-10 14:45:39 -07:00
Chen Fu	b2382dc43a	fix qdq relu removal bug (#12542 ) Fix minor bug in qdq quantization tool Motivation and Context Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input	2022-08-10 14:06:51 -07:00
Dmitri Smirnov	c10704a501	Use alignas instead of naive padding to avoid false cache sharing (#12514 ) PerThread and ChildThreadStat alignas	2022-08-10 11:23:20 -07:00
Kevin Chen	25032f1756	Add default to TRT datatype switch statement (#12533 ) Signed-off-by: Kevin Chen <kevinch@nvidia.com> Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2022-08-10 09:12:08 -07:00
Changming Sun	c0d396d176	Restrict "Component Detection" task to Lotus project only (#12536 ) It is related to PR #12426	2022-08-10 03:25:29 -07:00
Changming Sun	e810480403	Replace the occurrences of "master" to "main" in yaml files (#12534 )	2022-08-09 22:03:21 -07:00
Cheng	64e991a9fc	[Qlinearsoftmax] contrib cpu (#12177 ) * [Qlinearsoftmax] contrib cpu * int8 implementation * contrib operator md * qdq transformer test * new attribute: opset * doc * quantized tool * remove template to reduce Binary size * doc of contribe operators * enforce x_shape is valid * fix reduce_size if input-shape is dynamic * add UT * register one op for reducing binarysize * kernel hash update * docs/ContribOperators.md	2022-08-10 10:52:02 +08:00
Vincent Wang	0c6037b5ab	Bugfix for BiasSoftmax Fusion (#12517 )	2022-08-10 07:20:13 +08:00
msftlincoln	0d9a02e647	Eager Mode - Support Concatenation via aten::cat.out (#12527 ) * support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims	2022-08-09 17:16:18 -04:00
Chen Fu	47b787c28f	Python module for dumping activation tensors when running an ONNX model (#12474 ) Python module for dumping activation tensors when running an ONNX model This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.	2022-08-09 13:15:45 -07:00
Adam Louly	2681648f5b	Load checkpoint in cpp (#12352 ) * Load checkpoint in cpp * removed unused imports * throw error on invalid name and change function name * inplace model assignment, change name and other comments resolved * name change on import * Addded unit test, resolved comments * remove unused imports * resolved comments * refactoring too reduce memoory allocation * resolved extra comments * changed files hierarchy an force added onnx moodel * solved order of function argument * used gtest macros on test cases Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-09 12:30:50 -07:00
Faith Xu	ee3b757492	Add codeowners for requirement files (#12512 ) * Add Codeowners for dependency files * Fix team @s	2022-08-09 09:46:47 -07:00
Vincent Wang	2bed0d4abb	[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482 ) * sce refactor * refactor * remove usnecessory memset	2022-08-09 16:48:44 +08:00
Vincent Wang	cfa09d16d9	[CUDA] Mod Op Kernel (#12499 ) * mod for cuda and rocm * fix bfloat16 ut * change bf16 ut number * fix opset version * fix op kernel doc	2022-08-09 13:05:40 +08:00
pengwa	a2dc3e9eac	Improve the compilation speed when compiling for multiple architectures. (#12490 ) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments	2022-08-09 11:52:26 +08:00
Scott McKay	56bd96a3f5	Incrementally free initializers while saving to OrtValue instances (#12485 ) * Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage. Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-08-09 10:59:10 +10:00
Hector Li	730240d2a5	remove the link the comments (#12510 )	2022-08-08 15:20:40 -07:00
Adam Pocock	8a86b346a5	[Java] JNI refactor for ONNX Tensor (#12281 ) Working on JNI refactor for OnnxTensor. Simplifying the error handling logic in createTensor. Collapsing casting branches and migrating to ONNX element type enum. Disable cpplint for JNI C files.	2022-08-08 12:48:30 -07:00
Jian Chen	8c5c283471	new quantized operators split (#12495 ) * adding conditional variable again * Adding split test cases in python * Adding python cases for split * Enable s8s8 split * Optimize input * Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)" This reverts commit `d5e34acb` * Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"" This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad. * format file * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Reformat file * Reformat file * format file * Optimize input * Remove unused import * Remove useless init * Format split.py with black	2022-08-08 15:12:09 -04:00
cloudhan	9c05577021	Fix various warning in kernel explorer (#12501 ) Fix various warning	2022-08-08 11:15:41 -07:00
Yufeng Li	bdd6b00c9a	set zero point to 0 if all value are 0.0 (#12470 ) * set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal	2022-08-07 21:34:58 -07:00
cloudhan	ddea1e48df	Avoid false-positive dependent name lookup error by not depending on auto keyword (#12483 ) * Workaround false positive error produced by clang ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name" where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error here.	2022-08-08 10:32:01 +08:00
Dwayne Robinson	eb90b52a75	DML EP fix training build error (#12461 ) Fix onnxruntime_training.cmake missing linkage issue	2022-08-05 16:01:25 -07:00
Vincent Wang	e85e31ee80	Update ORTModule Default Opset Version to 15 (#12419 ) * update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer	2022-08-05 16:55:04 +08:00
Baiju Meswani	a7d6290774	CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412 )	2022-08-04 22:28:28 -07:00
PeixuanZuo	3e1b0ac4b3	[DELETE] delete python package rocm4.3.1 (#12480 ) [delete] delete rocm4.3.1	2022-08-05 13:27:42 +08:00
ytaous	b879dca51c	Fix Python Packaging CI (Rocm) (#12477 ) Fix Python Packaging CI Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-04 20:40:09 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
cloudhan	f39354d7cb	Add composable kernel GEMM baseline for kernel explorer (#12364 ) * Split GemmBase RocBlasGemm * Add composable kernel GEMM baseline * Make linter happy * Address review comment * Update bert cases with batchsize * Adjust includes to fix IWYU lint * Only builds and links used ck kernels to improve building time * Remove warmup run on SelectImpl * Add comment to utility function * Mute cpplint * Make RocBlasGemm<T>::SelectImpl semantically correct * Add reduced basic test cases for ck gemm * More robust gemm testing * Fix warnings * Fix grammar	2022-08-04 17:32:20 -07:00
Vincent Wang	37995a7245	[CUDA] BiasSoftmax Supporting New Pattern (#12361 )	2022-08-05 06:59:24 +08:00
LironKesem	d452462b5e	Lironkesem/unsqueeze_and_squeeze (#12421 )	2022-08-04 15:12:34 -04:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Yufeng Li	ac10f33d2d	Enable quant op to share quantization parameter between input and ouput (#12408 ) * share quant param between tensors	2022-08-03 21:25:35 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Edward Chen	3efd9a73bb	Refactor InferenceSession Load member functions. (#12430 ) Fix comparison of path characters when checking for ".ort" suffix. Some clean up of InferenceSession Load functions. - Reduce duplication between std::string/std::wstring versions. - Renaming for clarity.	2022-08-03 16:28:26 -07:00
Ashwini Khade	97268e023c	dev notes for layout transformer (#12396 ) * first draft * plus fixes * plus more links * Plus updates per review * plus more clarifications * plus updates * plus more nit fixes * plus some additions	2022-08-03 15:15:59 -07:00
Scott McKay	a3de1bbf7d	Update script to find optimizers that potentially need supported opset updates (#12330 ) * Update to handle multiline declarations for the kernels which are typical these days. * Update to new path for the cpu contrib_op kernel registrations. * Update tools/python/find_optimizer_opset_version_updates_required.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2022-08-04 07:37:27 +10:00
Xinya Zhang	77cab7a3a5	[ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops (#11968 ) * [ROCm] disable expected failure tests PoolTest.MaxPool_10_DilationPadding_?d * [ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops * (To squash after review) Replace rocm/nn/pool.cc with amd_hipify.py changes * [ROCM] Replace miCompat with Helper functions * (to squash) fix the compiling error of SetPoolingNdDescriptorHelper	2022-08-03 14:36:36 -07:00
Erick Muñoz	d1497bdf62	[oneDNN EP] Optimized DynamicQuantizeLinear operator (#12403 ) * Removed unnecesary reorders * Removed unnecesary element wise clip	2022-08-03 12:36:42 -07:00
Baiju Meswani	7f58bd7236	Perform graph transformations during offline tooling (#12422 )	2022-08-03 11:27:12 -07:00
Dmitri Smirnov	dc984a03d5	Container and memory allocation guidelines (#12387 ) Container and memory allocation guidelines Re-org and add code samples Clarify the wording on returning gsl::span	2022-08-03 10:31:59 -07:00
Tianlei Wu	97a340bf48	Fix integer overflow in LongformerAttention (#12435 ) fix integer overflow	2022-08-03 10:29:07 -07:00
Changming Sun	44ec2cf088	Update publish-python-apidocs.yml (#12433 )	2022-08-03 10:17:00 -07:00
Ye Wang	b622e5fa9b	Support vocab_mask/prefix_vocab_mask/no_repeat_number in greedysearch op (#12327 ) * support more inputs for greedy search * fix docs * refactor test * lint * review comments	2022-08-03 10:10:08 -07:00
Xinya Zhang	01f3a197d7	[ROCm] InstanceNormalization, BatchNormalization and LRN Ops (#11972 ) * [ROCm] Add InstanceNormalization Op * Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm * [ROCm] Add BatchNormalization for fp32 and fp16 * Enable BatchNormTest for ROCm * [ROCm] Add LRN Op * [ROCM] replace miCompat functions with Helper functions	2022-08-02 23:14:26 -07:00
Vincent Wang	99d2a63e1a	Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432 ) add seed	2022-08-03 13:29:30 +08:00

1 2 3 4 5 ...

7187 commits