onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-11 17:48:34 +00:00

Author	SHA1	Message	Date
Yulong Wang	bfdd191eec	[wasm] use same export name for SIMD/NOSIMD build (#12545 )	2022-08-19 18:17:50 -07:00
Dwayne Robinson	aa85092b51	DML EP squeeze all axes when empty (#12649 ) DML EP squeeze empty axes	2022-08-19 08:56:03 -07:00
Changming Sun	b270334e1e	Update numpy version from 1.21.0 to 1.21.6 to avoid building it from source (#12644 )	2022-08-18 22:11:48 -07:00
Chen Fu	56dd0176a1	QDQ debugger - Adding Error Calculator (#12632 ) QDQ debugger - Adding Error Calculator	2022-08-18 09:30:43 -07:00
Cheng	81b128b5e9	Qlinearsoftmax take FLOAT lookup-table (#12574 ) * [loopuptable] float-type * typed y-scale * round to nearest even	2022-08-18 09:54:39 +08:00
Erick Muñoz	82b724fa5e	[oneDNN] Improve DequantizeLinear operator performance. (#12611 ) * Detect when ZeroPoint = 0 and avoid sub op. * Added tests to verify constant initializer behaviour.	2022-08-17 12:31:10 -07:00
Thiago Crepaldi	d1ba801570	Add BuildError for --gen_doc and --enable_training (#12630 )	2022-08-17 14:18:37 -04:00
Dmitri Smirnov	9481893b58	Replace to lock_guard as lighter class for locking (#12616 ) Replace to lock_guard as lighter class	2022-08-17 11:08:31 -07:00
Chen Fu	f2db6bb293	weight matching (#12607 ) QDQ loss debug - Weights Matching Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.	2022-08-17 11:01:10 -07:00
Haoming Chen	8a038b9b0c	Fix a build error (#12600 ) LLVM compiler complains the std::hash<const char> and suggests std::hash<const void>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.	2022-08-17 10:49:54 -07:00
Tianlei Wu	ce01ed02da	Improve LongformerAttention performance: AddBiasTranspose and New weight format (#12448 ) * add AddBiasTranspose kernel, new format of weights * Use compact global_q in GEMM * sequence_index from BxS to S; new stream for copy * merge input and output pointers in scratch2 * update default benchmark tests * add new format 0 for weight and bias * avoid integer overflow * check gpu memory * output summary in benchmark * add logging * update unit tests with non empty bias value * add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm	2022-08-17 09:36:48 -07:00
pengwa	7df2e8c5cc	Refactor with std::variant (on device training) (#12383 ) * use std::variant for synthetic data storage. * use std::variant to replace TypedCheckpointProperty * Remvoe shared ptr for checkpoint property * fix tests * refine std::variant usage a bit * remove CheckpointProperty data abstraction * use InlinedVector and InlinedHashMap if possible * fix comments * fix build and test * fix some comments * use gsl::span * fix tests * refine based on comments * fix win build * fix build	2022-08-17 08:31:23 +08:00
Edward Chen	caabfcd920	Replace references to onnxruntime 'master' with 'main' in Dockerfiles. (#12550 ) * Replace references to onnxruntime 'master' with 'main' in Dockerfiles. * update dockerfiles/README.md	2022-08-16 14:13:05 -07:00
yf711	9d10badc55	Add build option to link TensorRT prebuilt parser (#12602 ) * Add build option to link prebuilt TensorRT parser * Test without the build option to link prebuilt TRTParser * Minor: update name of build option * Minor: update name of build option	2022-08-16 14:09:58 -07:00
Adam Pocock	733db31420	[Java] JNI refactor for OrtSession (#12496 ) Refactor JNI error reporting	2022-08-16 13:43:06 -07:00
Chen Fu	eb6aa861cf	QDQ debugger - activations compare (#12544 ) Debugger for QDQ loss - activation matching This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.	2022-08-15 17:03:28 -07:00
Yufeng Li	30ee5a4f79	release calibrator before deleting temporary files (#12601 )	2022-08-15 16:03:46 -07:00
Maxiwell S. Garcia	19a9690885	ppc64le: fix MlasQLinearMulKernel's VSX code to work with inputs of 32 bits (#12441 )	2022-08-15 16:03:07 -07:00
Dmitri Smirnov	616677104a	ONNX Protobuf natvis with some google::protobuf (#12580 ) ONNX Protobuf natvis with some google::protobuf structures Add leading underscore to local Intrinsic	2022-08-15 09:59:07 -07:00
Baiju Meswani	f5e3517c39	Add Learning Rate Scheduler C API (#11957 )	2022-08-15 09:10:25 -07:00
Kevin Chen	73da3f3705	Add TRT uint8 support (#12570 ) * uint8 support Signed-off-by: Kevin Chen <kevinch@nvidia.com> * Handle outputs as well Signed-off-by: Kevin Chen <kevinch@nvidia.com> Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2022-08-15 08:22:50 -07:00
Yulong Wang	95f2a3e7e0	[js/web] update branch name for pull:wasm (#12548 ) * [js/web] update branch name for pull:wasm * revise message	2022-08-12 15:46:36 -07:00
Nat Kershaw (MSFT)	cc9b3e1c37	Automate generation of javadocs and create PR with changes (#12515 )	2022-08-12 12:03:38 -07:00
Scott McKay	0b0c51e028	Support direct usage of ORT format model flatbuffer for initializers (#12465 ) * Add ability to use ORT format model flatbuffer directly for intiializers by leveraging the TensorProto external data infrastructure. Requires user to provide ORT format model bytes when creating the session, and set both `session.use_ort_model_bytes_directly` and `session.use_ort_model_bytes_for_initializers` to 1 in SessionOptions config entries (AddSessionConfigEntry in C API).	2022-08-12 18:31:43 +10:00
Xinya Zhang	bc353c7afe	Add FusedConv Op to ROCm (#11792 ) * [ROCm] Add FusedConv Op. * Enable ROCm for FusedConvTest * [ROCm] Implement FusedConv Op. with Fusion API The old code path was left as the fallback since some combinations are not supported (e.g., FusedConvTest.Conv2D_Bias_Z_Relu as of ROCM 5.1, where to bias layers are needed). * [ROCM] Suppress duplicated warnings in unsupported Fusion API usage. Know limitation for current MIOpen (verified with ROCM 5.2): Only one bias layer may present in the Fusion Plan. Adding the second bias operation to the Fusion plan will end up with miopenStatusUnsupportedOp. In this case the fallback code path will be taken to complete required FusedConv operation. However, previously this failure was not detected and cached, and applications that create multiple FusedConv Ops with both z and bias will keep printing error messages, which is annoying to end users while this message is mainly for developers. This commit will let it print the first error message as a reminder, and skip the Fusion API code path in following calls if both z and bias present. (Note: the skipping applies to all newly created FusedConv Ops). * [ROCM] Add cache mechanism for FusedConv Op. Now the operator with the same configuration will share the same Fusion Plan object, and the creation result will also be cached. Two benefits: 1. No duplicated Fusion plan creation, which is a presumably very costly process. 2. Failures due to MIOpen limitations (like z and b cannot present at the same time) will only be triggered once. Know limits: Due to the limitation of MIOpen Interface, the tensor order of the convolution operator can only be guessed.	2022-08-11 23:04:01 -07:00
Xinya Zhang	eb827bd3e5	[ROCm] NGramRepeatBlock, LongformerAttention and DecoderAttention Ops (#11971 ) * [ROCm] enable NGramRepeatBlock Op * [ROCm] Enable testing ROCm in NGramRepeatBlockTest.NGramSize_3 Also link onnxruntime_test_all with amdhip64 when USE_ROCM=1 * [ROCm] add LongformerAttention Op * [ROCm] Enable LongformerAttentionTest * [ROCm] Add DecoderAttention Op * Enable DecoderAttention Test for ROCm. * [ROCM] Updates according to reviews	2022-08-11 19:32:08 -07:00
Yufeng Li	95df5dac51	do not quantize Relu/Clip if their inputs are not quantized (#12565 )	2022-08-11 16:16:10 -07:00
Sheil Kumar	67f6b7ce29	DirectML GEMM broken in opset 11 and 13 when optional tensor C not provided (#12568 ) Set kernel input indices to be fixed to 0,1,2. C input is now optional, so last tensor must be specified.	2022-08-11 16:01:27 -07:00
Jian Chen	580f2294bc	Adding w_zero_point to conv_integer_test.cc (#12423 ) * Adding w_zero_point to conv_integer_test.cc * Reformatting code	2022-08-11 17:40:26 -04:00
Wil Brady	3d009cdde3	Updating binary ops in eager mode to support broadcasting. (#12560 ) * Updating binary ops in eager mode to support broadcasting.	2022-08-11 17:00:12 -04:00
pengwa	24eab921be	Enable PythonOp for --enable_training_torch_interop build (#12539 ) * enable PythonOp by default when --enable_training_torch_interop is enabled during build * clean up * fix * fix comment * fix * fix tests * fix fallback test * pylint format * refine based on comments	2022-08-12 00:49:30 +08:00
Scott McKay	b59ccbc75b	Add big endian support to murmurhash3 (#12549 )	2022-08-11 18:39:39 +10:00
Vincent Wang	018fba9b74	Fix Compile Warning (#12552 ) * fix warning * more fix	2022-08-11 16:00:35 +08:00
Changming Sun	ac7538b909	Remove CUDA 10.2 support (#12541 )	2022-08-10 22:46:41 -07:00
Cheng	819c36701f	[xnnpack] basic QDQ operators support (#11912 ) * basic ops for mobilenet,qconv,qsoftmax,qavgpool update Xnnpack to latest unit test * NodeUnit: use outputedge to replace output-node * qdq model e2e test * use inlinedvector to replace vector * conv bias check * tensorshape helpers * Refactor xnn_op minmax * Qlinearsoftmax schema update * Remove qlinearsoftmax registration Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-08-11 10:12:51 +08:00
Baiju Meswani	3e78f3cf1f	Add win-ci pipeline for on-device training (#12513 )	2022-08-10 14:45:39 -07:00
Chen Fu	b2382dc43a	fix qdq relu removal bug (#12542 ) Fix minor bug in qdq quantization tool Motivation and Context Relu node is removed in qdq quantization tool if it can be merged to its input node. When performing the removal, we forgot to check whether the input is actually the graph input	2022-08-10 14:06:51 -07:00
Dmitri Smirnov	c10704a501	Use alignas instead of naive padding to avoid false cache sharing (#12514 ) PerThread and ChildThreadStat alignas	2022-08-10 11:23:20 -07:00
Kevin Chen	25032f1756	Add default to TRT datatype switch statement (#12533 ) Signed-off-by: Kevin Chen <kevinch@nvidia.com> Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2022-08-10 09:12:08 -07:00
Changming Sun	c0d396d176	Restrict "Component Detection" task to Lotus project only (#12536 ) It is related to PR #12426	2022-08-10 03:25:29 -07:00
Changming Sun	e810480403	Replace the occurrences of "master" to "main" in yaml files (#12534 )	2022-08-09 22:03:21 -07:00
Cheng	64e991a9fc	[Qlinearsoftmax] contrib cpu (#12177 ) * [Qlinearsoftmax] contrib cpu * int8 implementation * contrib operator md * qdq transformer test * new attribute: opset * doc * quantized tool * remove template to reduce Binary size * doc of contribe operators * enforce x_shape is valid * fix reduce_size if input-shape is dynamic * add UT * register one op for reducing binarysize * kernel hash update * docs/ContribOperators.md	2022-08-10 10:52:02 +08:00
Vincent Wang	0c6037b5ab	Bugfix for BiasSoftmax Fusion (#12517 )	2022-08-10 07:20:13 +08:00
msftlincoln	0d9a02e647	Eager Mode - Support Concatenation via aten::cat.out (#12527 ) * support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims	2022-08-09 17:16:18 -04:00
Chen Fu	47b787c28f	Python module for dumping activation tensors when running an ONNX model (#12474 ) Python module for dumping activation tensors when running an ONNX model This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.	2022-08-09 13:15:45 -07:00
Adam Louly	2681648f5b	Load checkpoint in cpp (#12352 ) * Load checkpoint in cpp * removed unused imports * throw error on invalid name and change function name * inplace model assignment, change name and other comments resolved * name change on import * Addded unit test, resolved comments * remove unused imports * resolved comments * refactoring too reduce memoory allocation * resolved extra comments * changed files hierarchy an force added onnx moodel * solved order of function argument * used gtest macros on test cases Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-09 12:30:50 -07:00
Faith Xu	ee3b757492	Add codeowners for requirement files (#12512 ) * Add Codeowners for dependency files * Fix team @s	2022-08-09 09:46:47 -07:00
Vincent Wang	2bed0d4abb	[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482 ) * sce refactor * refactor * remove usnecessory memset	2022-08-09 16:48:44 +08:00
Vincent Wang	cfa09d16d9	[CUDA] Mod Op Kernel (#12499 ) * mod for cuda and rocm * fix bfloat16 ut * change bf16 ut number * fix opset version * fix op kernel doc	2022-08-09 13:05:40 +08:00
pengwa	a2dc3e9eac	Improve the compilation speed when compiling for multiple architectures. (#12490 ) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments	2022-08-09 11:52:26 +08:00

1 2 3 4 5 ...

7218 commits