onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Adam Pocock	8a86b346a5	[Java] JNI refactor for ONNX Tensor (#12281 ) Working on JNI refactor for OnnxTensor. Simplifying the error handling logic in createTensor. Collapsing casting branches and migrating to ONNX element type enum. Disable cpplint for JNI C files.	2022-08-08 12:48:30 -07:00
Jian Chen	8c5c283471	new quantized operators split (#12495 ) * adding conditional variable again * Adding split test cases in python * Adding python cases for split * Enable s8s8 split * Optimize input * Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)" This reverts commit `d5e34acb` * Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"" This reverts commit 3c1a330dd3afeb55aa7eabb8ebea39b6deb37bad. * format file * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Reformat file * Reformat file * format file * Optimize input * Remove unused import * Remove useless init * Format split.py with black	2022-08-08 15:12:09 -04:00
cloudhan	9c05577021	Fix various warning in kernel explorer (#12501 ) Fix various warning	2022-08-08 11:15:41 -07:00
Yufeng Li	bdd6b00c9a	set zero point to 0 if all value are 0.0 (#12470 ) * set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal	2022-08-07 21:34:58 -07:00
cloudhan	ddea1e48df	Avoid false-positive dependent name lookup error by not depending on auto keyword (#12483 ) * Workaround false positive error produced by clang ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name" where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error here.	2022-08-08 10:32:01 +08:00
Dwayne Robinson	eb90b52a75	DML EP fix training build error (#12461 ) Fix onnxruntime_training.cmake missing linkage issue	2022-08-05 16:01:25 -07:00
Vincent Wang	e85e31ee80	Update ORTModule Default Opset Version to 15 (#12419 ) * update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer	2022-08-05 16:55:04 +08:00
Baiju Meswani	a7d6290774	CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412 )	2022-08-04 22:28:28 -07:00
PeixuanZuo	3e1b0ac4b3	[DELETE] delete python package rocm4.3.1 (#12480 ) [delete] delete rocm4.3.1	2022-08-05 13:27:42 +08:00
ytaous	b879dca51c	Fix Python Packaging CI (Rocm) (#12477 ) Fix Python Packaging CI Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-08-04 20:40:09 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
cloudhan	f39354d7cb	Add composable kernel GEMM baseline for kernel explorer (#12364 ) * Split GemmBase RocBlasGemm * Add composable kernel GEMM baseline * Make linter happy * Address review comment * Update bert cases with batchsize * Adjust includes to fix IWYU lint * Only builds and links used ck kernels to improve building time * Remove warmup run on SelectImpl * Add comment to utility function * Mute cpplint * Make RocBlasGemm<T>::SelectImpl semantically correct * Add reduced basic test cases for ck gemm * More robust gemm testing * Fix warnings * Fix grammar	2022-08-04 17:32:20 -07:00
Vincent Wang	37995a7245	[CUDA] BiasSoftmax Supporting New Pattern (#12361 )	2022-08-05 06:59:24 +08:00
LironKesem	d452462b5e	Lironkesem/unsqueeze_and_squeeze (#12421 )	2022-08-04 15:12:34 -04:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Yufeng Li	ac10f33d2d	Enable quant op to share quantization parameter between input and ouput (#12408 ) * share quant param between tensors	2022-08-03 21:25:35 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Edward Chen	3efd9a73bb	Refactor InferenceSession Load member functions. (#12430 ) Fix comparison of path characters when checking for ".ort" suffix. Some clean up of InferenceSession Load functions. - Reduce duplication between std::string/std::wstring versions. - Renaming for clarity.	2022-08-03 16:28:26 -07:00
Ashwini Khade	97268e023c	dev notes for layout transformer (#12396 ) * first draft * plus fixes * plus more links * Plus updates per review * plus more clarifications * plus updates * plus more nit fixes * plus some additions	2022-08-03 15:15:59 -07:00
Scott McKay	a3de1bbf7d	Update script to find optimizers that potentially need supported opset updates (#12330 ) * Update to handle multiline declarations for the kernels which are typical these days. * Update to new path for the cpu contrib_op kernel registrations. * Update tools/python/find_optimizer_opset_version_updates_required.py Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2022-08-04 07:37:27 +10:00
Xinya Zhang	77cab7a3a5	[ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops (#11968 ) * [ROCm] disable expected failure tests PoolTest.MaxPool_10_DilationPadding_?d * [ROCm] Add AveragePool, GlobalAveragePool, MaxPool, GlobalMaxPool Ops * (To squash after review) Replace rocm/nn/pool.cc with amd_hipify.py changes * [ROCM] Replace miCompat with Helper functions * (to squash) fix the compiling error of SetPoolingNdDescriptorHelper	2022-08-03 14:36:36 -07:00
Erick Muñoz	d1497bdf62	[oneDNN EP] Optimized DynamicQuantizeLinear operator (#12403 ) * Removed unnecesary reorders * Removed unnecesary element wise clip	2022-08-03 12:36:42 -07:00
Baiju Meswani	7f58bd7236	Perform graph transformations during offline tooling (#12422 )	2022-08-03 11:27:12 -07:00
Dmitri Smirnov	dc984a03d5	Container and memory allocation guidelines (#12387 ) Container and memory allocation guidelines Re-org and add code samples Clarify the wording on returning gsl::span	2022-08-03 10:31:59 -07:00
Tianlei Wu	97a340bf48	Fix integer overflow in LongformerAttention (#12435 ) fix integer overflow	2022-08-03 10:29:07 -07:00
Changming Sun	44ec2cf088	Update publish-python-apidocs.yml (#12433 )	2022-08-03 10:17:00 -07:00
Ye Wang	b622e5fa9b	Support vocab_mask/prefix_vocab_mask/no_repeat_number in greedysearch op (#12327 ) * support more inputs for greedy search * fix docs * refactor test * lint * review comments	2022-08-03 10:10:08 -07:00
Xinya Zhang	01f3a197d7	[ROCm] InstanceNormalization, BatchNormalization and LRN Ops (#11972 ) * [ROCm] Add InstanceNormalization Op * Enable InstanceNormBatch1_fp16 and InstanceNormBatch2_fp16 for ROCm * [ROCm] Add BatchNormalization for fp32 and fp16 * Enable BatchNormTest for ROCm * [ROCm] Add LRN Op * [ROCM] replace miCompat functions with Helper functions	2022-08-02 23:14:26 -07:00
Vincent Wang	99d2a63e1a	Set Fix Seed For SoftmaxCrossEntoryLoss Related UTs (#12432 ) add seed	2022-08-03 13:29:30 +08:00
George Nash	26dc09417b	[oneDNN ep] matmulinteger postop fusion (#12354 ) * MatMulInteger + post op fusion This fuses MatMulInteger with upto 32 binary/elementwise operators if running on the oneDNN execution provider. Signed-off-by: George Nash <george.nash@intel.com> * Remove the un-needed transformer The MatMulIntegerToFloat transformer is not needed since the transform done is handled by the MatMulIntegerBinaryEltwise transformer code. Signed-off-by: George Nash <george.nash@intel.com> * Refactor of the post op trasformer code This separates the code that finds the post op nodes for MatMul and MatMulInteger to reduce code repetition. Signed-off-by: George Nash <george.nash@intel.com> * Minor cleanup based on cpplint resolved unused-variable build failure Signed-off-by: George Nash <george.nash@intel.com>	2022-08-02 20:42:34 -07:00
Changming Sun	5d610bc8eb	Disable CG task in PR pipelines (#12426 )	2022-08-02 19:01:41 -07:00
Yulong Wang	feed5da435	[js] loosen test timeout (#12427 ) Losen the following test timeout: 1. "Test Web Multi-Browsers" stage in "ONNX Runtime Web CI Pipeline": 30min -> 60min 2. Node.js binding default per-case timeout: 30 sec -> 90 sec	2022-08-02 19:01:19 -07:00
smrkatte	54d5e86981	Add cast before copy for dissimilar scalar type (#12391 ) * Add proper cast/copy callflow for ORT and non-ORT devices	2022-08-02 18:32:58 -07:00
Yulong Wang	c9e0d0f8b6	[js/node] upgrade terser version (#12351 )	2022-08-02 15:50:44 -07:00
Changming Sun	1a64b94f60	Fix a small issue in nuget packaging pipeline (#12405 ) In #12358 I typed a wrong path in the yaml file.	2022-08-02 15:44:43 -07:00
Dmitri Smirnov	eebaf5f270	Adjust and fixx abseil-cpp debugging visualization (#12415 ) Move abseil-cpp.natvis file, add it to PDB, adjust visualization	2022-08-02 15:08:17 -07:00
shalvamist	ca6b4221fe	[js] Bug fix - permission issue with ensureSymlinkSync (#12369 ) using ensureSymlinkSync might have issues with permissions when using 'dir' - changed to 'junction' to avoid this. If the folder generation fails it will cause the test to fails as well.	2022-08-02 12:21:31 -07:00
Chi Lo	b39257a5e6	Enable support of multi-level nested control flow ops model for TRT EP (#12147 ) * Make multiple-level nested control flow op model work * find correct input index * find correct input index (cont.) * enable nested layer unit tests for TRT EP * add comment * add Scan op to current workaround support of control flow op	2022-08-01 23:57:30 -07:00
Chi Lo	de3a91d85d	Revert TRT EP cache refactoring (#12376 ) * revert cache refactor * fix conflicts when reverting	2022-08-01 23:57:05 -07:00
Yi Zhang	5d1173fe68	Run IOS pipeline concurrently (#12400 ) split ios pipelines	2022-08-02 11:07:17 +08:00
Yi Zhang	63d64636f6	Add the comment linking to wiki (#12398 ) add the comment	2022-08-02 10:09:16 +08:00
LironKesem	315e006532	adding a comment on nll_loss_forward.output that can not be implemented (#12406 ) adding a comment on nll_loss_forward.output that can not be implemented	2022-08-01 19:12:35 -04:00
msftlincoln	62922f4c3c	Eager Mode generator: add comments, rename functions (#12385 ) * eager generator: add comments, rename functions * lint	2022-08-01 15:52:47 -04:00
Edward Chen	f77ab4fea6	Manually add optimization flag for Android Release builds. (#12390 ) With recent versions of NDK (since 23), the `-O` optimization level compile flag is not being passed when building in the "Release" configuration. More details here: https://github.com/android/ndk/issues/1740 Our "Release" Android builds have been built without the optimization flag since we upgraded from NDK 21. This change is a workaround to manually add `-O3` for "Release" Android builds.	2022-08-01 12:49:03 -07:00
George Wu	6bb807ef74	add cuda compute 8.7 to Cmakelists.txt to support Nvidia Orin devices (#12377 ) * add cuda arch 8.7 to cmakelists.txt to support Nvidia Orin devices * add cuda version >= 11 check for orin support	2022-08-01 09:45:58 -07:00
Cheng	3f66297499	code clean (#12392 ) * code clean * mispelling fix	2022-08-01 14:12:35 +08:00
Valery Chernov	1a4868e5c4	[TVM EP] Hot fix of build on Windows of TVM EP with ipp-crypto (#12381 ) fix of build on Windows with ipp-crypto. cmake warnings fix Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-07-31 14:36:54 +02:00
Yi Zhang	8b4ad77ea2	pipeline can use last run's artifacts (#12379 ) * first step * depends on stage * temp change * specific * runId * parameters * fix typo * fix typo * add nnapi * add nnapi * fix typo * minor fix * condition on stage * format * format	2022-07-30 21:34:57 +08:00
pengwa	6d1eb9509e	Refine gradient accumulation (on device training) (#12363 ) * a (cherry picked from commit 43909cdd6e3daf30a82d584292286806d1172a0b) * optimize inplace accumulator a bit * fix inputs * revert logging * minor fix * tune perf and resolve comments * typo * fix * fix tests * move threshold to constexpr.	2022-07-30 10:24:01 +08:00
Changming Sun	7b4ce0c1e1	Delete the build scripts that were copied from manylinux project (#12358 ) 1. Delete the build scripts that were copied from manylinux project. Use "git checkout" instead. 2. Update manylinux version to get python 3.11. Related issue: Python 3.11 support #12343 3. Change the cuda version of linux gpu build job of nuget packaging pipeline from cuda 11.4 to cuda 11.6 to match the TRT job within the same pipeline.. (A lot other places need be updated as well, but I'd prefer to put them in another PR) 4. Make dockerfile names static. For example, replace tools/ci_build/github/linux/docker/$(DockerFile) to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cpu . The former one relies on a runtime variable $(DockerFile), Template Parameters are expanded early in processing a pipeline run when most variables are not available. It like C++ macros vs variables.	2022-07-29 18:24:19 -07:00

1 2 3 4 5 ...

7166 commits