onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-02 03:55:34 +00:00

Author	SHA1	Message	Date
Edward Chen	e3ff4a6bfa	Fix NNAPI EP error when handling external node adjacent to partition. (#11233 ) Move a check for a graph output (for the partition) prior to iterating the downstream nodes to avoid trying to get a NodeUnit for a node that is outside of the partition.	2022-04-20 08:53:29 -07:00
Zhang Lei	70d97bdf53	Support only one input in QLinearConcat (#11265 )	2022-04-19 20:55:51 -07:00
Yufeng Li	2e6c2177af	remove deprecated quantize api (#11263 )	2022-04-19 19:41:55 -07:00
Maxiwell	acb555c4c7	ppc64le: Optimizing the MlasMaximumPool() to use VSX instructions (#11216 ) It runs on Power8, Power9, and Power10	2022-04-19 15:13:55 -07:00
Tianlei Wu	bab9b80f1f	auto mixed precision for t5 (#11252 )	2022-04-19 12:42:11 -07:00
Yulong Wang	5ee8e2e491	[js] use NPM and yarn to upgrade package version (#11059 )	2022-04-19 12:28:13 -07:00
Vincent Wang	06026fe8e6	SizeInBytes Fix for Strided Tensor (#11224 ) * SizeInBytes Fix for Strided Tensor * resolve comments	2022-04-19 15:13:00 +08:00
Edward Chen	3dac66698b	Add option to specify onnxruntime repo URL in tools/android_custom_build/build_custom_android_package.py. (#11250 )	2022-04-18 19:29:41 -07:00
Lukas Berbuer	efb0928e2b	Fix find_package for benchmark	2022-04-18 15:25:43 -07:00
Dmitri Smirnov	98faaa7e2f	Scoped GIL release in run_with_iobinding (#11248 )	2022-04-18 13:07:45 -07:00
Yufeng Li	dec99657a1	Improve onnx shape inference in quant tool (#11106 ) onnx.shape_inference.infer_shapes only works for model size < 2GB, while onnx.shape_inference.infer_shapes_path works for all models. This PR replaces infer_shapes with infer_shapes_path.	2022-04-18 08:07:31 -07:00
pengwa	9765ef8b4e	fix build warnings (#11213 ) * fix build warning	2022-04-18 21:09:09 +08:00
Vincent Wang	0bad5b1b5a	[CUDA] Rollback TileMemcpy and TileBatchedMemcpy when Block Size is Small (#11187 )	2022-04-16 07:46:43 +08:00
George Nash	d9eeb48393	One dnn v2.6 update (#11220 ) * Disable training code in DNNL LayerNorm code The capability code already does not claim the LayerNorm and SkipLayerNorm that require more than one output. However, building with training enabled was causing issues. The training specific code has been removed even when building with training enabled. Signed-off-by: George Nash <george.nash@intel.com> * Fix for DNNL FusedMatMul op. The bug was in the transpose code. Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Use agreed upon memory format type when runnig Pooling Gradient in dnnl ep The dnnl ep does not currently have a way to pass memory_format information between the forward pooling primitive to the backward pooling primitive. This change explicitly sets the memory_format to use match that of Onnxruntime. For both the forward and backward pooling code. This will prevent using un-matched memory format that could result in an `unimplemented` error from dnnl ep. Signed-off-by: George Nash <george.nash@intel.com> * Update dnnl ep to use OneDNN v2.6 Do not run ReduceInfLogSum on the kDnnlExecutionProvider due to a calculation bug when doing Log or infinity valuse. The fix for this issue will be part of the next OneDNN release. Signed-off-by: George Nash <george.nash@intel.com> * Update PrintMemory function in dnnl ep This modification can be used to enable/disable memory printing for dnnl ep develpers. This is considered a developer only feature and is disabled by default. It must be enabled and code recompiled to use. Even if it is enabled it will not actually print any memory because the developer needs to take the extra step of spefifying the memory that will be printed to the screen. Signed-off-by: George Nash <george.nash@intel.com> * Update binary ops to run on intel GPU when using dnnl ep Binary ops (i.e. Add, Div, Mul, and Sub ) was updated to no longer call GetMemoryAndReshape in the past this would move the memory from CPU to the GPU. This extra call is no longer needed since it is taken care of by the GetMemoryInOrtFormat call. Removing the GetMemoryAndReshape prevented copying the memory to GPU twice. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>	2022-04-15 12:51:11 -07:00
sumitsays	227bc7264e	Fixed compilation error for ARM architecture (#11223 ) Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2022-04-15 09:24:21 -07:00
ytaous	bc296c706e	MatMulScaleFusion - handling scale input (#11121 ) * scale input * more condition check * alternative * per comments * fix comments Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-04-14 21:54:04 -07:00
Yi Zhang	94032357e2	use int storage (#11185 )	2022-04-15 09:56:36 +08:00
Ahmad Zakaria	63ff391b16	add AppendExecutionProvider_CUDA_V2 to the C++ api (#11153 )	2022-04-14 17:33:27 -07:00
chausner	c2b4054c74	Fix typos	2022-04-14 13:53:50 -07:00
stevenlix	5216a43c9d	Consolidate TensorRT subgraphs to reduce inference overhead (#11211 ) * add trt node list consolidation * add more log * fix typo * seperate cycle detection and removal * update * change function name Co-authored-by: Ubuntu <azureuser@orttrtlinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2022-04-14 11:05:27 -07:00
Faruk D	a00d24066a	Fix CITATION.cff and add automatic validation of your citation metadata (#10478 ) * Add cffconvert.yml to validate CITATION.cff * Fix CITATION.cff by removing duplicate title and correcting the license Co-authored-by: Abel Soares Siqueira <abel.s.siqueira@gmail.com>	2022-04-13 10:03:52 -07:00
Vincent Wang	9707181257	fix build error (#11199 )	2022-04-13 13:09:19 +08:00
Scott McKay	3b3b23bcf9	Add new python helper dirs to wheel. (#11196 )	2022-04-13 13:34:07 +10:00
Chen Fu	0d0edc071f	Detecting ARM64 CPU core micro-architectures in Windows (#11145 ) Some micro-architectures of power efficient cores in ARMv8 system have narrow 64b load/store resources, which require specialized computing kernels in MLAS. We leverage pytorch CPUinfo package for detecting these cores. Unfortunately CPUinfo package does not work on Windows. This commit implements ARM64 micro-architecture detection.	2022-04-12 16:47:11 -07:00
ashbhandare	ddb17294b2	Fix gradient builder for Cast (#11008 ) * fix grad builder for cast * reviw comments Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-04-12 16:08:21 -07:00
Gary Miguel	e84c338989	minor improvements to CONTRIBUTING doc (#11080 )	2022-04-12 15:22:34 -07:00
Faith Xu	5337972f92	Update to use teams instead of individual GH handles (#11163 ) * Update to use teams instead of individual GH handles * Fix typo * Update CODEOWNERS * Update CODEOWNERS * Update team name	2022-04-12 12:06:12 -07:00
Edward Chen	38e67e66a2	Add script and Dockerfile to build custom Android package (#11144 ) * Handle relative paths in --include_ops_by_config. * Add dockerfile. * update comments * refine * update perms * refine * wording * Change readme to md file, add link to docs site.	2022-04-12 10:16:10 -07:00
RajalakshmiSR	e397d8e63e	POWER: Optimize MlasTranspose functions (#11172 ) This patch makes use of POWER vector intrinsics to improve performance of MlasTranspose functions. Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2022-04-12 09:51:20 -07:00
Xavier Dupré	833f5d5604	Remove dependancy on EP TVM in unit test project (#11170 )	2022-04-12 09:03:57 +02:00
Ryan Hill	625cc0ab99	Add Initialize() to shared providers to allow for reload (#11066 )	2022-04-11 22:58:50 -07:00
Changming Sun	8237568b65	Fix the rocm packaging pipeline package upload problem (#11174 ) In #11114 , I changed the script to use azcopy instead of azure blob storage's python APIs. However, it doesn't work for the AMD rocm pipeline, because: 1. The machines do not have azcopy installed 2. The machines are not in Azure, so they don't have Azure managed identity. So they still need to use SAS. Therefore in this PR I get the old python file back, but only use it in the AMD pipeline.	2022-04-11 13:59:44 -07:00
dependabot[bot]	04fe1bd2ed	Bump electron from 12.2.3 to 13.6.6 in /js/web (#10978 ) Bumps [electron](https://github.com/electron/electron) from 12.2.3 to 13.6.6. - [Release notes](https://github.com/electron/electron/releases) - [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md) - [Commits](https://github.com/electron/electron/compare/v12.2.3...v13.6.6) --- updated-dependencies: - dependency-name: electron dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-11 12:51:56 -07:00
Olivia Jain	ae243c2bb5	Pull Nightly Wheel File and Cleanup Perf (#11164 ) * delete unused files * only use one dockerfile, otherwise install * Update pipeline file * get other changes * minimal packages * update pull nightly variable * try logical boolean * test boolean * have build ort as boolean * case senstive * use the current head not the previous commit * add helpful note	2022-04-11 11:41:11 -07:00
Yi-Hong Lyu	749c0ddd1e	Upsample support NHWC (#10824 ) This patch implement bilinear interpolation for Upsample/Resize 4-D input with the outermost and innermost scale (usually channel of NHWC) as 1. It is parallelized with output_height * output_width instead of one dimension only. Besides, I also revert the HandleResize back to the original implementation for TransposeOptimizerTests.TestResize* tests. Finally, I add microbenchmark BM_NhwcUpsampleBilinear.	2022-04-11 11:39:17 -07:00
Edward Chen	269be2fe63	Remove unnecessary option from convert_onnx_models_to_ort.py, fix old instructions. (#11088 ) Remove unnecessary --nnapi_partitioning_stop_ops option from convert_onnx_models_to_ort.py, fix old instructions.	2022-04-11 11:19:21 -07:00
Tianlei Wu	00b595e389	move longformer and t5 to models subdirectory (#11161 ) * move longformer scripts to models subdirectory * Copy transformers\models\t5 to python package as well	2022-04-09 22:35:14 -07:00
Erick Muñoz	f24523e0eb	Enable LayerNorm and SkipLayerNorm in OneDNN EP (#11128 )	2022-04-08 23:10:13 -07:00
liqun Fu	d96230065e	fix code error in function.cc (#11148 )	2022-04-08 10:04:21 -07:00
Dmitri Smirnov	12c687f594	Rework initializer.cc to eliminate code duplication (#11131 ) Rework initializer.cc to eliminate code duplication and add type enforcement. Address review comments. Add literal operators for MLFloat16 abd BFloat16 and tests.	2022-04-08 09:42:31 -07:00
Vincent Wang	bcc62e0cbf	move some process out of training step (#11150 )	2022-04-08 17:30:11 +08:00
Lukas	4c37f15c1b	Find boost, nsync, json, cpuinfo system libs with CMake option onnxruntime_PREFER_SYSTEM_LIB (#11146 )	2022-04-08 00:11:02 -07:00
Weixing Zhang	0aaf3a676a	Update reduce norm1/norm2 and layernorm kernels with ROCm 4.3.1 (#9399 ) * update layernorm to reflect the fix in ROCm 4.3.1 * fix UT Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-04-07 22:54:12 -07:00
Lukas	1b664e6d4c	Link cpuinfo only if supported (#11147 ) * Remove unnecessary target_include_directories for cpuinfo Headers already exposed as public by CMake target: `5916273f79/CMakeLists.txt (L213)` * Link to cpuinfo library only if supported	2022-04-07 21:32:12 -07:00
IkerAriz	541eff8d89	Directly use memory mapped data for external data initializers (#11127 )	2022-04-07 18:00:43 -07:00
Baiju Meswani	5637f17189	Remove python frontend codeowners (#11143 )	2022-04-07 15:57:30 -07:00
Justin Stoecker	7609694464	Enable building with a GDK (#11126 )	2022-04-07 15:06:31 -07:00
Changming Sun	4983d6e5d6	Call pluggable EP's shutdown function in Environment::~Environment() (#11120 ) I disabled some tests temporarily. I will move them to a separated executable file in another PR. In the future, I want to combine onnxruntime::Environment and OrtEnv classes. Now we have 3 env classes, it is too confusing: 1. onnxruntime::Env 2. onnxruntime::Environment 3. OrtEnv Our python binding uses onnxruntime::Environment, while all other language bindings use OrtEnv. So python doesn't unload EPs but the others do. It's better to make them consistent. Please note even I added the call, currently the unload function still is a no-op on Linux. So, currently on Windows we must unload the EPs while on Linux we must not do it.	2022-04-07 14:11:29 -07:00
Dmitri Smirnov	2700261f7c	Provide an API to supply external initializers data from user buffers (#11109 ) Imlpement AddExternalInitializers	2022-04-07 12:21:53 -07:00
ytaous	eec5187801	Remove Rocm 4.2 from CI Build (#11130 ) * remove rocm42 CI * update torch to v1.11.0 Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-04-07 11:42:09 -07:00

1 2 3 4 5 ...

6664 commits