onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Sunghoon	4028e51e7e	Update the compatibility of ONNX Runtime Web (#9444 )	2021-10-20 18:03:12 -07:00
George Nash	1249c7c29e	Resolve issue when running Yolov4 on DNNL EP (#9355 ) The dnnl_binary ops need the memory format to match the format expected by Onnxruntime. If the memory format of the inputs do not match each other there will be an error in the calculated results. Additionally, since the code manually pads the tensor dimensions for broadcasting the inputs are expected to be in Onnxruntimes format. Since detecting and reordering the memory to Ort format matches what was previously done for the Reshape op the code was moved from dnnl_reshape to dnnl_subgraph_primitive under the name GetMemoryInOrtFormat. One small additional change made to the capability code log to also print the percentage of nodes run by the dnnl execution provider. Signed-off-by: George Nash <george.nash@intel.com>	2021-10-20 13:10:31 -07:00
Stella Stamenova	9fc53df33a	Only add aliasing to targets if the corresponding package was found (#9404 )	2021-10-20 11:32:08 -07:00
Nick Kreeger	f1123c2fb3	Fix whitespace and style in concat.cc (#9452 )	2021-10-20 12:43:46 -05:00
Jeff Daily	89a22fb641	Add TopK to ROCm EP (#9391 ) * Add TopK to ROCm EP * flake8 fix	2021-10-20 10:39:44 -07:00
Jeff Daily	f8acc6d0e8	Add NonMaxSuppression and RoiAlign to ROCm EP (#9394 )	2021-10-20 10:38:45 -07:00
Jeff Daily	c33391329a	Add QuantizeLinear and DequantizeLinear to ROCm EP (#9401 )	2021-10-20 10:37:58 -07:00
Changming Sun	406f1629c1	Remove Featurizers code (#9300 )	2021-10-20 10:20:35 -07:00
Bowen Bao	e983f37121	Bifurcation detector for aggressive decoding (#9432 ) ``` Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens, starting from previous suffix match index, and predicted tokens. Concat predicted tokens, starting from bifurcation index, to the back of current tokens. This forms the output tokens. Detect suffix match index in source tokens, between source tokens and output tokens. Detection is based on finding the appearances of last n-gram in output tokens in source tokens. A match is considered found if source tokens contain a single matching n-gram. Return the index of the start of the n-gram in source tokens. No matching if found if src tokens contain multiple or zero matching n-grams. Return -1. ```	2021-10-19 19:53:56 -07:00
baijumeswani	20eaed43e5	Ignore all string inputs to ORTModule AB#1310803 (#9344 )	2021-10-19 16:34:47 -07:00
Hariharan Seshadri	4698b73725	Fix output shape description of Attention op's schema (#9406 )	2021-10-19 15:56:35 -07:00
George Wu	3873885316	add missing atomic include (#9440 )	2021-10-19 14:42:50 -07:00
Jeff Daily	52c53e396d	hipify tensor/gather_nd_impl.cu (#9392 )	2021-10-19 14:15:49 -07:00
Jeff Daily	a2ba923ac7	hipify fast_divmod.h (#9400 )	2021-10-19 12:34:46 -07:00
Jeff Daily	a8e2e8d76a	hipify tensor/transpose.cc and tensor/transpose.h (#9397 )	2021-10-19 12:27:36 -07:00
baijumeswani	757bc66720	Set cuda version to be None instead of an empty string (#9435 )	2021-10-19 11:10:52 -04:00
Sherlock	e22920d954	Update ORTTraiing frontend codeowner (#9427 )	2021-10-18 23:56:21 -07:00
Yufeng Li	da3dd398c5	Kernels for QLinearConv with symmetrically quantized filter (#9323 ) Add kernels for QLinearConv with symmetric quantized filter, e.g., filter type is int8 and zero point of filter is 0. This PR includes kernels for avx2, avxvnni, avx512 and avx 512 vnni. Will adds kernels for ARM64 in following PR. Kernels uses direct input buffer directly for pointwise, and in-direct buffer for depthwise and non-group conv. The advantages of those new kernels are: no need to compute the sum of each pixel output image, and sum/offset of filter can be combined with bias. with in-direct buffer, im2col returns an array of buffer pointers instead of memcpy'ing the original data. This saves memcpy time and reduces the size of the intermediate buffer needed to hold the im2col transform. In the future, will compute im2col ahead of time for input with fixed input size.	2021-10-18 19:40:18 -07:00
baijumeswani	5da4e07daa	Make FusedAdam mathematically equivalent to Transformers AdamW (#9343 )	2021-10-18 16:03:18 -07:00
Yulong Wang	5b65f1cb44	fixes SDL Native Rules warning in Node.js binding CI (#9402 )	2021-10-18 13:05:46 -07:00
Jingqiao Fu	f60e603022	Add support for DmlExecutionProvider for transformer profiler tool (#9380 ) * fixed a profiler.py bug * Add dml support for profiler * Remove commented line * improve syntax	2021-10-18 12:31:29 -07:00
Ye Wang	0824207c0f	Add Dev Guide to transformer optimizer (#9329 ) * a * Update Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md * Add files via upload * Update Dev_Guide.md * Create Dev_Guide.md * Update Dev_Guide.md * Update Dev_Guide.md	2021-10-18 12:27:26 -07:00
Changming Sun	6ecb990fae	Update win-ci-pipeline.yml	2021-10-18 10:43:19 -07:00
Tracy Sharpe	b130a7b715	fix MSVC micro benchmark build warnings (#9373 )	2021-10-15 11:35:02 -07:00
Guoyu Wang	59dfab59dc	Fix integer overflow for large step for Slice OP (#9376 )	2021-10-15 09:42:53 -07:00
Yulong Wang	901c7de918	[js/web] remove webgl from default fallback list (#9374 )	2021-10-14 21:46:22 -07:00
pengwa	f05c285a58	Exception when duplicated autograd.Function name detected (#9351 ) * Exception when duplicated autograd.Function name detected * reorder a bit for a bittle bit better perf * fix a bug in previous PR :( * correct the error message a bit	2021-10-15 12:23:13 +08:00
Sunghoon	74eaaad768	[js/web] Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast and clip (#9249 ) * Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast, clip * merge master and update a operators.md * resolve comment. revise pool and cast kernel implementation. * skip fusion when clip min and max is not in initializer	2021-10-14 16:29:37 -07:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Abhishek Jindal	87e726d1a0	Abjindal/merge eager with external custom ops (#8986 ) * switching to pytorch nightly build * adding eager mode * enable pybind and remove install step * removing auditwheel repair process * installing package * adding auditwheel back * disabling auditwheel repair for eager mode * typo correction	2021-10-14 13:19:45 -07:00
Abhishek Jindal	23700a15a0	Abjindal/eager windows build (#9326 ) * removing warnings which are causing errors from torch and changing flags for Windows * adding MKL library resolution and comments * cleaning up the code * fixing onnxruntime_python file for windows build * fix the include order to aovid the python_d.lib issue on win debug build * changes for warnings, typos and other comments * merge conflict * adding fix for mkl library error * Revert "adding fix for mkl library error" This reverts commit `73b87c73c2`. * fix for dll path for windows * typo for dll path Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-10-14 12:54:49 -07:00
Jeff Daily	3e879aab6b	work around ucx in rocm ci Dockerfile (#9360 )	2021-10-14 09:49:31 -07:00
Xavier Dupré	11f0081c1e	Remove tensorflow, tf2onnx from the list of dependencies for the documentation (#9221 ) * Remove tensorflow, tf2onnx from the list of dependencies for the documentation * improve documentation * update API	2021-10-14 18:07:35 +02:00
Xavier Dupré	22e3f8bf54	Refactor TrainingManager.forward (#9354 ) * Refactor TrainingManager.forward	2021-10-14 12:54:31 +02:00
sumitsays	851554536c	[DML EP] ConstantsOfShape - Empty Output and EinSum - Optional Parameter (#9361 ) * Added null check before filling tensor with a value. Passing optional parameter for EinSum in case of MatMul type * Addressed comment on the PR Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2021-10-13 23:37:10 -07:00
pengwa	5ee47e3ffa	legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184 ) * megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional * add deepspeed zero1 and zero2 - checkoverflow & clip norm * re-structure code and add the copyright * update the document * refine the code after validation	2021-10-14 09:01:23 +08:00
Viswanath Boga	4771256be3	fix to avoid quantizing attention with varied q,k,v sizes (#9357 ) * fix to avoid quantizing attention with varied q,k,v sizes * updated the changes to address the comments	2021-10-13 16:25:34 -07:00
Chandru Ramakrishnan	ba0cca96f0	Hooked up eager logging to ORT default logger. (#9340 ) * Hooked up eager logging to ORT default logger.	2021-10-13 18:10:32 -04:00
groenenboomj	905fe36599	Add Conv and ConvTrans to ROCm EP (#9338 ) Added support for Conv and ConvTrans operators in the ROCm execution provider. Doubles not currently supported.	2021-10-13 14:18:08 -07:00
Arthur Meyre	bccd09c688	Serizalize model only once to reduce backend preparation overhead (#8270 ) * The serialization can be very heavy for large models * Only use the serialized model check on compatible onnx versions * onnx version >= 1.10.0 supports serialized model check Signed-off-by: IceTDrinker <49040125+IceTDrinker@users.noreply.github.com>	2021-10-13 13:58:22 -07:00
George Nash	e8ba5145ce	Add Transpose, Reshape, Pow and LeakyRelu ops to DNNL execution provider (#9180 ) * Transpose for DNNL EP Transpose reorders the memory to the right format but has the wrong dimentions and memory::format. So a new memory descriptor is created that points to the reordered memory. However, that memory is in a different location than the output expects. An extra parameter was added to the SetMemory to specify when memory must be copied if it is output from the subgraph. Signed-off-by: George Nash <george.nash@intel.com> * Implementation of Reshape op for dnnl ep Signed-off-by: George Nash <george.nash@intel.com> * Add Pow op to dnnl execution provider This Pow is limited; the exponent must be scaler or a one dimensional tensor e.g. a tensor with only a single element. The exponent must also be a constant initializer since it is only read when the primitive is created. OneDNN does not have any way to change the exponent after the primitive is created. The GraphViewer is now passed into the NodeCapability code since the GraphViewer is needed to find out if an input is a constant initializer. The unit tests for "Pow" did not make the exponent a constant initializer. To help verify the dnnl execution providers Pow function a version of the Pow unit tests was created for the DNNL execution provier that made the exponent a constant initializer. Signed-off-by: George Nash <george.nash@intel.com> * Add LeakyRelu to DNNL execution provider LeakyRelu was added to the dnnl elementwise ops. In the elementwise op the GetAlpha method was modified to take the default value for Alpha as a parameter instead of reading it from a member varable. This felt like it would be less likely to cause programer error. Signed-off-by: George Nash <george.nash@intel.com> * Switch dnnl_code_capability DataTypes from strings to enums Signed-off-by: George Nash <george.nash@intel.com> * Update DnnlSubgraphPrimitive.GetMemory function input This updates the GetMemory member function to take DnnlTensor instead of a string. This was done for two reasons. Every time the function was called it was always done using DnnlTensor.Name() this will reduce the code repition. We never called it using a saved string. This also makes the function inputs more closely match the GetMemoryAndReshape function. Making less differences between member functions. Signed-off-by: George Nash <george.nash@intel.com>	2021-10-13 10:20:07 -07:00
Yulong Wang	1527af3e30	[js/web] deduplicate test cases between opsets (#9327 ) * [js/web] deduplicate test cases between opsets * fix eslint error	2021-10-12 22:37:19 -07:00
TomWildenhain-Microsoft	fb31701f7e	Fix bug in determining default slice axes (#9328 )	2021-10-12 16:17:11 -07:00
Moshe David	510b747821	w (#9319 ) Co-authored-by: modav <modav@microsoft.com>	2021-10-12 16:02:40 -07:00
Tang, Cheng	f0bc35c4ba	fix a hardcode type (#9337 )	2021-10-12 13:44:46 -07:00
Hariharan Seshadri	d5c5c4fa50	Handle implicit subgraph inputs required on different devices in Memcpy transformer (#9299 )	2021-10-12 11:21:17 -07:00
Tang, Cheng	48737091c0	resolve the provider options before create training session in orttrainer (#9199 ) * resolve the provider options before create training session in orttrainer * Update orttraining/orttraining/python/orttraining_pybind_common.h Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * support clear the training ep instance pool * fix status error Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-10-12 09:30:45 -07:00
ashbhandare	52c021d1f3	Fix export of aten op for Max and Avg Pool 2D (#9330 )	2021-10-12 09:03:14 -07:00
mindest	f9cf62912a	Add same_shape case for BiasDropout (#9188 ) * bias dropout improvement * add transform case for same shape case * combine kernel * merge with vectorized kernel * use "has_same_shape_bias" * minor: a "N % 4 != 0" case * add op UT for has_same_shape_bias * address comments; add param case for 1d bias; add param case tests for 1d and same-shape bias * rewrite logic condition Co-authored-by: Peng Wang <pengwa@microsoft.com>	2021-10-12 19:57:38 +08:00
Sunghoon	2f1204a5d5	[js/web] Enable wasm profiling and preserve function names in profiling (#9314 ) * add p50 in test * allow WebAssembly profiling and preserve function names Co-authored-by: Yulong Wang <yulongw@microsoft.com>	2021-10-11 22:04:50 -07:00

1 2 3 4 5 ...

5705 commits