onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-21 21:52:11 +00:00

Author	SHA1	Message	Date
Yulong Wang	25fdcfbd14	[js/web] allow multiple inference session creating concurrently (#10784 ) * test case * bugfix * fix * support multi session init	2022-03-07 11:35:06 -08:00
RandySheriffH	a4b5fa334a	Add type and shape information to profiled numbers (#10773 ) * add func to collect type shape * reformat * refactor perf view * remove obsolete	2022-03-07 10:17:58 -08:00
Changming Sun	d8bf9a479b	Remove python 3.6 from training pipelines (#10780 ) Because the numpy we use doesn't support python 3.6. And inference pipelines already removed python 3.6.	2022-03-07 09:57:24 -08:00
Hariharan Seshadri	9d30262422	Fix AMD training pipeline (#10788 )	2022-03-07 08:53:08 -08:00
Chen Fu	50a6f095cd	Symmetric QGEMM kernel for ARMv8 A55 chip (#10754 ) ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions. This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace ldr q4,[x1],#16 with ldr d4,[x1],#8 ldr x11,[x1],#8 ins v4.d[1],x11 so that we can try to hide the memory load cycles behind computing cycles in the kernel. Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-03-07 08:41:13 -08:00
PeixuanZuo	55af7a96a7	update the amd ci pipeline (#10723 ) * [TEST] test to get amd pipeline information * [FIX] lower the threshold * [UPDATE] add retry task * [UPDATE] add retry task * [ERROR] error to occur retry * [FIX] error * [UPDATE] update retryCountOnTaskFailure to 1 time * [UPDATE] add showmeminfo	2022-03-07 18:39:42 +08:00
Fei Hu	60acfd3dd8	Support CUDA Graph in the CUDA EP (#9978 )	2022-03-06 20:47:31 -08:00
Tianlei Wu	0e335aba37	Update BeamSearch operator spec to support t5 (#10777 ) * change BeamSearch op to support encoder decoder model * check model_type and decoder attribute * fix * update comments * warn shape inference issue with onnx v1.11 or T5 * skip parity test when tempature != 1.0 * fix build	2022-03-04 21:52:45 -08:00
George Nash	6be5185088	Update dnnl Add, Mul, Sub, Div ops to handle scalar values (#10756 ) * Update dnnl Add, Mul, Sub, Div ops to handle scalar values Signed-off-by: George Nash <george.nash@intel.com> * Add additional scalar support for dnnl execution provider This will add scalar support for: Eltwise operators: Abs, Elu, Exp, LeakyRelu, Log, Relu, Round, Sigmoid, Softplus, Sqrt, and Tanh Gelu operators: BiasGelu, FastGelu, and Gelu Softmax operator Signed-off-by: George Nash <george.nash@intel.com>	2022-03-04 19:28:25 -08:00
Ye Wang	259ade2557	Add ability to modify num_hidden_layers from benchmark script (#10760 ) * add ability to modify num_hidden_layers from benchmark script * comment * Revert "comment" This reverts commit 28794b0e4f86506dcc937738894fcef97fc84e48. * Revert "add ability to modify num_hidden_layers from benchmark script" This reverts commit 96f36ed7f751721bcf4e3ab8748a715f19a4e044. * review coments Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>	2022-03-04 18:28:51 -08:00
Ella Charlaix	fde847473b	Add min max moving average calibration method (#10753 ) * Add min max moving average calibration method * Modify the calibration extra options dictionnary creation	2022-03-04 14:55:31 -08:00
Maxiwell	43ff27c7c8	ppc64le: optimizing the MlasQuantizeLinear() with VSX (#10644 ) This code is valid only when -mcpu is set to utilize POWER9 technology or above. A compatible code for POWER8 was created as well, but it was not tuned for performance.	2022-03-04 14:54:56 -08:00
Tianlei Wu	379b3cdef6	T5 to ONNX conversion script (#10766 ) * T5 onnx conversion script	2022-03-04 14:42:04 -08:00
Olivia Jain	12eb660415	Compare TRT vs ORT-TRT Accurately (#10565 ) * get inputs independently for trtexec * track one process only * remove engine and profile files * change time to commit time * add runtime option for io binding * move to commit date * fixes * add option for graph optimization * cleanup docker script * include remaining changes * choose graph optimization option * add space in option	2022-03-04 10:14:18 -08:00
dependabot[bot]	e3c85d4262	Bump numpy Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-04 09:51:32 -08:00
dependabot[bot]	b780a3784e	Bump numpy in /tools/ci_build/github/linux/docker/scripts/training Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-04 09:38:38 -08:00
dependabot[bot]	0b0e8ccf92	Bump numpy Bumps [numpy](https://github.com/numpy/numpy) from 1.19.5 to 1.21.0. - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt) - [Commits](https://github.com/numpy/numpy/compare/v1.19.5...v1.21.0) --- updated-dependencies: - dependency-name: numpy dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2022-03-04 09:34:58 -08:00
Changming Sun	283d0c47b4	Update our absl cmake files (#10762 )	2022-03-04 09:28:04 -08:00
zhangyaobit	4c88fa5971	Add micro-benchmark for FastGelu (#10744 ) * Add micro-benchmark for FastGelu * Delete the bert-base case, as it is very similar to the bert-large one. * Add argument parsing and more user-friendly provider type assertion.	2022-03-04 08:51:15 -08:00
Valery Chernov	46d0b20ac2	upstream TVM. small code cleaning (#10515 ) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-04 12:15:29 +01:00
Edward Chen	395a7242d6	[iOS packaging] Minor updates. (#10755 ) * Change storage container, simplify build definition parameters. * Remove explicit version from Objective-C docs. * Increase timeout. * Use real storage account. * Get static website URL with az cli.	2022-03-04 16:02:53 +10:00
Scott McKay	e337f5faf3	Enable QDQ cleanup and NHWC optimizers in an extended minimal build. (#10729 ) * Enable QDQ cleanup and NHWC optimizers in an extended minimal build.	2022-03-04 15:45:42 +10:00
Guoyu Wang	7aa706854f	Pipeline changes to build full ORT package for Android (#10654 ) * Add android package build settings for full build Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2022-03-04 15:35:54 +10:00
Scott McKay	6072c6b65e	Simplify QLinearConv registration so type reduction works with it. (#10747 ) * Simplify QLinearConv registration so type reduction works with it. * Update QLinearMatMul registration to be a standard typed registration	2022-03-04 14:06:04 +10:00
Abhishek Kulkarni	c2c85dd6b1	Add an option to export ONNX graphs in ORTModule tests (#10579 ) Co-authored-by: Abhishek Kulkarni <abkulkarni@microsoft.com>	2022-03-03 16:56:19 -08:00
Yulong Wang	745fa5885f	optimize web assembly build flags for multi-thread (#10759 )	2022-03-03 16:44:14 -08:00
Edward Chen	c8ec7782bd	Fix unused variable warning, move variable definitions closer to usages. (#10757 )	2022-03-04 09:18:33 +10:00
Olivia Jain	ed87e1b721	Change axis to 0D in cumsum tests. (#10715 ) * changing axis to 0 * if def for openvino * removing extra header * include changes * pass in 0D scalar * Add comment explaining change.	2022-03-03 10:44:46 -08:00
Changming Sun	b3e96d6195	A new pipeline to replace the existing WindowsAI packaging pipeline (#10646 )	2022-03-03 08:56:49 -08:00
Hubert Lu	fe8d867efa	Optimize BinaryElementWise and BiasGeluGrad kernels for AMD (#10594 ) * Optimize elementwise and biasgelugrad kernels for AMD * Clean up for BiasGeluGradDxKernel	2022-03-03 08:07:15 -08:00
cloudhan	4c20f6863d	Fix build with gcc 7.5 (#10567 )	2022-03-03 18:29:02 +08:00
Fei Hu	75160d6779	Add the missing status return in beam search (#10738 )	2022-03-03 01:24:44 -08:00
Rachel Guo	a9dc50ba8b	Add option to force QDQIsInt8Allowed to return true when exporting to ORT format (#10719 ) * wip * save * minor update * fix * fix * Revert "fix" This reverts commit `a76f364b2d`. * revert * revert * revert submodule removal * address pr comments * minor fix * address cr comments * fix format Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-03-02 23:26:14 -08:00
Ye Wang	44d08d80a0	Add restriction to first usage in allocation planner (#10724 ) * Add restriction to first usage in allocation planner * change phrases * add UT Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>	2022-03-02 22:03:50 -08:00
Tianlei Wu	47ab0c2006	Auto mixed precision conversion of GPT-2 onnx model (#10711 ) * add auto mixed precision * Add float_to_float16_max_diff, update fp16 constants * remove cascaded Cast nodes	2022-03-02 21:08:51 -08:00
Olivia Jain	7ebff2b273	add missing link to openvino (#10737 )	2022-03-02 15:10:59 -08:00
Baiju Meswani	f9b6eef05f	orttraining packaging pipeline for rocm 5.0.1 (#10725 )	2022-03-02 12:32:14 -08:00
Yufeng Li	7ab0c607b4	add qdq support of (un)squeeze and GlobalAveragePool (#10721 )	2022-03-02 10:58:35 -08:00
Numfor Tiapo	9ad95bf068	Skip SetName test on inbox build (#10699 )	2022-03-02 10:28:58 -08:00
RajalakshmiSR	5d8c5409ab	POWER10: QGEMM optimization (#10642 ) * POWER10: QGEMM optimization This patch makes use of POWER10 MMA feature for QGEMM function. This optimization includes signed and unsigned cases.Tested and there are no new failures with gcc11 and clang-14. * Changes as per review comments Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>	2022-03-02 08:36:26 -08:00
Funtowicz Morgan	e5c6dc1fc8	Add ability to save calibration augmented models through external data format when model size exceeds 2Gb. (#10695 )	2022-03-02 08:35:30 -08:00
Valery Chernov	62cc981599	[TVM EP] support of TVM Virtual Machine (#10341 ) * add executor option (vm or graph) and support virtual machine methods * nullptr check for compile and run methods (see also PR#10211 from microsoft:onnxruntime) * get output shapes for VM * remove run_with_benchmark. remove run methods from python api, get it from native side * get outputs method for VM was implemented * support multiple input for VM * update python logging and exception * small fix * update tvm with patch for VM API * update nhwc transformations for TVM EP * add data alignment check and support set_input_zero_copy for GE in TVM EP * fix logger name * return back to apache/tvm with VM fixes instead of local dev branch * hide customized tvm logger while issue is not resolved. fix tvm warning related to target_host * flake8 fix Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-02 11:02:33 +01:00
Sunghoon	a7f6442c45	[js] release pipeline for web and react native (#10656 ) * skip browserstack test at release pipeline * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * pool name as a parameter to run at lotus * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * Update web-ci-pipeline.yml for Azure Pipelines * create a packaging pipeline for web * Update web-packaging-pipeline.yml for Azure Pipelines * make web-ci-pipeline as a template * make web-ci-pipeline as a template * make web-ci-pipeline as a template * make web-ci-pipeline as a template * change a paramter name checking a pipeline * make a pool name changable for react native pipeline * disable code sign validation for react native * fix react native package.json publish * fix indentation * remove unnecessary comment * test onnxruntime-common package publish * ts and js files use lf as eol for windows * use Linux style of ending line break * change newLine at only tsconfig.json * restore a commented code * fix git restore directory for npm packaging * fix a typo * force eol to lf on windows for js directory in CI	2022-03-01 21:38:33 -08:00
Edward Chen	9e7d7a9e97	Convert ConvActivationFusion transformer to a selector action transformer. (#10687 )	2022-03-02 13:47:55 +10:00
Tianlei Wu	fa9090f259	check gpt-2 graph in converting beam search (#10712 )	2022-03-01 19:04:34 -08:00
Edward Chen	d07a2377b1	Fix race condition in CUDA, ROCm, and TensorRT EP GetKernelRegistry() implementations. (#10200 ) Make GetKernelRegistry() kernel registry initialization thread-safe.	2022-03-01 17:53:58 -08:00
Tianlei Wu	2fb2dae42f	Print tensor snippet in dumping node Inputs/Outputs to StdOut (#10707 ) * dump tensor snippet	2022-03-01 16:59:12 -08:00
zhangyaobit	a7738b52c5	Add microbench to benchmark single operators. (#10678 ) * Add microbench to benchmark single operators. * Move to tool directory; seperate data genration from io binding. * Refector. * Clean up. * Use precision instead for extensibility. * Refactor the create_io_binding function to take in torch tensors instead of numpy arrays; this reflects more accurately what the function does, because it is torch tensors that got bound.	2022-03-01 16:00:16 -08:00
Guoyu Wang	19464614e7	[NNAPI QDQ] Add QDQ Concat (#10666 ) * add qdq concat Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-03-02 09:08:36 +10:00
Bowen Bao	6448ca64e6	Fix reshape allowzero with unknowndim (#10665 )	2022-03-01 10:47:48 -08:00

1 2 3 4 5 ...

6455 commits