onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Dmitri Smirnov	e752cbe7f2	Work on eliminating Internal Compiler Error (#16741 ) ### Description <!-- Describe your changes. --> Replace the offending bitwise `operator \|` with if() logic for ARM.	2023-07-18 10:17:52 -07:00
Wei-Sheng Chin	b71ebf91a5	[DORT] Reduce global configs to make enabling dynamic shape easier (#16720 ) There are several global configs used by DORT. ```py DEFAULT_ONNX_EXPORTER_OPTIONS = torch.onnx._internal.exporter.ResolvedExportOptions( torch.onnx._internal.exporter.ExportOptions() ) # TODO(wechi): This line must generate result identical to the call of # _create_onnx_supports_op_overload_table(...) inside # create_onnx_friendly_decomposition_table(...) in # torch/onnx/_internal/fx/decomposition_table.py. _SUPPORT_DICT = torch.onnx._internal.fx.decomposition_table._create_onnx_supports_op_overload_table( DEFAULT_ONNX_EXPORTER_OPTIONS.onnx_registry ) # type: ignore _EXTRA_SUPPORT_DICT: Dict[str, Any] = { "getattr": None, "_operator.getitem": None, } DORT_DECOMPOSITION_TABLE = DEFAULT_ONNX_EXPORTER_OPTIONS.decomposition_table ``` We can see all but `_EXTRA_SUPPORT_DICT` are extracted from deduced from ONNX exporter's options. As there are many ways to configure ONNX exporter's options, we decided to move these variables to `OrtBackend`'s `__init__` so that the construction of `OrtBackend` becomes more flexible (especially for enabling dynamic shape or not).	2023-07-18 09:06:58 -07:00
PeixuanZuo	9b549c646c	[ROCm] fix kernel explorer GemmSoftmaxGemm test (#16735 ) GemmSoftmaxGemmTunble occasionally broken with large numerical error. The root cause of this error is CK's Strided Batched Gemm has larger error under a specific initialization distribution `(multinormal_distribution)`. Generic(Gemm1 + Softmax + Gemm2) implementation is one instance of GemmSoftmaxGemmTunble. Gemm1 and Gemm2 in Generic implementation are TunableOps when tuning enabled. In some case GemmSoftmaxGemmTunble select Generic implentation, while Gemm1 or Gemm2 select ck implementation, the result of GemmSoftmaxGemmTunble affect by CK. - Make tolerance more loosen. - Add `GemmSoftmaxGemmPermuteGenericNestedTunable` to test Generic implementation with tuning enabled.	2023-07-18 16:47:39 +08:00
zhangsibo1129	9ba5cdbaa4	[CANN EP] Fix Float16 support for CANN EP (#16733 ) ### Description <!-- Describe your changes. --> Replace the constructor function `MLFloat16()` with the public member function `FromBits()` in the file `onnxruntime/core/providers/cann/cann_common.cc` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> PR [#16506](https://github.com/microsoft/onnxruntime/pull/16506) changed the public constructor function `MLFloat16(uint16_t x)` to private, and added a public function `MLFloat16::FromBits(uint16_t x)` in the file `include/onnxruntime/core/framework/float16.h`, which broke the CANN CI. This PR aligns the CANN behavior with the modified class `MLFloat16`.	2023-07-17 23:24:51 -07:00
cloudhan	0cab7e1a37	[ROCm] Generalize FastGeLU (#16623 ) Allow the whole pipeline to be parameterized with unary elementwise functor.	2023-07-18 11:23:12 +08:00
Scott McKay	ad90352a68	Add MAUI test app that can be used to test model loading and performance (#16658 ) ### Description <!-- Describe your changes. --> MAUI test app with tooling to add model and generated or provided input test data. The app will load the model and validate the output. It can also run a specified number of iterations to provide basic performance information. <img width="401" alt="image" src="https://github.com/microsoft/onnxruntime/assets/979079/daf3af13-fb22-4cbb-9159-486b483a7485"> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Primarily to make it easier to test an arbitrary model on iOS. A MAUI app allows testing on all platforms. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-07-18 08:21:18 +10:00
cloudhan	a45b834722	Fix warning about uninitialized member (#16736 ) #16506 Cause almost every translation units on linux complaint ``` [1175/1235] Building CXX object CMakeFiles/onnxruntime_test_all.dir/home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc.o In file included from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:18, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/data_types.h:17, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/tensor.h:17, from /home/guangyunhan/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16, from /home/guangyunhan/onnxruntime/onnxruntime/test/providers/compare_provider_test_utils.h:7, from /home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc:4: /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h: In instantiation of ‘static constexpr uint16_t onnxruntime_float16::Float16Impl<Derived>::ToUint16Impl(float) [with Derived = onnxruntime::MLFloat16; uint16_t = short unsigned int]’: /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:42:66: required from here /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:241:7: note: ‘union onnxruntime_float16::detail::float32_bits’ has no user-provided default constructor 241 \| union float32_bits { \| ^~~~~~~~~~~~ /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:242:16: note: and the implicitly-defined constructor does not initialize ‘unsigned int onnxruntime_float16::detail::float32_bits::u’ 242 \| unsigned int u; \| ^ ``` This PR shut the compiler up.	2023-07-17 11:33:54 -07:00
Edward Chen	df8843c4a7	Upgrade old Python version in packaging pipeline (#16667 ) - Upgrade from Python 3.6 to 3.8 in packaging pipeline. - Raise build.py minimum required Python version.	2023-07-17 08:24:47 -07:00
Dmitri Smirnov	b8c40b7813	Fix parameter naming that fails Doc generation. (#16717 ) ### Description Rename `FromBits` param name to match the docs. ### Motivation and Context Fix API Doc generation.	2023-07-16 22:02:05 -07:00
RandySheriffH	e1ca8ee6d4	RunAsync C/CXX API (#16613 ) Implement RunAsync API - the session will run in a thread of intra-op thread pool. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-07-16 16:51:40 -07:00
Ryan Hill	2cf31a20cf	Cuda: Decoder Masked Multihead Attention Q values get corrupted when using cross attention (#16721 ) ### Description Some code was accidentally moved into the `if(!params.is_cross_attention)' block, it must stay outside to work in both cases. ### Motivation and Context This causes invalid results. We detected this as a performance bug, as it caused the EOS early exit to never happen, and the runs would always take max_length to complete which was slow.	2023-07-15 00:41:06 -07:00
Wanming Lin	2b7a94e65b	[WebNN EP] Make some types clearer (#16705 ) It's a follow-up to address comments in https://github.com/microsoft/onnxruntime/pull/16671#discussion_r1261761828 and https://github.com/microsoft/onnxruntime/pull/16671#discussion_r1261763873	2023-07-14 17:39:36 -07:00
Ryan Hill	2ae041f390	atomicAdd returns previous value, not current value. (#16690 ) ### Description Mistake in beam scorer processing, atomicAdd result should be compared with '1' vs '0' as it returns the original value, not the latest value. This error just results in slow perf, nothing fails. ### Motivation and Context Fixes #16642	2023-07-14 15:46:57 -07:00
Wei-Sheng Chin	44fd98ebfe	[DORT] Enable aten::full by implementing extra logics to select EP (#16699 ) DORT only select devices from inputs arguments' (type: torch.Tensor). However, it errors out when a graph doesn't have any inputs (e.g., a single aten::full graph). This PR address this problem by changing the EP selection to - First, inspect graph inputs. If there are some valid devices, use them plus a default one (`OrtBackend.ep: str`). - Otherwise, inspect graph outputs carried by `torch.fx.GraphModule` and use all valid devices plus the default `OrtBackend.ep`. - When both (1) and (2) fail, it uses the default EP specified by `OrtBackend.ep`.	2023-07-14 15:42:25 -07:00
Edward Chen	f236768d5c	[ios] Enable `--use_extensions` with custom built iOS pod (#16711 ) - Fix link errors by including the needed onnxruntime-extensions libraries in the static framework. - Add Objective-C API to register custom ops from embedded onnxruntime-extensions. Caveat: Not all onnxruntime-extensions build options are working yet. E.g., building with the onnxruntime-extensions OpenCV dependency does not work.	2023-07-14 15:37:16 -07:00
G. Ramalingam	4faee2e44c	Fix issue in constant-propagation inside function subgraph (#16330 ) ### Description The SequenceMap function-op has a graph-attribute. ORT's constant-folding optimization may identify constant-expressions inside the subgraph and promote them to constants, stored as initializers in the main graph. When it does this, the optimization updates the subgraph to remove the corresponding nodes. When we expand a SequenceMap node by inlining its function-expansion, we need to use this updated subgraph. However, the existing code uses the original graph-attribute (GraphProto), instead of regenerating it from the modified subgraph. This results in producing a graph with duplicate definitions for the constant-folded variable, resulting in an error during graph-resolve. This PR fixes this issue (just a single line fix), and adds a test-case to cover this scenario. --------- Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2023-07-14 14:44:59 -07:00
Wanming Lin	ea43671eb6	[WebNN EP] Support several activation ops (#16693 ) Support Elu, HardSigmoid, HardSwish, Softplus, Softsign, Tanh.	2023-07-14 14:36:15 -07:00
Adrian Lizarraga	a189e76fde	[QNN EP] Fix error handling for Softmax/ReduceOps (#16700 ) ### Description - Fix check for Softmax with axis attributes not equal to -1. QNN EP only supports axis values equal to -1 (or rank - 1). - Explicit error when Reduce* ops have an input with rank > 4 on HTP backend (unsupported). - Correctly filter out partitions that only contain a single QuantizeLinear or DequantizeLinear node. - Add tests for the above and clean up unnecessary usage of test description labels. ### Motivation and Context Make it easier to debug why a model may not be supported.	2023-07-14 13:47:23 -07:00
Baiju Meswani	9889f0f507	Add support for training apis to support custom ops (#16601 )	2023-07-14 11:15:51 -07:00
Adrian Lizarraga	19169afe30	[QNN EP] Add option to skip unit tests in the QNN NuGet packaging pipeline (#16164 ) Add option to skip unit tests in the QNN NuGet packaging pipeline.	2023-07-14 10:52:05 -07:00
Dmitri Smirnov	853c4ff0a5	[C#, CPP] Introduce Float16/BFloat16 support and tests for C#, C++ (#16506 ) ### Description Introduce `Float16/BFloat16` support for C# and C++ APIs. User should be able to perform conversions from `float` to/from `Float16/BFloat16`, compare values and tests for `NaN, Inifnity, and whether the number is denormalized.` ### Motivation and Context User filed issues such as: https://github.com/microsoft/onnxruntime/issues/14303	2023-07-14 10:46:52 -07:00
Tianlei Wu	77b45c6503	Add Stable Diffusion Benchmark on A100-PCIE-80GB (#16702 ) 0(1) Fix a bug in https://github.com/microsoft/onnxruntime/pull/16560 that UNet shall be set fp16 flag. (2) Remove wget in requirements since it is no longer needed. (3) Add benchmark numbers in A100-PCIE-80GB. Note that CUDA EP have issue to run in batch size 4 so the number is not added.	2023-07-14 10:37:00 -07:00
Yi Zhang	36b121d8c2	add more check to Web CI on cache restore (#16689 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Make sure the data is correct.	2023-07-14 10:00:13 +08:00
mindest	810512c658	[ROCm] TunableOp: add hipBLASLt tuning logic (#16338 ) ### Description - Add hipBLASLt tuning logic in place of default hipBLASLt implementation; - add kernel explorer for hipBLASLt. related operators: Gemm, StridedBatchedGemm, and GemmFastGelu. Temporarily mark algos that require extra workspace as unsupported. Will add workspace support in later PR, which will change Gemm Params def and affect multiple files.	2023-07-14 08:20:58 +08:00
Scott McKay	a3fc04ba74	Fix CodeCoverage pipeline (#16684 ) ### Description <!-- Describe your changes. --> Delete second reference to onnxruntime_api_tests_without_env in the code coverage commands. One was removed in #16373 and the duplicate wasn't noticed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix pipeline.	2023-07-14 07:47:04 +10:00
Yulong Wang	d1d65978f6	[js/web] fix file size trim for wasm only .min.js (#16681 ) ### Description fix file size trim for wasm only .min.js minimal build `ort.wasm.min.js` and `ort.wasm-core.min.js` should exclude JSEP related source code.	2023-07-13 14:20:51 -07:00
Danny Friar	5de2e2fb76	Call `lazy_reset_grad` in on-device training docs (#16696 )	2023-07-13 13:29:54 -07:00
Dipanjan Sengupta	a461608409	Amx flag removal (#16527 ) ### Description 1. Replacing AMX intrinsics with machine code macros in QGEMM kernel. 2. Removing AMX build flags for GCC in cmake file. 3. Fixing the link time optimization (LTO) issue introduced with asm .include of an assembly file. I have moved the AMX instruction macro definitions from QgemmU8S8KernelAmxCommon.S to the amx_common.h to fix the LTO issue. Note that I am also pushing the macros defined in QgemmU8S8KernelAmxCommon.S for future reference. A special thanks to @laxmansole who helped in the development of the instruction macro definitions for AMX intrinsics and fixing the LTO issue. ### Motivation and Context The additional AMX flag in cmake adds an extra layer of dependency on GCC version to use the feature.These changes should allow the usage of the AMX feature with just the CPU ID check.	2023-07-13 11:19:49 -07:00
Vincent Wang	c07a3b869c	Triton Codegen for ORTModule (#15831 ) Fuse connected elementwise and reduce Ops to TritonOp and codegen triton code to run the kernel. This PR is co-edited by @wejoncy and @er3x3	2023-07-13 18:17:58 +08:00
Wanming Lin	7cac114e52	[WebNN EP] Support Abs and Neg ops (#16672 )	2023-07-13 00:44:22 -07:00
Wanming Lin	d5b76cff60	[WebNN EP] Fixed build error (#16671 ) The build break was caused by enabling `-Wshorten-64-to-32` in https://github.com/microsoft/onnxruntime/pull/16524	2023-07-12 23:37:24 -07:00
mindest	b7fd5af48b	[ROCm] TunableOp: Update rocBLAS get_solutions API (since ROCm5.6) (#16657 ) ### Description - Update existing rocBLAS get_solutions API using `*_get_solutions_by_type` (supported from ROCm5.6); remove the original nested TunableOp logic. - Update kernel_explorer.	2023-07-13 11:20:26 +08:00
PeixuanZuo	ebc311365b	[ROCm] Optimize ROCm CI to reduce time (#16620 ) This PR mainly optimize ROCm CI test to reduce time and CPU utilization. - use smaller batch size on strided_batched_gemm/batched_gemm test - disable cpu training test - fix test_e2e_padding_elimination Occasional failures on ROCm.	2023-07-13 10:58:03 +08:00
cloudhan	af89496fc7	Allow generic pipeline to accept some params for cross attention (#16519 ) Allow `GemmSoftmaxGemmPermuteGenericPipeline<T>` to be used in some cross attention, that opt for rocblas instead of ck if rocblas is better to the small problem. The improvement is ~20% e2e time reduction on some test cases for whisper large. Note: This is because ck has some performance issue if the sequence length is merely 1, and should be improved in the future.	2023-07-13 09:31:31 +08:00
cloudhan	3866614519	Avoid cmake repeatly printing DISABLE_FLOAT8_TYPES=ON (#16656 )	2023-07-13 09:29:20 +08:00
Yi Zhang	f3b40abe29	Use pipeline cache to cache onnx node test data. (#16659 ) ### Description Use pipeline cache instead of reading data from the image. ### Motivation and Context 1. To reduce the browser dependency of custom image. 2. The onnx node test data is less than 30M and the cache download time is very short.	2023-07-13 09:26:27 +08:00
Rachel Guo	111382746e	[js/rn] Add test for validating "executionProvider" options (#16651 ) ### Description <!-- Describe your changes. --> As title. Validation at JS call level in E2E app is not included. Can cover together in a separate pr. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Test coverage. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-07-12 14:55:47 -07:00
Ye Wang	dd7d721f3c	support rotary embeddings in decoder masked self-attention (#16556 ) ### Description <!-- Describe your changes. --> This PR adds support for rotary embeddings in decoder masked self-attention ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-07-12 13:48:48 -07:00
Sheil Kumar	0c956bef0a	[WinML] Fix warnings in OnnxruntimeEngine and OnnxruntimeEngineBuilder (#16679 ) Fix [prefast:Warning]: C6101 (in '_winml::OnnxruntimeEngine::CreateTensorValueFromDefaultAllocator' Fix [prefast:Warning]: C6101 (in '_winml::OnnxruntimeEngineBuilder::CreateEngine' Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2023-07-12 13:09:50 -07:00
pengwa	2449ded20f	Use autograd_inlining for model export (#16665 ) ### Use autograd_inlining for model export From some versions of PyTorch, there is an issue related to custom autograd.Function inlining, even though we register custom export function for the autograd.Function (e.g. when custom autograd function is enabled). As an options, PyTorch exporter adds a new flag during export, we can disable the inline. https://github.com/pytorch/pytorch/pull/104067 Currently the PyTorch change is in nightly built, this PR dynamically check the torch.onnx.export's signature and decide to use the `autograd_inlining` when it exists. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-12 20:57:24 +08:00
PeixuanZuo	596dbe277e	[ROCm] add upgrade to fix security issue (#16668 )	2023-07-12 17:57:18 +08:00
Yulong Wang	ecca11340a	[js/common] allow creating (u)int64 tensors in 2 ways (#16541 ) ### Description allow creating (u)int64 tensors from either a number array or a bigint array. before: ```js // TypeScript think is good, but actually does not work // runtime error: Uncaught TypeError: Cannot convert 1 to a BigInt const myTensor1 = new Tensor('int64', [1, 2, 3, 4], [2, 2]); // runtime good, but TypeScript thinks myTensor2 is a string tensor const myTensor2 = new Tensor('int64', [1n, 2n, 3n, 4n], [2, 2]); ``` after: ```js // both work at runtime and TypeScript populates the correct types const myTensor1 = new Tensor('int64', [1, 2, 3, 4], [2, 2]); const myTensor2 = new Tensor('int64', [1n, 2n, 3n, 4n], [2, 2]); ```	2023-07-11 21:07:36 -07:00
Aditya Goel	8e393e0b8c	Unique operator with double (#16359 ) ### Description The [ONNX standard](https://github.com/onnx/onnx/blob/main/docs/Operators.md#type-constraints-181) permits the `Unique` operator to have `double` input tensor element type, however this was not supported in onnxruntime. This PR enables this kernel. ### Motivation and Context The lack of support for `float64` forces users currently to cast to `float32` instead. This loss of precision can be severely problematic in feature engineering pipelines downstream of the `Unique` operator. It would be good to prevent this by updating ORT to reflect the standard and support `double` input tensors. --------- Signed-off-by: Aditya Goel <agoel4512@gmail.com>	2023-07-11 20:24:14 -07:00
Edward Chen	1b8d5c43c2	Fix builds (#16646 ) - Fix some more `shorten-64-to-32` warnings - Move minimum build.py Python version back to 3.6	2023-07-11 19:21:25 -07:00
Scott McKay	ce68a4c06a	Fix Linux build failure when onnxruntime_DISABLE_ABSEIL=ON (#16373 ) ### Description <!-- Describe your changes. --> Add ort_value.h to session_options.h so OrtValue is defined. Update a unit test binary to add required include paths. Adding ort_value.h pulls in more data type headers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #16193	2023-07-12 11:23:18 +10:00
Tianlei Wu	2de5807703	Attention fusion for UNet onnx model export from PyTorch 2.* (#16629 ) ### Description Tested with stable diffusion unet models exported by pytorch nightly. Example to run: ``` cd onnxruntime/python/tools/transformers/ python optimizer.py --input unet.onnx --output unet_fp16.onnx --model_type unet --float16 --opt_level 0 ```	2023-07-11 14:35:48 -07:00
Yulong Wang	b4bf7d5044	[js/web/test] accelerate 'npm test' suite0/1 init time (#16558 ) ### Description This change reduces the number of calls to globby functions so that it accelerates the initialization for 'npm test' with suite0/1 tests from ~14sec to <2sec.	2023-07-11 14:34:40 -07:00
Ti-Tai Wang	72076e5320	Update converter registry usage in `orttraining_test_dort_custom_ops.py` (#16663 ) Fix Orttraining Linux Lazy Tensor CI Orttraining Linux Lazy Tensor CI is broken. The error message is AttributeError: 'OnnxRegistry' object has no attribute 'register'	2023-07-11 12:03:12 -07:00
satyajandhyala	d41bbac7b9	[Web/JS] Added Expand operator support. (#16577 ) ### Description Added Expand operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-11 09:38:16 -07:00
Tommy Au	1b07bbceaa	Update build.bat Prevent spaces in path (#16635 ) ### Description <!-- Describe your changes. --> Simply add double quotes to prevent there is spaces in the path ### Motivation and Context <!-- - Why is this change required? What problem does it solve? As if there are spaces in path the bat cannot run, error would occurs. So with a simple double quotes can fix these problems - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-07-11 07:07:08 -07:00

1 2 3 4 5 ...

9170 commits