onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-15 18:23:41 +00:00

Author	SHA1	Message	Date
Yifan Li	d6ce43db5e	[EP Perf] MemTest: Add Valgrind and fix addressSanitizer (#16930 ) ### Description 1. Add valgrind to existing ep_perf CI MemTest and parse ORT-TRT memLeak details 1. General Valgrind logs and logs related to ORT-TRT will be parsed in [CI artifacts](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=334122&view=artifacts&pathAsName=false&type=publishedArtifacts) 1. Logic: 1. Run valgrind with `onnxruntime-perf-test -e tensorrt` and export log to `valgrind.log` 2. Identify if any `definitely lost` memleak happened 1. For log paragraphs which show `definitely lost`, parse if they have keyword `TensorrtExecutionProvider`. 2. If so, extract these details to `ort_trt_memleak_detail.log`, and return `build failure` to EP Perf CI 3. Fix existing addressSanitizer and sync the squeezenet testcase with latest update from [ort-inference-example](https://github.com/microsoft/onnxruntime-inference-examples/blob/main/c_cxx/squeezenet/main.cpp) 1. Updates in short: Upgrade main.cpp to be using OrtTensorRTProviderOptionsV2 4. Reorder the 7-min-MemTest to be ahead of 9-hr-model-tests, and enable MemTest by default	2023-08-04 16:58:57 -07:00
Yulong Wang	5af8774a0b	[build] do init and precheck first (#16961 ) ### Description This change allows Web CI to do some check as the first step, so that if there are errors it won't launch the task to build web assembly, which is heavy. Checks includes: - "npm ci" in /js, /js/common and /js/web. this implicitly include: - typescript compiler in /js - typescript compiler in /js/common - webpack build in /js/common - typescript compiler in /js/web - ESLint on typescripts - clang-format formatter (.js, .ts, .cc, .h, .mm) - Prettier formatter (.json, .jsonc, .md) --------- Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-08-04 16:44:45 -07:00
Chi Lo	fc8003349e	Add API for updating TRT EP provider option user compute stream (#16965 ) Add a generic `UpdateTensorRTProviderOptionsWithValue()` C API to update TensorRT provider options where its data type is pointer that can't be represented by string.	2023-08-04 15:14:43 -07:00
Jiajia Qin	9ea0a3129b	[js/webgpu] Make sure only storage buffers are reused (#16893 ) ### Description <!-- Describe your changes. --> This PR makes sure that only storage buffers are reused. Previously, the query buffer might also get from the freeBuffers list if there is a matching size in it. But they are different usage, which results errors.	2023-08-04 13:40:52 -07:00
satyajandhyala	7ad43d9564	[JS/Web] Fixed ArgMin and ArgMax and refactored (#17002 ) Fixed ArgMin and ArgMax and refactored using functionality from Reduce operator code. ### Description Removed code/functionality duplication and fixed some issue. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-04 12:59:36 -07:00
Adrian Lizarraga	191f98a00e	[QNN EP] Improve QDQ model accuracy tests (#16916 ) ### Description - Improves how unit tests measure the accuracy of QDQ models on QNN EP. - Adds tests for ops: Add, Mul, Abs<sup>1</sup>, And<sup>1</sup>, Or<sup>1</sup>, Ceil<sup>1</sup>, Cos<sup>1</sup> <sup>1</sup>: Not previously supported due to missing node unit handling. ### Motivation and Context The new approach for testing QDQ operator accuracy requires running 3 inferences: 1. float model on CPU EP (baseline) 2. qdq model on CPU EP 3. qdq model on QNN EP The units tests check that running the QDQ model on QNN EP (3) is at least as accurate (+- small tolerance) as running the QDQ model on CPU EP (2). We measure accuracy by comparing to the baseline (1). This is essentially what we care about: is qnn ep as accurate as cpu ep. If not, it is worth investigating as a potential bug.	2023-08-04 12:15:27 -07:00
Baiju Meswani	e5bb7aba50	Add Gradient for Reciprocal (#16945 )	2023-08-04 09:38:09 -07:00
Yi Zhang	555414f1aa	Set PR trigger rules (#16987 ) ### Description Add a script to insert the trigger rules to workflow yamls. First step, skipp windows gpu and linux gpu workflow when there's only doc change ### Motivation and Context Make skipping workflows for doc change easily. [AB#18201](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18201)	2023-08-04 08:21:07 -07:00
pengwa	a6887f171f	Refactor schema extraction and output unflattening (#16894 ) ### Motivation and Context When we handle PyTorch models' inputs in different places (ORTModule or others), it's common for us to flatten a structured data into a 1-D tensor list (required by lib for example torch.onnx.export, torch.autograd.Function.forward or ORT inference session), then do subsequent work, then unflatten back to original hierarchy as returned values. DeepStage3 hooks support work also need such a lib to do similar things, so I was proposing to extract this pair of APIs in training/utils/, which can be more used more generally. Also a comprehensive set of test data are used for testing unflatten/flatten in unit tests. Let me know if you have any other suggestions. ### Refactor schema extraction and output unflattening Move `_extract_schema` and `unflatten_user_output` in `orttraining/orttraining/python/training/ortmodule/_io.py` . to `extract_data_and_schema` and `unflatten_data_using_schema` in `orttraining/orttraining/python/training/utils/torch_io_helper.py` as shared libs, which can be used later by other features (deepspeed stage 3 hook rewrite). While there are still a few duplicated logic handling flatten with different task by recursively loop the data struct, will change them step by step in case of heavy review efforts.	2023-08-04 13:58:21 +08:00
Edward Chen	f98d3f8a23	[CoreML EP] Enable inputs with dynamic shape (#16915 ) Enable node inputs with dynamic shape to be handled by the CoreML EP.	2023-08-03 18:15:00 -07:00
Jeff Daily	1629a6fa75	[ROCm] add gfx1100 and gfx1101 to CMAKE_HIP_ARCHITECTURES (#16972 ) ### Description Support additional AMD GPU architectures. ### Motivation and Context AMD announced expanding support for additional GPUs. https://community.amd.com/t5/rocm/new-rocm-5-6-release-brings-enhancements-and-optimizations-for/ba-p/614745 This PR is how we will deliver that expanded support to onnxruntime.	2023-08-04 08:38:42 +08:00
satyajandhyala	cc4b64f646	[JS/Web] Modify Reduce, Expand and Slice to pass op and node tests. (#16979 ) ### Description Make CacheHint mechanism, which is designed to avoid running the same test multiple times saving the result mapped against a key, working by adding input dims. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:48:47 -07:00
Tianlei Wu	a25d0d296b	Add --mask_type option to generate different format of attention mask in bert_perf_test.py (#16976 ) ### Description Add an option to generate different formats of attention_mask for testing transformers models: 1 - 1D mask index, actual sequence length excluding padding 2 - 2D attention mask. Value 0 means padding, 1 otherwise. 3 - 1D, key lengths and cumulated sequence lengths of query and key ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:24:20 -07:00
Tianlei Wu	bda012a4b2	Scripts to convert model with MulitHeadAttention to packing mode (#16925 ) ### Description Update scripts for converting model with MulitHeadAttention to packing mode. - [x] Update symbolic shape inference for PackedMultiHeadAttention and GatedRelativePositionBias - [x] Update convert_to_packing_mode to handle model with MulitHeadAttention ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:23:55 -07:00
Edward Chen	06096fcb31	Hardcode xcodebuild destination iOS simulator OS to 16.4. (#16982 )	2023-08-03 14:49:54 -07:00
Yulong Wang	641c3a4a37	[js/web] update op test schema (#16921 ) ### Description update op test schema. This changes fixes several problems for operator tests for web: - `opsets` -> `opset`: an operator uses exactly one opset instead of multiple - `condition` -> `platformCondition`: make it less confusing - `inputShapeDefinitions`: allows to test ORT behaviors when it get no/partial/full shape info. Added a JSON schema file and also an example file	2023-08-03 14:20:20 -07:00
Arthur Islamov	ea55700e1c	[js/web] JSEP Gather OP (#16855 ) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-03 14:09:37 -07:00
Arthur Islamov	acb9e56164	[js/web] JSEP Expand fix for inputs with rank < 2 (#16829 ) ### Description If Expand inputs has rank < 2, `inputIndicesHelper` and `outputIndicesHelper` create indices as u32 instead if array<u32> and `calculateInputIndex` throws an error ### Motivation and Context I've encountered this error while making StableDiffusion work with JSEP	2023-08-03 11:38:04 -07:00
Rachel Guo	757c42cea7	[rn] Update expo/config-plugins to 7.2.4 due to security warning with current version (#16977 ) ### Description <!-- Describe your changes. --> As title. And manually validated it in the https://github.com/fs-eire/ort-rn-hello-world test app with the dev/updated version of onnxruntime-react-native package: https://www.npmjs.com/package/onnxruntime-react-native/v/1.16.0-dev.20230712-a396a15fa6 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Resolve security warning issues. cc @skottmckay thanks author for the changes. Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-08-03 10:13:43 -07:00
Arthur Islamov	c11cffb565	[js/web] Fix typo in JSEP ConvTranspose (#16884 ) ### Description A typo fix in JSEP ConvTranspose. It used $12 as output shape pointer but it should be $13. As $12 holds shape size	2023-08-03 09:46:18 -07:00
Wei-Sheng Chin	e6c9ed0606	More element types in AllGather and AllToAll (#16941 ) Two things done in this PR. - [2nd commit] More tensor element types are supported because in distributed computation, we need to re-shard tensors in many different types. - [1st commit] We now specify opset version in test models. Without this change, those models will have opset=20 with latest ONNX and results test errors. - [3rd commit] Tests are modified to test `AllGather` and `AllToAll` for boolean tensors. Several graph patterns are tried for tests. We found that `int64_tensor -> Cast -> bool_tensor -> AllToAll -> bool_tensor -> Cast -> int64_tensor` always generate random results. My guess is that `AllToAll` needs to synchronize all GPUs before calling `ncclSend` and `ncclRecv` since `AllGather` doesn't hit this problem. For reproducing the error, search for `TODO` in this PR. Note that this PR doesn't fix it.	2023-08-03 09:31:55 -07:00
BoarQing	b8bbc898c6	fix errors for node with empty name for vitis ai (#16949 ) ### Description Fixed the issue of finding nodes with empty name for vitis ai. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It is required because we encountered this error when testing newly created models.	2023-08-02 19:08:49 -07:00
Dmitri Smirnov	246cb3a197	Simplify shrink, replace Eigne in Sign implemenation (#16975 ) ### Description <!-- Describe your changes. --> Simplify Shrink. Replace Eigen code with the one that does not require fp16 conversion in Sign. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-02 18:24:38 -07:00
Guenther Schmuelling	0df2e14038	js/webgpu: argmax,argmin,softmax support (#16882 ) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin	2023-08-02 18:16:19 -07:00
Hariharan Seshadri	506ddb3d5d	[js/WebGPU] Support int32 Transpose in WebGPU (#16952 )	2023-08-02 16:27:24 -07:00
BoarQing	6361b22103	vitis ai support generic data type (#16902 ) ### Description <!-- Describe your changes. --> Support more data types for vitis ai. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It is required because the models we are testing now have uint8 data type. To solve this once for all, we changed the code to support generic data type.	2023-08-02 15:56:39 -07:00
satyajandhyala	d399648869	[JS/Web] Added Resize kMSInternalNHWCDomain domain registration. (#16946 ) ### Description Added Resize NHWC domain kernel registration. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-02 14:16:21 -07:00
Michael Klimenko	07e6648e12	Enable Intel oneAPI DPC++/C++ compiler build (#16587 ) Last week I fixed error #16484 found when trying to build onnxruntime with the icpx compiler. Another thing I found out is that icpx uses -ffast-math flag by default. You can check it by running the compiler with -v flag like following: ```bash # Setup the environment . /opt/intel/oneapi/setvars.sh # Compile any file to see all the implicit flags icpx -v main.cpp ``` This leads to a bunch of warnings during the build like: ```bash In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/cpu/tensor/upsample_op_test.cc:5: In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/provider_test_utils.h:6: In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/test/providers/checkers.h:10: In file included from /mnt/f/wsl_home/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68: In file included from /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/Core:172: /mnt/f/wsl_home/onnxruntime/build/Linux/RelWithDebInfo/_deps/eigen-src/Eigen/src/Core/MathFunctions.h:1019:12: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare] return isnan EIGEN_NOT_A_MACRO (x); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` And some tests are failing as well, usually with infinities involved. To list a few: ```bash # ... 1: [ FAILED ] IsInfTest.test_isinf_float 1: [ FAILED ] IsInfTest.test_isinf_double 1: [ FAILED ] IsInfTest.test_isinf_positive_float 1: [ FAILED ] IsInfTest.test_isinf_positive_double 1: [ FAILED ] IsInfTest.test_isinf_negative_float 1: [ FAILED ] IsInfTest.test_isinf_negative_double 1: [ FAILED ] IsNaNOpTest.IsNaNFloat 1: [ FAILED ] IsNaNOpTest.IsNaNDouble # ... ``` This PR adds a quick global check for the IntelLLVM compiler, as in the way its name is reported by CMake and then, depending on the compiler driver, sets either MSVC-like or GCC-like switch to disable fast-maths. Probably a bit cleaner solution would be to use ```target_compile_options(${TARGET} PRIVATE MEOW)``` instead of a global-wide ```set(CMAKE_CXX_FLAGS MEOW)```, but then we'd be required to add it to all the individual targets and execution providers and this will lead to a lot of code duplication.	2023-08-02 12:50:35 -07:00
Tianlei Wu	76aff63f37	Update bert_perf_test to test inputs with different padding ratio (#16963 ) Add --average_sequence_length and --random_sequence_length so that we can test the performance of model on different padding ratio.	2023-08-02 10:28:39 -07:00
RandySheriffH	c392fdeb1b	RunAsync Python API (#16760 ) Implement python binding for RunAsync API. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-02 10:15:34 -07:00
Dmitri Smirnov	bd4d011142	[C#] Rename unreleased API, add utilities (#16806 ) ### Description 1. rename OrtValue.FillStringTensorElement to StringTensorSetElementAt . To the API user I think we're conceptually setting the string at an offset in the tensor with is roughly equivalent to `List<string> list ... list[index] = "value"`. 2. While working on new inference examples, I noticed that I am still inclined to use `DenseTensor` for N-D indexing. Added `GetStrides()` and `GetIndex()` from strides for long dims, so the user can obtain strides and translate N-D indices into a flat index to operate directly on the native `OrtValue` buffers. Expose these functions to the user. 3. Make sure we generate docs for C# public static functions.	2023-08-02 10:06:42 -07:00
Chi Lo	f4faceab28	Ignore deprecated declarations warning for TRT EP build (#16948 ) In additions to `onnxruntime_test_all`, `onnxruntime_shared_lib_test` and `onnxruntime_customopregistration_test` should also add "-Wno-deprecated-declarations" flag to ignore compiler warning	2023-08-02 09:51:58 -07:00
satyajandhyala	f8d933df31	[JS/Web] Register JSEP contrib ops only once per process. (#16950 ) ### Description Fix contrib ops once once. ### Motivation and Context Fix the earlier commit adding Gelu contrib op to the JSEP.	2023-08-02 00:27:11 -07:00
pengwa	b9d80131a7	Save optimized pre_grad graph once ready (#16816 ) ### Save optimized pre_grad graph once it's ready `graph_builder.build()` did two things for training: 1. optimized forward graph, e.g. pre_grad graph optimization. 2. build gradient graph. Originally after `graph_builder.build()` completed, pre_graph graph is saved. While if pre_grad graph optimization completed, but fail during gradient graph build, we still cannot get pre_grad graph to investigate. This PR made the change once pre_grad graph is ready, we save it (if save_model is enabled) in C++ backend.	2023-08-02 14:05:26 +08:00
Wanming Lin	ba49d64f67	[WebNN EP] Support LpPool, GlobalLpPool, and Log ops (#16954 ) BTW, reset minimal supported opset to 1, because with minimal supported opset 7 will ignore all ops that have last since version less than 7. e.g. GlobalLpPool, it only has two opset versions: 1, 2.	2023-08-01 22:35:10 -07:00
Yulong Wang	4a2a248dd7	remove unused comments in mac CI yml file (#16964 ) ### Description remove unused comments in mac CI yml file	2023-08-01 20:52:12 -07:00
zesongw	5912837791	[WebNN EP] Fix bug when Pad has negative padding value. (#16878 ) Padding value in ONNX Pad can be negative, which indicates remove pixel. WebNN EP can not support such operation, so it needs to use slice to handle this case.	2023-08-01 19:41:02 -07:00
Yi Zhang	36c5b0dcdd	Fix onnxruntime_tvm (#16933 ) ### Description it works but it's ugly. ### Motivation and Context Fix tvm ci	2023-08-02 07:51:00 +08:00
Tianlei Wu	50bf310dea	[CUDA] RelativePositionBias supports input with padding removed (#16923 ) update RelativePositionBias to support input with padding removed. - [x] add bias transpose kernel - [x] add test - [x] update operator document	2023-08-01 16:39:09 -07:00
Yulong Wang	afac67bcc3	[build] fix the CI pipeline (#16962 ) ### Description There are currently multiple failures that blocking the CI pipelines so this PR has all of the fixes in order to make sure it passes the CI. Otherwise a single fix will still fail the CI. includes: #16960 #16958 Please help to make sure this PR get merged once CI passed. @snnn @carzh @guschmue Fixed: [AB#18118](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18118) --------- Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-08-01 16:22:45 -07:00
Tianlei Wu	1fbd1ed179	[CUDA] PackedMultiHeadAttention support Bias and separated Q, K and V inputs (#16913 ) ### Description Follow-up change for PackedMultiHeadAttention added in https://github.com/microsoft/onnxruntime/pull/16779: - [x] Add Bias input - [x] Add CUDA kernels to support separated query, key and values inputs. - [x] Update operator documents - [x] Add unit tests	2023-08-01 15:30:41 -07:00
Changming Sun	e412d93b00	Add lsb-release package to android custom build (#16944 ) ### Description Add lsb-release package to android custom build ### Motivation and Context To fix a build issue: /workspace/onnxruntime/tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/install_protobuf.sh: line 27: lsb_release: command not found	2023-08-01 11:27:29 -07:00
Changming Sun	1333f73a68	Add ONNX 1.14 test data (#16943 ) This PR is similar to #15256	2023-08-01 11:19:27 -07:00
Yulong Wang	969c95f73f	[js/common] a few fixes/revises to onnxruntime-common (#16853 ) ### Description - enable unit test for js/common in CI - add debug config in js/.vscode/launch.json - enable source map for js/common/test for debugging purposes; add source map files to ignore list - ignore js/common/test folder for npm packaging	2023-08-01 11:17:39 -07:00
Yi Zhang	c4e4b98fb2	replace one pool with onnxruntime-Win2022-GPU-T4 (#16953 ) ### Description replace one pool ### Motivation and Context onnxruntime-gpu-tensorrt8-winbuild-t4 would be deprecated	2023-08-01 21:02:56 +08:00
Yulong Wang	6046456bb6	build break: apply formatter fix (#16947 ) ### Description build break: apply formatter fix	2023-08-01 01:10:55 -07:00
Patrice Vignola	49512e558a	[DML EP] Add I/O binding and `If` operator (#16859 ) Being able to leverage I/O binding for DML and registering `If` for the DML EP allows us to avoid copying the past/present key/values back and forth between the CPU and the GPU after every token. This gives us a 25% performance increase for Dolly V2 with 128 tokens on an RTX 4090.	2023-07-31 19:45:59 -07:00
Artyom Stepanishchev	ba23e5b234	[JS/Common] Fix malformed result of Tensor.fromImage(ImageBitmap) (#16919 ) ### Description Set `canvas` dimensions to the `ImageBitmap` dimensions, thus fixing a malformed Tensor creation. ### Motivation and Context According to the [HTMLCanvasElement.drawImage() spec](https://html.spec.whatwg.org/multipage/canvas.html#drawing-images): > When the destination rectangle is outside the destination image (the output bitmap), the pixels that land outside the output bitmap are discarded, as if the destination was an infinite canvas whose rendering was clipped to the dimensions of the output bitmap. meaning that `ImageBitmap` pixels exceeding the canvas dimensions will be discarded. Since no canvas dimensions are set for `Tensor.fromImage(ImageBitmap)` if-case, the default 300x150px canvas dimensions are used leading to the creation of malformed Tensors where all the exceeding pixels are discarded and equal to `0, 0, 0, 0` during the subsequent `pixels2DContext.getImageData()` call.	2023-07-31 18:18:06 -07:00
Jiajia Qin	fa8487ea3a	[js/webgpu] Check profilingMode in each run (#16897 ) ### Description <!-- Describe your changes. --> This PR moves checking profilingMode to each run instead of the initialization stage. In this way, users can start/stop profiling at any time. Otherwise, profiling only take effects at the very beginning and can't be stopped.	2023-07-31 17:37:24 -07:00
kunal-vaishnavi	3c72f43f78	Extend saving models optimized by inference session (#16912 ) ### Description This PR adds support for saving model optimizations after loading a model that contains external data into an `InferenceSession`. ### Motivation and Context This PR is a follow-up to a [previous PR](https://github.com/microsoft/onnxruntime/pull/16716) for saving a model optimized by an `InferenceSession`.	2023-07-31 16:39:35 -07:00

1 2 3 4 5 ...

9297 commits