onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-04 23:59:56 +00:00

Author	SHA1	Message	Date
Vrajang Parikh	67f4a4fd16	Objective-C binding for ORT training (#16127 ) ### Description Implement Objective-C binding for `ORTCheckPoint`. Additionally, - Modify `onnxruntime_objectivec.cmake` to only include training header and sources when training flag is enabled - Enable objective-c binding for `orttraining-mac-ci-pipeline` ### Motivation and Context This PR is part of implementing Objective-C bindings for training API. It implements objective-c binding for ORTCheckPoint class. The objective-C API closely resembles the C++ API. Note: The test for saving checkpoint is skipped as it requires use of training session. It will be added when the objective-c binding for `ORTTrainingSession` is added.	2023-06-07 14:01:30 -07:00
Adam Pocock	bca49d62a0	Fixing CoreML in Java (#16231 ) ### Description The name of the flag we set when compiling the JNI binding to enable the CoreML EP changed at some point in the past. This PR fixes it by updating the flag in the JNI. I also added a quick smoke test for the CoreML provider to make sure it doesn't crash and can be enabled. ### Motivation and Context All the EPs should work as expected in Java. Fixes #16230.	2023-06-07 12:24:57 -07:00
Edward Chen	1261d0b8ba	Fix some build issues on MacOS with Xcode 14.3. (#15878 ) - Fix flatbuffers flatc warning, unused-but-set-variable. - Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code). - Update CI builds to use Xcode 14.3. - Update minimum iOS version to 12.0. - Update Mac hosted agents to MacOS 13 where possible.	2023-06-07 12:07:11 -07:00
Adrian Lizarraga	b8858f034e	[QNN EP] Increase conv test tolerance for Windows x64 (#16241 ) ### Description Increases allowable accuracy tolerance for specific Conv op test on QNN CPU backed (Windows x64). ### Motivation and Context Allow QNN NuGet pipeline to run. PR https://github.com/microsoft/onnxruntime/pull/15975 introduced a failing test on Windows x64.	2023-06-07 10:52:56 -07:00
Wanming Lin	a8c2f24ae0	[WebNN EP] Merge support for segment anything into main branch (#16208 ) We implemented a number of new ops and data types to support running segment anything model on Chromium WebNN DML backend (POC) in a forked branch https://github.com/honry/onnxruntime/tree/stable-diffusion In this PR, we migrate the changes in the forked branch to main branch, includes: - 22 new ops - New tensor data types: bool, int32, uint32, uint64, int64, float16 (As JavaScript hasn't shipped Float16Array, we use Uint16Array as a workaound) - Handle empty input tensors and duplicated outputs - Fixed some nits	2023-06-07 09:56:37 -07:00
cloudhan	05bea0d3c3	Add new cases for non biased mha tests (#16097 ) 1. Add new test data GetSelfAttentionData_WithPastAndPresent_HeadSize8_NoMask_NoRelPosBias, also added non-biased data 2. Add new test data GetCrossAttentionData_DiffSequenceLengths_HeadSize8, also added non-biased data 3. Disabled the new tests for CUDA EP due to qkv is not correctly transposed.	2023-06-07 15:04:27 +08:00
cloudhan	3373160863	[CPU EP] Refactor CPU mha (#16247 ) Followup of #16075	2023-06-07 14:41:14 +08:00
cloudhan	f013965831	Add non qkv biased version mha unittests (#16075 ) 1. Add nonbiased mha unittests data 2. Update CPU and CUDA EP to accept inputs with `qkv_bias`	2023-06-06 09:18:41 +08:00
Adam Pocock	3c2a11f2f1	[java] Allow the creation of boolean tensors from ByteBuffer (#15556 ) ### Description The tensor creation code now allows the creation of boolean tensors from non-direct `ByteBuffer` instances. It previously only allowed them from arrays and direct `ByteBuffer` instances and this fixes that inconsistency. The boolean tensor test has been updated to cover all three cases. ### Motivation and Context Fixes #15509.	2023-06-05 09:58:50 -07:00
PeixuanZuo	a95f8ae53c	[ROCm] Update ROCm/MIGraphX CI pipeline (#16215 ) MIGraphX CI - Change docker container user name to `onnxruntimedev` ROCm CI - Build docker image every job instead of using prebuild image. - Every job create a container with only one GPU with command `docker run -it --device=/dev/kfd --device=/dev/dri/renderDxxx` - Remove tests that are unstable or use outdated interfaces. - Enable training ortmodule test.	2023-06-05 10:28:10 +08:00
ashari4	18c97381cd	Detect fake tensor mode if it has already been created. (#16220 ) ### Description <!-- Describe your changes. --> Detect fake tensor mode if it has already been created. Follows this example in pytorch: `86c7652503/torch/_inductor/compile_fx.py (L280)` ### Motivation and Context As of torch nightly 6/2/23, when trying to run a torch dynamo graph on the ORT backend, we observe ``` E torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised: E AssertionError: Mixing fake modes NYI E E E You can suppress this exception and fall back to eager by setting: E import torch._dynamo E torch._dynamo.config.suppress_errors = True ``` The issue is that `ort_backend.py` creates a new fake tensor mode even though one has already been created by torch.	2023-06-02 23:17:49 -07:00
Somdev Sangwan	2e66bc8669	prevent object destruction compile error (#16134 ) ### Description The proposed fix is to store the result of AsBlockSparse() in a variable to ensure the object isn't destroyed until the end of the current scope. ### Motivation and Context "own_buffer_tensor" is a temporary object that is destroyed at the end of the expression and causes a compile error.	2023-06-02 11:19:53 -07:00
Changming Sun	6b5b79872b	Avoid taking dependency on dl.fedoraproject.org (#16202 ) ### Description 1. Avoid taking dependency on dl.fedoraproject.org The website is not very stable. Our build pipelines often fail to fetch packages from there. 2. Update manylinux to the latest version	2023-06-02 07:41:46 -07:00
Changming Sun	7686193c40	Fix DNNL build (#16201 )	2023-06-02 09:46:03 +08:00
Yulong Wang	319a0dc6aa	[js/doc] allow deduplicate opset version (#16182 ) ### Description allow deduplicate opset version in generated document webgpu-operators.md	2023-06-01 17:28:08 -07:00
Dale Phurrough	6e1c3003ff	DML EP and MLAS buffer allocator - increase alignment to 64 bytes for AVX-512 processing (#15141 ) Fixes #13119 top concerns by * using `onnxruntime::AllocatorDefaultAlloc` instead of `malloc` * set `MLAS_DEFAULT_PREFERRED_BUFFER_ALIGNMENT=64` which cascades that value to several members and functions not directly related to MLAS. ### Motivation and Context * Fixes #13119 top concerns. Otherwise, alignment is to 16 bytes circa 1990s 👴 * Does not yet enable flexible alignment. Instead fixed at 64 (64 x 8 bits=512 bits) for modern NN hardware like AVX-512	2023-06-01 16:32:55 -07:00
Adrian Lizarraga	5a4c3b7937	[QNN EP] Support Equal, Less, LessOrGreater, Greater, GreaterOrEqual operators on HTP backend (#16171 ) ### Description - Updates QDQ transformer to handle QDQ logical operators (Equal, Less, LessOrEqual, Greater, GreaterOrEqual). - Expects 2 DQ inputs and no Qs in the output, which is boolean. ### Motivation and Context This is needed to enable QDQ models with logical comparison operators to run on QNN EP.	2023-06-01 15:07:15 -07:00
Hector Li	f72dc198c6	[QNN EP]Add UT for cached Qnn context binary (#16184 ) ### Description 1. Add UT for cached Qnn context binary 2. Minor change: set model path to "" if model_path is not available since the model could be loaded from buffer instead of Onnx file ### Motivation and Context support more scenario --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2023-06-01 14:28:46 -07:00
Changming Sun	5bfa1183d1	Add a Memory Profiling build job in post merge pipeline (#16172 ) ### Description 1. Add a Memory Profiling build job 2. Remove no absl build job since the feature will be removed 3. Simplify post-merge-jobs.yml by unifying the pool names ### Motivation and Context To catch build errors in #16124	2023-06-01 13:00:44 -07:00
Alexander Visheratin	e6c6184fee	[JS/WebGPU] Unsqueeze operator implementation (#16138 ) ### Description This PR adds an implementation of the Squeeze operator to WebGPU JSEP. The implementation follows the [operator schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Unsqueeze). To implement the `Unsqueeze` operator in the same fashion as the `Squeeze`, I added the `ComputeOutputShape()` method to the `UnsqueezeBase` class and made some slight modifications. Please let me know if it is a bad idea and if I should move this method to the JS implementation. I also uncommented test case lines in the `suite-test-list.jsonc` file for both Squeeze and Unsqueeze operators following @hariharans29's [comment](https://github.com/microsoft/onnxruntime/pull/16024#issuecomment-1565113633). ### How was it tested 1. I created a model with only one operator: ```Python import onnx.helper node = onnx.helper.make_node( "Unsqueeze", inputs=["T", "axes"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [2])], [onnx.helper.make_tensor_value_info("y", 1, [3, 1, 4, 5, 1])]) onnx.save(onnx.helper.make_model(graph), "unsqueeze.onnx") ``` 2. I compiled the runtime using @fs-eire's [instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce). 3. I ran the test models in the browser using this minimal setup: ```HTML <html> <script src=".\dist\ort.webgpu.min.js"></script> <script> async function run() { const session = await ort.InferenceSession.create('unsqueeze.onnx', {executionProviders: ['webgpu']}); console.log(session); const input = new ort.Tensor('float32', new Float32Array(60), [3, 4, 5]); const dim = new ort.Tensor('int64', [1n, 4n], [2]); const output = await session.run({ "T": input, "axes": dim }); console.log(output); } run(); </script> </html> ``` ### Motivation and Context Improve operator coverage for WebGPU JSEP.	2023-06-01 12:23:02 -07:00
Changming Sun	5b08176314	Exclude shufflenet from DNNL's model tests (#16126 )	2023-06-01 10:56:24 -07:00
FFFrog	d185bf444d	[CANN] Add IOBinding Support For CANN EP (#15802 ) ### Description Add IOBinding Support For CANN EP ### Motivation and Context Now, Users can use IOBinding feature to speed up the inference on CANN.	2023-06-01 03:13:38 -07:00
FFFrog	8c85d990c2	add third-party pipeline status to README.md (#16155 ) Refer to this [issue](https://github.com/microsoft/onnxruntime/issues/16154), please.	2023-05-31 22:14:39 -07:00
PeixuanZuo	1b518c6836	[ROCm] add early stop to tunable profile progress (#15716 ) For TunableOp, some instance may has very bad performance and it will take a long time during profile process. Add `tunable_op_max_tuning_duration_ms` parameter to limit max tuning time.	2023-06-01 10:18:25 +08:00
pengwa	65b316a138	Consolidate ORTModule logging (#16078 ) ### Consolidate ORTModule logging There are few improvements for ORTModule loggings: - All ORTModule logging are used logger that is initialized in `ortmodule.py`. - Manage all export logs same way, e.g. use ` _logger.suppress_os_stream_output(log_level=self._debug_options.logging.log_level)` to control exporting related logs suppressing or not. If any warning or errors suppressed, `self._warning_log_detected_during_export` will be set to True, then when we log ORTModule feature matrix, we will also told users there are logs suppressed. - Downgrade some warnings. We had some warnings for years, and looks many models have them by default, no action we actually can take, so downgrade them to make user logging cleaner. - PyTorch export requires update of custom export function signature changes, otherwise, _symbolic_context_handler complains with warnings, so update custom export function adaption for version >=1.13 PyTorch. - Add ORTModule feature matrix summary, this is supposed to be only places users see our logs by default (unless they use INFO or VERBOSE). Features ON/OFF states are shown clearly to them in case they want to try some features in OFF states. This logs only shows up in rank 0 (if there are multiple rank), the intention is we want user to see a useful and clean output from ORTModule by default. The outputs shown as below: ![image](https://github.com/microsoft/onnxruntime/assets/10530022/9c6653ac-50fa-4b2d-ba7f-4d5ce44b25b2) ![image](https://github.com/microsoft/onnxruntime/assets/10530022/10dff5a9-2d46-4646-a4b4-2c515566376e) - `reinitialize_ortmodule` in util.py is only used by ortmodule.py, moving it into ortmodule.py, then utils takes no dependency on `orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py`, then `_custom_op_symbolic_registry.py` can call functions defined in utils.py (without recursively include).	2023-06-01 10:09:12 +08:00
Changming Sun	d19e5c0abb	Fix a misaligned error in CUDA GEMM (#16130 ) ### Description Fix an issue that FusedMatMulOpTest.FloatTypeTransposeBatch fails to run on GPUs with TF32 support. Authored-by: Tianlei Wu <tlwu@microsoft.com>	2023-05-31 18:10:17 -07:00
Yulong Wang	f67f7c0f0b	[js/web] disable node fallback in webpack (#16166 ) ### Description disable webpack's polyfill for node's `global`, `__filename` and `__dirname` in web build. This will confuse emscripten generated environment detection. see https://webpack.js.org/configuration/node/	2023-05-31 16:47:00 -07:00
cao lei	13d6ac74de	fix memory profile build (#16177 ) ### Description <!-- Describe your changes. --> This PR is to fix the build break when onnxruntime_ENABLE_MEMORY_PROFILE is on ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This PR is to fix the build break when onnxruntime_ENABLE_MEMORY_PROFILE is on. It fixes this issue https://github.com/microsoft/onnxruntime/issues/16124 Co-authored-by: Lei Cao <leca@microsoft.com>	2023-05-31 16:08:14 -07:00
dependabot[bot]	a55637a103	Bump socket.io-parser from 4.2.2 to 4.2.3 in /onnxruntime/test/wasm (#16067 )	2023-05-31 21:55:00 +00:00
Aung T Naing	3cca32beec	[QNN EP] exapand convolution test coverage. (#15975 ) ### Description <!-- Describe your changes. --> Convolution with Padding and Convolution with large inputs,outputs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is mainly to check the CPU vs QNN EP output mismatch for models. ./onnxruntime_test_all --gtest_filter=.TestQDQConvU8U8S32 Failed tests with mismatch. [ FAILED ] 2 tests, listed below: [ FAILED ] QnnHTPBackendTests.TestQDQConvU8U8S32_large_input1_padding_bias_initializer [ FAILED ] QnnHTPBackendTests.TestQDQConvU8U8S32_large_input2_bias_initializer ./onnxruntime_test_all --gtest_filter=.TestCPUConvf32_ [ FAILED ] QnnCPUBackendTests.TestCPUConvf32_large_input1_pad_bias_initializer	2023-05-31 10:12:35 -07:00
Yi Zhang	e0199cfbd9	extend mac packaging timeout limit (#16173 ) ### Description ### Motivation and Context MacOS_py_wheels are often failed due to timeout	2023-05-31 18:31:28 +08:00
Yulong Wang	ba5f5e3198	[js] allow manually release inference session (#16169 ) ### Description This change adds a new instance function (method) to type `InferenceSession` to allow users to manually release an inference session instance. #16131 depends on this change to work correctly.	2023-05-31 00:31:38 -07:00
PeixuanZuo	3dc5179a36	[ROCm] Change ortmodule test (#15884 ) Change ortmodule test because rocm ep behaves differently than cuda. The warning from torch `The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead.` appears twice on ROCm EP. On ROCm EP, the log is shown as below: ``` The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead. The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead. User Module's attribute name _torch_module collides with ORTModule's attribute name. User Module's attribute may not be returned when trying to retrieve the attribute through ORTModule. User Module's attribute name load_state_dict collides with ORTModule's attribute name. User Module's method may not be called upon invocation through ORTModule. ```	2023-05-31 15:14:10 +08:00
dependabot[bot]	03216e2313	Bump socket.io-parser from 4.2.2 to 4.2.3 in /js/web (#16068 )	2023-05-31 02:15:23 +00:00
Baiju Meswani	7edc4b105d	Copy missing training header files to the package archive (#16119 )	2023-05-30 16:45:40 -07:00
RandySheriffH	2802614846	Condition the usage of variadic callback by version (#16112 ) For older versions of custom ops, optional and variadic callbacks are null pointers, hence adding conditions to scope the usage. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-05-30 16:43:22 -07:00
Yulong Wang	ebe715a817	[js/webgpu] fix RangeError in buffer download (#16165 ) ### Description this is a following up fix for #15990, which should resolve the RangeError issue.	2023-05-30 15:04:50 -07:00
Sunghoon	bf05d4ec26	Fix nightly ort CI pipeline (#16162 ) This PR changes [night ort CI pipeline](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198) to pick up the latest night ACPT image, which was changed from torch 2.0.0.dev to torch 2.1.0.dev.	2023-05-30 14:00:34 -07:00
Xavier Dupré	e726151b5c	Introduce float 8 types (#14731 ) ### Description The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA API to cast float/half to float8 if CUDA>=11.8, a custom implementation if CUDA<11.8. * It implements, Cast, QuantizeLinear, DequantizeLinear for all types on CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA. * It extends the supported types for control flow operator, Shape, Reshape, Identity, If, Loop, Scan, Reshape * It implements Equal(19). * Cast, QuantizeLinear, DequantizeLinear operators now support a parameter `saturate` only valid for float 8 types. It is true by default. In that case, any value out of range is converted into the maximum float 8 value. If false, it is infinite. * QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA (and ROCm by extension), scale = 1D tensor with one scale per channel ### Motivation and Context Supports latest onnx version. Fixes [AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395) --------- Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-05-30 13:25:58 -07:00
神楽坂帕琪	abd94b65b7	eigen.cmake use url info from deps.txt (#16129 ) ### Description `eigen.cmake` use url info provided by deps.txt instead of using raw url.	2023-05-30 11:07:20 -07:00
mindest	90e8c8daaf	profile_explorer: add op-kernel correlation info (#15946 ) ### Description <!-- Describe your changes. --> * Add aggregated op-kernel correlation information in profiler explorer when running inference session. * Add filtering feature so that we can focus on model runs of interest (excluding warmup steps, etc.)	2023-05-30 23:25:43 +08:00
Yi Zhang	31fc25d2c2	[Fix] Check if CUDA is downloaded in AGENT_TEMPDIRECTORY (#16142 ) ### Description supplement of #15915 ### Motivation and Context fix nuget pipeline exception in the stage of Final_Jar_Testing_Windows_GPU ``` JUnit Jupiter:ProviderOptionsTest:testCUDAOptions() MethodSource [className = 'ai.onnxruntime.providers.ProviderOptionsTest', methodName = 'testCUDAOptions', methodParameterTypes = ''] => ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1131 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\cloudtest\AppData\Local\Temp\onnxruntime-java17193857285260738736\onnxruntime_providers_cuda.dll" ``` ### Verification https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313476&view=results	2023-05-30 13:14:08 +08:00
Jian Chen	6abdc3a87b	Fix static analysis bug (#16114 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-28 10:58:07 -07:00
Yi Zhang	73584f9360	More fixes on nuget pipeline (#16091 ) ### Description 1. parameters couldn't using string to comprare, change it to boolean. 2. Windows_CI_GPU_DML_DEV_arm64 on the pool onnxruntime-Win-CPU-2022 failed to pass prefast step, change the pool to aiinfra-dml-winbuild. 3. skipped test_zfnet512, it's failed in Nuget_Test_Win_Training_CPU Todo Only Final_Jar_Testing_Windows_GPU failed now. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=313042&view=logs&s=d66543d5-16de-5a48-6ecb-a36e21ff8d4d&j=d9489789-5e39-5a05-13ab-9aaf7b4d386f	2023-05-27 08:59:12 +08:00
Alexander Visheratin	415c26e46e	[JS/WebGPU] Squeeze operator implementation (#16024 ) ### Description This PR adds an implementation of the `Squeeze` operator to WebGPU JSEP. The implementation follows the [operator schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Squeeze) and allows one or two inputs. ### How was it tested 1. I created two models. Without `axes`: ```Python import onnx.helper node = onnx.helper.make_node( "Squeeze", inputs=["T"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 1, 4, 5])], [onnx.helper.make_tensor_value_info("y", 1, [3, 4, 5])]) onnx.save(onnx.helper.make_model(graph), "squeeze.onnx") ``` And with `axes`: ```Python import onnx.helper node = onnx.helper.make_node( "Squeeze", inputs=["T", "axes"], outputs=["y"], ) graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 1, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [1])], [onnx.helper.make_tensor_value_info("y", 1, [3, 4, 5])]) onnx.save(onnx.helper.make_model(graph), "squeeze-dim.onnx") ``` 2. I compiled the runtime using @fs-eire's [instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce). 3. I ran the test models in the browser using this minimal setup: ```HTML <html> <script src=".\dist\ort.webgpu.min.js"></script> <script> async function run() { const session = await ort.InferenceSession.create('squeeze-dim.onnx', {executionProviders: ['webgpu']}); console.log(session); const input = new ort.Tensor('float32', new Float32Array(60), [3, 1, 4, 5]); const dim = new ort.Tensor('int64', [-3n], [1]); const output = await session.run({ "T": input, "axes": dim }); console.log(output); } run(); </script> </html> ``` ### Motivation and Context Improve operator coverage for WebGPU JSEP.	2023-05-26 15:53:05 -07:00
Scott McKay	5e41d1600a	Add new QNN CIs to azp run tool (#16109 ) ### Description <!-- Describe your changes. --> Add 2 new QNN CIs to tools/python/run_CIs_for_external_pr.py ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Update tool so it runs all current CIs	2023-05-27 08:46:16 +10:00
Dmitri Smirnov	9939092e71	[CPP API]Fix constness in C++API (#16103 ) ### Description `CreateMap` and `CreateSequence` should be able to take in const data.	2023-05-26 14:09:00 -07:00
Jeff Bloomfield	54fdb640fe	Address performance regression with duplicate initializers across DML partitions (#16087 ) This addresses a DML performance regression introduced by the constant sharing pass. The constant sharing pass identifies small initializer tensors which contain identical values and merges them. This could have the effect of causing DML to treat those tensors as non-constant and skip certain optimization. To prevent this, there is now an element count threshold below which the DML EP will enable this optimization, even though it results in duplicate work uploading and pre-processing the common tensor at multiple operators.	2023-05-26 13:37:34 -07:00
Changming Sun	a5410515ad	Fix: Some fields in OrtCUDAProviderOptionsV2 struct are not initialized (#16113 ) ### Description The file include/onnxruntime/core/providers/cuda/cuda_provider_options.h is a C++ file. It is not for C. Before this commit, this header file is already not compatible with C compilers. Because it has: ``` onnxruntime::ArenaExtendStrategy arena_extend_strategy; ``` And this file is intended to be internal only. It is an internal header file. It should not be included in onnxruntime_c_api.h and should not be used with the public C APIs. User can only get the instance of OrtCUDAProviderOptionsV2 via CreateCUDAProviderOptions. In such a way we can add new members to this struct without breaking binary compatibility. Since it is an internal header, we can safely use C++ grammar there.	2023-05-26 11:34:22 -07:00
cao lei	4ab7d410ae	ExecutionProvider API refactor - Deattach allocator from EP by creating local cpu allocator instead (#16084 ) ### Description ExecutionProvider API refactor - Detach allocator from EP by creating local cpu allocator instead ### Motivation and Context This is PR is a refactor to create local CPU allocator instead of getting allocator from ExecutionProvider, which the final goal is to totally detach allocators from ExecutionProvider, and put them in session level indexed by OrtDevice	2023-05-26 04:54:42 -07:00

1 2 3 4 5 ...

8929 commits