onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Baiju Meswani	bb33285ec2	C# training api updates for on device training (#15720 )	2023-05-01 10:01:38 -07:00
shalvamist	c10a6a9d17	Tensor <--> image - Adding per channel compute for Norm mean & Bias (#14705 ) ### Description Enabled the use of per channel Bias and Mean normalization when converting an image <--> tensor. Added a few bug fixes and updates to the relevant E2E tests. --------- Co-authored-by: shalvamist <shalva.mist@microsoft.com>	2023-05-01 09:37:50 -07:00
Scott McKay	0770cf3699	Remove C# SessionOptions.RegisterCustomOpsUsingFunction. (#15754 ) Symbol visibility from DllImport is inconsistent across platforms resulting in the symbol not necessarily being visible to ORT native code that tries to look it up by name. Best solution is to use DllImport to load the library and to call the registration function directly. That requires the native SessionOptions handle and OrtApiBase struct. We could either make those public, or provide a helper where the user passes in a delegate from their DllImport. Can add when needed.	2023-05-01 09:14:21 -07:00
RandySheriffH	cdf4fc49fc	Implement lite custom op API (#15590 ) Implement a set of new APIs for lightweight custom ops registration, to save efforts on schema-composing. A few highlights: 1. Support build-time type inference; 2. Support function-as-op for "stateless" ops; 3. Support structure-as-op for "stateful" ops; 4. Support varied input/output forms such as span, scalar, and tensors, either optional or non-optional. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-05-01 08:45:26 -07:00
Chen Fu	0e9472d391	NHWC graph optimizer (#15724 ) ### Description Augment nhwc graph optimizer to accommodate fp16 operators. ### Motivation and Context With new fp16 conv operator added. This operator prefers NHWC data layout. We need to augment existing graph optimizers to better utilize the new operator.	2023-05-01 08:44:07 -07:00
Chunye Wang@AMD	d35850c142	[VitisAI]Update VitisAI EP to be compatible with VitisAI 3.5 (#15673 ) ### Description Originally VitisAI EP only works with old version of VitisAI release. ### Motivation and Context Update VitisAI EP so that it works together with the current VitisiAI 3.5 and further version of VitisAI. We try our best to make it forward compatible. --------- Co-authored-by: Wang Chunye <chunywan@xilinx.com> Co-authored-by: mingyue <mingyue@amd.com> Co-authored-by: mingyueliuh <131847423+mingyueliuh@users.noreply.github.com> Co-authored-by: liumingyue <mingyue@xilinx.com> Co-authored-by: moore-ch <129165652+moore-ch@users.noreply.github.com> Co-authored-by: shoucair <shoucai.ren@amd.com> Co-authored-by: zz002 <zhenze.wang@amd.com> Co-authored-by: BoarQing <yuz75@Pitt.edu> Co-authored-by: Yueqing Zhang <yueqingz@amd.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-05-01 08:28:26 -07:00
Jeff Bloomfield	3df3a85114	Default kOrtSessionOptionsDisableQuantQDQ to 1 when the DML EP is registered (#15725 ) This addresses a performance regression in some INT8 models with the DirectML EP by defaulting OrtSessionOptionsDisableQuantQDQ to 1 when the EP is registered. This regression occured due to the introduction of the QDQ propagation transformer, which is based on this session option. That transformer maximizes the number of nodes which are executed as quantized by logically propagating quantize operators upstream and dequantize operators downstream. However, it does this simply by inserting QDQ pairs, with an expectation that something will recognize sequences of DQ->Op->Q. This logic and related L2 transformers are not currently enabled for the DirectML EP. This change also removes a noisy warning when the session option for memory pattern is overriden as the DirectML EP is registered.	2023-05-01 08:26:03 -07:00
Tianlei Wu	10dff4f665	only add type info from symbolic shape inference for fp16 conversion (#15617 ) ### Description Walkaround of https://github.com/microsoft/onnxruntime/issues/15521. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-30 23:22:11 -07:00
Chi Lo	6e652d0554	Support explicit TRT profiles from provider options (#15546 ) Previous behavior of TRT EP to set TRT optimization profiles for dynamic shape input is based on input tensor values. Users can't explicitly specify the profiles. This PR makes users capable of specifying min/max/opt profiles through newly added three provider options: `trt_profile_min_shapes`, `trt_profile_max_shapes` and `trt_profile_opt_shapes` with the format of "input1:dim1xdim2...,input2:dim3xdim4...". (Note: It's similar to --minShapes, --maxShapes and --optShapes of trtexec command-line [flags](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec-flags)) For example, if you are using onnxruntime_perf_test, you can try this: `./onnxruntime_perf_test -e tensorrt -r 1 -i "trt_profile_min_shapes\|imgs:1x3x384x288 trt_profile_max_shapes\|imgs:32x3x384x288 trt_profile_opt_shapes\|imgs:16x3x384x288" your_model_path` If the engine cache is enabled, you still need to provide these three explicit provider options in order to use this feature. ORT TRT will compare the min/max/opt profile shape with the ones saved in .profile file to decide whether to rebuild the engine. Constraints to use these provider options: (1) Need to specify min/max/opt profile shapes for all the dynamic shape input This feature is also requested by other users: https://github.com/microsoft/onnxruntime/issues/13851	2023-04-30 22:30:26 -07:00
Scott McKay	31e7d3d7d4	Disable TestRegisterCustomOpsWithFunction on Linux (#15747 ) ### Description <!-- Describe your changes. --> Disable new test that is failing on linux. Not required for this release. Will fix in the next week. Marshal.Prelink can be used on Windows to make the symbol available but Linux appears to work differently. Also need to update the pre-checkin tests so this is tested early as it's only failing in the E2E tests run in the packaging pipeline. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix packaging pipeline error.	2023-04-30 14:39:02 +10:00
Changming Sun	176161348e	Revert "make nuget workflow easy to debug. (#15693 )" (#15744 ) This reverts commit `53ff50d19a` because it make the nuget pipeline fail.	2023-04-29 19:05:01 -07:00
kunal-vaishnavi	7ae01cec15	Update wheel path to Whisper custom export script (#15739 ) ### Description This PR updates the documentation for using the Whisper custom export scripts via the wheel. ### Motivation and Context The path should say `onnxruntime.transformers.models.whisper.convert_to_onnx` instead of `onnxruntime.transformers.models.convert_to_onnx`.	2023-04-29 17:32:34 -07:00
sfatimar	4fbc08e3c2	VPUX config fix and dynamic_shape bug fixed. (#15737 ) Dynamic shapes was not working with serialized model so we are switching to compile model method ### Motivation and Context Dynamic shapes was not working with serialized model - If it fixes an open issue, please link to the issue here. --> Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>	2023-04-29 15:48:34 -07:00
Adrian Lizarraga	d32c540b2d	[QNN EP] Support LRN operator (#15741 ) ### Description Adds support for the LRN operator to QNN EP. ### Motivation and Context Enables basic models like googlenet and alexnet to run entirely on QNN EP.	2023-04-29 13:23:42 -07:00
Changming Sun	65020d433e	Prefast fixes for CUDA EP (#15726 ) ### Description 1. Adjust cmake flags. Do not modify CMAKE_CXX_FLAGS globally. Only apply the flags to ORT code. 2. Fix some SDL warnings.	2023-04-29 12:43:12 -07:00
Jian Chen	ec2f038c6d	Update Nuget pipeline's Linux CUDA job to cuda 11.8 (#15516 ) ### Description Fixed AB#14497	2023-04-29 07:38:18 -07:00
Rachel Guo	c8bd34f975	[js/rn] Package dependency change to manage ort-extensions for react_native app (#15641 ) ### Description <!-- Describe your changes. --> js/react_native package dependency change to manage ort-extensions for react-native app. Enable optional inclusion of ort-ext aar/ ort-ext pods for react-native extensions apps when specifiy `ortExtensionsEnabled` in user's package.json file ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-04-29 00:07:12 -07:00
Yuhong Guo	41dcf0d32e	Expose build information in dynamic lib (#15643 ) ### Description <!-- Describe your changes. --> 1. Add Build Info API to onnx. 2. Fix compile error while building onnxruntime_benchmark in MacOs. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> 1. When Onnxruntime lib is serving online, we need a way to detect how this lib is built. This PR helps the developer to get the build information using `strings` such as git branch, git commit id, build type and cmake cxx flags, which is showed as follows. ![image](https://user-images.githubusercontent.com/19584326/233794371-b2f95a2c-27fb-4709-a6dd-bf4bb12b0b5b.png) ![image](https://user-images.githubusercontent.com/19584326/233794360-f96f5d2e-332c-405c-83f1-370ccc2b86f8.png) If the build env has no git, there will be no git related infor: ![image](https://user-images.githubusercontent.com/19584326/234558596-298c1b01-9a90-41bf-9372-7259a8f8e5be.png) 3. Fix the following compile error while building benchmark in MacOs. ![image](https://user-images.githubusercontent.com/19584326/233793571-c261ac1f-47b2-434d-a293-7e9edc6c8a66.png) --------- Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-04-28 21:57:31 -07:00
Adrian Lizarraga	191deb4235	[QNN EP] Nuget package (#15711 ) Adds pipeline for QNN NuGet package (x64 and arm64).	2023-04-28 19:33:14 -07:00
Rachel Guo	6a6091a519	[rn] Add support for loading model from buffer on iOS (#13802 ) ### Description <!-- Describe your changes. --> -Add support for loading model from buffer on iOS -Update OnnxruntimeModuleTest to use updated loadModelFromBuffer Based on #12676 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Issue: #12500 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-04-28 17:34:26 -07:00
kunal-vaishnavi	fe1ddd7b61	Fix bug when adding Whisper to wheel (#15708 ) ### Description This PR adds `onnxruntime.transformers.models.whisper` to the wheel. ### Usage There is a README.md document that shows sample commands. The following command will show how to use the custom Whisper export script in more detail. ``` $ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx --help ``` ### Motivation and Context This fixes an issue with adding the Whisper custom export scripts to the wheel. The Whisper folder now appears in the wheel. ![Screenshot 2023-04-26 143705](https://user-images.githubusercontent.com/115581922/234708587-6d1b7d34-71a9-4f9f-a491-657ceb25afcb.jpg)	2023-04-28 16:03:55 -07:00
liqun Fu	2802c547a1	update OnnxMl.cs (#15702 )	2023-04-28 11:20:29 -07:00
pengwa	29d13cea42	Cumulative update on optimizers and tests (on-device training) (#15499 )	2023-04-28 09:55:39 -07:00
Adam Pocock	8a1a40ac63	[Java] CheckpointState AddProperty & GetProperty support (#15730 )	2023-04-28 09:52:52 -07:00
Chen Fu	be08b47e7b	Refine cast optimizer for safety (#15658 ) ### Description Cast optimizer may convert a fp16 node to fp32. This used to be safe as all fp16 kernels has fp32 implementation. As this assumption is no longer true, we need to check the validity of the operation ### Motivation and Context Main work here is to introduce an API to check whether a kernel is registered. Currently we don't have a way to do that without an operator node. This needs to be augmented. We need to query whether a kernel is registered by its property only, so that we can judge whether it is safe to construct a node long before we actually do so.	2023-04-28 09:32:54 -07:00
Edward Chen	c415bc725f	Add 'name' key to xcodebuild 'destination' option. (#15690 )	2023-04-28 08:52:18 -07:00
Jian Chen	c401cf4b51	Fix issue there 9573-quantizing-distilbert-models-after-optimizing-wi… (#15659 ) …th-ort-leads-to-invalid-node-input-names ### Description Fix issue where Quantizing DistilBERT models after optimizing with ORT leads to invalid node input names ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-28 08:45:20 -07:00
Scott McKay	7e6331d5c7	Add ability to register custom ops from ORT extensions nuget package (#15696 ) ### Description <!-- Describe your changes. --> Add infrastructure so it's easy for a user to add the ORT extensions nuget package and register the custom ops for C# apps. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Need to be able to use extensions on mobile platforms with Xamarin/MAUI	2023-04-28 18:53:02 +10:00
Yulong Wang	94c9a31f83	[js/webgpu] fix download failure due to buffer change (#15723 ) ### Description fix download failure due to buffer change. WebAssembly buffer may change (growth triggered by memory allocation) during an async function call.	2023-04-28 00:16:31 -07:00
Linnea May	2c3697be00	User/linneamay/reduce 18 (#15701 ) ### Description <!-- Describe your changes. --> Add registration for DML reduce functions in opset 18. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Linnea May <linneamay@microsoft.com>	2023-04-27 20:32:11 -07:00
Changming Sun	5b826b1bc3	Update cmake version in Linux build (#15707 ) ### Description All our Windows build pipelines already uses cmake 3.26 except one pipeline: QNN ARM64. This PR does the same for Linux build pipelines. ### Motivation and Context This change is related to #15704 .	2023-04-27 20:02:33 -07:00
Edward Chen	9db24f8fec	Update kernel registration validation to allow kernel registrations to appear in arbitrary order. (#15705 ) The validation script will now sort them by increasing opset order before processing them.	2023-04-27 18:49:31 -07:00
kunal-vaishnavi	39d6d7050d	Change EmbedLayerNormalization mask index output to optional (#15526 ) ### Description This PR changes an EmbedLayerNormalization node's mask index output to be an optional output if a mask input is not provided. ### Motivation and Context The documentation for EmbedLayerNormalization states ``` The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words) will be calculated. ``` However, if the mask input is not provided, the mask index output is still calculated and required.	2023-04-27 16:32:42 -07:00
Yulong Wang	d471432e10	[js/webgpu] fix attribute cache key for 2 operators (#15710 ) ### Description fix attribute cache key for LeakyRelu and ThresholdedRelu	2023-04-27 15:04:33 -07:00
Yulong Wang	c0116af619	[js/webgpu] operator Exp (#15713 ) ### Description operator Exp	2023-04-27 15:04:09 -07:00
Tang, Cheng	627f5c9767	support allgather on different axis (#15610 ) ### Description Extend the AllGather op to support perform allgather on different axis. provide the implementation in nccl kernels. ### Motivation and Context We hit some scenario in distributed inference that we need to support gather on non-first axis. --------- Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>	2023-04-27 14:47:28 -07:00
Sheil Kumar	5bde1e8e37	Add Bluestein Z-Chirp Algorithm to DirectML DFT implementation (#15686 ) Add Bluestein Z-Chirp Algorithm to DirectML DFT implementation This will enable STFT and DFT on signals which have non-powers of 2.	2023-04-27 14:03:40 -07:00
Adrian Lizarraga	be5c582e65	[QNN EP] Update to QNN SDK 2.9.0 (#15709 ) ### Description - Update to QNN SDK 2.9.0 for QNN pipelines - Temporarily disable warnings as errors for QNN Windows x64 pipeline - Note that this pipeline did not previously run to completion. It also currently does not run for pull requests. ### Motivation and Context Need to update and test the latest available version of the QNN SDK.	2023-04-27 13:44:09 -07:00
RandySheriffH	9773e76c44	Single-schema-multi-kernel (#15184 ) The PR is to allow custom op of different input types to have same op name in a graph. The idea to go over all ops of same name and merge their input/output types into a type-inference function. With the enhancement, custom op node inside a graph can have same op-type given that the input/output types are different. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-04-27 13:39:59 -07:00
Changming Sun	d3d232b047	Rename onnxruntime-Linux-CPU-2019 machine pool (#15691 ) Rename onnxruntime-Linux-CPU-2019 machine pool to "onnxruntime-Ubuntu2004-AMD-CPU". The old one has an internal error and stuck there. I cannot make any change to it. It has been like this for more than 1 week. So I created a new pool with the same setting except the name is different. Also, move some android pipelines to "onnxruntime-Linux-CPU-For-Android-CI" which uses a standard image from https://github.com/actions/runner-images	2023-04-27 12:46:18 -07:00
Chi Lo	a957a872d3	Patch fix for the newly added TRT EP provider options (#15687 ) We missed some code change with recently added TRT EP provider options	2023-04-27 10:36:01 -07:00
Changming Sun	d3e8d7a70d	Better support for cmake 3.26 and Windows ARM64 (#15704 ) ### Description In #8953 I introduced a change in our onnxruntime_mlas.cmake that it enables "ASM_MASM" cmake language for all Windows build. ```cmake enable_language(ASM_MASM) ``` Before the change, it is only enabled when onnxruntime_target_platform equals to x64. However, cmake 3.26 added a new language: ASM_MARMASM. According to cmake's manual, ASM_MASM is for Microsoft Assembler ASM_MARMASM is for Microsoft ARM Assembler. This one is new in cmake 3.26. We should choose the right one according to ${onnxruntime_target_platform}.	2023-04-27 10:25:45 -07:00
yf711	2e1f92a986	Fix EP Perf pipeline (#15507 ) ### Description * Update TensorRT 8.6 lib dependencies in dockerfile of TRT EP Perf pipeline * Avoid using `--allow_running_as_root` and build ORT with non-root user ### Motivation and Context To fix the build issue on EP perf pipeline Fixed [AB#14615]	2023-04-27 10:09:14 -07:00
Yi Zhang	8cda1ffa28	Fix error in post-merge pipeline (#15717 ) ### Description Get the right drive letter on Windows ### Motivation and Context Build Directory might be in drive C	2023-04-27 10:05:15 -07:00
cloudhan	a952419674	[ROCm] Fix FusedConv to stop caching fusion args (#15671 ) The follow code shows ROCm EP FusedConv produce incorrect results: ```py import numpy as np import onnx import onnxruntime as ort X = onnx.helper.make_tensor_value_info("input", onnx.TensorProto.FLOAT, [1, 64, 55, 55]) a = onnx.helper.make_tensor_value_info("tmp", onnx.TensorProto.FLOAT, [1, 64, 55, 55]) Y = onnx.helper.make_tensor_value_info("output", onnx.TensorProto.FLOAT, [1, 64, 55, 55]) weight_data = np.random.random([64, 64, 1, 1]).astype(np.float32) weight1 = onnx.helper.make_tensor("weight1", onnx.TensorProto.FLOAT, [64, 64, 1, 1], weight_data) bias_data = np.random.random(64).astype(np.float32) bias1 = onnx.helper.make_tensor("bias1", onnx.TensorProto.FLOAT, [64], bias_data) weight_data = np.random.random([64, 64, 1, 1]).astype(np.float32) # <------ comment out weight2 = onnx.helper.make_tensor("weight2", onnx.TensorProto.FLOAT, [64, 64, 1, 1], weight_data) bias_data = np.random.random(64).astype(np.float32) # <------ comment out bias2 = onnx.helper.make_tensor("bias2", onnx.TensorProto.FLOAT, [64], bias_data) node1 = onnx.helper.make_node("FusedConv", inputs=[X.name, weight1.name, bias1.name], outputs=[a.name], domain="com.microsoft", kernel_shape = [1,1], activation="Relu") node2 = onnx.helper.make_node("FusedConv", inputs=[a.name, weight2.name, bias2.name], outputs=[Y.name], domain="com.microsoft", kernel_shape = [1,1], activation="Relu") graph = onnx.helper.make_graph([node1, node2], "Graph", [X], [Y], initializer=[weight1, bias1, weight2, bias2]) model = onnx.helper.make_model(graph, producer_name="tmp", opset_imports=[ onnx.helper.make_opsetid('com.microsoft', 1), onnx.helper.make_opsetid('ai.onnx.ml', 1), onnx.helper.make_opsetid('', 14), ]) sess0 = ort.InferenceSession(model.SerializeToString(), providers=["CPUExecutionProvider"]) sess1 = ort.InferenceSession(model.SerializeToString(), providers=["ROCMExecutionProvider"]) ref = sess0.run(["output"], {"input" : 0.05 * np.ones([1, 64, 55, 55], dtype=np.float32)})[0] our = sess1.run(["output"], {"input" : 0.05 * np.ones([1, 64, 55, 55], dtype=np.float32)})[0] print(ref - our) ``` The root cause is that fusion args is cached together with fusion plan. It seems that internal to MIOpen, the `miopenOperatorArgs_t` handle is copied directly to execution engine, instread of the content of a `miopenOperatorArgs_t`. If two ORT `OpKernel`s have the same conv kernel spatial dimension and strides, etc, we then get the same hash for the fusion plan, thus we also get the same fusion args handle. Then the second node of `FusedConv` may modify the fusion args on the fly when it is still pending execution for first node of `FusedConv` internal to MIOpen. This PR moves the fusion args out of fusion plan cache to avoid the problem.	2023-04-27 23:20:25 +08:00
pengwa	2efb75bfe9	Fold shape related operation (#14936 ) ### Fold shape related operation at best efforts. This is a follow up for PR https://github.com/microsoft/onnxruntime/pull/12561. Create a specialized shape_optimzer to constant fold shape related operation. ShapeOptimizer at the best efforts to constant fold the dim values that exists from shape inferencing. This is helpful to simplify the graph, which on the other hand, help other graph transformers to do more. Transformer that traverses the graph top-down and performs shape optimizations. Try the best effort to constant fold the shape related to Shape node outputs: 1. Shape generates 1D tensor [12, 128, 512] (all dimensions have concrete dim value), which can be constant folded to an initializer including 1D tensor values [12, 128, 512]. (Some logic of ConstantFolding also does the same thing.) 2. Shape generate 1D tensor [batch_size, 128, 512] -> Slice(start=1,end=3), we can constant fold the Shape->Slice to an initializer including 1D tensor values [128, 512]. 3. Shape generate 1D tensor [batch_size, 128, 512] -> Gather(axes=[0], index=[2]), we can constant fold the Shape->Gather to an initializer including 1D tensor values [512]. 4. Shape 15 takes input of shape [batch_size, 128, 512], slicing from 1 to 2(exclusive), we can constant fold the Shape15(start=1,end=2) to an initializer including 1D tensor values [128]. This would help clean up the graph, combined with ConstantFolding, the graph would be much more simplified. ### Motivation and Context One direct motivation to have this is, we have a model subgraph like this: ![image](https://user-images.githubusercontent.com/10530022/223390243-47b13922-4340-4999-9637-f52a33f69a2d.png) The subgraph in the green rectangle is trying to get the value `30522`, with the changes in this PR, the subgraph will be constant folded. Plus ConstantFolding optimizer will further to optimize out the subsquent `Squeeze`/`Unsqueeze`/`ConcatTraining`, then we will have a clean very clean Reshape node, with its shape input be an constant `[-1, 20522]`. Having this simplified graph, our other compute optimizer can help further optimize the graph by re-ordering gather/reshape nodes.	2023-04-27 18:59:28 +08:00
Yi Zhang	53ff50d19a	make nuget workflow easy to debug. (#15693 ) ### Description Add parameters to make some stages could use other run's intermediate output. ### Motivation and Context nuget workflow has 38 stages of 4 layers. We had to run the whole workflow from begining to test one stage. It could make life easier to run only one stage for testing. like ![image](https://user-images.githubusercontent.com/16190118/234453721-e6e9a4bd-5e0b-4101-a18e-d5cf60615c9f.png) ### N.B. In this PR, Nuget_Test_Linux_CPU, Nuget_Test_LinuxGPU and Jar_Packaging_GPU are enabled as the first step. So I can start to move tests from Linux host to container	2023-04-27 14:54:14 +08:00
Ted Themistokleous	926ae7d786	Add updated skipped test for multiheadattention Packed KV & QKV (#15587 ) Adds skip for MIGraphX EP builds for Packed KV and QKV tests in Multi Head attention. As it is not supported and causes CI failures when building and testing EPs --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com>	2023-04-27 10:31:53 +08:00
Changming Sun	e63bb5acef	Fix a memory leak in QGemm (#15703 ) ### Description The BufferUniquePtrs in the old code doesn't have knowledge of the allocator where the allocated memory was from, so it cannot free the memory.	2023-04-26 18:48:00 -07:00
Rachel Guo	740d553c42	[rn] Reland support loading model from buffer for Android (#14514 ) ### Description <!-- Describe your changes. --> Reland previous reverted changes for loading model from buffer - Android ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #13903 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-04-26 16:53:17 -07:00

1 2 3 4 5 ...

8703 commits