onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-07 00:13:17 +00:00

Author	SHA1	Message	Date
Zhang Lei	bd39c8f35e	Fix causual flash attention related kernel run (#14299 )	2023-01-13 21:40:22 -08:00
Yufeng Li	8824f812e0	optimize topk for greedysearch (#14271 ) Optimize top 1 computation in greedysearch. For vocabulary size 50k on A100, - batch size 1: from 220us to 10.4us. - batch size 4, from 230us to 11.5us. For generation of 50 tokens for example, it saves 50*0.2ms = 10ms.	2023-01-13 15:03:49 -08:00
JiCheng	4f309f05ca	[CPU] Resize of Opset 18 (#13890 ) ### Description To Implement Resize 18. This PR depends on https://github.com/microsoft/onnxruntime/pull/13765. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-01-14 08:57:23 +10:00
Adam Louly	f0555eb437	Improved test cases by using paramerters (#14246 ) ### Description Completing some missing parts of some test cases for python bindings ### Motivation and Context Some test cases like test_training_module_checkpoint and test_optimizer step were not completed before because we had no access to parameters to check if the parameters are changing after the optimizer step or that the checkpoint saved parameters remains the same. now that we have access to the vector or parameters by exposing get_contiguous_parameters() method. we can complete the tests.	2023-01-13 12:54:23 -08:00
Yilun Huang	6ac7c894bf	[bug fixed] use different node names for different dedicated QDQ pairs (#14258 ) ### Description <!-- Describe your changes. --> Bug fixed: Quantized models cannot be loaded into ort.InferenceSession when DedicatedQDQPair is True in extra_options of QDQQuantizer. Solutions: Add postfix to node names of dedicated QDQ pairs similar to tensor names of them. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Loading quantized model fails when setting `DedicatedQDQPair` to `True` in `extra_options` and raise an error as below: ``` Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mobilenetv2-opset10-quantized-dedicated.onnx failed:This is an invalid model. Error: two nodes with same node name (489_QuantizeLinear). ``` After visualizing the quantized model using netron, we can find that both the dedicated QDQ pairs for tensor 489 have the same node names of "489_QuantizeLinear". So I found that in QDQQuantizer, there is no unique postfix for the node names of dedicated QDQ pairs. <img width="1171" alt="image" src="https://user-images.githubusercontent.com/12782861/212010296-f8cc05ce-c20e-4189-a692-aaf4bbac3a29.png"> Therefore, I add postfix to node names of QDQ pairs similar to doing so to tensor names. After this modification, the quantized model can be loaded successfully and dedicated QDQ pairs have different node names.👌🏻 <img width="1037" alt="image" src="https://user-images.githubusercontent.com/12782861/212010594-78eba39d-eab6-4d77-9ecd-b55f5303bcf4.png">	2023-01-13 11:24:54 -08:00
Scott McKay	114f18357a	Add Java and Objective-C bindings for RegisterCustomOpsUsingFunction. (#14256 ) Description Add bindings for Android and iOS. Motivation and Context Enable mobile app linking against ort-extensions library and registering the custom ops with ORT.	2023-01-13 09:04:26 -08:00
Xavier Dupré	a909cc0e1b	Improves parallelization by trees for TreeEnsemble (#13835 ) ### Description If the number of trees is >= 100 and batch size >= 2000, the parallelization by tree becomes slower than the parallelization by rows. However, by applying the parallelization by trees over smaller chunks of data, it is still better than the parallelization by rows. The following script was used to measure the performance [plot_gexternal_lightgbm_reg_per.zip](https://github.com/microsoft/onnxruntime/files/10149092/plot_gexternal_lightgbm_reg_per.zip) with different thresholds. The graph were produced by the script following the graph. * //N means parallelization by rows * //T means parallelization by trees * //T-128 means parallelization by trees every batch of 128 rows. * //T-1024 means parallelization by trees every batch of 1024 rows. The following graphs shows that the parallelization by trees is better than the parallelization by rows on small batches only. It is also better to split the input tensor by chunks of 128 rows and parallelize by trees on every chunk of 128 rows. The proposed changes implements that optimization. It applies the same idea even when there is only one thread. It also makes sure one thread is used when the user only wants one. ![image](https://user-images.githubusercontent.com/22452781/205505093-6d04c684-80a3-40b4-b2a5-ca1bcee5f7d2.png) ```python import pandas import matplotlib.pyplot as plt filenames = [ ("//N",r"plot_gexternal_lightgbm_reg_per_N.csv"), ("//T", "plot_gexternal_lightgbm_reg_per_T.csv"), ("//T-128", "plot_gexternal_lightgbm_reg_per_128.csv"), ("//T-1024", "plot_gexternal_lightgbm_reg_per_1024.csv"), ] dfs = [] for name, filename in filenames: df = pandas.read_csv(filename) for c in df.columns: if "batch" in c: df[f"-{name}-{c}"] = df[c] dfs.append(df) df = dfs[0][["N"]].copy() for _df in dfs: for c in _df.columns: if c[0] == "-": df[c] = _df[c].copy() fig, ax = plt.subplots(1, 3, figsize=(14, 6)) Ts = [50, 500, 2000] ga = df.set_index("N") for i, nt in enumerate(Ts): cs = [c for c in ga.columns if c.endswith(f"-{nt}")] ga[cs].plot(ax=ax[i], title=f"Trees={nt}", logy=True, logx=True) ``` Below the performance gain for the monothread implementation by looping on data in the inner loop. ![image](https://user-images.githubusercontent.com/22452781/207379886-10540b53-d66f-4103-937a-15074154c166.png) ### Motivation and Context Performance. Signed-off-by: xadupre <xadupre@microsoft.com>	2023-01-13 10:03:10 +01:00
PeixuanZuo	d3a09cf77f	[ROCm] use pytest-xdist for fast pytest (#14261 ) ### Description Use pytest-xdist to distribute tests across multiple CPUs to speed up test execution. Use pytest-rerunfailures to rerun failed test in case of pytest-xdist crash. `pytest -n 16` can reduce pytest time from 80 minutes to 20 minutes. ### Motivation and Context Now kernel explorer pytest of ROCm CI takes nearly 1 hour 20 minutes. It will take longer time when we add more tunableOp in the future.	2023-01-13 16:57:50 +08:00
Scott McKay	9bd9206928	Attempt to fix flaky Windows GPU CI Pipeline 'cuda' stage. (#14281 ) ### Description <!-- Describe your changes. --> Change tolerance for tests involving MNIST and cuda to try and fix flaky CI tests. Errors from CI: ModelTests/ModelTest.Run/cuda__models_zoo_opset8_MNIST_model expected 4.0755 (40826a83), got 4.06948 (40823938), diff: 0.00601721, tol=0.0050755 idx=4. 2 of 10 differ ModelTests/ModelTest.Run/cuda__models_zoo_opset7_MNIST_model expected 7.89851 (40fcc09e), got 7.88879 (40fc70f8), diff: 0.00972271, tol=0.00889851 idx=4. 4 of 10 differ ModelTests/ModelTest.Run/cuda__models_zoo_opset12_MNIST12_mnist12 expected -5.50068 (c0b00595), got -5.49023 (c0afaff0), diff: 0.0104547, tol=0.00650068 idx=1. 1 of 10 differ Use rtol of 1e-2 if cuda is enabled. Use same for openvino for simplicity. ``` >>> expected = np.array([4.0755, 7.89851, -5.50068], dtype=np.float32) >>> actual = np.array([4.06948, 7.88879, -5.49023], dtype=np.float32) >>> np.isclose(expected, actual, rtol=1e-2, atol=1e-3) array([ True, True, True]) ``` Whitespace changes are from clang-format. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> CI fails semi-frequently causing unnecessary re-runs.	2023-01-13 18:09:49 +10:00
Yufeng Li	16e39807e0	presence_mask should be sampling only (#14275 )	2023-01-12 22:09:17 -08:00
Ashwini Khade	cc7799835e	Enable a single build with optimized inference and on device training (#14241 ) ### Description Right now prepacking code is not compiled when training is enabled. Our partners want a single build of ort which can do both optimized inference + training on device. This PR enables prepacking code in a training build and controls whether it is enabled or not using already existing session option - kOrtSessionOptionsConfigDisablePrepacking For Inference scenarios - prepacking will be turned on by default and this behavior remains the same after this PR too. For training scenarios - prepacking will be disabled by default and if user explicitly enables it then an error will be thrown. ### Motivation and Context Enable both optimized inference as well as on device training in a single build. For on device training use flag --enable_training_apis.	2023-01-12 21:36:43 -08:00
Vincent Wang	fb3c1221e4	Fix Prefast Warning (#14250 ) Fix two prefast:Warning related to constexpr.	2023-01-13 10:16:35 +08:00
Scott McKay	ea12b674c0	Disable the failing opset 18 model tests that are breaking the packaging pipeline (#14259 ) ### Description <!-- Describe your changes. --> Skip tests for opset18 models that we haven't implemented kernels for yet. Slice was checked in today so those failures should go away. Resize: #13890 (all resize failures are fixed by this PR as confirmed in output [here](https://dev.azure.com/aiinfra/530acbc4-21bc-487d-8cd8-348ff451d2ff/_apis/build/builds/264725/logs/729)) Col2Im: #12311 ScatterND and ScatterElement: #14224 Pad (should also fix CenterCropPad failures): #14219 Bitwise ops: #14197 Optional: Unknown if we're intending to support this in 1.14 Not sure about SoftPlus as that is failing due to `Could not find an implementation for Exp(1)`. ORT supports Exp from opset 6 and on, and it seems incorrect for the test model created for opset 18 to be using a version of Exp that is so old. Would have expected it to use the latest - Exp(13). @liqunfu is this something that requires a fix to the ONNX model? ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix pipeline	2023-01-13 09:55:52 +10:00
Ye Wang	c9a53c9255	Some changes to Sampling Op (#14218 ) ### Description <!-- Describe your changes. --> 1. add an optional input to pass in seed 2. two UTs. one for top_p=0.5, another for top_p=0.01(create greedy search result, in convert_generation.py) 3. fix a bug in cpu kernel ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-01-12 14:15:26 -08:00
Hariharan Seshadri	3898b22a1a	Fix some prefast warnings (#14247 )	2023-01-12 11:15:23 -08:00
Numfor Tiapo	dee36f8ade	DML EP Register ScatterND-16 (#14240 ) This PR registers ScatterND-16 to the DML EP - CPU fallback is added if the reduction attribute is in use, as this is not yet supported by DML. Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>	2023-01-12 10:39:25 -08:00
Yufeng Li	8f7eb75c3e	fix greedysearch token out of range bug (#14242 ) Bug: the last sentence generates token out of vocabulary size. Cause: total element should be computed with padded vocabulary size.	2023-01-12 09:06:05 -08:00
shalvamist	5c16e0befb	[web] utility functions for tensor<->image conversion in ORT web (#13603 ) ### Description Data processing capabilities to ORT Web. This PR will focus augmenting raw data to and from Tensors. ### Motivation and Context Enabling different app building use cases to leverage ORT in a more natural form. Currently, the user needs to process the data and call Tensor constructors - these util functions will provide a direct path to generating ORT tensors. Co-authored-by: shalvamist <shalva.mist@microsoft.com>	2023-01-12 09:05:18 -08:00
Patrice Vignola	99a4036c80	[DML EP] Add FusedMatMul (#14196 ) ### Description Add FusedMatMul ### Motivation and Context - Add the FusedMatMul fusion for DML - Fix the FusedMatMul logic and tests when transposed batches are involved	2023-01-12 02:17:04 -08:00
cloudhan	712f781702	Make CK an optional dependencies and only built with ck if ROCm >= 5.3 (#14232 ) Recently, ck dropped ROCm 5.2 support, which is causing packaging pipeline failures. This PR workaround it.	2023-01-12 17:09:40 +08:00
Scott McKay	b9ecd428c1	Add ability to register custom ops by specifying a function name (#14177 ) ### Description <!-- Describe your changes. --> Use dlsym/GetProcAddress to lookup a custom ops registration function by name and call it. This will be better on mobile platforms where the custom ops library is linked against, and there isn't necessarily a filesystem that a library path can be loaded from. Alternative is to wire up passing in the address of the function, but that has multiple complications which differ by platform. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable using ort and ort-ext packages on mobile platforms. Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-01-12 15:11:34 +10:00
guyang3532	f5b4b0f77d	Add support for 'axes' attr of unsqueeze in opset 13 and add ut (#14071 ) Since opset 13, 'axes' attr of unsqueeze become an input of unsqueeze, add support for it and add ut.	2023-01-12 10:45:15 +08:00
sfatimar	7654cd50e8	Openvino ep 2022.3 v4.3 (#14210 ) ### Description Changes to incorporate OpenVINO EP 2022.3 ### Motivation and Context This change is required to incorportate OpenVINO EP 2022.3 - If it fixes an open issue, please link to the issue here. --> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Aravind <aravindx.gunda@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: flexci <mohsinmx>	2023-01-11 16:31:26 -08:00
Nat Kershaw (MSFT)	26a6b40554	Delete add-issues-to-project (#14147 ) It is not currently possible to get the appropriate org level permissions for this action. It should soon be possible to do this automatically within [GitHub](https://github.com/microsoft/github-operations/issues/59#issuecomment-1275424404).	2023-01-11 14:33:37 -08:00
Scott McKay	dd2df460b3	Split(18) (#14015 ) ### Description <!-- Describe your changes. --> Opset 18 Split changes. Adds ability to specify num_outputs which also allows uneven splitting. https://github.com/onnx/onnx/releases/tag/v1.13.0 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support ONNX opset 18.	2023-01-12 08:14:10 +10:00
Tianlei Wu	012b34dc4e	Add --use_multi_head_attention in transformers fusion (#14198 ) Add an option --use_multi_head_attention to fuse model with MultiHeadAttention operator instead of Attention operator for testing purpose. Note that MultiHeadAttention can be used in self-attention and cross-attention, while Attention operator is used for self-attention only. In Attention operator, there is packed Q/K/V weights for input projection, but that MatMul of input projection is excluded from MultiHeadAttention.	2023-01-11 13:20:05 -08:00
RandySheriffH	83ad562826	Rename CloudEP to AzureEP (#14175 ) Rename CloudEP to AzureEP. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-11 12:25:04 -08:00
Tianlei Wu	3b79b8eb1d	fix reshape fusion error in numpy 1.24 (#14231 ) Fix https://github.com/microsoft/onnxruntime/issues/14017. Before: shape_value = np.asarray([0, 0, np.array([4]), np.array([8])], dtype=np.int64) raise Error in numpy 1.24. After: shape_value = np.asarray([0, 0, 4, 8)], dtype=np.int64) is good in numpy 1.24. Update test environment to use numpy 1.24.	2023-01-11 10:37:41 -08:00
Patrice Vignola	52fc1ff21e	[DML EP] Fix FusedMatMul crash when batch > 1 (#14225 ) ### Description Fix FusedMatMul crash when batch > 1 ### Motivation and Context FusedMatMul calls `SetStrides` on its input tensors but doesn't update the tensorSizeInBytes value. Calling `SetStrides` is very error-prone because it puts the tensor in an invalid state, and the caller needs to manually adjust it after the call. To avoid this situation in the future, we now update the size of the tensor in the `SetStrides` call itself.	2023-01-11 10:17:34 -08:00
pengwa	a4180d79c5	Multi-tensor SGDOptimizer (on device training) (#14083 ) Implement SGDOptimizerV2 taking sequence of weights and gradients as inputs. For CPU EP and CUDA EP only. Added tests.	2023-01-11 10:15:53 -08:00
RandySheriffH	ecd5ce0b33	Use json format to save and load partition config (#14169 ) Use json format to save and load partition config, previously it was csv, which brought issues among windows and posix due to different line breaks. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-11 10:03:14 -08:00
Yufeng Li	7a9a6bcebd	Improve TopP sampling (#14192 ) ### Description Improve TopP sampling's filter kernel with cub::scan. It reduces TopP sampling latency from 3.67 to 0.92 for batch size 8 and vocabulary size 51k.	2023-01-11 08:40:17 -08:00
Ashwini Khade	d92c663f28	Create dedicated build for training api (#14136 ) ### Description Enable creating dedicated build for on device training. With this PR we can build a lean binary for on device training using flag --enable_training_apis. This binary includes only the essentials like training ops, optimizers etc and NOT features like Aten fallback, strided tensors, gradient builders etc . This binary also removes all the deprecated components like training::TrainingSession and OrtTrainer etc ### Motivation and Context This enables our partners to create a lean binary for on device training.	2023-01-10 20:58:04 -08:00
dependabot[bot]	3a39736a2c	Bump json5 from 2.2.0 to 2.2.3 in /js/web (#14110 )	2023-01-11 02:27:42 +00:00
Ye Wang	342a5bf2b7	Improve rpb cuda kernel (#14195 ) ### Description Average latency (ms) of float16 relative position bias cuda kernel on V100: Kernel\Seq_Len \| 16 \| 32 \| 64 \| 128 \| 256 \| 384 \| 512 \| 768 \| 1024 \| 2048 \| 4096 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- Before\| 0.0494 \| 0.0654 \| 0.1519 \| 0.4322 \| 1.1865 \| 2.4091 \| 4.3676 \| 14.912 \| 36.517 \| 142.09 \| 561.80 After \| 0.0483 \| 0.0651 \| 0.1294 \| 0.3858 \| 1.1128 \| 2.2988 \| 3.8391 \| 14.290 \| 34.542 \| 136.13 \| 529.54 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Review of this comment https://github.com/microsoft/onnxruntime/pull/14149/#discussion_r1063152021 Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-01-10 17:06:42 -08:00
RandySheriffH	d152452d4b	Tune test case for hybrid cpu (#14204 ) Tune test case for hybrid cpu architecture. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-10 12:54:02 -08:00
Chen Fu	90142899bd	Supporting Intel AMX instructions in quantized GEMM (#14042 ) ### Description Using Intel AMX int8 instructions to accelerate quantized GEMM ### Motivation and Context AMX instructions accelerate quantized GEMM significantly: Prepacked B perf numbers (latency in ns) GEMM Config \| AVX512Vnni \| AMX -- \| --: \| --: M:384/N:1024/K:1024/Batch:1/Threads:4 \| 1057511 \| 285393 M:384/N:1024/K:3072/Batch:1/Threads:4 \| 2643929 \| 700397 M:384/N:1024/K:4096/Batch:1/Threads:4 \| 3784750 \| 890701 M:384/N:4096/K:1024/Batch:1/Threads:4 \| 2378139 \| 887251 M:384/N:1024/K:1024/Batch:1/Threads:16 \| 307137 \| 138481 M:384/N:1024/K:3072/Batch:1/Threads:16 \| 855730 \| 295027 M:384/N:1024/K:4096/Batch:1/Threads:16 \| 1126878 \| 317395 M:384/N:4096/K:1024/Batch:1/Threads:16 \| 781963 \| 237014 M:1536/N:1024/K:1024/Batch:1/Threads:16 \| 538864 \| 181459 M:1536/N:1024/K:3072/Batch:1/Threads:16 \| 1681002 \| 561600 M:1536/N:1024/K:4096/Batch:1/Threads:16 \| 2158127 \| 717470 M:1536/N:4096/K:1024/Batch:1/Threads:16 \| 2428622 \| 896140 M:3072/N:1024/K:1024/Batch:1/Threads:16 \| 1058029 \| 357031 M:3072/N:1024/K:3072/Batch:1/Threads:16 \| 3138504 \| 1095857 M:3072/N:1024/K:4096/Batch:1/Threads:16 \| 4155640 \| 1386183 M:3072/N:4096/K:1024/Batch:1/Threads:16 \| 4679030 \| 1778624 Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: Chen Fu <fuchen@microsoft.com>	2023-01-10 12:16:27 -08:00
Ye Wang	a01bf8dbb1	rename CrossAttention to MultiHeadAttention (#14201 ) ### Description <!-- Describe your changes. --> rename the CrossAttention to MultiheadAttention since this op can also be used as self attention ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-01-10 10:18:39 -08:00
Guenther Schmuelling	6b8c72cfa6	pin ort-ext to 81e7799c69044c745239202085eb0a98f102937b (#14044 ) pin onnxruntime-extension to 81e7799c69044c745239202085eb0a98f102937b in preparation to in enable extension in wasm build.	2023-01-10 10:10:17 -08:00
Numfor Tiapo	f4ea781b81	DML EP Register Identity-16 (#14053 ) This PR Registers Identity-16 to the DML EP. ONNX Backend tests and optional type tests were skipped pending future additions. Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>	2023-01-10 09:16:09 -08:00
Tianlei Wu	05e26f302a	Hot fix for prefast failure to unblock python package pipeline (#14206 ) ### Description Hot fix python packaging pipeline failures by disabling an attention op test which causes cl crashes in prefast build. Verified that python package is good with this hot fix: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=263786&view=results ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Failed in prefast build that linker crashes: ``` cl : command line error D8040: error creating or communicating with child process ``` The cause is high stack usage in an attention op unit test introduced in https://github.com/microsoft/onnxruntime/pull/13953.	2023-01-10 07:57:32 -08:00
Adrian Lizarraga	3d8b596cb9	Use a local copy of murmurhash3 in TensorRT shared library (#14207 ) ### Description Uses a local copy of murmurhash3 in TensorRT. ### Motivation and Context The current murmurhash3 implementation is located in core/framework, which is not linked to the provider shared library. This causes a segfault when tensorrt shared library is used standalone.	2023-01-10 07:24:06 -08:00
Ryan Hill	da57c0a701	Add protected destructor to Provider structure (#14152 ) ### Description Add protected destructor so that any inherited classes can't accidentally be deleted through a pointer to the base. Fixes this prefast warning: The type 'struct onnxruntime::CUDA_Provider' with a virtual function needs either public virtual or protected non-virtual destructor (c.35). Internal bug 8999	2023-01-09 23:04:04 -08:00
Ryan Hill	f8117b6f87	Add catch-all exception handler to API_IMPL_END (#14194 ) ### Description Fairly self explanatory. Someone pointed out we could miss some exceptions, and we never want to throw exceptions through the C API. ### Motivation and Context This doesn't fix any known issue, it's just a good idea to have.	2023-01-09 21:58:46 -08:00
PeixuanZuo	33367fa2dc	[MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0 (#14184 ) ### Description Update the MIGraphX version used in ORT to rocm-5.4.0 ### Motivation and Context The previous branch migraphx_for_ort has stopped updating, it is too far away from the MIgraphX latest release branch. More discussion here: https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049 Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>	2023-01-10 13:40:25 +08:00
Yi Zhang	6463f4383b	make WITHCACHE as an option in MacOS workflow (#14188 ) ### Description 1. Set the WithCache default value as false in Mac OS CI workflow too. 2. Add date of today in cache key to avoid cache size keep increasing too. WithCache, the pipeline duration reduced from 70 more minutes to 10 more minutes	2023-01-10 10:54:19 +08:00
Tianlei Wu	7e751ac6e6	update convert_generation for Attention op change (#14191 ) We remove key and value inputs in https://github.com/microsoft/onnxruntime/pull/14146, need update the convert_generation as well.	2023-01-09 18:04:44 -08:00
Patrice Vignola	c151afec71	[DML EP] Fix unconnected node removal logic (#14193 ) ### Description Fix unconnected node removal logic ### Motivation and Context The edges need to be removed before the nodes themselves, otherwise the indices will reference the wrong nodes.	2023-01-09 15:40:09 -08:00
Sumit Agarwal	906f578be8	[DML EP] Update DML_FEATURE_LEVEL 5.0 (#14172 ) ### Description DML EP was using very old feature level (2.0) which may lead to model (having latest operator) execution failure, if model is running against old DirectML.dll. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-01-09 13:00:56 -08:00
liqun Fu	1be36913cc	to work with onnx 1.13 rc, implement ver 18 reduce and optioanl ops, … (#13765 )	2023-01-09 10:26:16 -08:00

1 2 3 4 5 ...

7990 commits