onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-06 04:28:32 +00:00

Author	SHA1	Message	Date
pengwa	d8dfda2e08	Minor fix for differently scoped cpu_ep usage (#15550 ) ### Minor fix for differently scoped cpu_ep cpu_ep is under `#ifndef DISABLE_CONTRIB_OPS`, but one of its usage is not under the same condition. ``` #ifndef DISABLE_CONTRIB_OPS const InlinedHashSet<std::string_view> cpu_ep = {onnxruntime::kCpuExecutionProvider}; #endif ``` ### Motivation and Context Postmoterm: https://github.com/microsoft/onnxruntime/pull/15461 passed all CIs except Linux/Windows TVM CIs. I did not check the detailed error message then because they are failed for some reason for a few days at least. While checking the details, after PR 15461, the error messge changes from Before constant sharing change: TVM CI error message: ``` https://github.com/microsoft/onnxruntime/actions/runs/4700368634/jobs/8334955814 ERROR: testBooleanInputs (__main__.TestInferenceSession) ---------------------------------------------------------------------- Traceback (most recent call last): File "onnxruntime_test_python.py", line 617, in testBooleanInputs sess = onnxrt.InferenceSession(get_name("logicaland.onnx"), providers=available_providers) File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 435, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\onnxruntime\onnxruntime\onnxruntime\core\providers\tvm\tvm_api.cc:49 onnxruntime::tvm::TVMCompile compile != nullptr was false. Unable to retrieve 'tvm_onnx_import_and_compile'. ``` to ``` D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,67): error C2065: 'cpu_ep': undeclared identifier [D:\a\onnxruntime\onnxruntime\build\Release\onnxruntime_optimizer.vcxproj] D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,19): error C2672: ``` This PR fixes the build the issue, The error message of Windows/Linux TVM CIs are back to the original ones.	2023-04-18 16:51:11 +08:00
PeixuanZuo	8bec6cd029	Refactor FusedConv test (#15512 ) Refactor FusedConv test.	2023-04-18 15:22:31 +08:00
Justin Chu	9d26f8f4fe	Use os.fspath on Path (#15530 ) ### Description <!-- Describe your changes. --> Use os.fspath instead of str() on a path object. ### Motivation and Context I learned today that os.fspath is the right way to go: https://github.com/charliermarsh/ruff/issues/3675#issuecomment-1494975508	2023-04-17 16:59:40 -07:00
Zhang Lei	a30b57da6e	Fix/Enhance convert_generation tool for SkipLayerNorm, op_block_list... (#15368 ) After SkipLayernorm using fp32 for internal calculation and using numeric stable algorithm, enable it for fp16 here. Make the op_block_list a command line argument to help future tools. Other minor changes.	2023-04-17 14:44:37 -07:00
Justin Chu	a36caba073	Bump ruff in CI (#15533 ) ### Description Bump ruff version in CI and fixed new lint errors. - This change enables the flake8-implicit-str-concat rules which helps detect unintended string concatenations: https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc - Update gitignore to include common python files that we want to exclude. ### Motivation and Context Code quality	2023-04-17 10:11:44 -07:00
cao lei	c2221d919f	create a stream in DeviceStreamCollection for memory pattern (#15426 ) ### Description Create a stream in DeviceStreamCollection for memory pattern case to fix the thread safe issue 15154 ### Motivation and Context This is to fix the bug 15154 https://github.com/microsoft/onnxruntime/issues/15154	2023-04-17 10:06:55 -07:00
Ashwini Khade	8fa65aba0e	enable training tests for csharp bindings (#15513 ) ### Description Simple fix to enable training tests in csharp through build.py script.	2023-04-17 09:57:23 -07:00
cloudhan	7ed3bfde51	Fix FusedConv for ROCm (#15460 ) 1. Fix undesired runtime optimization for non-Relu activation. 3. Fix false positive runtime error log due to fusion failure.	2023-04-17 11:41:00 +08:00
Wei-Sheng Chin	ac6ceffb2c	Force using fixed random seeds for flaky tests (#15515 ) Some gradient-related tests fail frequently due to their math properties. This PR fixes their random seed so that it's possible to debug in the future. Fixed [AB#14605](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14605), [AB#14604](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14604)	2023-04-14 18:44:51 -07:00
Adrian Lizarraga	5ebe700a9b	[QNN EP] Fix pool and conv op tests (#15504 ) ### Description - Fixes QNN unit tests for pool and conv ops. - Temporarily disables QNN Resize tests until we fix type/shape inferencing for NHWC Resize. ### Motivation and Context The Linux QNN CI Pipeline has not run unit tests for a week (see https://github.com/microsoft/onnxruntime/pull/15497). Some tests broke in the meantime. Fixed [AB#14625](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14625) --------- Co-authored-by: Changming Sun <chasun@microsoft.com>	2023-04-14 13:18:38 -07:00
Maximilian Müller	fbe88fccbd	Exposing new TRT build options (#15089 ) ### Description This will add a few TRT options, some of them are only available on TRT 8.6: - heuristics - sparsity - optimization level (8.6 only) - auxiliary stream (8.6 only) - tactic source selection I am no sure yet which tests is should add for these options. As those are mostly simple TRT flags i am not sure to what level i should test. For heuristics something similar to `44dda08b51/onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc (L510-L538)` should be possible for, but for all other essentially we would only be testing if there is a crash or not if the option is set. Also if i forgot some option that would be good to have feel free to speak up !	2023-04-14 09:47:36 -07:00
Yi Zhang	4e1f75810c	Add compilation cache in 2 Linux CPU pipelines and refactor the Linux build step with cache (#15484 ) ### Description 1. Add compilation cache in Linux CPU ARM and Linux Minimal Build. 2. Integrate 4 Linux CPU build step with cache into one. 3. install ccache from source code in Linux ARM64 image. ### Motivation and Context 1. Enable more build steps with compilation cache. 2. Make it easier to add cache. It could save 40 more minutes of compilation time in Linux ARM64. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=959619&view=logs&j=1e0830bb-fd74-5d0a-5029-1c63b4266d7b&t=75260ed7-7566-5947-2095-566660191920	2023-04-14 23:56:59 +08:00
pengwa	bf32dbbd9b	Share more constant initializers (#15461 ) ### Share more constant initializers. `ConstantSharing` transformer originally only handle single value initializer (scalar or 1D). This PR tried to share more cases to make common subexpression elimination transformer to remove more duplicated nodes. Originally, we used a single vector<std::variant<float,half,int32,int64>> to store different scalar values. In this PR, we create a unordered map with its key being data_type + rank + element count, and its value is a vector of `InitializerValue`. For one specific initializer, if it fulfils the condition, then finally will find the corresponding vector of `InitializerValue` by its <data_type + rank + element count>, then search from the vector whether the constant tensor already exist or not. After that, a value id is returned, which will be combined together with <data_type + rank + element count> to form the pattern key to decide which tensor to reuse (legacy code). ### Motivation and Context One example we see here is: ```mermaid stateDiagram [] --> LayerNorm(b,s,64) LayerNorm(b,s,64) --> Reshape1 Shape1_Const[bs,64] --> Reshape1 LayerNorm(b,s,64) --> Reshape2 Shape2_Const[bs,64] --> Reshape2 Reshape1 --> AttentionSubGraph Reshape2 --> Add AttentionSubGraph--> Add Add --> [] ``` Ideally CommonSubexpressionElimination can remove one of `Reshape1` and `Reshape2`, while since `Shape1_Const` and `Shape2_Const` are different NodeArg*, so it did not remove the duplication. This is an example: removing the duplication will bring more opportunities to apply graph transformations.	2023-04-14 07:41:07 -07:00
Changming Sun	f297bbb89b	Fix an indent error in build.py (#15497 ) ### Description Fix an indent error in build.py ### Motivation and Context The problem was introduced in #15395 when I was deleting unused code.	2023-04-14 06:32:46 -07:00
mindest	0fdd356abf	[ROCm] Add hipBLASLt GEMM support to Tunable op. (#15351 ) ### Description Add hipBLASLt to GEMM Tunable op, which supports GEMM and StridedBatchedGEMM. To enable hipBLASLt implementation, add an extra flag to the building command: `--cmake_extra_defines onnxruntime_USE_HIPBLASLT=ON`.	2023-04-14 17:56:01 +08:00
Sunghoon	fda0aa14c8	SkipLayerNorm fusion with different input and output type (#15500 ) SkipLayerNorm fusion fuses LayerNorm and one or more Add kernels now. While LayerNormalization kernel allows different input and output type by definition, SkipLayerNormalization must have the same input and output type. This graph is valid as the output of Add node is float16 and two inputs from initializers are float. ![image](https://user-images.githubusercontent.com/35605090/231874079-3f3b03cc-f751-4ad9-a002-31116a35117f.png) But, when Add and LayerNormalization are fused, it fails because two inputs of Add node are float16 type and SkipLayerNormalization must have the same input types. To avoid this failure, this PR adds Cast node before inputs of SkipLayerNormalization when input and output type are different and output type is float. The above graph is fused as follows, ![image](https://user-images.githubusercontent.com/35605090/231874097-6405713a-7c95-4b5b-a293-1305976edc94.png) For performance, it'd better for SkipLayerNormalization to support different input and output type, but this PR is to unblock Turing NLR v5 base mode in Babel. When we have more cases, we can support it.	2023-04-13 23:07:47 -07:00
Wei-Sheng Chin	d76cf374c4	Capture both ValueError and RuntimeError (#15503 )	2023-04-13 19:29:34 -07:00
Akshay Sonawane	56ad68120e	Add support to use sequence as input ids in decoder inputs to Beam Search CUDA Op (#15232 ) Add support to use sequence as input ids in decoder inputs to Beam Search CUDA Op ### Description Currently Beam search Op is only supported for CPU EP, added support for CUDA EP. ### Motivation and Context - For Turing models inference was throwing segmentation fault due to copy failing in cuda memory, also beam search support was not present in cuda.	2023-04-13 13:35:33 -07:00
Changming Sun	5bed8d0285	Disable XNNPack EP's tests in Windows CI pipeline (#15406 ) ### Description 1. Disable XNNPack EP's tests in Windows CI pipeline The EP code has a known problem(memory alignment), but the problem does not impact the usages that we ship the code to. Now we only use XNNPack EP in mobile apps and web usages. We have already pipelines to cover these usages. We need to prioritize fixing the bugs found in these pipelines, and there no resource to put on this Windows one. We can re-enable the tests once we reached an agreement on how to fix the memory alignment bug. 2. Delete anybuild.yml which was for an already deleted pipeline. 3. Move Windows CPU pipelines to AMD CPU machine pools which are cheaper. 4. Disable some qdq/int8 model tests that will fail if the CPU doesn't have Intel AVX512 8-bit instructions.	2023-04-13 12:19:32 -07:00
zhijiang	05ec22330f	softmax perf improvement pr2 - import softmax bw (#15199 ) when dimension to do softmax is 2048, original ort code will fallback to cudnn, while with some optimization on ort's softmax_warp_backward, we can be faster than cudnn implementation. the ideas to optimize softmax_warp_backward is: 1. instead of saving intermediate result in register, we just recompute to save resource 2. save the input data in fp16 instead of fp32 to further save resource the perf numbers: ![image](https://user-images.githubusercontent.com/43435212/227476335-ae0b61c4-cd15-40b7-b743-a956fadaedda.png) please be noted that when dim to do softmax is less than 2048, nothing will be changed, so only gives perf number of 2048 case. add more perf number for smaller batch size ![image](https://user-images.githubusercontent.com/43435212/231676120-c8944b09-a664-43f3-a1e8-dfe729c6e816.png)	2023-04-13 14:57:01 +08:00
mindest	67ac36101c	disable BatchNormalizationGrad test (#15485 ) ### Description Temporarily disable BatchNormalizationGrad test due to random failure. Example: ``` 2023-04-12T06:33:24.1593811Z 1: [ RUN ] GradientCheckerTest.BatchNormalizationGrad 2023-04-12T06:33:27.5603881Z 1: D:\a\_work\1\s\orttraining\orttraining\test\gradient\gradient_ops_test.cc(1468): error: Value of: IsErrorWithinTolerance(max_error, error_tolerance) 2023-04-12T06:33:27.5604509Z 1: Actual: false 2023-04-12T06:33:27.5604719Z 1: Expected: true 2023-04-12T06:33:27.5604997Z 1: max_error: 1.776702880859375; tolerance: 0.019999999552965164; ORT test random seed: 2552121240; 2023-04-12T06:33:27.5605266Z 1: Google Test trace: 2023-04-12T06:33:27.5605531Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 8910 2023-04-12T06:33:27.5605843Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 5678 2023-04-12T06:33:27.5606478Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 1234 2023-04-12T06:33:27.8285560Z 1: D:\a\_work\1\s\orttraining\orttraining\test\gradient\gradient_ops_test.cc(1493): error: Value of: IsErrorWithinTolerance(max_error, error_tolerance) 2023-04-12T06:33:27.8286181Z 1: Actual: false 2023-04-12T06:33:27.8286404Z 1: Expected: true 2023-04-12T06:33:27.8286669Z 1: max_error: 1.776702880859375; tolerance: 0.019999999552965164; ORT test random seed: 2552121240; 2023-04-12T06:33:27.8286942Z 1: Google Test trace: 2023-04-12T06:33:27.8287208Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 8910 2023-04-12T06:33:27.8287532Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 5678 2023-04-12T06:33:27.8287849Z 1: D:\a\_work\1\s\onnxruntime\test\common\tensor_op_test_utils.cc(14): ORT test random seed: 1234 2023-04-12T06:33:51.6368960Z 1: [ FAILED ] GradientCheckerTest.BatchNormalizationGrad (27475 ms) ```	2023-04-13 14:53:47 +08:00
Changming Sun	a22dc65a81	Add a missing header to cuda_common.h (#15489 ) ### Description The following three lines are needed before including some cutlass header files, because cutlass uses "and"/"or" keywords. Generally it should not be a problem without this header, but nvcc is not strictly compliant to C++ standard. ```c++ #ifdef __cplusplus #include <ciso646> #endif ``` We didn't hit this problem because the above code exists in absl. We always include absl headers first. However, ABSL recently deleted them! https://github.com/abseil/abseil-cpp/pull/1246 The cutlass dependency was introduced in #14343 , after we had abseil.	2023-04-12 22:16:59 -07:00
pengwa	516c8e95fa	Optimize SCE loss compute (#15401 ) ### Optimize SCE loss compute Compute optimization based on label data sparsity: - Insert ShrunkenGather before SCELoss node, to filter out invalid labels for compute. - Support ShrunkenGather upstream. - Added test for the above. - Added flag to enable label sparsity optimization with env var, by default disabled now. Will enable after comprehensive benchmarking later. - Extract common logic into test_optimizer_utils.h/cc from core/optimizer/compute_optimzier_test.cc, then the common functions can be shared by both core/optimizer/compute_optimzier_test.cc and orttraining/core/optimizer/compute_optimzier_test.cc - Extract common logic into shared_utils.h/cc: `GetONNXOpSetVersion` and `Create1DInitializerFromVector` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-13 13:02:12 +08:00
Justin Chu	07b64d5275	Remove codecov from requirements-dev.txt (#15487 ) ### Motivation and Context It is no longer supported, and we don't really use it.	2023-04-12 18:48:02 -07:00
Patrice Vignola	fd7f0c3cfc	[DML EP] Use ORT node names in DML execution plans (#15411 )	2023-04-12 16:44:53 -07:00
G. Ramalingam	e361e3f138	Fix bug in handling of variadics in function schema creation (#15409 ) ### Description The code handling variadic parameters when creating a schema for a function has a minor bug. The checking logic was nested inside a conditional, instead of being outside. Fix the logic, and add a test-case. This bugs manifests itself when the first parameter in the variadic list is not an input/output of the enclosing function. ### Motivation and Context Fixes https://github.com/microsoft/onnxruntime/issues/15404 --------- Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>	2023-04-12 14:32:24 -07:00
Yulong Wang	e1e8852213	[build/npm] dump ORT_COMMON_FROM from validation (#15475 ) ### Description dump ORT_COMMON_FROM from validation This writes environment variable ORT_COMMON_FROM for later steps in the release pipeline to use.	2023-04-12 13:48:19 -07:00
Yulong Wang	3875c824d5	[build] fix default value of flag cmake_generator (#15471 ) ### Description fix default value of flag cmake_generator	2023-04-12 13:47:58 -07:00
Yulong Wang	041a0e2747	[build] fix nuget linux in build_protoc_for_host() (#15472 ) ### Description fix nuget linux in build_protoc_for_host()	2023-04-12 13:46:32 -07:00
yf711	8cd5f3ad9c	[TensorRT EP] support TensorRT 8.6-EA (#15299 ) ### Description <!-- Describe your changes. --> * Integrate TRT 8.6EA on relevant Linux/Windows/pkg pipelines * Update onnx-tensorrt to 8.6 * Add new dockerfiles for TRT 8.6 and clean old ones * Update [CGManifest](https://github.com/microsoft/onnxruntime/tree/main/cgmanifests) files and ort build deps version * yml/script update * Enable built-in TRT parser option on TRT related pipelines by default * Exclude test TopKOperator.Top3ExplicitAxisInfinity out of TRT EP tests (8.6-EA has issue with topk operator)	2023-04-12 11:34:59 -07:00
Numfor Tiapo	e3086b2ed8	Move DML CI Pipeline to A10 (#15468 ) This change moves the DML CI pipeline to the A10 machines and fixes or disables tests that were failing from this change. - Max error rate threshold was increased for Image Tests - Some failing batch tests were disabled --------- Co-authored-by: Changming Sun <chasun@microsoft.com>	2023-04-12 10:19:40 -07:00
PeixuanZuo	0016554090	[ROCm] disable composable_kernel and kernel explorer for MIGraphX CI (#15479 ) Disable composable_kernel and kernel explorer for MIGraphx CI to save build time. Composable_kernel and kernel explorer are tested on ROCm CI.	2023-04-12 22:26:40 +08:00
PeixuanZuo	d49a8de9b1	[ROCm] add FP16 support for FusedConv Op (#15443 ) Add FP16 support for FusedConv Op and update UT	2023-04-12 12:19:14 +08:00
PeixuanZuo	ce1eb6d629	[ROCm] Add Tunable GroupNorm (#15298 ) refactor GroupNorm and Add Tunable GroupNorm	2023-04-12 10:55:42 +08:00
Changming Sun	db4fc12318	Add support for building the code on Windows ARM64 natively (#15371 ) ### Description Recently Visual Studio and python started to provide native Windows ARM64 packages. This PR is to provide better support for building on Windows ARM64. You can do it as what you did for x64. Like: ``` python tools\ci_build\build.py --config Debug --update --skip_submodule_sync --build_dir b --cmake_generator "Visual Studio 17 2022" ``` You do not need to append the "--arm64" build arg, and do not need to cross-compile protoc for a different arch as you are not cross-compiling. caveat: it does not work with the latest cmake release(3.26.x). It only works fine with cmake 3.25.x and below. Filed a bug to them: https://gitlab.kitware.com/cmake/cmake/-/issues/24797 ### Motivation and Context Provide better support for building on Windows ARM64.	2023-04-11 17:14:54 -07:00
Rachel Guo	9c42d5e31f	[CoreML EP]Add broadcasting support for binary ops (#15187 ) ### Description <!-- Describe your changes. --> As title ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/15110 --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-11 13:50:45 -07:00
Yulong Wang	0fbf715824	[build] add script to validate generated NPM packages (#15453 ) ### Description add script to validate generated NPM packages and publish it to artifacts, so that release pipeline can use it. once this PR is merged, I will update the NPM package release pipeline.	2023-04-11 11:04:55 -07:00
Dmitri Smirnov	ce3b4eabd3	Implement Optional Metadata support and C# test support (#15314 ) ### Description Implement Optional Type metadata support in the library. Implement optional support in C# API along with metadata. Implement Sequence, Map, Optional test data support and test execution. Prune tests and provide more details for failing tests in C# code. Note, this PR does not enable running onnx test models in C++. ### Motivation and Context Opset18 optional type support.	2023-04-11 09:41:59 -07:00
Edward Chen	0497ac0432	Support additional op domains in op reduction script. (#15424 ) Add support for kMSInternalNHWCDomain and kPytorchAtenDomain op domains to op reduction script. Make it an error if the op reduction script encounters unknown op domains.	2023-04-11 08:57:51 -07:00
Patrice Vignola	3be5bfe363	[DML EP] Add MatMul + SoftMax fusion (#15240 )	2023-04-11 08:31:04 -07:00
Patrice Vignola	7c927bb95c	[DML EP] Add BiasSplitGelu (#15197 )	2023-04-11 08:30:37 -07:00
Yi Zhang	311f84d00c	Fix one nuget packaging pipline error (#15458 ) ### Description Fix one typo in #14965 ### Motivation and Context Fix the error `"onnxruntime_providers_shared.dll not found for win-x64"`	2023-04-11 18:00:10 +08:00
zhijiang	29c74d3c43	softmax perf improvement pr1 - add more softmax related test (#15176 ) 1. add fp16 test 2. add test for shape is not power of two.	2023-04-11 17:02:40 +08:00
Ye Wang	ef42fd09fb	google/mt5 optimization and fix (#15454 ) ### Description <!-- Describe your changes. --> 1. enabled self-attention fusion in mt-5 decoder graph 2. fix a parity issue https://github.com/microsoft/onnxruntime/issues/15042 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-04-11 00:09:11 -07:00
Patrice Vignola	c5b6ee1a99	[DML EP] Add NhwcConv (#15194 )	2023-04-10 23:16:09 -07:00
cloudhan	9acbfc6a29	ROCm MHA (#15279 ) Add MultiHeadAttention for ROCm EP. Before: ``` 'engine': 'onnxruntime' 'version': '1.15.0' 'height': 512 'width': 512 'steps': 50 'batch_size': 1 'batch_count': 5 'num_prompts': 1 'average_latency': 3.878769588470459 'median_latency': 3.8792178630828857 'first_run_memory_MB': -1 'second_run_memory_MB': -1 'model_name': 'runwayml/stable-diffusion-v1-5' 'directory': './sd-v1-5-onnx-fp16-nomha' 'provider': 'ROCMExecutionProvider' 'disable_safety_checker': True ``` After: ``` 'engine': 'onnxruntime' 'version': '1.15.0' 'height': 512 'width': 512 'steps': 50 'batch_size': 1 'batch_count': 5 'num_prompts': 1 'average_latency': 2.364924430847168 'median_latency': 2.3650705814361572 'first_run_memory_MB': -1 'second_run_memory_MB': -1 'model_name': 'runwayml/stable-diffusion-v1-5' 'directory': './sd-v1-5-onnx-fp16' 'provider': 'ROCMExecutionProvider' 'disable_safety_checker': True ```	2023-04-11 13:20:44 +08:00
Yi Zhang	feafbc4263	Refactor all Mac build steps (#15440 ) ### Description ### Motivation and Context Make the compilation cache steps easy to use and maintain Reduce cache storage.	2023-04-11 12:12:46 +08:00
Changming Sun	d175e87a1f	Delete eager mode code and increase minimal required python version to 3.8 (#15450 ) ### Description 1. Delete eager mode code. 2. Increase the minimal required python version to 3.8.	2023-04-10 16:00:04 -07:00
Patrice Vignola	4a676b011a	[DML EP] Add BiasAdd (#15211 )	2023-04-10 14:46:33 -07:00
Sheil Kumar	ce9ad8c8bc	For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fal… (#15448 ) CP: [For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fallback to CPU when there is no hardware support #15414 ](https://github.com/microsoft/onnxruntime/pull/15414) For HLSL shader ops in the DirectML EP (STFT,DFT) FP16 ops should fallback to CPU when there is no hardware support.	2023-04-10 13:21:40 -07:00

1 2 3 4 5 ...

8571 commits