onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-17 18:40:28 +00:00

Author	SHA1	Message	Date
Ye Wang	d00197aaa7	initialize cache_indir explicitly in beamsearch with encoder decoder model (#15667 )	2023-04-25 11:05:21 -07:00
Chi Lo	e1755541cc	Fix TRT timing cache test (#15588 ) TRT EP test for timing cache has wrong logic where it enables timing cache for both sessions to compare the trt engine build time, that's why CI got some intermittent failures. This PR disabled the timing cache test for comparing the engine build time between enabling/disabling timing cache until we find a model that can benefit from timing cache.	2023-04-25 10:20:26 -07:00
Wei-Sheng Chin	d0c3f92ec6	[DORT] Fix fake tensor problem cuased by PyTorch change (#15664 ) This should make `Orttraining Linux Lazy Tensor CI Pipeline` green again.	2023-04-25 19:56:42 +08:00
Yulong Wang	3440d3a08e	remove 'lib/' from .gitignore (#15613 ) This will ignore source folder /js/web/lib/	2023-04-24 18:43:32 -07:00
Ashwini Khade	124ea0a801	remove compute optimizer from lte (learning on the edge) builds (#15637 ) ### Description Removing compute optimizer from on device training builds. ### Motivation and Context 1. mitigate android build failures 2. reduce binary size Since only CPU EP is enabled for LTE builds, we can optimize the models offline.	2023-04-24 15:57:15 -07:00
Yulong Wang	14cc02c65c	[js/web] WebGPU backend via JSEP (#14579 ) ### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (ON_COMPLETE)(PVOID state, DATA data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview Inter-op JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. Resource Management Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. about data transfer `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. run kernel in JS Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_`. disabled features* memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. prefer channels last JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. Testing code It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-24 15:21:18 -07:00
George Wu	8dd32fed47	[TensorRT EP] avoid excessive library load/unload overhead when running unit tests. (#15639 ) TensorRT will load/unload libraries as builder objects are created and torn down. This will happen for every single unit test, which leads to excessive test execution time due to that overhead. This overhead has steadily increased over the past few TensorRT versions as the library objects get bigger leading to 8 hours to run all the unit tests. Nvidia suggests to keep a placeholder builder object around to avoid this.	2023-04-24 14:43:13 -07:00
George Wu	c2acf69d13	support new include,lib dir structure in upcoming QNN 2.11 (#15605 ) upcoming QNN 2.11 will have a different include/lib directory structure. update cmake files to support the new structure.	2023-04-24 13:10:17 -07:00
Ashwini Khade	ccb2243ee7	Update build option for training in java to enable_training_api (#15638 ) ### Description Updating the build option for enabling training in java builds from ENABLE_TRAINING -> ENABLE_TRAINING_APIS. In the native codebase ENABLE_TRAINING is used for enabling full training and ENABLE_TRAINING_APIS is used for creating the lte builds with training apis. Making the change to sync the naming convention across all the language bindings. It was a bit confusing to see ENABLE_TRAINING when debugging the android build failures for training. Making this change just to improve readability of logs during debugging. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-24 11:53:08 -07:00
Tianlei Wu	686fd3c22a	Fix cuda 12.1 windows Build (#15614 ) ### Description Fix CUDA 12.1 Windows build error of cuda namespace ambiguous. Use a new namespace for attention softmax. Tested with VS 2019 and VS 2022 with the following settings: - OS: Microsoft Windows 11 Enterprise (Version 10.0.22621 Build 22621) - CUDA: cuda_12.1.0_531.14_windows - TensorRT: TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0 - CUDNN: 8.8.1.3 for cuda 12 - Visual Studio Enterprise 2019, version 16.11.26 (MSVC v142) or Visual Studio Enterprise 2022 (64-bit), version 17.5.4 - Python: 3.10 - CMake: 3.25.2 VS 2019: ``` build.bat --cmake_generator "Visual Studio 16 2019" --config Release --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;80;86" --skip_submodule_sync --parallel --build_shared_lib --update --build --build_dir .\build\trt --use_cuda --cuda_version "12.1" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" --cudnn_home "C:\CuDNN\8.8.1.3_cuda12" --use_tensorrt --tensorrt_home "C:\TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0\TensorRT-8.6.0.12" ``` VS 2022: ``` build.bat --cmake_generator "Visual Studio 17 2022" --config Release --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;80;86" --skip_submodule_sync --parallel --build_shared_lib --update --build --build_dir .\build\trt_2022 --use_cuda --cuda_version "12.1" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" --cudnn_home "C:\CuDNN\8.8.1.3_cuda12" --use_tensorrt --tensorrt_home "C:\TensorRT-8.6.0.12.Windows10.x86_64.cuda-12.0\TensorRT-8.6.0.12" ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/15242	2023-04-24 10:02:35 -07:00
cao lei	dc53ddef7a	Create a new C API KernelContext_GetAllocator() for Custom Op scenario (#15591 ) ### Description Create a new C API KernelContext_GetAllocator() for Custom Op scenario ### Motivation and Context Create a new C API KernelContext_GetAllocator() for Custom Op scenario	2023-04-23 21:54:35 -07:00
Hector Li	a8e2833050	[QNN EP]Unblock Qnn EP for Csharp support (#15640 ) ### Description Unblock Qnn EP for Csharp support ### Motivation and Context Enable Csharp support for Qnn EP	2023-04-23 21:28:34 -07:00
Changming Sun	c82bebde6a	Fix the TestCUDAProviderOptions test error (#15649 ) The test limits GPU's running memory requirements to 20MB. It might be enough in the past, but it seems not enough now when we upgrade CUDA to a newer version or add more kernels/graph transformers to our code. Therefore we need to increase it. Our test log shows sometimes the model needs 128MB memory. So I set the limit to 256MB.	2023-04-24 11:21:59 +08:00
PeixuanZuo	9df1a5e605	[ROCm] enable LayerNorm opset Ver17 for ROCm EP (#15601 ) enable LayerNorm opset Ver17 for ROCm EP.	2023-04-24 10:30:06 +08:00
Erick Muñoz	45c82eefb4	[OneDNN] Fix poolgrad bug (#15557 ) * Fixed default dilatation value for poolgrad ops ### Description Changed default dilatation value to 0 in poolgrad ops ### Motivation and Context Fixes error on unit tests when --enable_training --use_dnnl flags are active and	2023-04-23 08:20:26 -07:00
cloudhan	d1354dcc83	[ROCm] Add stable diffusion benchmark results for MI100 (#15646 )	2023-04-23 18:29:35 +08:00
cloudhan	8297148bde	[ROCm] Update benchmark for stable diffusion (#15602 ) 1. update scripts for ROCm memory measurement. 2. update README to contain ROCm result. 3. address some minor issue in the README	2023-04-23 11:49:40 +08:00
cloudhan	9e44248bf9	Workaround ROCm global pool (#15481 ) Implement global avg/max pool with reduction	2023-04-23 11:48:43 +08:00
Baiju Meswani	fd6ecc3909	Add env to the TrainingSession constructor (#15635 )	2023-04-21 21:05:46 -07:00
Hector Li	fab3e33105	[Qnn EP]Enable Gelu op support (#15631 ) ### Description Enable Gelu contrib op support ### Motivation and Context unblock models with contrib op Gelu	2023-04-21 16:54:34 -07:00
Patrice Vignola	0080bb0331	Add NCHW transpose for GroupNorm (#15634 ) It gives about a 2x perf improvement on Stable Diffusion on some hardware.	2023-04-21 15:18:11 -07:00
Patrice Vignola	b49d428299	[DML EP] Add missing newline to image test logging (#15596 )	2023-04-21 13:39:07 -07:00
Tianlei Wu	5a675d9113	Disable random failing DML image batch test (#15624 ) ### Description Disable a test with random failure in Windows GPU CI Pipeline like the following: ``` 11: [ OK ] BatchTest/BatchTest.BatchSupport/163 (0 ms) 11: [ RUN ] BatchTest/BatchTest.BatchSupport/164 11: D:\a\_work\1\s\winml\test\image\imagetests.cpp(186): error: Expected: m_model_binding.Bind(output_data_binding_name, output_video_frames) doesn't throw an exception. 11: Actual: it throws. 11: D:\a\_work\1\s\winml\test\image\imagetests.cpp(211): error: Expected: m_result = m_session.Evaluate(m_model_binding, L"") doesn't throw an exception. 11: Actual: it throws. 11: total errors is 0/2073600, errors rate is 0total errors is 0/2073600, errors rate is 0total errors is 0/2073600, errors rate is 0[ FAILED ] BatchTest/BatchTest.BatchSupport/164, where GetParam() = ((L"fns-candy_Bgr8_Batch3.onnx", 0, { L"1080.jpg", L"fish_720_Gray.png", L"fish_720.png" }, 3, false), 0, 1, 1, 1, 4-byte object <02-00 00-00>) (3203 ms) ``` Since https://github.com/microsoft/onnxruntime/pull/15468 merged to main, about 10~15% build job failed in the test.	2023-04-21 13:29:56 -07:00
Ye Wang	633dec0b17	refactor some code (#15566 ) ### Description <!-- Describe your changes. --> 1. moved onnxruntime/contrib_ops/cuda/decoder to onnxruntime/contrib_ops/cuda/bert 2. create utils.cuh under /bert for shared implementations in decoder_masked_multihead_attention_impl_utils.h and rotary_embedding_util.h 3. refactored relative_attn_bias_impl.cu by reusing the template specializations in utils.cuh ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-04-21 12:57:08 -07:00
Baiju Meswani	b5a1941835	C, C++, Python, C# API update for on device training (#15518 )	2023-04-21 11:36:01 -07:00
Zhang Lei	a6d6e45be2	Tune block size for layer_norm considering #rows and GPU resource (#15410 ) fine tune cuda layernorm block size considering number of rows to process together with column number, and hardware resources (number of SMs, etc) Co-authored-by: Lei Zhang <phill.zhang@gmail.com>	2023-04-21 09:49:21 -07:00
Rachel Guo	2cb3fb18b5	Integrate React Native E2E test with detox framework (#15133 ) ### Description <!-- Describe your changes. --> Integrate react native e2e test framework with detox. https://wix.github.io/Detox/ Good build in CI: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=946695&view=results ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Write cross-platform end-to-end tests in JavaScript. Resolve flaky e2e tests in react native ci pipelines. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-04-21 09:46:26 -07:00
Adrian Lizarraga	f3d04cd1be	[QNN EP] Update Windows ARM64 pipeline to use Visual Studio 2022 (#15607 ) ### Description - Updates the QNN Windows ARM64 pipeline to use a new image with Visual Studio 2022 (updated from VS 2019) - Creates a new gtest fixture class that skips tests for the QNN CPU backend if we detect that the QNN CPU backend is not available/functional. The current windows arm64 vm does not support any QNN backend. ### Motivation and Context Visual Studio 2022 adds support for native arm64 compilation. This pipeline will help catch any build regressions on Windows ARM64 w/ VS 2022.	2023-04-21 09:31:10 -07:00
Yi Zhang	84746a8efe	Revert "Retry the step of Start Android simulator (#15584 )" (#15620 ) This reverts commit `64b63921a2`. ### Motivation and Context From https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=970086&view=logs&s=28fb2bf2-39c5-5feb-1887-4904233f6193&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3 It's useless to rerun the step.	2023-04-21 08:33:18 -07:00
kunal-vaishnavi	3de33e00c7	Fix issues for Whisper export with beam search (#15619 ) ### Description This PR fixes an issue with calling the ORT transformer optimizer script on the custom export of Whisper with beam search. It also includes the [fix](https://github.com/microsoft/onnxruntime/pull/15616) for the GPU out-of-memory issue. ### Motivation and Context With this PR fix, the optimizer runs as described in the [Whisper model optimization PR](https://github.com/microsoft/onnxruntime/pull/15473).	2023-04-21 00:08:58 -07:00
Ted Themistokleous	9011613b65	Add Trilu and GatherND to the list of supported OPs for MIGraphX EP (#15463 ) Add support entry for Trilu op to be recognized in the MIGraphX EP Co-authored-by: Ted Themistokleous <tthemist@amd.com>	2023-04-21 14:46:28 +08:00
Yi Zhang	a2f80a006b	update target framework to dotnet6.0 (#15615 ) ### Description Upgrade dotnet E2E test target framework to dotnet6.0 ### Motivation and Context Fix dotnet3.1 deprecation issue which broke nuget building pipeline. The error message in NuGet_Test_Linux_CPU was ``` To install missing framework, download: https://aka.ms/dotnet-core-applaunch?framework=Microsoft.NETCore.App&framework_version=3.1.0&arch=x64&rid=ubuntu.20.04-x64 . Please check the diagnostic logs for more information. ``` Test Run: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=300655&view=results.	2023-04-21 12:11:43 +08:00
Chi Lo	6cf080ccbf	Temporarily disable two tests for TRT EP (#15578 ) We are investigating an issue introduced by TRT 8.6 which causes [TRT EP CI](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=967950&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=66420422-c7d6-5f71-625c-4b7851c9b9ba) fail. Disable two tests for now until the issue is root caused and fixed.	2023-04-20 16:32:56 -07:00
Justin Chu	dfa06bf81b	Add link to doc for lintrunner in CI (#15604 ) Add a link to point to the doc where users can find instructions to set up lintrunner should there be any lint issues in CI.	2023-04-20 15:54:14 -07:00
Dmitri Smirnov	a5dec8eedf	[C# ] Improve string marshalling and reduce GC pressure (#15545 ) ### Description Reduce a number of auxillary objects created to reduce GC pressure. Eliminate GCHandle type of memory pinning in most of the places. Improve string marshalling by allocating unmanaged memory that does not require pinning. Change native methods from `IntPtr` to `byte[]` (marshalling pinning is more efficient). Allocate input/output UTF-8 names in unmanaged heap for the lifetime of InferenceSession. So we do not keep converting them and pinning on every Run. Introduce a new native API that allows to allocate and convert/copy strings directly into a native tensor. The PR delivers around 50% latency improvements and less GC pauses. Inspired by: https://github.com/microsoft/onnxruntime/pull/15520 ### Motivation and Context Client experience GC pressure and performance degradation when dealing with string tensors. Co-Authored-By: @tannergooding	2023-04-20 15:12:51 -07:00
Yufeng Li	373f912e51	add quantization support for whisper (#15589 ) ### Description <!-- Describe your changes. --> Add dynamic quantization support for whisper model. There are 3 options to try out: - quantize_embedding_layer: enable to quantize embedding layer of decoder model or not - quantize_per_channel: enable to quantize per channel for Gemm or MatMul - quantize_reduce_range: use 7bit to quantize MatMul or Gemm. Use when hitting accuracy issue on x64 cpus without VNNI.	2023-04-20 14:22:11 -07:00
Edward Chen	4b74cb1741	Make docker command fail if bash command fails. (#15564 ) Add `set -e` so that failing bash commands will cause the containing docker command to fail.	2023-04-20 13:38:58 -07:00
Baiju Meswani	46210556f0	BatchnormInternal avoid setting num_channels if input shape is not known (#15544 )	2023-04-20 12:57:16 -07:00
Baiju Meswani	11b0a18de6	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
Justin Chu	1f7c2f724f	Fix lintrunner configurations (#15586 ) ### Description - Fix lintrunner configurations to always use `python` instead of `python3`. - Set up dependabot - Moved dependencies to requirements-lintrunner to allow dependabot to update it similar to https://github.com/onnx/onnx/pull/5124	2023-04-20 08:54:26 -07:00
Adrian Lizarraga	9df96c7d5b	[QNN EP] Fix shape inference of NHWC Resize (#15477 ) ### Description Adds schema for NHWC Resize that uses the default ONNX type/shape inferencing. ### Motivation and Context The QNN EP requires the Resize operator to be NHWC. Currently, the Resize operator fails type and shape inference because the current schema changes the input to NCHW, but the `scales` and `sizes` inputs remain in NHWC. This PR adds a schema for NHWC Resize that allows it to use the default ONNX type/shape inference while still remaining in the internal NHWC domain.	2023-04-20 07:25:25 -07:00
Scott McKay	446c478fbd	Add iOS Swift Package Manager support (#15297 ) ### Description <!-- Describe your changes. --> Add Swift Package Manager (SPM) support for ORT based on #14621 - uses the existing objective-c bindings - some re-organization of the directory structure was required but the contents of the files are unchanged, apart from adjustments due to file movements Add tool for updating ORT native pod used in the SPM package Update CIs to use ORT native pod from build, and build/test using SPM ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> iOS developers are using SPM as much as cocoapods, so adding SPM means both are catered for.	2023-04-20 16:18:35 +10:00
Yi Zhang	64b63921a2	Retry the step of Start Android simulator (#15584 ) ### Description Add Retry once There's a failure in `Start Android Simulator`. ### Motivation and Context `Start Android Simulator` isn't stable enough and the pipeline would hang. We could find many instances in https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=188&contextType=build	2023-04-20 12:06:35 +08:00
Yi Zhang	5b6f79e79b	Improve windows build cache steps (#15537 ) ### Description 1. Split deps' compilation cache and ort's 2. reduce the caches generation in merge branch. ### Motivation and Context Reduce pipeline cache stage.	2023-04-20 09:42:22 +08:00
Chen Fu	29d00fb776	Set proper default values for pool attributes (#15559 ) ### Description Setting proper default value for attributes of pool operators ### Motivation and Context Fixed AB#14719 Global pooling and pooling operators usually share the same underlying implementation. When we detect the operator is global, code for setting up the attributes is skipped. This may cause un-deterministic behavior.	2023-04-19 17:24:35 -07:00
George Nash	f2889b41c1	[AMX] Update assembler check (#15501 ) A recent commit added an assembler check if the ASM dialect was ATT This unfortunately broke the AMX build for systems that don't have the ASM-ATT dialect. This change assumes if the CMAKE_ASM-ATT_COMPILER_ID is not found and the CMAKE_ASM_COMPILER_ID is "GNU" based on all the other already passed checks AMX is supported by the compiler and assembler. ### Description ### Motivation and Context On my build system the recent change to add the ASM-ATT version check disabled AMX code from the build. --------- Signed-off-by: George Nash <george.nash@intel.com>	2023-04-19 14:16:26 -07:00
Chen Fu	142220ad87	Fix cmake 3.25 debug info config (#15565 ) ### Description https://github.com/microsoft/onnxruntime/pull/15538 Above pull request breaks Windows build on cmake 3.25 or earlier. This should fix it. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-19 09:14:19 -07:00
Yi Zhang	573e4cf95f	[Fix] Python Packaging Pipeline exception. (#15568 ) ### Description supplement of #15299 ### Motivation and Context It broke Python Packaging Pipeline since April 12.	2023-04-19 21:57:14 +08:00
PeixuanZuo	59ea35d592	[ROCm] add CK GroupNorm to GroupNormTunable (#15510 ) - Add CK GroupNorm to GroupNormTunable. - Reduce configuration of GroupNormNHWCOp because CK implementation is better. The performance gain on stable diffusion v1.5. Before: ``` 'height': 512 'width': 512 'steps': 50 'batch_size': 1 'batch_count': 5 'num_prompts': 1 'average_latency': 2.4782688856124877 'median_latency': 2.4783748388290405 'provider': 'ROCMExecutionProvider' 'disable_safety_checker': True ``` After: ``` 'height': 512, 'width': 512, 'steps': 50, 'batch_size': 1, 'batch_count': 5, 'num_prompts': 1, 'average_latency': 2.107170510292053, 'median_latency': 2.1067750453948975, 'first_run_memory_MB': -1, 'second_run_memory_MB': -1, 'provider': 'ROCMExecutionProvider', 'disable_safety_checker': True ```	2023-04-19 13:54:59 +08:00
Dmitri Smirnov	a66af390fa	[C#] Allow passing various options when creating singleton Environment object. (#14723 ) ### Description Re-work OrtEnv class so we can pass various options when creating the environment such as: - logId - initial logging level - thread options - user supplied logging function Create the default instance when SessionOptions are instantiated as users often forget to do so. ### Motivation and Context We lack this capability. Inspired by https://github.com/microsoft/onnxruntime/pull/13822 https://github.com/microsoft/onnxruntime/pull/13951 https://github.com/microsoft/onnxruntime/pull/11593 Cc: @thoron --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-04-18 21:49:55 -07:00

1 2 3 4 5 ...

8630 commits