onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-09 17:28:58 +00:00

Author	SHA1	Message	Date
Erick Muñoz	45c82eefb4	[OneDNN] Fix poolgrad bug (#15557 ) * Fixed default dilatation value for poolgrad ops ### Description Changed default dilatation value to 0 in poolgrad ops ### Motivation and Context Fixes error on unit tests when --enable_training --use_dnnl flags are active and	2023-04-23 08:20:26 -07:00
cloudhan	d1354dcc83	[ROCm] Add stable diffusion benchmark results for MI100 (#15646 )	2023-04-23 18:29:35 +08:00
cloudhan	8297148bde	[ROCm] Update benchmark for stable diffusion (#15602 ) 1. update scripts for ROCm memory measurement. 2. update README to contain ROCm result. 3. address some minor issue in the README	2023-04-23 11:49:40 +08:00
cloudhan	9e44248bf9	Workaround ROCm global pool (#15481 ) Implement global avg/max pool with reduction	2023-04-23 11:48:43 +08:00
Baiju Meswani	fd6ecc3909	Add env to the TrainingSession constructor (#15635 )	2023-04-21 21:05:46 -07:00
Hector Li	fab3e33105	[Qnn EP]Enable Gelu op support (#15631 ) ### Description Enable Gelu contrib op support ### Motivation and Context unblock models with contrib op Gelu	2023-04-21 16:54:34 -07:00
Patrice Vignola	0080bb0331	Add NCHW transpose for GroupNorm (#15634 ) It gives about a 2x perf improvement on Stable Diffusion on some hardware.	2023-04-21 15:18:11 -07:00
Patrice Vignola	b49d428299	[DML EP] Add missing newline to image test logging (#15596 )	2023-04-21 13:39:07 -07:00
Tianlei Wu	5a675d9113	Disable random failing DML image batch test (#15624 ) ### Description Disable a test with random failure in Windows GPU CI Pipeline like the following: ``` 11: [ OK ] BatchTest/BatchTest.BatchSupport/163 (0 ms) 11: [ RUN ] BatchTest/BatchTest.BatchSupport/164 11: D:\a\_work\1\s\winml\test\image\imagetests.cpp(186): error: Expected: m_model_binding.Bind(output_data_binding_name, output_video_frames) doesn't throw an exception. 11: Actual: it throws. 11: D:\a\_work\1\s\winml\test\image\imagetests.cpp(211): error: Expected: m_result = m_session.Evaluate(m_model_binding, L"") doesn't throw an exception. 11: Actual: it throws. 11: total errors is 0/2073600, errors rate is 0total errors is 0/2073600, errors rate is 0total errors is 0/2073600, errors rate is 0[ FAILED ] BatchTest/BatchTest.BatchSupport/164, where GetParam() = ((L"fns-candy_Bgr8_Batch3.onnx", 0, { L"1080.jpg", L"fish_720_Gray.png", L"fish_720.png" }, 3, false), 0, 1, 1, 1, 4-byte object <02-00 00-00>) (3203 ms) ``` Since https://github.com/microsoft/onnxruntime/pull/15468 merged to main, about 10~15% build job failed in the test.	2023-04-21 13:29:56 -07:00
Ye Wang	633dec0b17	refactor some code (#15566 ) ### Description <!-- Describe your changes. --> 1. moved onnxruntime/contrib_ops/cuda/decoder to onnxruntime/contrib_ops/cuda/bert 2. create utils.cuh under /bert for shared implementations in decoder_masked_multihead_attention_impl_utils.h and rotary_embedding_util.h 3. refactored relative_attn_bias_impl.cu by reusing the template specializations in utils.cuh ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-04-21 12:57:08 -07:00
Baiju Meswani	b5a1941835	C, C++, Python, C# API update for on device training (#15518 )	2023-04-21 11:36:01 -07:00
Zhang Lei	a6d6e45be2	Tune block size for layer_norm considering #rows and GPU resource (#15410 ) fine tune cuda layernorm block size considering number of rows to process together with column number, and hardware resources (number of SMs, etc) Co-authored-by: Lei Zhang <phill.zhang@gmail.com>	2023-04-21 09:49:21 -07:00
Rachel Guo	2cb3fb18b5	Integrate React Native E2E test with detox framework (#15133 ) ### Description <!-- Describe your changes. --> Integrate react native e2e test framework with detox. https://wix.github.io/Detox/ Good build in CI: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=946695&view=results ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Write cross-platform end-to-end tests in JavaScript. Resolve flaky e2e tests in react native ci pipelines. --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-04-21 09:46:26 -07:00
Adrian Lizarraga	f3d04cd1be	[QNN EP] Update Windows ARM64 pipeline to use Visual Studio 2022 (#15607 ) ### Description - Updates the QNN Windows ARM64 pipeline to use a new image with Visual Studio 2022 (updated from VS 2019) - Creates a new gtest fixture class that skips tests for the QNN CPU backend if we detect that the QNN CPU backend is not available/functional. The current windows arm64 vm does not support any QNN backend. ### Motivation and Context Visual Studio 2022 adds support for native arm64 compilation. This pipeline will help catch any build regressions on Windows ARM64 w/ VS 2022.	2023-04-21 09:31:10 -07:00
Yi Zhang	84746a8efe	Revert "Retry the step of Start Android simulator (#15584 )" (#15620 ) This reverts commit `64b63921a2`. ### Motivation and Context From https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=970086&view=logs&s=28fb2bf2-39c5-5feb-1887-4904233f6193&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3 It's useless to rerun the step.	2023-04-21 08:33:18 -07:00
kunal-vaishnavi	3de33e00c7	Fix issues for Whisper export with beam search (#15619 ) ### Description This PR fixes an issue with calling the ORT transformer optimizer script on the custom export of Whisper with beam search. It also includes the [fix](https://github.com/microsoft/onnxruntime/pull/15616) for the GPU out-of-memory issue. ### Motivation and Context With this PR fix, the optimizer runs as described in the [Whisper model optimization PR](https://github.com/microsoft/onnxruntime/pull/15473).	2023-04-21 00:08:58 -07:00
Ted Themistokleous	9011613b65	Add Trilu and GatherND to the list of supported OPs for MIGraphX EP (#15463 ) Add support entry for Trilu op to be recognized in the MIGraphX EP Co-authored-by: Ted Themistokleous <tthemist@amd.com>	2023-04-21 14:46:28 +08:00
Yi Zhang	a2f80a006b	update target framework to dotnet6.0 (#15615 ) ### Description Upgrade dotnet E2E test target framework to dotnet6.0 ### Motivation and Context Fix dotnet3.1 deprecation issue which broke nuget building pipeline. The error message in NuGet_Test_Linux_CPU was ``` To install missing framework, download: https://aka.ms/dotnet-core-applaunch?framework=Microsoft.NETCore.App&framework_version=3.1.0&arch=x64&rid=ubuntu.20.04-x64 . Please check the diagnostic logs for more information. ``` Test Run: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=300655&view=results.	2023-04-21 12:11:43 +08:00
Chi Lo	6cf080ccbf	Temporarily disable two tests for TRT EP (#15578 ) We are investigating an issue introduced by TRT 8.6 which causes [TRT EP CI](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=967950&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=66420422-c7d6-5f71-625c-4b7851c9b9ba) fail. Disable two tests for now until the issue is root caused and fixed.	2023-04-20 16:32:56 -07:00
Justin Chu	dfa06bf81b	Add link to doc for lintrunner in CI (#15604 ) Add a link to point to the doc where users can find instructions to set up lintrunner should there be any lint issues in CI.	2023-04-20 15:54:14 -07:00
Dmitri Smirnov	a5dec8eedf	[C# ] Improve string marshalling and reduce GC pressure (#15545 ) ### Description Reduce a number of auxillary objects created to reduce GC pressure. Eliminate GCHandle type of memory pinning in most of the places. Improve string marshalling by allocating unmanaged memory that does not require pinning. Change native methods from `IntPtr` to `byte[]` (marshalling pinning is more efficient). Allocate input/output UTF-8 names in unmanaged heap for the lifetime of InferenceSession. So we do not keep converting them and pinning on every Run. Introduce a new native API that allows to allocate and convert/copy strings directly into a native tensor. The PR delivers around 50% latency improvements and less GC pauses. Inspired by: https://github.com/microsoft/onnxruntime/pull/15520 ### Motivation and Context Client experience GC pressure and performance degradation when dealing with string tensors. Co-Authored-By: @tannergooding	2023-04-20 15:12:51 -07:00
Yufeng Li	373f912e51	add quantization support for whisper (#15589 ) ### Description <!-- Describe your changes. --> Add dynamic quantization support for whisper model. There are 3 options to try out: - quantize_embedding_layer: enable to quantize embedding layer of decoder model or not - quantize_per_channel: enable to quantize per channel for Gemm or MatMul - quantize_reduce_range: use 7bit to quantize MatMul or Gemm. Use when hitting accuracy issue on x64 cpus without VNNI.	2023-04-20 14:22:11 -07:00
Edward Chen	4b74cb1741	Make docker command fail if bash command fails. (#15564 ) Add `set -e` so that failing bash commands will cause the containing docker command to fail.	2023-04-20 13:38:58 -07:00
Baiju Meswani	46210556f0	BatchnormInternal avoid setting num_channels if input shape is not known (#15544 )	2023-04-20 12:57:16 -07:00
Baiju Meswani	11b0a18de6	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
Justin Chu	1f7c2f724f	Fix lintrunner configurations (#15586 ) ### Description - Fix lintrunner configurations to always use `python` instead of `python3`. - Set up dependabot - Moved dependencies to requirements-lintrunner to allow dependabot to update it similar to https://github.com/onnx/onnx/pull/5124	2023-04-20 08:54:26 -07:00
Adrian Lizarraga	9df96c7d5b	[QNN EP] Fix shape inference of NHWC Resize (#15477 ) ### Description Adds schema for NHWC Resize that uses the default ONNX type/shape inferencing. ### Motivation and Context The QNN EP requires the Resize operator to be NHWC. Currently, the Resize operator fails type and shape inference because the current schema changes the input to NCHW, but the `scales` and `sizes` inputs remain in NHWC. This PR adds a schema for NHWC Resize that allows it to use the default ONNX type/shape inference while still remaining in the internal NHWC domain.	2023-04-20 07:25:25 -07:00
Scott McKay	446c478fbd	Add iOS Swift Package Manager support (#15297 ) ### Description <!-- Describe your changes. --> Add Swift Package Manager (SPM) support for ORT based on #14621 - uses the existing objective-c bindings - some re-organization of the directory structure was required but the contents of the files are unchanged, apart from adjustments due to file movements Add tool for updating ORT native pod used in the SPM package Update CIs to use ORT native pod from build, and build/test using SPM ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> iOS developers are using SPM as much as cocoapods, so adding SPM means both are catered for.	2023-04-20 16:18:35 +10:00
Yi Zhang	64b63921a2	Retry the step of Start Android simulator (#15584 ) ### Description Add Retry once There's a failure in `Start Android Simulator`. ### Motivation and Context `Start Android Simulator` isn't stable enough and the pipeline would hang. We could find many instances in https://dev.azure.com/onnxruntime/onnxruntime/_pipeline/analytics/stageawareoutcome?definitionId=188&contextType=build	2023-04-20 12:06:35 +08:00
Yi Zhang	5b6f79e79b	Improve windows build cache steps (#15537 ) ### Description 1. Split deps' compilation cache and ort's 2. reduce the caches generation in merge branch. ### Motivation and Context Reduce pipeline cache stage.	2023-04-20 09:42:22 +08:00
Chen Fu	29d00fb776	Set proper default values for pool attributes (#15559 ) ### Description Setting proper default value for attributes of pool operators ### Motivation and Context Fixed AB#14719 Global pooling and pooling operators usually share the same underlying implementation. When we detect the operator is global, code for setting up the attributes is skipped. This may cause un-deterministic behavior.	2023-04-19 17:24:35 -07:00
George Nash	f2889b41c1	[AMX] Update assembler check (#15501 ) A recent commit added an assembler check if the ASM dialect was ATT This unfortunately broke the AMX build for systems that don't have the ASM-ATT dialect. This change assumes if the CMAKE_ASM-ATT_COMPILER_ID is not found and the CMAKE_ASM_COMPILER_ID is "GNU" based on all the other already passed checks AMX is supported by the compiler and assembler. ### Description ### Motivation and Context On my build system the recent change to add the ASM-ATT version check disabled AMX code from the build. --------- Signed-off-by: George Nash <george.nash@intel.com>	2023-04-19 14:16:26 -07:00
Chen Fu	142220ad87	Fix cmake 3.25 debug info config (#15565 ) ### Description https://github.com/microsoft/onnxruntime/pull/15538 Above pull request breaks Windows build on cmake 3.25 or earlier. This should fix it. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-04-19 09:14:19 -07:00
Yi Zhang	573e4cf95f	[Fix] Python Packaging Pipeline exception. (#15568 ) ### Description supplement of #15299 ### Motivation and Context It broke Python Packaging Pipeline since April 12.	2023-04-19 21:57:14 +08:00
PeixuanZuo	59ea35d592	[ROCm] add CK GroupNorm to GroupNormTunable (#15510 ) - Add CK GroupNorm to GroupNormTunable. - Reduce configuration of GroupNormNHWCOp because CK implementation is better. The performance gain on stable diffusion v1.5. Before: ``` 'height': 512 'width': 512 'steps': 50 'batch_size': 1 'batch_count': 5 'num_prompts': 1 'average_latency': 2.4782688856124877 'median_latency': 2.4783748388290405 'provider': 'ROCMExecutionProvider' 'disable_safety_checker': True ``` After: ``` 'height': 512, 'width': 512, 'steps': 50, 'batch_size': 1, 'batch_count': 5, 'num_prompts': 1, 'average_latency': 2.107170510292053, 'median_latency': 2.1067750453948975, 'first_run_memory_MB': -1, 'second_run_memory_MB': -1, 'provider': 'ROCMExecutionProvider', 'disable_safety_checker': True ```	2023-04-19 13:54:59 +08:00
Dmitri Smirnov	a66af390fa	[C#] Allow passing various options when creating singleton Environment object. (#14723 ) ### Description Re-work OrtEnv class so we can pass various options when creating the environment such as: - logId - initial logging level - thread options - user supplied logging function Create the default instance when SessionOptions are instantiated as users often forget to do so. ### Motivation and Context We lack this capability. Inspired by https://github.com/microsoft/onnxruntime/pull/13822 https://github.com/microsoft/onnxruntime/pull/13951 https://github.com/microsoft/onnxruntime/pull/11593 Cc: @thoron --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-04-18 21:49:55 -07:00
Chi Lo	6115c8fd1f	Add TRT plugins support using custom ops (#13847 ) This PR makes ORT support TRT plugin using custom ops. ORT TRT can automatically register all TRT plugins from TRT plugins registry as custom ops. There is no code change needed for ORT when new TRT plugins are introduced. Previous way for ORT to support TRT plugins was using contrib ops, but there are some concerns about it: - Contrib ops are shipped as part of the ORT binary by default. TRT related plugins should not be in the default ORT. - Contrib ops are designed for internal ops and developed for cpu and cuda EPs. Therefore, using custom ops is a good approach to support TRT plugins. Followings are the major modifications: 1. Add new `GetCustomOpDomainList` provider api which allows provider to create its own custom op domain list and ORT can register this domain list. Provider has the responsibility to free all the custom op domain instances it created. 2. Move OrtCustomOpDomain struct definition to framework_provider_common.h since this struct is being used by framework and EPs now. 3. There are several TRT plugins registered as onnx schema op through contrib op with onnx domain. In order not to break the old models using those TRT plugins which were registered with ONNX domain and maintain backward compatible, we need to keep the old/legacy TRT plugins with onnx domain. Moving forward, all newly added TRT plugins should be registered with `trt.plugins` domain. 4. TRT plugin doesn't have an api to get number of inputs/outputs of the registered plugins, so ORT TRT uses variadic inputs/outputs to bypass the onnx node validation. 5. Add new trt provider option, `trt_extra_plugin_lib_paths`, user can specify any extra plugin lib, for example, `fastertransformer/build/lib/libvit_plugin.so` or `fastertransformer/build/lib/libvit_plugin.so;fastertransformer/build/lib/libvit_plugin_v2.so`	2023-04-18 20:24:32 -07:00
Yulong Wang	cb83d2b1a9	[js/web] allow script to use partial success build (#15547 ) ### Description allow script `npm run pull:wasm` to use partial success build.	2023-04-18 17:41:47 -07:00
kunal-vaishnavi	901c2bc384	Whisper Model Optimization (#15473 ) ### Description This PR contains fusion-level and kernel-level optimizations for [OpenAI's Whisper](https://github.com/openai/whisper). Some of the added optimizations include: - Pruning of duplicate/unnecessary inputs and outputs - Fusion support for Whisper models with or without these inputs/outputs (e.g. with these inputs/outputs if exporting with an older official Optimum version, without these inputs/outputs if exporting with Optimum from source) - Attention fusions - For Whisper's encoder and decoder - Modified symbolic shape inference for present output when no past input exists (for decoder) - Multi-head attention fusions - For Whisper's decoder and decoder with past - Packed MatMul for the 3 MatMuls excluded in multi-head attention fusion - Attention kernel changes - CPU: - Different Q and KV sequence lengths - Parallel memset for large sequence lengths - Convert broadcast add after MatMul of Q and K (add_qk) to element-wise add - Separate present key-value output into present key and present value (for multi-head attention spec) - CUDA: - Use memory efficient attention compute kernel with present state (for decoder) - Multi-head attention kernel changes - CPU: - Introduction of multi-head attention CPU kernel (previously did not exist) - Use AddBiasReshape instead of AddBiasTranspose when sequence length = 1 (for decoder with past) - Different Q, K, V input shapes - Pass past key and past value directly as key and value - CUDA: - Use memory efficient attention compute kernel with past and/or present state (for decoder with past) ### Usage To use the optimizations, run the ORT transformer optimizer script as follows: ``` $ cd onnxruntime/onnxruntime/python/tools/transformers/ $ python3 optimizer.py --input <filename>.onnx --output <filename>.onnx --model_type bart --num_heads <number of attention heads, depends on the size of the whisper model used> --hidden_size <attention hidden size, depends on the size of the whisper model used> --use_external_data_format --use_multi_head_attention ``` Once optimized, here's an example of how to run Whisper with [Hugging Face's Optimum](https://github.com/huggingface/optimum): ``` from transformers.onnx.utils import get_preprocessor from optimum.onnxruntime import ORTModelForSpeechSeq2Seq from optimum.pipelines import pipeline as ort_pipeline import whisper # Installed from OpenAI's repo - setup instructions at https://github.com/openai/whisper/ directory = './whisper_opt' # Where the optimized ONNX models are located model_name = 'openai/whisper-tiny' device = 'cpu' # Get pipeline processor = get_preprocessor(model_name) model = ORTModelForSpeechSeq2Seq.from_pretrained( directory, use_io_binding=(device == 'cuda'), provider='CPUExecutionProvider', ).to(device) pipe = ort_pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=(-1 if device == 'cpu' else 0), ) # Load audio file and run pipeline audio = whisper.load_audio('tests/jfk.flac') audio = whisper.pad_or_trim(audio) outputs = pipe([audio]) print(outputs) ``` Note: In order to use these changes with Optimum, it is recommended to use Optimum from source to have the following changes: - https://github.com/huggingface/optimum/pull/872 - https://github.com/huggingface/optimum/pull/920 ### Motivation and Context This PR helps the following issues: - https://github.com/microsoft/onnxruntime/issues/15100 - https://github.com/microsoft/onnxruntime/issues/15235 - https://github.com/huggingface/optimum/issues/869 (work in progress) This PR can be used with the other currently merged Whisper PRs: - https://github.com/microsoft/onnxruntime/pull/15247 - https://github.com/microsoft/onnxruntime/pull/15339 - https://github.com/microsoft/onnxruntime/pull/15362 - https://github.com/microsoft/onnxruntime/pull/15365 - https://github.com/microsoft/onnxruntime/pull/15427 This PR uses changes from the following merged PRs: - https://github.com/microsoft/onnxruntime/pull/14198 - https://github.com/microsoft/onnxruntime/pull/14146 - https://github.com/microsoft/onnxruntime/pull/14201 - https://github.com/microsoft/onnxruntime/pull/14928 (this introduced the new multi-head attention spec)	2023-04-18 17:13:54 -07:00
Ye Wang	53d304d4d2	optimize gated gru cuda kernel (#15525 ) ### Description <!-- Describe your changes. --> Improvement with Tulrv6 on A100 ![image](https://user-images.githubusercontent.com/52801275/232602055-518726da-3a9a-4e2e-8def-2cd855c8225d.png) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-04-18 14:23:43 -07:00
Justin Chu	831734a46e	Fix lint errors missed due to new commits (#15558 ) Follow up of #15524	2023-04-18 12:55:02 -07:00
Yi Zhang	698e9f71cd	Improve cache hit rate in windows build (#15538 ) ### Description 1. Update /Zi to /Z7 in abseil project while using cache 2. Skip target_precompile_headers while using cache ### Motivation and Context There're about 1/4 uncacheable calls in Windows GPU compilation with cache. ``` Uncacheable calls: 441 / 1641 (26.87%) Could not use precompiled header: 361 / 441 (81.86%) Preprocessing failed: 1 / 441 ( 0.23%) Unsupported compiler option: 79 / 441 (17.91%) ``` https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=961916&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=9b927034-e3ef-5e25-c6df-387bc37acd63&l=21 The root cause of `Unsupported compiler option` is that /Zi in Abseil isn't updated to /Z7. The root cause of `Could not use precompiled header` is the `target_precompile_headers` creates cmake_pch.pch every time and it's hash value is changed too. ### Result It could reduce compilation time by another 20%. For example: It took 16m43 in CUDA training compilation on Windows. It takes 13m32 after the change. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=964002&view=logs&s=959c6b43-5937-53e5-5f36-e53cb0249117 ### N.B. In winml project, it's using own target_precompiled_header https://github.com/microsoft/onnxruntime/blob/main/cmake/precompiled_header.cmake. Just let it be.	2023-04-18 09:31:35 -07:00
Justin Chu	cf19c3697d	Run clang-format in CI (#15524 ) ### Description Run clang-format in CI. Formatted all c/c++, objective-c/c++ files. Excluded ``` 'onnxruntime/core/mlas/', 'onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/', ``` because they contain assembly or is data heavy ### Motivation and Context Coding style consistency	2023-04-18 09:26:58 -07:00
Sheil Kumar	2700d01642	Add Bluestein Z-Chirp CPU EP implementation for the DFT operator (#15522 ) Add Bluestein Z-Chirp CPU EP implementation for the DFT operator While the current DFT operator has an FFT implementation for signal lengths of size 2^N, it currently only has a naive implementation for completeness sake. The non-power of 2 case is very slow. The appropriate algorithm to use here is the Bluestein Z-Chirp algorithm, which evalutates a single DFT with 3 FFT calculations (2 forwards and 1 inverse) and a chirp signal. Luckily, the chirp signal and one of these FFT operations can be precomputed (B). The resulting computation performs multiple DFTs on longer signals, but in the end is faster because the individual sub-DFT computations can leverage the faster FFT implementation under the hood. --------- Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>	2023-04-18 09:06:05 -07:00
liqun Fu	919d8f2660	update with onnx main (#14929 )	2023-04-18 08:42:51 -07:00
pengwa	d8dfda2e08	Minor fix for differently scoped cpu_ep usage (#15550 ) ### Minor fix for differently scoped cpu_ep cpu_ep is under `#ifndef DISABLE_CONTRIB_OPS`, but one of its usage is not under the same condition. ``` #ifndef DISABLE_CONTRIB_OPS const InlinedHashSet<std::string_view> cpu_ep = {onnxruntime::kCpuExecutionProvider}; #endif ``` ### Motivation and Context Postmoterm: https://github.com/microsoft/onnxruntime/pull/15461 passed all CIs except Linux/Windows TVM CIs. I did not check the detailed error message then because they are failed for some reason for a few days at least. While checking the details, after PR 15461, the error messge changes from Before constant sharing change: TVM CI error message: ``` https://github.com/microsoft/onnxruntime/actions/runs/4700368634/jobs/8334955814 ERROR: testBooleanInputs (__main__.TestInferenceSession) ---------------------------------------------------------------------- Traceback (most recent call last): File "onnxruntime_test_python.py", line 617, in testBooleanInputs sess = onnxrt.InferenceSession(get_name("logicaland.onnx"), providers=available_providers) File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 435, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\onnxruntime\onnxruntime\onnxruntime\core\providers\tvm\tvm_api.cc:49 onnxruntime::tvm::TVMCompile compile != nullptr was false. Unable to retrieve 'tvm_onnx_import_and_compile'. ``` to ``` D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,67): error C2065: 'cpu_ep': undeclared identifier [D:\a\onnxruntime\onnxruntime\build\Release\onnxruntime_optimizer.vcxproj] D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,19): error C2672: ``` This PR fixes the build the issue, The error message of Windows/Linux TVM CIs are back to the original ones.	2023-04-18 16:51:11 +08:00
PeixuanZuo	8bec6cd029	Refactor FusedConv test (#15512 ) Refactor FusedConv test.	2023-04-18 15:22:31 +08:00
Justin Chu	9d26f8f4fe	Use os.fspath on Path (#15530 ) ### Description <!-- Describe your changes. --> Use os.fspath instead of str() on a path object. ### Motivation and Context I learned today that os.fspath is the right way to go: https://github.com/charliermarsh/ruff/issues/3675#issuecomment-1494975508	2023-04-17 16:59:40 -07:00
Zhang Lei	a30b57da6e	Fix/Enhance convert_generation tool for SkipLayerNorm, op_block_list... (#15368 ) After SkipLayernorm using fp32 for internal calculation and using numeric stable algorithm, enable it for fp16 here. Make the op_block_list a command line argument to help future tools. Other minor changes.	2023-04-17 14:44:37 -07:00
Justin Chu	a36caba073	Bump ruff in CI (#15533 ) ### Description Bump ruff version in CI and fixed new lint errors. - This change enables the flake8-implicit-str-concat rules which helps detect unintended string concatenations: https://beta.ruff.rs/docs/rules/#flake8-implicit-str-concat-isc - Update gitignore to include common python files that we want to exclude. ### Motivation and Context Code quality	2023-04-17 10:11:44 -07:00

1 2 3 4 5 ...

8616 commits