onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Yulong Wang	c5d28cac4d	Initial WebGPU EP checkin (#22318 ) ### Description This change introduces the WebGPU EP into ONNX Runtime. To make the PR as simple as possible, this PR excluded the following: - C API changes for WebGPU EP - actual implementation of WebGPU EP. Currently in this PR, WebGPU is a stub implementation that does not register any kernel. - Python IO Binding update - Node.js IO Binding update This PR now contains only 43 file changes (while the working branch contains 130+) and hopefully this makes it easier to review. There is going to be separated PRs for each mentioned above. Current working branch: #21904	2024-10-08 16:10:46 -07:00
Dmitri Smirnov	d9de054eb5	Multi-Lora support (#22046 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-30 15:59:07 -07:00
George Wu	944d87381d	[QNN EP] set up py packaging pipeline for Linux x64 (#22132 ) set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn this can be used for offline context binary generation.	2024-09-18 23:24:32 -07:00
Tianlei Wu	a9740d6f96	Add onnx export script for segment anything v2 (#22119 ) ### Description Add ONNX export script for segment anything v2 (SAM2). ### Limitations * Does not support video. Only support image right now. * The decoder does not support batch inference. ### Credits The demo that is based on [SAM2 notebook](https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb), and modified to run with ORT. The export of decoder is inspired by https://github.com/vietanhdev/samexporter. ### Demo Example output of demo: ![sam2_demo](https://github.com/user-attachments/assets/9a9fa360-8c20-482e-9935-a7aba9cf15de) ### Motivation and Context For support optimization of SAM2 image segmentation.	2024-09-18 14:31:59 -07:00
Ranjit Ranjan	02e3a430af	[AIX] Python binding enablement and gcc support (#21934 ) ### Description Enabling python binding and gcc support for AIX. ### Motivation and Context Code changes in this PR contains: 1. python binding enablement 2. gcc building support Below are list of files and the description. 1. cmake/CMakeLists.txt [gcc building support] -no-unused-function compiler flag addition for IBMClang 2. cmake/external/eigen.cmake [gcc building support] AIX check for applying the AIX patch 3. cmake/onnxruntime_python.cmake [python binding ] putting NOT AIX check for -Xlinker 4. cmake/onnxruntime_unittests.cmake [gcc building support] Fix for gtest behavior. Check the comment . [python binding ] using -Wl,-brtl for linking onnxruntime_providers_shared in test_execution_provider 5. cmake/patches/eigen/eigen-aix.patch [gcc building support] In AIX gcc, we are hitting __builtin_cpu_supports("mma") which is not supported yet. So patching code for this method . Patched code will check for P10 Processor at run-time and based on that routine will be set. 6. onnxruntime/python/onnxruntime_validation.py [python binding ] Adding AIX check in check_distro_info() 7. onnxruntime/test/providers/cpu/generator/random_test.cc [gcc building support] updating previous check for AIX , along with clang. So in case of gcc, else block will hit. 8. onnxruntime/test/python/onnxruntime_test_python.py [python binding ] powerpc check on platform.processor() 9. setup.py [python binding ] Adding AIX check for list of libs.	2024-08-30 12:17:26 -07:00
Edward Chen	a5ce65d87a	Clean up some mobile package related files and their usages. (#21606 ) The mobile packages have been removed.	2024-08-05 16:38:20 -07:00
Julius Tischbein	1391354265	Adding CUDNN Frontend and use for CUDA NN Convolution (#19470 ) ### Description Added CUDNN Frontend and used it for NHWC convolutions, and optionally fuse activation. #### Backward compatible - For model existed with FusedConv, model can still run. - If ORT is built with cuDNN 8, cuDNN frontend will not be built into binary. Old kernels (using cudnn backend APIs) are used. #### Major Changes - For cuDNN 9, we will enable cudnn frontend to fuse convolution and bias when a provider option `fuse_conv_bias=1`. - Remove the fusion of FusedConv from graph transformer for CUDA provider, so there will not be FusedConv be added to graph for CUDA EP in the future. - Update cmake files regarding to cudnn settings. The search order of CUDNN installation in build are like the following: * environment variable `CUDNN_PATH` * `onnxruntime_CUDNN_HOME` cmake extra defines. If a build starts from build.py/build.sh, user can pass it through `--cudnn_home` parameter, or by environment variable `CUDNN_HOME` if `--cudnn_home` not used. * cudnn python package installation directory like python3.xx/site-packages/nvidia/cudnn * CUDA installation path #### Potential Issues - If ORT is built with cuDNN 8, FusedConv fusion is no longer done automatically, so some model might have performance regression. If user still wants FusedConv operator for performance reason, they can still have multiple ways to walkaround: like use older version of onnxruntime; or use older version of ORT to save optimized onnx, then run with latest version of ORT. We believe that majority users have moved to cudnn 9 when 1.20 release (since the default in ORT and PyTorch is cudnn 9 for 3 months when 1.20 release), so the impact is small. - cuDNN graph uses TF32 by default, and user cannot disable TF32 through the use_tf32 cuda provider option. If user encounters accuracy issue (like in testing), user has to set environment variable `NVIDIA_TF32_OVERRIDE=0` to disable TF32. Need update the document of use_tf32 later. #### Follow ups This is one of PRs that target to enable NHWC convolution in CUDA EP by default if device supports it. There are other changes will follow up to make it possible. (1) Enable `prefer_nhwc` by default for device with sm >= 70. (2) Change `fuse_conv_bias=1` by default after more testing. (3) Add other NHWC operators (like Resize or UpSample). ### Motivation and Context The new CUDNN Frontend library provides the functionality to fuse operations and provides new heuristics for kernel selection. Here it fuses the convolution with the pointwise bias operation. On the [NVIDIA ResNet50](https://pytorch.org/hub/nvidia_deeplearningexamples_resnet50/) we get a performance boost from 49.1144 ms to 42.4643 ms per inference on a 2560x1440 input (`onnxruntime_perf_test -e cuda -I -q -r 100-d 1 -i 'prefer_nhwc\|1' resnet50.onnx`). --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com>	2024-08-02 15:16:42 -07:00
Tianlei Wu	2b7e2a5bd0	[CUDA] Fix cuda provider fallback inconsistency (#21425 ) * Fix fallback setting (cuda still falls back to cuda). * Fix cuda provider fallback inconsistent with/without CUDA_PATH environment variable. * Add cuda and cudnn major version requirement in error message. Example result in Windows: ``` >>> import onnxruntime >>> ort_session = onnxruntime.InferenceSession("model.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) 2024-07-19 17:43:44.2260019 [E:onnxruntime:Default, provider_bridge_ort.cc:1972 onnxruntime::TryGetProviderInfo_CUDA] D:\onnxruntime\onnxruntime\core\session\provider_bridge_ort.cc:1636 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\.conda\envs\py310\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll" 2024-07-19 17:43:44.2312351 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:970 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12., and the latest MSVC runtime. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. >>> ort_session <onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x0000016BB2DF7D60> >>> ort_session.get_providers() ['CPUExecutionProvider'] ``` Example result in Linux: ``` >>> import onnxruntime >>> ort_session = onnxruntime.InferenceSession("resnet50-v2-7.onnx", providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) 2024-07-20 20:33:26.486974543 [E:onnxruntime:Default, provider_bridge_ort.cc:1972 TryGetProviderInfo_CUDA] /work/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1636 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.12: cannot open shared object file: No such file or directory 2024-07-20 20:33:26.487034646 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:961 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9. and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. >>> ort_session.get_providers() ['CPUExecutionProvider'] ``` ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/21424	2024-07-23 11:58:04 -07:00
Baiju Meswani	116398c1a4	onnxruntime shared lib inside python package (#21223 )	2024-07-02 15:37:50 -07:00
Changming Sun	d1ab94c2b0	Add compatibility for NumPy 2.0 (#21085 ) ### Description As suggested by SciPy's doc, we will `Build against NumPy 2.0.0, then it will work for all NumPy versions with the same major version number (NumPy does maintain backwards ABI compatibility), and as far back as NumPy 1.19 series at the time of writing` I think it works because in [numpyconfig.h#L64](https://github.com/numpy/numpy/blob/main/numpy/_core/include/numpy/numpyconfig.h#L64) there is a macro NPY_FEATURE_VERSION. By default it is set to NPY_1_19_API_VERSION. And the NPY_FEATURE_VERSION macro controls ABI. This PR only upgrade the build time dependency; When a user installs ONNX Runtime, they still can use numpy 1.x. ### Motivation and Context Recently numpy published a new version, 2.0.0, which is incompatible with the latest ONNX Runtime release.	2024-06-27 13:50:53 -07:00
Changming Sun	be423747b1	Delete pyop (#21094 ) ### Description Remove the "--enable_language_interop_ops" build flag, because the code is incompatible with the latest numpy, and the build flag is not used anywhere except a macOS CI pipeline. It does not seem to have a ship plan. ### Motivation and Context The build error was: ``` onnxruntime/core/language_interop_ops/pyop/pyop.cc:122:85: error: no member named 'elsize' in '_PyArray_Descr' static_cast<int64_t>(PyArray_DescrFromType(type)->elsize), ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ```	2024-06-19 16:21:33 -07:00
Scott McKay	159fe9d4f3	Update to mobile model usability checker (#19843 ) ### Description <!-- Describe your changes. --> - Add check for CoreML MLProgram supported ops - Only check usability with ORT Mobile package if requested - this package will be deprecated so info is a) of minimal value and b) can be confusing. - Output more things at INFO level - a lot of meaningful info was only output at DEBUG level. The default INFO level is more useful - dump full partition info at DEBUG level - Check subgraphs fully - CoreML can handle a subgraph - TBD if we want to add support for adding a subgraph to the parent graph for Loop and If nodes - most likely will be required for simple If nodes to be performant - Check 5D CoreML limitation ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve helper tools --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-06-18 07:50:33 +10:00
George Wu	49b2bebe85	[qnn ep] include qnn sdk in onnxruntime-qnn python whl (#20485 ) script changes to include qnn sdk libs with onnxruntime-qnn python package.	2024-04-29 09:44:54 -07:00
Rachel Guo	14fcf0a52d	Support visionos build (#20365 ) ### Description <!-- Describe your changes. --> This PR supports a build of onnxruntime.xcframework for xros/xrsimulator for visionos via the build command of `python3 tools/ci_build/github/apple/build_apple_framework.py --config Release/Debug tools/ci_build/github/apple/default_vision_os_framework_build_settings.json`. For officially include visionos in ios cocoapods package and testing in CI, would require separate work for upgrading the Xcode version & upgrade macOS CI agent to macos-13-arm64 or higher. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> visionos support: https://github.com/microsoft/onnxruntime/discussions/19313 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-23 18:15:07 -07:00
Adam Louly	ee74fb6908	Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287 ) ### Description Introducing a new class ORTPipelineModule to handle wrapping layers in DeepSpeed pipeline parallel. ### Motivation and Context To support pipeline parallelism on ORTModule. This PR will include an initial support of deepspeed Pipeline parallelism. - [x] Support Pipeline parallel where layers are nn Modules in Sequential. - [ ] Support LayerSpec and TiedLayerSpec - [ ] Enable partitioning to accept List - [ ] Full-GPU Graph Consolidation - [ ] Subgraph Merging for Inference	2024-04-18 11:30:15 -07:00
George Wu	08d208b969	[QNN EP] refactor QNN deps/copy logic. start copying deps to target python loc… (#20317 ) copy QNN deps when building python bindings as well. tweak the wildcard to only copy QNN related files. latest sdk from Qualcomm (>= 2.21) also include SNPE dll's which we don't want to include.	2024-04-15 22:33:12 -07:00
Maximilian Müller	c20ced4132	Use CMake's find package for CUDA libs (#19673 ) ### Description Answers issue #19640 More details are in the issue, basically I am changing all the include directory and link directory usage to CMake's `CUDA::*` targets	2024-02-27 11:26:48 -08:00
Ye Wang	aaf32fb1b1	phi2 conversion/optimization script (#19338 ) ### Description <!-- Describe your changes. --> This PR adds onnx conversion script for dynamo exported phi2, optimization script, and inference example script A readme file is added as documentation. https://github.com/microsoft/onnxruntime/tree/wangye/phi2_doc/onnxruntime/python/tools/transformers/models/phi2#readme ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-02-05 10:15:16 -08:00
Yueqing Zhang	1d6f13fb92	[VitisAI] Refactor the VAIEP to use MSFT's standalone API (#19058 ) ### Description <!-- Describe your changes. --> Refactor the VAIEP to use MSFT's standalone API ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Vitis ONNX RT VAI should switch to using the standalone API for ONNX EPs in order to decouple the EP from onnxruntime.dll and the providers.dll. This will help to simplify customer deployment of applications and use cases that need to share their onnxruntime.dll with other applications. --------- Co-authored-by: Zhenze Wang <zhenzew@xilinx.com> Co-authored-by: zz002 <zhenze.wang@amd.com>	2024-01-31 21:08:26 -08:00
Wei-Sheng Chin	658e30eb33	Remove DORT since it's in PyTorch main now (#18996 ) Main code are removed and tests are modified to use DORT directly from PyTorch.	2024-01-04 12:59:47 -08:00
Adrian Lizarraga	81796a3081	[QNN EP Quantization] Add fusion preprocessing to QNN quantization (#18719 ) ### Description - Adds graph fusions to preprocessing step that can be called before creating a QDQ model for QNN EP. - Fuse Erf sequence to Gelu (adapted from [optimizer.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_gelu.py)). Required by QNN EP. - Fuse ReduceMean sequence to LayerNormaliation (adapted from [optimizer.py](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/fusion_layernorm.py)). Not required by QNN EP. - Fuse ReduceL2 sequence to LpNormalization (new, specific to QNN EP). Required by QNN EP. Example use: ```python3 from quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model # Added by this PR: model_updated = qnn_preprocess_model("model.fp32.onnx", "model.fp32.preprocessed.onnx", fuse_layernorm=True) model_to_quantize = "model.fp32.preprocessed.onnx" if model_updated else "model.fp32.onnx" # Quantize model ... qnn_config = get_qnn_qdq_config(model_to_quantize, data_reader, activation_type=QuantType.QUInt16) quantize(model_to_quantize, "model.qdq.onnx", qnn_config) ``` ### Motivation and Context Allow more models to be quantized for use with QNN EP --------- Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2023-12-12 08:43:04 -08:00
Adrian Lizarraga	e066fca777	[Quantization] Tensor quant overrides and QNN EP quantization configuration (#18465 ) ### Description #### 1. Adds `TensorQuantOverrides` extra option Allows specifying a dictionary of tensor-level quantization overrides: ``` TensorQuantOverrides = dictionary : Default is {}. Set tensor quantization overrides. The key is a tensor name and the value is a list of dictionaries. For per-tensor quantization, the list contains a single dictionary. For per-channel quantization, the list contains a dictionary for each channel in the tensor. Each dictionary contains optional overrides with the following keys and values. 'quant_type' = QuantType : The tensor's quantization data type. 'scale' = Float : The scale value to use. Must also specify `zero_point` if set. 'zero_point' = Int : The zero-point value to use. Must also specify `scale` is set. 'symmetric' = Bool : If the tensor should use symmetric quantization. Invalid if also set `scale` or `zero_point`. 'reduce_range' = Bool : If the quantization range should be reduced. Invalid if also set `scale` or `zero_point`. 'rmax' = Float : Override the maximum real tensor value in calibration data. Invalid if also set `scale` or `zero_point`. 'rmin' = Float : Override the minimum real tensor value in calibration data. Invalid if also set `scale` or `zero_point`. ``` - All of the options are optional. - Some combinations are invalid. - Ex: `rmax` and `rmin` are unnecessary if the `zero_point` and `scale` are also specified. Example for per-tensor quantization overrides: ```Python3 extra_options = { "TensorQuantOverrides": { "SIG_OUT": [{"scale": 1.0, "zero_point": 127}], "WGT": [{"quant_type": quantization.QuantType.QInt8, "symmetric": True, "reduce_range": True}], "BIAS": [{"quant_type": quantization.QuantType.QInt8, "symmetric": True, "reduce_range": True}], }, } ``` Example for per-channel quantization overrides (Conv weight and bias): ```Python3 extra_options = { "TensorQuantOverrides": { "WGT": [ { "quant_type": quantization.QuantType.QUInt8, "rmin": 0.0, "rmax": 2.5, "reduce_range": True, }, { "quant_type": quantization.QuantType.QUInt8, "rmin": 0.2, "rmax": 2.55, "reduce_range": False, }, ], "BIAS": [ {"zero_point": 0, "scale": 0.000621}, {"zero_point": 0, "scale": 0.23}, ], }, } ``` #### 2. Adds utilities to get the default QDQ configs for QNN EP Added a `quantization.execution_providers.qnn.get_qnn_qdq_config` method that inspects the model and returns suitable quantization configurations. Example usage: ```python3 from quantization import quantize, QuantType from quantization.execution_providers.qnn import get_qnn_qdq_config qnn_config = get_qnn_qdq_config(input_model_path, data_reader, activation_type=QuantType.QUInt16, weight_type=QuantType.QUInt8) quantize(input_model_path, output_model_path, qnn_config) ``` ### Motivation and Context Make it possible to create more QDQ models that run on QNN EP. --------- Signed-off-by: adrianlizarraga <adlizarraga@microsoft.com>	2023-12-04 17:54:58 -08:00
Akshay Sonawane	97cc40d75a	Add fusion patterns for conformer-transducer model (#18461 ) ### Description Add conformer-transducer model type to optimizer. This PR adds pattern matches for attention shown below: Unfused attention: ![ct_unfused](https://github.com/microsoft/onnxruntime/assets/111780983/46c71ed8-67e0-4607-85b1-bcadba5a2956) Fused attention: ![ct_fused](https://github.com/microsoft/onnxruntime/assets/111780983/fbb91c96-0d4b-4f0b-8674-1ae3b9b9a92e)	2023-11-18 23:39:04 -08:00
Ashwini Khade	02333293de	Removed all the deprecated python training code and related tests and utils (#18333 ) ### Description Motivation for this PR is code cleanup. 1. Remove all deprecated python code related to orttrainer, old checkpoint, related tests and utils 2. Cleanup orttraining_pybind_state.cc to remove all deprecated bindings.	2023-11-17 18:19:21 -08:00
Vincent Wang	b7408f7389	[ORTModule] ATen Efficient Attention and Triton Flash Attention (#17959 ) This PR is to support efficient attention and flash attention in ORTModule, including: - Use ATen to call efficient attention, which requires PyTorch 2.2.0 dev or newer. ORTMODULE_USE_EFFICIENT_ATTENTION=1 to enable. - Integrate Triton Flash attention, which requires triton==2.0.0.dev20221202. Need A100 or H100. ORTMODULE_USE_FLASH_ATTENTION=1 to enable. - A python transformer tool to match sub-graph by config and write transformer quickly. Current transformers supports attention mask for both efficient attn and flash attn, and dropout for efficient attn only. To support more training scenarios (such as causal mask in GPT2), more transformers need to be added. The feature is guarded by system environment variables, it won't effect any current behavior if not enabled. Since it requires specific PyTorch/Triton versions, related tests is not added for now.	2023-10-27 10:29:27 +08:00
kunal-vaishnavi	edac3ef150	Add LLaMA scripts (#17020 ) ### Description This PR adds the following scripts for LLaMA: - LLaMA conversion (support for TorchScript and Dynamo exporters) - LLaMA parity - LLaMA benchmark - LLaMA quantization - LLaMA integration with [Hugging Face Optimum](https://github.com/huggingface/optimum) ### Motivation and Context This PR adds scripts for using LLaMA. There is a [follow-up PR](https://github.com/microsoft/onnxruntime/pull/17043) for adding scripts for Whisper.	2023-08-22 18:05:11 -07:00
pengwa	a6887f171f	Refactor schema extraction and output unflattening (#16894 ) ### Motivation and Context When we handle PyTorch models' inputs in different places (ORTModule or others), it's common for us to flatten a structured data into a 1-D tensor list (required by lib for example torch.onnx.export, torch.autograd.Function.forward or ORT inference session), then do subsequent work, then unflatten back to original hierarchy as returned values. DeepStage3 hooks support work also need such a lib to do similar things, so I was proposing to extract this pair of APIs in training/utils/, which can be more used more generally. Also a comprehensive set of test data are used for testing unflatten/flatten in unit tests. Let me know if you have any other suggestions. ### Refactor schema extraction and output unflattening Move `_extract_schema` and `unflatten_user_output` in `orttraining/orttraining/python/training/ortmodule/_io.py` . to `extract_data_and_schema` and `unflatten_data_using_schema` in `orttraining/orttraining/python/training/utils/torch_io_helper.py` as shared libs, which can be used later by other features (deepspeed stage 3 hook rewrite). While there are still a few duplicated logic handling flatten with different task by recursively loop the data struct, will change them step by step in case of heavy review efforts.	2023-08-04 13:58:21 +08:00
Vincent Wang	c07a3b869c	Triton Codegen for ORTModule (#15831 ) Fuse connected elementwise and reduce Ops to TritonOp and codegen triton code to run the kernel. This PR is co-edited by @wejoncy and @er3x3	2023-07-13 18:17:58 +08:00
Edward Chen	6be7b03e53	Enable `-Wshorten-64-to-32` warning if available. (#16524 ) - Fix some warnings from Xcode build (`-Wshorten-64-to-32`). - Enable `-Wshorten-64-to-32` warning if available. Currently it's not fully enabled for `onnxruntime_test_all` and `onnxruntime_providers_xnnpack` yet. - Some clean up in build.py including setting CMake generator more consistently.	2023-07-07 08:11:44 -07:00
saurabh	a6ce7b339f	Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147 ) ### Description This PR enables execution of subgraphs in OVEP and currently, when OVEP developers install the onnxruntime-openvino package on windows from pypi, they would have to additionally download OpenVINO windows binaries and run the setupvars.bat script which sets the environment PATH to locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer sample. ### Motivation and Context Fix: We want to make the user experience easy for OVEP Python developers on windows platform. This fix, introduces a function add_openvino_libs_to_path at the location tools/python/util/add_openvino_win_libs.py. The above function, can be called by OVEP python users in the application code and that takes care of setting the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino) which was installed. This change also makes sure that add_openvino_libs_to_path() function is added to onnxruntime python package only when it is build for OpenVINO Execution Provider for ONNXRuntime and not for default ORT python package builds. New user experience for Python OVEP developers on windows platform: step 1: pip install onnxruntime-openvino step 2: pip install openvino step 3: <Add these 2 lines in the application code> import onnxruntime.tools.add_openvino_win_libs as utils utils.add_openvino_libs_to_path() --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2023-06-16 19:47:09 -07:00
Sumit Agarwal	70d2dc8209	[DML EP] Fix issue with --dml_path build option (#15972 ) ### Description DML_PACKAGE_DIR cmake variable is not getting set properly when dml_path build options is used. ### Motivation and Context - Why is this change required? What problem does it solve? It is required for DML Perf dashboard. <!--- If it fixes an open issue, please link to the issue here. -->	2023-05-24 19:20:40 -05:00
Changming Sun	0204594f90	Cleanup WASM cmake code (#15996 ) ### Description Remove the "onnxruntime_BUILD_WEBASSEMBLY" cmake option. Use `if (CMAKE_SYSTEM_NAME STREQUAL "Emscripten")` instead. It makes some code look more nature. For example, ```cmake if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR onnxruntime_BUILD_WEBASSEMBLY) ``` becomes ```cmake if (CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Android" OR CMAKE_SYSTEM_NAME STREQUAL "Emscripten") ```	2023-05-20 18:07:39 -07:00
Baiju Meswani	ba7b83ff3c	Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776 )	2023-05-03 13:08:35 -07:00
Changming Sun	5352f6d9b0	Make "--cuda_version" build arg optional (#15758 ) ### Description This change will allow us building CUDA EP without installing CUDA SDK on Windows. ### Motivation and Context Nvidia's CUDA installer comes with a VS extension. In the past, we require installing the extension. It is a little bit inconvenient since: 1. Visual Studio must be installed before CUDA SDK. CUDA's installer will not install the extension if your machine doesn't have Visual Studio. 2. We need to install CUDA SDK on our build machines, instead of just downloading it and using it. After this change, we will not need to install CUDA SDK on our build machines. So it will be easier to add a support for a different CUDA version. Also, fix two PreFast warnings.	2023-05-01 18:00:47 -07:00
Sumit Agarwal	4c4f688a93	[DML EP] Fix dml_external_project (#15656 ) ### Description While building ORT for DML EP with `dml_EXTERNAL_PROJECT` flag, 2 variables (`DML_SHARED_LIB`, `DML_PACKAGE_DIR`) value is not set properly. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-05-01 12:02:56 -07:00
kunal-vaishnavi	cfb8c0e2ca	Add Whisper custom export to wheel (#15685 ) ### Description This PR adds the Whisper custom export scripts to the wheel. ### Motivation and Context This enables access to the custom export scripts in the wheel.	2023-04-26 10:45:52 -07:00
kunal-vaishnavi	901c2bc384	Whisper Model Optimization (#15473 ) ### Description This PR contains fusion-level and kernel-level optimizations for [OpenAI's Whisper](https://github.com/openai/whisper). Some of the added optimizations include: - Pruning of duplicate/unnecessary inputs and outputs - Fusion support for Whisper models with or without these inputs/outputs (e.g. with these inputs/outputs if exporting with an older official Optimum version, without these inputs/outputs if exporting with Optimum from source) - Attention fusions - For Whisper's encoder and decoder - Modified symbolic shape inference for present output when no past input exists (for decoder) - Multi-head attention fusions - For Whisper's decoder and decoder with past - Packed MatMul for the 3 MatMuls excluded in multi-head attention fusion - Attention kernel changes - CPU: - Different Q and KV sequence lengths - Parallel memset for large sequence lengths - Convert broadcast add after MatMul of Q and K (add_qk) to element-wise add - Separate present key-value output into present key and present value (for multi-head attention spec) - CUDA: - Use memory efficient attention compute kernel with present state (for decoder) - Multi-head attention kernel changes - CPU: - Introduction of multi-head attention CPU kernel (previously did not exist) - Use AddBiasReshape instead of AddBiasTranspose when sequence length = 1 (for decoder with past) - Different Q, K, V input shapes - Pass past key and past value directly as key and value - CUDA: - Use memory efficient attention compute kernel with past and/or present state (for decoder with past) ### Usage To use the optimizations, run the ORT transformer optimizer script as follows: ``` $ cd onnxruntime/onnxruntime/python/tools/transformers/ $ python3 optimizer.py --input <filename>.onnx --output <filename>.onnx --model_type bart --num_heads <number of attention heads, depends on the size of the whisper model used> --hidden_size <attention hidden size, depends on the size of the whisper model used> --use_external_data_format --use_multi_head_attention ``` Once optimized, here's an example of how to run Whisper with [Hugging Face's Optimum](https://github.com/huggingface/optimum): ``` from transformers.onnx.utils import get_preprocessor from optimum.onnxruntime import ORTModelForSpeechSeq2Seq from optimum.pipelines import pipeline as ort_pipeline import whisper # Installed from OpenAI's repo - setup instructions at https://github.com/openai/whisper/ directory = './whisper_opt' # Where the optimized ONNX models are located model_name = 'openai/whisper-tiny' device = 'cpu' # Get pipeline processor = get_preprocessor(model_name) model = ORTModelForSpeechSeq2Seq.from_pretrained( directory, use_io_binding=(device == 'cuda'), provider='CPUExecutionProvider', ).to(device) pipe = ort_pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=(-1 if device == 'cpu' else 0), ) # Load audio file and run pipeline audio = whisper.load_audio('tests/jfk.flac') audio = whisper.pad_or_trim(audio) outputs = pipe([audio]) print(outputs) ``` Note: In order to use these changes with Optimum, it is recommended to use Optimum from source to have the following changes: - https://github.com/huggingface/optimum/pull/872 - https://github.com/huggingface/optimum/pull/920 ### Motivation and Context This PR helps the following issues: - https://github.com/microsoft/onnxruntime/issues/15100 - https://github.com/microsoft/onnxruntime/issues/15235 - https://github.com/huggingface/optimum/issues/869 (work in progress) This PR can be used with the other currently merged Whisper PRs: - https://github.com/microsoft/onnxruntime/pull/15247 - https://github.com/microsoft/onnxruntime/pull/15339 - https://github.com/microsoft/onnxruntime/pull/15362 - https://github.com/microsoft/onnxruntime/pull/15365 - https://github.com/microsoft/onnxruntime/pull/15427 This PR uses changes from the following merged PRs: - https://github.com/microsoft/onnxruntime/pull/14198 - https://github.com/microsoft/onnxruntime/pull/14146 - https://github.com/microsoft/onnxruntime/pull/14201 - https://github.com/microsoft/onnxruntime/pull/14928 (this introduced the new multi-head attention spec)	2023-04-18 17:13:54 -07:00
Changming Sun	d175e87a1f	Delete eager mode code and increase minimal required python version to 3.8 (#15450 ) ### Description 1. Delete eager mode code. 2. Increase the minimal required python version to 3.8.	2023-04-10 16:00:04 -07:00
pengwa	1d32285536	Statistics tool for ORTModule convergence parity (#15020 ) ### Statistics tool for ORTModule convergence parity As ORTModule get more and more validated, it is pretty fast to intergrade PyTorch based model with ORT. The same time, we need make sure once there is convergence issue, we don't spend months of time to investigate. As part of this efforts, this PR is introducing a tool to dump activation statistics without much involvement from users. The dumping results contains only some statistic numbers plus sampled data, which is not big, compared with dumping all the tensors, it is much faster and space efficient. For us to use it, two single lines are needed before wrapping ORTModule. For baseline run, need also apply the same trick. ``` + from onnxruntime.training.utils.hooks import SubscriberManager, StatisticsSubscriber + SubscriberManager.subscribe(model, [StatisticsSubscriber("pt_out", override_output_dir=True)]) ``` Once you run the steps, following command can be used to merge result into per-step-summary respectively for ORT and baseline runs. ```bash python -m onnxruntime.training.utils.hooks.merge_activation_summary --pt_dir pt_out --ort_dir ort_out --output_dir /tmp/output ``` Docs is added here as part of this PR [convergence investigation notes](https://github.com/microsoft/onnxruntime/blob/pengwa/conv_tool/docs/ORTModule_Convergence_Notes.md) Based on the generated merged files, we can compare them with tools. ![image](https://user-images.githubusercontent.com/10530022/224653929-4e4480bd-bb02-4bbe-bd44-2672bdf91a87.png) ### Design and Implementation This PR introduced a common mechanism registering custom logic for nn.Module's post forward hooks. And statistics for activation (StatisticsSubscriber) is one of the implementations. If there is other needs, we can define another XXSubscriber to do the customized things.	2023-03-23 20:34:24 +08:00
George Wu	289f7dbcdd	enable pybind for qnn ep (#14897 ) enable python bindings for QNN EP. tested on Windows Dev Kit 2023 (ARM64) with python 3.11 (ARM64) from https://www.python.org/ftp/python/3.11.1/python-3.11.1-arm64.exe	2023-03-03 07:26:53 -08:00
Tianlei Wu	742658d171	Stable Diffusion CUDA optimizations Part 2 (#14597 ) ### Description This is a follow-up of https://github.com/microsoft/onnxruntime/pull/14428 for Stable Diffusion CUDA optimizations: (1) use NchwConv to replace Conv in onnx graph and add Tranpose nodes accordingly (2) reduce sequential Transpose nodes to at most one. (3) symbolic shape infer of NchwConv (4) fix add bias transpose which causes CUDA error (launching more than 1024 threads per block) in inferencing fp32 model. (5) add models (bert, bart, stable_diffusion subdirectories) to package; (6) remove option --disable_channels_last Note that (1) We can add a few graph transformations to reduce Transpose nodes further. It is not done in this PR due to time limit. (2) Stable diffusion 2.1 model outputs black images. It seems that forcing Attention to float32 could avoid the issue. However it is much slow to use float32 Attention. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-02-07 07:49:15 -08:00
Scott McKay	7f374f4012	Fix build error on Windows if Python debug libraries are installed (#14308 ) ### Description <!-- Describe your changes. --> If a user installs the debug libraries from Python on Windows the ORT python project file attempts to use the debug python lib, which conflicts with a pragma in pyconfig.h that wants the release lib (due to pybind11 undefining _DEBUG). Explicitly use the release lib instead of Python::Module so the build doesn't break. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix obtuse build break.	2023-01-17 09:48:26 +10:00
RandySheriffH	83ad562826	Rename CloudEP to AzureEP (#14175 ) Rename CloudEP to AzureEP. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-11 12:25:04 -08:00
Ashwini Khade	68b5b2d7d3	Refactor training build options (#13964 ) ### Description 1. Renames all references of on device training to training apis. This is to keep the naming general. Nothing really prevents us from using the same apis on servers\non-edge devices. 2. Update ENABLE_TRAINING option: With this PR when this option is enabled, training apis and torch interop is also enabled. 3. Refactoring for onnxruntime_ENABLE_TRAINING_TORCH_INTEROP option: - Removed user facing option - Setting onnxruntime_ENABLE_TRAINING_TORCH_INTEROP to ON when onnxruntime_ENABLE_TRAINING is ON as we always build with torch interop. Once this PR is merged when --enable_training is selected we will do a "FULL Build" for training (with all the training entry points and features). Training entry points include: 1. ORTModule 2. Training APIs Features include: 1. ATen Fallback 2. All Training OPs includes communication and collectives 3. Strided Tensor Support 4. Python Op (torch interop) 5. ONNXBlock (Front end tools for training artifacts prep when using trianing apis) ### Motivation and Context Intention is to simply the options for building training enabled builds. This is part of the larger work item to create dedicated build for learning on the edge scenarios with just training apis enabled.	2023-01-03 13:28:16 -08:00
RandySheriffH	587e891cae	CloudEP (#13855 ) Implement CloudEP for hybrid inferencing. The PR introduces zero new API, customers could configure session and run options to do inferencing with Azure [triton endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint) Sample configuration in python be like: ``` sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton'); sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com'); sess_opt.add_session_config_entry('cloud.model_name', 'detection2'); sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1 sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose ... run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint. run_opt.add_run_config_entry('cloud.auth_key', '...') ... sess.run(None, {'input':input_}, run_opt) ``` Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-03 10:03:15 -08:00
Changming Sun	04900f96c1	Improve dependency management (#13523 ) ## Description 1. Convert some git submodules to cmake external projects 2. Update nsync from [1.23.0](https://github.com/google/nsync/releases/tag/1.23.0) to [1.25.0](https://github.com/google/nsync/releases/tag/1.25.0) 3. Update re2 from 2021-06-01 to 2022-06-01 4. Update wil from an old commit to 1.0.220914.1 tag 5. Update gtest to a newer commit so that it can optionally leverage absl/re2 for parsing command line flags. The following git submodules are deleted: 1. FP16 2. safeint 3. XNNPACK 4. cxxopts 5. dlpack 7. flatbuffers 8. googlebenchmark 9. json 10. mimalloc 11. mp11 12. pthreadpool More will come. ## Motivation and Context There are 3 ways of integrating 3rd party C/C++ libraries into ONNX Runtime: 1. Install them to a system location, then use cmake's find_package module to locate them. 2. Use git submodules 6. Use cmake's external projects(externalproject_add). At first when this project was just started, we considered both option 2 and option 3. We preferred option 2 because: 1. It's easier to handle authentication. At first this project was not open source, and it had some other non-public dependencies. If we use git submodule, ADO will handle authentication smoothly. Otherwise we need to manually pass tokens around and be very careful on not exposing them in build logs. 2. At that time, cmake fetched dependencies after "cmake" finished generating vcprojects/makefiles. So it was very difficult to make cflags consistent. Since cmake 3.11, it has a new command: FetchContent, which fetches dependencies when it generates vcprojects/makefiles just before add_subdirectories, so the parent project's variables/settings can be easily passed to the child projects. And when the project went on, we had some new concerns: 1. As we started to have more and more EPs and build configs, the number of submodules grew quickly. For more developers, most ORT submodules are not relevant to them. They shouldn't need to download all of them. 2. It is impossible to let two different build configs use two different versions of the same dependency. For example, right now we have protobuf 3.18.3 in the submodules. Then every EP must use the same version. Whenever we have a need to upgrade protobuf, we need to coordinate across the whole team and many external developers. I can't manage it anymore. 3. Some projects want to manage the dependencies in a different way, either because of their preference or because of compliance requirements. For example, some Microsoft teams want to use vcpkg, but we don't want to force every user of onnxruntime using vcpkg. 7. Someone wants to dynamically link to protobuf, but our build script only does static link. 8. Hard to handle security vulnerabilities. For example, whenever protobuf has a security patch, we have a lot of things to do. But if we allowed people to build ORT with a different version of protobuf without changing ORT"s source code, the customer who build ORT from source will be able to act on such things in a quicker way. They will not need to wait ORT having a patch release. 9. Every time we do a release, github will also publish a source file zip file and a source file tarball for us. But they are not usable, because they miss submodules. ### New features After this change, users will be able to: 1. Build the dependencies in the way they want, then install them to somewhere(for example, /usr or a temp folder). 2. Or download the dependencies by using cmake commands from these dependencies official website 3. Similar to the above, but use your private mirrors to migrate supply chain risks. 4. Use different versions of the dependencies, as long as our source code is compatible with them. For example, you may use you can't use protobuf 3.20.x as they need code changes in ONNX Runtime. 6. Only download the things the current build needs. 10. Avoid building external dependencies again and again in every build. ### Breaking change The onnxruntime_PREFER_SYSTEM_LIB build option is removed you could think from now it is default ON. If you don't like the new behavior, you can set FETCHCONTENT_TRY_FIND_PACKAGE_MODE to NEVER. Besides, for who relied on the onnxruntime_PREFER_SYSTEM_LIB build option, please be aware that this PR will change find_package calls from Module mode to Config mode. For example, in the past if you have installed protobuf from apt-get from ubuntu 20.04's official repo, find_package can find it and use it. But after this PR, it won't. This is because that protobuf version provided by Ubuntu 20.04 is too old to support the "config mode". It can be resolved by getting a newer version of protobuf from somewhere.	2022-12-01 09:51:59 -08:00
cloudhan	2de883c592	Update CK and fix performance issue on dev machine (#13531 ) 1. Update CK to its latest develop branch 2. `-mllvm -amdgpu-early-inline-all=true` is critical to CK's performance, ensure it is properly configured. - The flags are propagated from target `hip-lang::device`'s `INTERFACE_COMPILE_OPTIONS`, we must not manually add the flags. - Instead, we must ensure this target is properly configured by checking _CMAKE_HIP_DEVICE_RUNTIME_TARGET is set. TL,DR `hip-lang::device` sometime will be not be properly configured if our `CMAKE_PREFIX_PATH` is not configured carefully. In the CI docker, the configuration is in good state, but on dev machine it is not, which then silently result poor performance for kernels. We fixed it in this PR and add a guard to avoid unsuccessful future editing and to prevent convoluted debugging process. `_CMAKE_HIP_DEVICE_RUNTIME_TARGET ` is shared in `/opt/rocm/lib/cmake/hip-lang/hip-lang-config.cmake` and it is internal to [CMake](https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6121/diffs), the variable name will not be changed in the foreseeable future.	2022-11-03 19:32:30 +08:00
Wei-Sheng Chin	b5904c40dd	Enable ORT in TorchDynamo (#13259 ) This PR enables ORT to execute graphs captured by TorchDynamo. Major compilation code is in `OrtBackend.compile` in ort_backend.py. `register_backend.py` is for plugging `OrtBackend` into TorchDynamo as a compiler.	2022-11-01 11:19:29 -07:00
Wei-Sheng Chin	dc324b1d90	[LazyTensor] Make LORT Build Again with Latest PyTorch (#13303 ) `python setup.py develop` doesn't install PyTorch as a normal package in site-packages anymore, and the user must stay at PyTorch's root directory to call `import torch`. This will break LORT tests because LORT tests contains `import torch` and are called outside PyTorch root directory. To make PyTorch a normal package again, this PR build PyTorch with `python setup.py install`.	2022-10-13 13:56:17 -07:00
RandySheriffH	a83a9ed6b0	Remove miscellaneous nuphar configs (#13070 ) Remove a handful of nuphar related configurations after deprecation. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-26 13:41:28 -07:00

1 2 3 4 5

205 commits