onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

Author	SHA1	Message	Date
Ryan Hill	d5b606d50d	Remove now duplicated symbol (#16458 ) ### Description Change #16161 broke rocm by duplicating this symbol. This removes the duplicate to unblock the tests.	2023-06-23 09:21:03 -07:00
Chen Fu	5c125b4366	Cfu revertamx (#16455 ) ### Description This is to revert two PRs that aim at reducing AMX toolchain requirements. Unfortunately we still have some pipeline issues. https://github.com/microsoft/onnxruntime/pull/16390 https://github.com/microsoft/onnxruntime/pull/16086 ### Motivation and Context Looks like gcc link time optimization does not work very well with inline assembly in the above PRs.	2023-06-23 09:20:23 -07:00
Rachel Guo	04dbdc96bf	[js/rn] Fix React Native CI pipeline E2E test (#16447 ) ### Description <!-- Describe your changes. --> Based on this kindly provided quick fix: https://github.com/microsoft/onnxruntime/pull/16411 See more description in the above linked pr about bumping AGP version, etc. Also fixed import header file path in detox e2e test. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Good build: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1041757&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=9894c870-b8ce-548d-51ff-8f44d21a4117&l=18	2023-06-22 14:33:49 -07:00
Baiju Meswani	10ba1e270c	Minimal Build for On-Device Training (#16326 ) 🛠️ __Changes in this pull request:__ This pull request introduces two significant changes to the project: - Changing on device training checkpoint format: The current implementation stores the on device training checkpoint as a sequence of tensors in multiple files inside a checkpoint folder, which can be inefficient in terms of storage and performance. In this PR, I have modified the checkpoint format to utilize the flatbuffer table to save the checkpoint to a single file, providing a more compact and efficient representation. The changes around this are twofold: - Add the checkpoint flatbuffer schema that will generate the necessary checkpoint source files. - Update the checkpoint saving and loading functionality to use the new format. - Adding support for onnxruntime minimal build: To support scenarios where binary size is a constraint, I made changes to ensure that the training build can work well with the minimal build. 🔍 __Open Issues:__ - In order to extract the optimizer type, the existing implementation re-loaded the onnx optimizer model and parsed it. This is no longer possible, since the model format can either be onnx or ort. One idea is to do the same for ort format optimizer model. This needs some investigation. - Changes to the offline tooling to generate ort format training artifacts. - End-to-end training example showcasing the use of the minimal training build. - Add support for export model for inferencing in a minimal build.	2023-06-22 12:27:23 -07:00
dependabot[bot]	97f4484df9	Bump actions/setup-python from 3 to 4 (#16404 )	2023-06-22 18:12:11 +00:00
Yi Zhang	8e8840f1de	Enable Web CI on Linux (#16419 ) ### Description 1. Enable Web ci on Linux ### Motivation and Context 1. speed up web ci, the duration can be reduced from 160 minutes to 130 minutes, a time saving of 20% could be be achieved. The total computation time is 455 minutes now. Moved to Linux, it could be reduced to 336 minutes. 2. It's the first step to enable compilation cache for emscripten 3. per Yulong's request, build_web stages are still using windows pool ![image](https://github.com/microsoft/onnxruntime/assets/16190118/c9496408-74bd-45ea-b4ae-a4dd2a574d17) https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1038382&view=results	2023-06-22 15:42:58 +08:00
Pranav Sharma	a270d8407e	Allow saving of large models after optimization (github issue 12882) (#16440 ) ### Description Allow saving of large models after optimization. ### Motivation and Context Addresses https://github.com/microsoft/onnxruntime/issues/12882	2023-06-21 22:46:26 -07:00
Yufeng Li	89f8f20a61	fix protobuf copyfrom 2G limit (#16422 ) ### Description <!-- Describe your changes. --> protobuf CopyFrom doesn't work for model > 2GB for version 4.23. This PR removes the copy for Calibrator. Currently Calibrator copies the ModelProto to avoid changing it. The reason is that: quantize_static passes a ModelProto to Calibrator to calibrate quantitation parameters, and then use it for quantization. If calibrator changes the ModelProto, quantizaiton won't work. This PR changes quantize_static to pass in a model path to Calibrator instead of a ModelProto, and make Calibrator only take in model path as input, which is how it is used in most cases. This PR also remove the optimization from quantization. User needs to call pre-process to optimize the model	2023-06-21 20:45:11 -07:00
kunal-vaishnavi	4b69226fca	Fix input typo in Whisper export with beam search (#16439 ) ### Description This PR fixes a typo with assigning the `repetition_penalty` input in the Whisper export with beam search model. It is a follow-up to the [export stabilization PR](https://github.com/microsoft/onnxruntime/pull/16297). ### Motivation and Context The `repetition_penalty` input should be set to `repetition_penalty` instead of `input_features`.	2023-06-21 18:59:11 -07:00
Baiju Meswani	42489a8a24	Add ability to create ort format models from training offline utility (#16360 )	2023-06-21 18:51:43 -07:00
yf711	0ad0d6ebbf	Unblock Linux MultiGPU TensorRT CI (#16446 ) ### Description Revert docker base image to nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e ### Motivation and Context The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider tests (these three tests have higher error which are above the tolerance) That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor that make maxwell GPU generator higher error. CIs with T4 GPU are not affected.	2023-06-21 17:15:39 -07:00
Yulong Wang	de476c8075	[js/web] update webgl context creating (#16436 ) ### Description Modify the creating of webgl context. Previous behavior: STEP.1 - create canvas (document.createElement), if failed, goto step.2 else step.3 STEP.2 - create offscreenCanvas, if failed abort STEP.3 - use the canvas created in step.1 or 2 to create webgl context. if successful return context else abort Now bahavior: STEP.1 create offscreenCanvas, if failed goto step.3 STEP.2 use it to create webgl context. if successful, return context STEP.3 create canvas (document.createElement). if failed, abort STEP.4 use it to create webgl context. if successful, return context else abort Motivation: we found in some environment, normalCanvas.getContext() returns null but offscreenCanvas.getContext() returns the context object. and when offscreenCanvas is available it is good idea to always prefer to use it.	2023-06-21 17:10:26 -07:00
pallavides	3c2d52a995	DORT support for custom ops (#16392 ) ### Description DORT support for custom ops ### Motivation and Context Custom ops registered via custom_ops shared_library cannot be run using DORT atm. This PR enables it using: 1. registering custom_ops supported in DORT 2. plumbing down session_options from OrtBackend when creating the InferenceSession, that were used to register the custom_ops shared library using `session_options.register_custom_ops_library(shared_library)`	2023-06-21 15:58:05 -07:00
Yulong Wang	da532f3f5a	[js/webgpu] fix GPU to GPU memcpy (#16393 ) ### Description Fixes a GPU to GPU memory copy bug which causes #16267	2023-06-21 15:50:08 -07:00
Tianlei Wu	52e2bdf541	Add license header to CUDA related files (#16437 ) Add license header for files under core/providers/cuda or contrib_ops/cuda/	2023-06-21 13:31:43 -07:00
RandySheriffH	6e29e185f3	Clean AzureEP logics (#16367 ) Moving out AzureEP invokers out of core runtime. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-06-21 09:38:52 -07:00
Chi Lo	4e3cff60fd	CUDA graph support for TRT EP (#16081 ) CUDA EP already supports [CUDA graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs), also we observed some models can benefit from using CUDA graph with `trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP. The implementation is based on https://github.com/microsoft/onnxruntime/pull/9978 with the same [constraints](https://github.com/microsoft/onnxruntime/pull/9978) as below: - Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. - Usage of CUDA Graphs is limited to models where-in all the model ops (graph nodes) can be partitioned to the TRT EP. - The input/output types of models need to be tensors. - Shapes of inputs/outputs cannot change across inference calls. - IObinding is required.	2023-06-21 09:36:45 -07:00
Rachel Guo	961fa7274a	[NNAPI doc] add reducemean to supported op list (#16414 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-21 00:29:20 -07:00
Rachel Guo	b4b126ffb0	Set onnxruntime-c local pod path environment variable for react native e2e tests on ci (#16431 ) ### Description <!-- Describe your changes. --> Set onnxruntime-c local pod path environment variable for react native e2e tests on react-native-ci.yml ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previously the E2E test project is not properly consuming a local built onnxruntime-c version pod. https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816	2023-06-21 00:27:36 -07:00
yf711	9a80955c45	Add compute capacity to trtep engine cache file (#16356 ) ### Description Add "_smXX" to trtep engine cache file name, which "sm" stands for "Streaming Multiprocessor". > The GPU compute capability version is prefixed with "SM" because NVIDIA typically improves and updates the SM in each new GPU architecture. ### Motivation and Context Github issue: https://github.com/microsoft/onnxruntime/issues/15982 Reduce the chance of misusing incompatible engine cache, when user is switching GPU devices with different compute capacity * The prevention can't be 100%, as model size & GPU memory size could be another factor to make cache incompatible	2023-06-20 23:34:24 -07:00
zesongw	64b22cd00f	[WebNN EP] Support Where Op (#16380 ) ### Description Add Where Op for WebNN EP as ternary conditional operator. --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2023-06-20 16:45:40 -07:00
Justin Chu	e2381c42f2	Use M_PI to replace 3.14 constants (#16421 ) ### Description Use M_PI to replace 3.14 constants ### Motivation and Context Fixes #16413	2023-06-20 15:09:10 -07:00
Yuhong Guo	48e6186b1a	Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161 ) ### Description <!-- Describe your changes. --> 1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all CUDA UTs through this ut lib without affecting production lib `onnxruntime_providers_cuda`. 2. Move all test cases from `core/providers/cuda/test/` to `test/providers/cuda/`. These test cases are built into lib `onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all --gtest_filter="CUDA_EP_Unittest"`. Since the lib is only for test, we can use gtest macros in the test cases. Previous implementation do not support using gtest lib in the CUDA UT cases. 3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a bit. A new function `onnxruntime_add_object_library` is to build a object target. The 2 libs `onnxruntime_providers_cuda_ut` & `onnxruntime_providers_cuda` share most of the code, so the object files can be used in both libs, which helps reduce build time. Another function `config_cuda_provider_shared_module` is used to configure all 3 similar targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut). 4. Refactored the test to call `testing::InitGoogleTest` & `RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`. After this change, we can see all the cases running in `CUDA_EP_Unittest.All`: ![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> After https://github.com/microsoft/onnxruntime/pull/13016, there are still test files in test/providers/cuda/ that are not moved to core/providers/cuda/test/ and the test cases are disabled. This PR helps to clean the unfinished TODOs. Even through onnxruntime_shared_lib_test covers some test for CUDA provider. onnxruntime_shared_lib_test works like a coarse grain end-to-end test for CUDA provider. If CUDA unittest can run cases for a single component, this wound be helpful for CUDA developers. --------- Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-06-20 14:54:55 -07:00
Aung T Naing	e83993bbaf	Added MatMul tests for QNN EP (#15956 ) ### Description <!-- Describe your changes. --> Added test coverage for QNN EP MatMul op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Created test coverage for HTP based MatMul with broadcasting. --------- Co-authored-by: Hector Li <hecli@microsoft.com>	2023-06-20 13:58:56 -07:00
Sheil Kumar	7b51a1b17d	Enable Microsoft.AI.MachineLearning NuGet with WinUI projects (#16415 ) Microsoft.AI.MachineLearning NuGet fails to build on WinUI projects due to the conflict between the ReferenceCopy of binaries that occurs with managed applications, with the manual binplacement that occurs with native appliactions. With WinUI, both cases are triggered, and a duplicate binplace is detected as an error. Fix: Don't rely on the ReferenceCopy for WinUI applications, and manually binplace the Microsoft.AI.MachineLearning dll.	2023-06-20 13:10:19 -07:00
Wei-Sheng Chin	c8de3eaac6	Update DORT to follow PyTorch changes (#16394 ) Fix #16355. The root cause change in PyTorch is [#103302](https://github.com/pytorch/pytorch/pull/103302), which seem blocking calling make_fx inside a dynamo backend. Changes: 1. Move decomposition to `register_backend.py`, so we don't have to call `make_fx` inside DORT, which triggers a bunch of new exceptions. 2. Remove shape inference based on FakeTensorProp since the FX graph received from dynamo contains all shapes now. 3. Fix a macro bug so that DORT can build without CUDA. Before (3), ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) virtual PhiloxGenerator& PhiloxGenerator__Default() = 0; #ifdef ENABLE_TRAINING_TORCH_INTEROP ... #endif #endif ``` After (3), ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) virtual PhiloxGenerator& PhiloxGenerator__Default() = 0; #endif #ifdef ENABLE_TRAINING_TORCH_INTEROP ... #endif ``` The later one looks better since the `ENABLE_TRAINING_TORCH_INTEROP` is for Python bridge code, not for random-number-generating kernels `PhiloxGenerator`.	2023-06-20 12:06:50 -07:00
Rachel Guo	30bb0959dc	[NNAPI EP] Add ReduceMean Op support (#16294 ) ### Description <!-- Describe your changes. --> As title. Special cases for ReduceMean: [UPDATE] The following cases are supported now by converting to providing an input with all axes for NNAPI. Behaviors when axes is not provided or axes provided as an empty vector: For ReduceMean Opset version 18: - Support case `axes` is provided as empty with `noop_with_empty_axes` set to true. - Support case `axes` is not provided with `noop_with_empty_axes` set to true. All treat as identity op. - Does not support the case when `axes` is not provided/provided as empty but `noop_with_empty_axes` is set to false. For ReduceMean OpSet Version 13-: - Does not support when `axes` attribute is not provided. (as onnx treats it as default behavior to reduce all dimensions, and the case is not implemented by NNAPI.) https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaabbe492c60331b13038e39d4207940e0a047fe95a35b27f45c05432b6ca18eb6c > 1: A 1-D Tensor of [ANEURALNETWORKS_TENSOR_INT32](https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaf06d1affd33f3bc698d0c04eceb23298ac34965d8e76ac5acfddf5acd9e40f896). The dimensions to reduce. Must be in the range [-rank(input_tensor), rank(input_tensor)).NOTE: When the operation was introduced, the documentation incorrectly stated that if dimensions were empty, the operation would reduce across all dimensions. This behavior was never implemented. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixes issue #16194 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-06-20 11:09:00 -07:00
Yufeng Li	d190db7fcd	Update default external_data_location for pre-process of quantization (#16399 ) external_data_location should be a string/file_name to indicate the file name of external data instead of a directory	2023-06-20 09:37:17 -07:00
Yulong Wang	b8917ad84f	[js/web] fix nodejs detection (#16400 ) ### Description We used to use `typeof fetch === 'undefined'` as condition to detect the environment is Node.js or not. Before Node.js v18, this works. However, in Node.js v18, it introduced `fetch` function, so this check does not work any more. This PR changes the condition to check whether `process`, `process.versions` and `process.versions.node` exists. Checking whether `process` exists is not enough. This is because in some configuration, webpack may polyfill nodejs's process.	2023-06-20 00:20:58 -07:00
Prateek Chokse	12dffef768	added support for cmake "find_package" (#8919 ) Description: Adds support for cmake find_package. Motivation and Context As mentioned in issue #7150 onnxruntime doesn't have support for CMake find_package, this PR adds that and also adds the CMake package version file. Now anyone can link onnxruntime like this: ```cmake find_package(onnxruntime) add_executable(test Source.cpp) target_link_libraries(test PRIVATE onnxruntime::onnxruntime) ``` this also simplifies #3124	2023-06-19 22:20:31 -07:00
cao lei	dd72192cf4	ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833 ) ### Description This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice. By this change, EP level will shift the burden of maintaining allocators, which will be user friendly for EP developers --------- Co-authored-by: Lei Cao <leca@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-06-19 17:44:45 -07:00
jingyanwangms	5dcaf70501	Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375 ) ### Description torch.optim Adam zero_grad() signature is zero_grad(set_to_none=True) https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad We set this flag in initialization, similar to deepspeed: https://deepspeed.readthedocs.io/en/latest/optimizers.html#deepspeed.ops.adam.FusedAdam Adding this flag to have signature parity with pytorch Adam ### Motivation and Context Easier model integration Co-authored-by: Jingyan Wang <jingywa@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-06-19 17:27:41 -07:00
PeixuanZuo	470d6c1cce	[ROCm] Delete unused file to fix Component Governance Alert (#16407 ) Delete unused file to fix Component Governance Alert	2023-06-19 11:28:32 -07:00
guyang3532	341484e67c	Embedding sparsity optimization (#16141 ) ### Description Optimize compute graph by eliminating padding in embedding. ### Motivation and Context The computation for padding in nodes after embedding is unnecessary and waste computation resources. This pr just add an Optimizer of PaddingElimination to check and eliminate the padding after embedding automatically by modifying the graph. ### Implementation: 1. Find and check embedding node in graph. 2. Iterate the subgraph afterward the embedding node and record all the input nodes and output nodes to this subgraph. 3. Insert 'Reshape + ShrunkenGather' to flatten each input node shape from [batch_size, seqlen, ...] to [valid_token_without_padding, ...], and insert 'GatherGrad + Reshape' to unflatten each output node shape from [valid_token_without_padding, ...] to [batch_size, seqlen, ...] --------- Co-authored-by: mindest <linminuser@gmail.com>	2023-06-19 20:34:53 +08:00
PeixuanZuo	1418d8728c	[ROCm] Fix CI Pipeline (#16409 ) 1. add `set -ex` before commands. 2. update ccache.	2023-06-19 15:22:13 +08:00
Yi Zhang	8b9eab093b	keep symlinks in maven package (#16376 ) ### Description 1. Keep symlink in the package. 2. keep the artifact package format ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-19 09:41:39 +08:00
Dipanjan Sengupta	35fa6af428	Fix for the build break in AMX feature on Mac OS. (#16390 ) ### Description Fixing the build break issue in Apple pipeline due to AMX flag removal.	2023-06-16 21:00:41 -07:00
Scott McKay	8fdfd20191	Separate out operator vs model testing. (#16228 ) ### Description <!-- Describe your changes. --> Split up OpTester to separate out operator vs model testing. This led to a lot of other cleanups/refactoring. - create BaseTester class and derived OpTester/ModelTester classes to limit APIs to what is applicable for each test type - e.g. adding an attribute isn't relevant to a model test - cleanup structure - don't expose member variables either directly or via public methods returning them - split out checkers so they can be easily re-used - refactor so there's one public Check method for comparing two OrtValue instances containing any data type - refactor the GradientOpTester usage - it required a lot of OpTester internals to be exposed and no other tests needed this - it also returned Status through various parts which prevented the usage of the google test macros which provide better output. change to return void and use the macros. - fix some other minor issues - update some cmake files so all the source files are included - remove some low value helpers (FetchTensor and GetShapeVector) - remove some outdated code to allow unreleased opset versions from when onnx opset 15 wasn't released - move files from test/util/include/test to test/util/include - doesn't seem to be any reason for the additional subdirectory given they're not files use to test the code in test/util - files were moved with no changes ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup test infrastructure. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-06-17 12:58:57 +10:00
saurabh	a6ce7b339f	Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147 ) ### Description This PR enables execution of subgraphs in OVEP and currently, when OVEP developers install the onnxruntime-openvino package on windows from pypi, they would have to additionally download OpenVINO windows binaries and run the setupvars.bat script which sets the environment PATH to locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer sample. ### Motivation and Context Fix: We want to make the user experience easy for OVEP Python developers on windows platform. This fix, introduces a function add_openvino_libs_to_path at the location tools/python/util/add_openvino_win_libs.py. The above function, can be called by OVEP python users in the application code and that takes care of setting the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino) which was installed. This change also makes sure that add_openvino_libs_to_path() function is added to onnxruntime python package only when it is build for OpenVINO Execution Provider for ONNXRuntime and not for default ORT python package builds. New user experience for Python OVEP developers on windows platform: step 1: pip install onnxruntime-openvino step 2: pip install openvino step 3: <Add these 2 lines in the application code> import onnxruntime.tools.add_openvino_win_libs as utils utils.add_openvino_libs_to_path() --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2023-06-16 19:47:09 -07:00
kunal-vaishnavi	3f7f90aed0	Stabilize Whisper export with beam search (#16297 ) ### Description This PR stabilizes the Whisper export with beam search by adding the following: - Remove unused ONNX models and extra folders generated during the export process - Specify the Whisper with beam search model's IR version for E2E integration - Parity check for Whisper with beam search model between PyTorch and ORT - Remove previously exported Whisper with beam search model before saving newly exported model ### Motivation and Context - Removing the unused ONNX models and extra folders frees up disk space after exporting and makes it easier to copy and move the output folder to other environments. - Specifying the IR version fixes an issue with generating the ONNX E2E model - Adding a parity check helps detect runtime issues during the export process - Removing the previously exported Whisper with beam search model prevents the data file size from doubling when the newly exported model is saved with the same filename	2023-06-16 18:56:52 -07:00
dependabot[bot]	dd660c054e	Bump transformers from 4.24.0 to 4.30.0 in /tools/ci_build (#16331 )	2023-06-16 13:08:46 -07:00
zesongw	d813d991b1	[WebNN EP] Support Squeeze Op (#16361 ) ### Description <!-- Describe your changes. --> Adds support for the Squeeze Op to WebNN EP. It shares the similar parameters as Unsqueeze, so they are merged. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable more models to run on WebNN EP. --------- Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>	2023-06-16 11:18:58 -07:00
Chi Lo	fbf08c4b4d	Fix minor TRT EP provider option issue (#16107 ) Several TRT EP provider options are not included when calling OrtApis::GetTensorRTProviderOptionsAsString(). This issue doesn't affect TRT EP, but when user calling above api to get all the provider options will find some provider options not included in the string.	2023-06-16 10:07:40 -07:00
Silvio Traversaro	4915191e63	Fix build of Python wheel on Windows with single-config generator (#16337 ) ### Description Before this PR, the CMake code assumed that when on Windows a multiple-config CMake generator was used, while on non-Windows there was the assumption of a single-config CMake generator. After this PR this information is obtained from the [`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html) global CMake propery. ### Motivation and Context I discovered this problem when building with Ninja generator on Windows, but I guess this should fix problems also on non-Windows platforms when using a multiple-config generator (such as Xcode on macOS or "Ninja Multi-Config" on all platforms). See https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html for more info.	2023-06-16 09:17:49 -07:00
Jhen-Jie Hong	685816bb0a	[js/rn] Add executionProviders support (#16233 ) ### Description <!-- Describe your changes. --> This PR adds support for `executionProviders` option for react-native package, support: - Android: cpu / xnnpack / nnapi - iOS: cpu / xnnpack / coreml ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In my case I want to enable Core ML / NNAPI EP for react-native project.	2023-06-16 19:38:41 +10:00
Jhen-Jie Hong	ea1a5cf920	[js/rn] Implement blob exchange by JSI instead of use base64 (#16094 ) ### Description <!-- Describe your changes. --> - Create `OnnxruntimeJSIHelper` native module to provide two JSI functions - `jsiOnnxruntimeStoreArrayBuffer`: Store buffer in Blob Manager & return blob object (iOS: RCTBlobManager, Android: BlobModule) - `jsiOnnxruntimeResolveArrayBuffer`: Use blob object to get buffer - The part of implementation is reference to [react-native-blob-jsi-helper](https://github.com/mrousavy/react-native-blob-jsi-helper) - Replace base64 encode/decode - `loadModelFromBlob`: Rename from `loadModelFromBase64EncodedBuffer` - `run`: Use blob object to replace input.data & results[].data For [this context](https://github.com/microsoft/onnxruntime/issues/16031#issuecomment-1556527812), it saved a lot of time and avoid JS thread blocking in decode return type, it is 3700ms -> 5~20ms for the case. (resolve function only takes 0.x ms) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It’s related to #16031, but not a full implementation for migrate to JSI. It just uses JSI through BlobManager to replace the slow part (base64 encode / decode). Rewriting it entirely in JSI could be complicated, like type convertion and threading. This PR might be considered a minor change. /cc @skottmckay	2023-06-16 19:37:02 +10:00
cloudhan	9110e5b9bd	[ROCm] Add attention kv cache for decoding (#16076 )	2023-06-16 14:17:56 +08:00
Tianlei Wu	96471491d7	Fix test failure in debug CUDA build (#16370 ) Fix assertion failure in onnxruntime_test_all in debug build with CUDA, which is caused by a test case added in https://github.com/microsoft/onnxruntime/pull/16075. Remove an assumption that bias exists in MultiHeadAttention.	2023-06-15 23:16:16 -07:00
Tianlei Wu	1866a9d818	Use the lowest float for causal mask (#16369 ) Always set causal mask to the lowest float. Note that since huggingface transformers v4.21, gpt2 uses lowest half for FP16, and lowest float for FP32: `66fd3a8d62/src/transformers/models/gpt2/modeling_gpt2.py (L199)` Assume that most fp16 ONNX models are converted from fp32 models. We decided to use lowest float32 for both half and float model for consistency. The mask_filter_value only applies to raw attention mask (2D, 3D or 4D). For 1D mask, masked item is 0.0 after softmax so mask filter value is the lowest float for 1D mask. * For BERT model, when users use 1D mask (required by FMHA) and mask_filter_value is not applicable. * For BERT or GPT-2, when fused kernel is used, mask_filter_value has no impact ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/12843 https://github.com/microsoft/onnxruntime/issues/14363	2023-06-15 21:32:29 -07:00
PeixuanZuo	bcdb81c563	[Whisper] add a fusion option to split input bias from MHA/DMHA (#16049 ) Whsiper model contains five different types of attention, q, k, v bias was fused into Attention/MHA/DMHA op, encoderdecoderinit subgraph - Attention: encoder attention - Attention: decoder self attention + present k, v - MultiHeadAttention: decoder cross attention + present k and v. q and v have bias. decoder subgraph - DecoderMultiHeadAttention: decoder cross attention + past k, v. q has bias - DecoderMultiHeadAttention: decoder self attention + past/present k, v. q, k, v have bias. For ROCm EP, MHA/DMHA doesn't support additional bias. This PR add a fusion option `disable_multi_head_attention_bias` to split q.k,v bias from MHA/DMHA.	2023-06-16 10:29:48 +08:00

1 2 3 4 5 ...

9025 commits