onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-04 04:07:22 +00:00

Author	SHA1	Message	Date
zesongw	64b22cd00f	[WebNN EP] Support Where Op (#16380 ) ### Description Add Where Op for WebNN EP as ternary conditional operator. --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2023-06-20 16:45:40 -07:00
Justin Chu	e2381c42f2	Use M_PI to replace 3.14 constants (#16421 ) ### Description Use M_PI to replace 3.14 constants ### Motivation and Context Fixes #16413	2023-06-20 15:09:10 -07:00
Yuhong Guo	48e6186b1a	Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161 ) ### Description <!-- Describe your changes. --> 1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all CUDA UTs through this ut lib without affecting production lib `onnxruntime_providers_cuda`. 2. Move all test cases from `core/providers/cuda/test/` to `test/providers/cuda/`. These test cases are built into lib `onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all --gtest_filter="CUDA_EP_Unittest"`. Since the lib is only for test, we can use gtest macros in the test cases. Previous implementation do not support using gtest lib in the CUDA UT cases. 3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a bit. A new function `onnxruntime_add_object_library` is to build a object target. The 2 libs `onnxruntime_providers_cuda_ut` & `onnxruntime_providers_cuda` share most of the code, so the object files can be used in both libs, which helps reduce build time. Another function `config_cuda_provider_shared_module` is used to configure all 3 similar targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut). 4. Refactored the test to call `testing::InitGoogleTest` & `RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`. After this change, we can see all the cases running in `CUDA_EP_Unittest.All`: ![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> After https://github.com/microsoft/onnxruntime/pull/13016, there are still test files in test/providers/cuda/ that are not moved to core/providers/cuda/test/ and the test cases are disabled. This PR helps to clean the unfinished TODOs. Even through onnxruntime_shared_lib_test covers some test for CUDA provider. onnxruntime_shared_lib_test works like a coarse grain end-to-end test for CUDA provider. If CUDA unittest can run cases for a single component, this wound be helpful for CUDA developers. --------- Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-06-20 14:54:55 -07:00
Aung T Naing	e83993bbaf	Added MatMul tests for QNN EP (#15956 ) ### Description <!-- Describe your changes. --> Added test coverage for QNN EP MatMul op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Created test coverage for HTP based MatMul with broadcasting. --------- Co-authored-by: Hector Li <hecli@microsoft.com>	2023-06-20 13:58:56 -07:00
Sheil Kumar	7b51a1b17d	Enable Microsoft.AI.MachineLearning NuGet with WinUI projects (#16415 ) Microsoft.AI.MachineLearning NuGet fails to build on WinUI projects due to the conflict between the ReferenceCopy of binaries that occurs with managed applications, with the manual binplacement that occurs with native appliactions. With WinUI, both cases are triggered, and a duplicate binplace is detected as an error. Fix: Don't rely on the ReferenceCopy for WinUI applications, and manually binplace the Microsoft.AI.MachineLearning dll.	2023-06-20 13:10:19 -07:00
Wei-Sheng Chin	c8de3eaac6	Update DORT to follow PyTorch changes (#16394 ) Fix #16355. The root cause change in PyTorch is [#103302](https://github.com/pytorch/pytorch/pull/103302), which seem blocking calling make_fx inside a dynamo backend. Changes: 1. Move decomposition to `register_backend.py`, so we don't have to call `make_fx` inside DORT, which triggers a bunch of new exceptions. 2. Remove shape inference based on FakeTensorProp since the FX graph received from dynamo contains all shapes now. 3. Fix a macro bug so that DORT can build without CUDA. Before (3), ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) virtual PhiloxGenerator& PhiloxGenerator__Default() = 0; #ifdef ENABLE_TRAINING_TORCH_INTEROP ... #endif #endif ``` After (3), ``` #if defined(USE_CUDA) \|\| defined(USE_ROCM) virtual PhiloxGenerator& PhiloxGenerator__Default() = 0; #endif #ifdef ENABLE_TRAINING_TORCH_INTEROP ... #endif ``` The later one looks better since the `ENABLE_TRAINING_TORCH_INTEROP` is for Python bridge code, not for random-number-generating kernels `PhiloxGenerator`.	2023-06-20 12:06:50 -07:00
Rachel Guo	30bb0959dc	[NNAPI EP] Add ReduceMean Op support (#16294 ) ### Description <!-- Describe your changes. --> As title. Special cases for ReduceMean: [UPDATE] The following cases are supported now by converting to providing an input with all axes for NNAPI. Behaviors when axes is not provided or axes provided as an empty vector: For ReduceMean Opset version 18: - Support case `axes` is provided as empty with `noop_with_empty_axes` set to true. - Support case `axes` is not provided with `noop_with_empty_axes` set to true. All treat as identity op. - Does not support the case when `axes` is not provided/provided as empty but `noop_with_empty_axes` is set to false. For ReduceMean OpSet Version 13-: - Does not support when `axes` attribute is not provided. (as onnx treats it as default behavior to reduce all dimensions, and the case is not implemented by NNAPI.) https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaabbe492c60331b13038e39d4207940e0a047fe95a35b27f45c05432b6ca18eb6c > 1: A 1-D Tensor of [ANEURALNETWORKS_TENSOR_INT32](https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaf06d1affd33f3bc698d0c04eceb23298ac34965d8e76ac5acfddf5acd9e40f896). The dimensions to reduce. Must be in the range [-rank(input_tensor), rank(input_tensor)).NOTE: When the operation was introduced, the documentation incorrectly stated that if dimensions were empty, the operation would reduce across all dimensions. This behavior was never implemented. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixes issue #16194 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-06-20 11:09:00 -07:00
Yufeng Li	d190db7fcd	Update default external_data_location for pre-process of quantization (#16399 ) external_data_location should be a string/file_name to indicate the file name of external data instead of a directory	2023-06-20 09:37:17 -07:00
Yulong Wang	b8917ad84f	[js/web] fix nodejs detection (#16400 ) ### Description We used to use `typeof fetch === 'undefined'` as condition to detect the environment is Node.js or not. Before Node.js v18, this works. However, in Node.js v18, it introduced `fetch` function, so this check does not work any more. This PR changes the condition to check whether `process`, `process.versions` and `process.versions.node` exists. Checking whether `process` exists is not enough. This is because in some configuration, webpack may polyfill nodejs's process.	2023-06-20 00:20:58 -07:00
Prateek Chokse	12dffef768	added support for cmake "find_package" (#8919 ) Description: Adds support for cmake find_package. Motivation and Context As mentioned in issue #7150 onnxruntime doesn't have support for CMake find_package, this PR adds that and also adds the CMake package version file. Now anyone can link onnxruntime like this: ```cmake find_package(onnxruntime) add_executable(test Source.cpp) target_link_libraries(test PRIVATE onnxruntime::onnxruntime) ``` this also simplifies #3124	2023-06-19 22:20:31 -07:00
cao lei	dd72192cf4	ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833 ) ### Description This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This PR is to refactor ExecutionProvider API for memory management, which is to move allocators from EP level to SessionState level and indexed by OrtDevice. By this change, EP level will shift the burden of maintaining allocators, which will be user friendly for EP developers --------- Co-authored-by: Lei Cao <leca@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-06-19 17:44:45 -07:00
jingyanwangms	5dcaf70501	Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375 ) ### Description torch.optim Adam zero_grad() signature is zero_grad(set_to_none=True) https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad We set this flag in initialization, similar to deepspeed: https://deepspeed.readthedocs.io/en/latest/optimizers.html#deepspeed.ops.adam.FusedAdam Adding this flag to have signature parity with pytorch Adam ### Motivation and Context Easier model integration Co-authored-by: Jingyan Wang <jingywa@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-06-19 17:27:41 -07:00
PeixuanZuo	470d6c1cce	[ROCm] Delete unused file to fix Component Governance Alert (#16407 ) Delete unused file to fix Component Governance Alert	2023-06-19 11:28:32 -07:00
guyang3532	341484e67c	Embedding sparsity optimization (#16141 ) ### Description Optimize compute graph by eliminating padding in embedding. ### Motivation and Context The computation for padding in nodes after embedding is unnecessary and waste computation resources. This pr just add an Optimizer of PaddingElimination to check and eliminate the padding after embedding automatically by modifying the graph. ### Implementation: 1. Find and check embedding node in graph. 2. Iterate the subgraph afterward the embedding node and record all the input nodes and output nodes to this subgraph. 3. Insert 'Reshape + ShrunkenGather' to flatten each input node shape from [batch_size, seqlen, ...] to [valid_token_without_padding, ...], and insert 'GatherGrad + Reshape' to unflatten each output node shape from [valid_token_without_padding, ...] to [batch_size, seqlen, ...] --------- Co-authored-by: mindest <linminuser@gmail.com>	2023-06-19 20:34:53 +08:00
PeixuanZuo	1418d8728c	[ROCm] Fix CI Pipeline (#16409 ) 1. add `set -ex` before commands. 2. update ccache.	2023-06-19 15:22:13 +08:00
Yi Zhang	8b9eab093b	keep symlinks in maven package (#16376 ) ### Description 1. Keep symlink in the package. 2. keep the artifact package format ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-19 09:41:39 +08:00
Dipanjan Sengupta	35fa6af428	Fix for the build break in AMX feature on Mac OS. (#16390 ) ### Description Fixing the build break issue in Apple pipeline due to AMX flag removal.	2023-06-16 21:00:41 -07:00
Scott McKay	8fdfd20191	Separate out operator vs model testing. (#16228 ) ### Description <!-- Describe your changes. --> Split up OpTester to separate out operator vs model testing. This led to a lot of other cleanups/refactoring. - create BaseTester class and derived OpTester/ModelTester classes to limit APIs to what is applicable for each test type - e.g. adding an attribute isn't relevant to a model test - cleanup structure - don't expose member variables either directly or via public methods returning them - split out checkers so they can be easily re-used - refactor so there's one public Check method for comparing two OrtValue instances containing any data type - refactor the GradientOpTester usage - it required a lot of OpTester internals to be exposed and no other tests needed this - it also returned Status through various parts which prevented the usage of the google test macros which provide better output. change to return void and use the macros. - fix some other minor issues - update some cmake files so all the source files are included - remove some low value helpers (FetchTensor and GetShapeVector) - remove some outdated code to allow unreleased opset versions from when onnx opset 15 wasn't released - move files from test/util/include/test to test/util/include - doesn't seem to be any reason for the additional subdirectory given they're not files use to test the code in test/util - files were moved with no changes ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup test infrastructure. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-06-17 12:58:57 +10:00
saurabh	a6ce7b339f	Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147 ) ### Description This PR enables execution of subgraphs in OVEP and currently, when OVEP developers install the onnxruntime-openvino package on windows from pypi, they would have to additionally download OpenVINO windows binaries and run the setupvars.bat script which sets the environment PATH to locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer sample. ### Motivation and Context Fix: We want to make the user experience easy for OVEP Python developers on windows platform. This fix, introduces a function add_openvino_libs_to_path at the location tools/python/util/add_openvino_win_libs.py. The above function, can be called by OVEP python users in the application code and that takes care of setting the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino) which was installed. This change also makes sure that add_openvino_libs_to_path() function is added to onnxruntime python package only when it is build for OpenVINO Execution Provider for ONNXRuntime and not for default ORT python package builds. New user experience for Python OVEP developers on windows platform: step 1: pip install onnxruntime-openvino step 2: pip install openvino step 3: <Add these 2 lines in the application code> import onnxruntime.tools.add_openvino_win_libs as utils utils.add_openvino_libs_to_path() --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2023-06-16 19:47:09 -07:00
kunal-vaishnavi	3f7f90aed0	Stabilize Whisper export with beam search (#16297 ) ### Description This PR stabilizes the Whisper export with beam search by adding the following: - Remove unused ONNX models and extra folders generated during the export process - Specify the Whisper with beam search model's IR version for E2E integration - Parity check for Whisper with beam search model between PyTorch and ORT - Remove previously exported Whisper with beam search model before saving newly exported model ### Motivation and Context - Removing the unused ONNX models and extra folders frees up disk space after exporting and makes it easier to copy and move the output folder to other environments. - Specifying the IR version fixes an issue with generating the ONNX E2E model - Adding a parity check helps detect runtime issues during the export process - Removing the previously exported Whisper with beam search model prevents the data file size from doubling when the newly exported model is saved with the same filename	2023-06-16 18:56:52 -07:00
dependabot[bot]	dd660c054e	Bump transformers from 4.24.0 to 4.30.0 in /tools/ci_build (#16331 )	2023-06-16 13:08:46 -07:00
zesongw	d813d991b1	[WebNN EP] Support Squeeze Op (#16361 ) ### Description <!-- Describe your changes. --> Adds support for the Squeeze Op to WebNN EP. It shares the similar parameters as Unsqueeze, so they are merged. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable more models to run on WebNN EP. --------- Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>	2023-06-16 11:18:58 -07:00
Chi Lo	fbf08c4b4d	Fix minor TRT EP provider option issue (#16107 ) Several TRT EP provider options are not included when calling OrtApis::GetTensorRTProviderOptionsAsString(). This issue doesn't affect TRT EP, but when user calling above api to get all the provider options will find some provider options not included in the string.	2023-06-16 10:07:40 -07:00
Silvio Traversaro	4915191e63	Fix build of Python wheel on Windows with single-config generator (#16337 ) ### Description Before this PR, the CMake code assumed that when on Windows a multiple-config CMake generator was used, while on non-Windows there was the assumption of a single-config CMake generator. After this PR this information is obtained from the [`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html) global CMake propery. ### Motivation and Context I discovered this problem when building with Ninja generator on Windows, but I guess this should fix problems also on non-Windows platforms when using a multiple-config generator (such as Xcode on macOS or "Ninja Multi-Config" on all platforms). See https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html for more info.	2023-06-16 09:17:49 -07:00
Jhen-Jie Hong	685816bb0a	[js/rn] Add executionProviders support (#16233 ) ### Description <!-- Describe your changes. --> This PR adds support for `executionProviders` option for react-native package, support: - Android: cpu / xnnpack / nnapi - iOS: cpu / xnnpack / coreml ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In my case I want to enable Core ML / NNAPI EP for react-native project.	2023-06-16 19:38:41 +10:00
Jhen-Jie Hong	ea1a5cf920	[js/rn] Implement blob exchange by JSI instead of use base64 (#16094 ) ### Description <!-- Describe your changes. --> - Create `OnnxruntimeJSIHelper` native module to provide two JSI functions - `jsiOnnxruntimeStoreArrayBuffer`: Store buffer in Blob Manager & return blob object (iOS: RCTBlobManager, Android: BlobModule) - `jsiOnnxruntimeResolveArrayBuffer`: Use blob object to get buffer - The part of implementation is reference to [react-native-blob-jsi-helper](https://github.com/mrousavy/react-native-blob-jsi-helper) - Replace base64 encode/decode - `loadModelFromBlob`: Rename from `loadModelFromBase64EncodedBuffer` - `run`: Use blob object to replace input.data & results[].data For [this context](https://github.com/microsoft/onnxruntime/issues/16031#issuecomment-1556527812), it saved a lot of time and avoid JS thread blocking in decode return type, it is 3700ms -> 5~20ms for the case. (resolve function only takes 0.x ms) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It’s related to #16031, but not a full implementation for migrate to JSI. It just uses JSI through BlobManager to replace the slow part (base64 encode / decode). Rewriting it entirely in JSI could be complicated, like type convertion and threading. This PR might be considered a minor change. /cc @skottmckay	2023-06-16 19:37:02 +10:00
cloudhan	9110e5b9bd	[ROCm] Add attention kv cache for decoding (#16076 )	2023-06-16 14:17:56 +08:00
Tianlei Wu	96471491d7	Fix test failure in debug CUDA build (#16370 ) Fix assertion failure in onnxruntime_test_all in debug build with CUDA, which is caused by a test case added in https://github.com/microsoft/onnxruntime/pull/16075. Remove an assumption that bias exists in MultiHeadAttention.	2023-06-15 23:16:16 -07:00
Tianlei Wu	1866a9d818	Use the lowest float for causal mask (#16369 ) Always set causal mask to the lowest float. Note that since huggingface transformers v4.21, gpt2 uses lowest half for FP16, and lowest float for FP32: `66fd3a8d62/src/transformers/models/gpt2/modeling_gpt2.py (L199)` Assume that most fp16 ONNX models are converted from fp32 models. We decided to use lowest float32 for both half and float model for consistency. The mask_filter_value only applies to raw attention mask (2D, 3D or 4D). For 1D mask, masked item is 0.0 after softmax so mask filter value is the lowest float for 1D mask. * For BERT model, when users use 1D mask (required by FMHA) and mask_filter_value is not applicable. * For BERT or GPT-2, when fused kernel is used, mask_filter_value has no impact ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/12843 https://github.com/microsoft/onnxruntime/issues/14363	2023-06-15 21:32:29 -07:00
PeixuanZuo	bcdb81c563	[Whisper] add a fusion option to split input bias from MHA/DMHA (#16049 ) Whsiper model contains five different types of attention, q, k, v bias was fused into Attention/MHA/DMHA op, encoderdecoderinit subgraph - Attention: encoder attention - Attention: decoder self attention + present k, v - MultiHeadAttention: decoder cross attention + present k and v. q and v have bias. decoder subgraph - DecoderMultiHeadAttention: decoder cross attention + past k, v. q has bias - DecoderMultiHeadAttention: decoder self attention + past/present k, v. q, k, v have bias. For ROCm EP, MHA/DMHA doesn't support additional bias. This PR add a fusion option `disable_multi_head_attention_bias` to split q.k,v bias from MHA/DMHA.	2023-06-16 10:29:48 +08:00
Jeff Bloomfield	6949cfaf94	Fix MS domain QuantizeLinear and DequantizeLinear type registrations … (#16298 ) This fixes the type lists used to register DML kernels for Microsoft domain QuantizeLinear and DequantizeLinear. These previously did not include FP16 and incorrectly used the same type list for both operators. The new type lists are the same as opset 19 ONNX which aren't implemented yet in the DML EP.	2023-06-15 18:21:56 -07:00
Changming Sun	188d5f5398	Fix Linux Multi GPU build pipeline (#16368 ) ### Description The build pipeline runs on Azure NV12 machines that will be deprecated soon because the SKU is too old. So this PR will move the pipeline to a Windows machine with two A10 GPUs.	2023-06-15 16:24:46 -07:00
Changming Sun	5754cd7d1d	Add fp16 support to CPU EP gemm op (#15506 )	2023-06-15 14:38:17 -07:00
Skand Hurkat	67093b204d	Clean up aarch64 quantized GEMM dispatch (#16120 ) ### Description - Add a new field to `MLAS_PLATFORM` for S8S8 GEMM dispatch. - Set this field to either dot product instructions or NEON MLA in platform.cpp. - Clean up dispatch selector in qgemm.h. ### Motivation and Context This will allow future extensibility as other functions that use other ARM64 extensions for quantized matrix multiplication. --------- Co-authored-by: Skand Hurkat <skhurkat@microsoft.com>	2023-06-15 14:24:40 -07:00
Guenther Schmuelling	5c0d5768e7	make package.json more rebost (#16366 ) "default" should be last element for exports. This fixes "Module not found: Error: Default condition should be last one" when importing the onnxruntime-web package in some conditions.	2023-06-15 14:17:37 -07:00
Hariharan Seshadri	63f5573354	Relax node placement check for CUDA Graph usage (#16358 )	2023-06-15 14:03:08 -07:00
Dipanjan Sengupta	681a0d084d	Removing AMX build flag (#16086 ) ### Description 1. Replacing AMX intrinsics with machine code macro instructions in QGEMM kernel. 2. Removing AMX build flags for GCC in cmake file. ### Motivation and Context The additional AMX flag in cmake adds an extra layer of dependency on GCC version to use the feature.These changes should allow the usage of the AMX feature with just the CPU ID check.	2023-06-15 11:22:59 -07:00
Rachel Guo	65434dce57	Bump decode-uri-component from 0.2.0 to 0.2.2 in /js/react_native/e2e (#16329 ) ### Description <!-- Describe your changes. --> As title. Similar as this pr: https://github.com/microsoft/onnxruntime/pull/13846 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To resolve component governance alert. https://aiinfra.visualstudio.com/Lotus/_componentGovernance/97926/alert/8087084?typeId=16589570 Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2023-06-15 10:30:48 -07:00
Yulong Wang	4f7900b553	[js/web] enable ONNX Runtime Web error messages in JS (#16335 ) ### Description enabling passing error messages from C++ to JavaScript so that when ORT Web API fails it generates more verbose errors.	2023-06-15 09:45:41 -07:00
Yi Zhang	3e99e43a1d	extend Final AAR testing timeout limit (#16340 ) ### Description <!-- Describe your changes. --> ### Motivation and Context improve nuget pipeline stability	2023-06-15 17:27:45 +08:00
pengwa	735a32fee1	Introduce memory observer for ORTModule (#16213 ) ### Introduce memory observer for ORTModule To analyze memory usage for ORTModule training, we need collect per-iteration memory footprint in different stages (pre-forward, post-forward, pre-backward, and post-backward). Currently we only collect the data using torch.cuda APIs. The next step is, we could collect the detailed stashed activation list and its percentage within ORT backend, which is beyond this PR. Sample as below: ``` 0/8] step 0 memory (MiB) \| phase: pre_forward \| allocated: 1866 \| max allocated: 1866 \| cached: 1874 \| max cached: 1874 \| inactive: 8 \| max inactive: 8 [0/8] step 0 memory (MiB) \| phase: post_forward \| allocated: 23277 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 193 \| max inactive: 405 [0/8] step 0 memory (MiB) \| phase: pre_backward \| allocated: 23277 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 193 \| max inactive: 405 [0/8] step 0 memory (MiB) \| phase: post_backward \| allocated: 2932 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 6158 \| max inactive: 6158 0%\|█ \| 1/200 [00:26<1:26:18, 26.02s/it] [0/8] step 1 memory (MiB) \| phase: pre_forward \| allocated: 2356 \| max allocated: 26215 \| cached: 26406 \| max cached: 26406 \| inactive: 2454 \| max inactive: 6165 [0/8] step 1 memory (MiB) \| phase: post_forward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 1 memory (MiB) \| phase: pre_backward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 1 memory (MiB) \| phase: post_backward \| allocated: 3422 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 5284 \| max inactive: 6165 1%\|██ \| 2/200 [00:26<36:47, 11.15s/it] [0/8] step 2 memory (MiB) \| phase: pre_forward \| allocated: 2356 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2454 \| max inactive: 6165 [0/8] step 2 memory (MiB) \| phase: post_forward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 2 memory (MiB) \| phase: pre_backward \| allocated: 23767 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 2639 \| max inactive: 6165 [0/8] step 2 memory (MiB) \| phase: post_backward \| allocated: 3422 \| max allocated: 26705 \| cached: 29342 \| max cached: 29342 \| inactive: 5284 \| max inactive: 6165 ```	2023-06-15 15:45:36 +08:00
pengwa	574e17ade4	Fix Reshape check (#16349 ) ### Fix Reshape check 3D->2D reshape by merging the first dims. There is a bug for the case. ```mermaid stateDiagram [768,12,64] --> Reshape (—1,768) --> Reshape Reshape --> [768,768] ``` The Reshape pass the upstream Reshape check, but it should not. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-15 13:50:53 +08:00
PeixuanZuo	097346be9d	[ROCm] Add clean step for ROCm CI pipeline (#16336 ) 1. Add clean step for ROCm CI pipeline 2. Fix error "device or resource busy" bug by setting umount dataset step as `always()` step.	2023-06-15 13:44:12 +08:00
Baiju Meswani	5eec24837f	Fix for AMD GPU pipeline (#16357 )	2023-06-14 20:36:16 -07:00
Wanming Lin	73dad4452b	[WebNN EP] Support Shape op (#16282 ) Since WebNN API doesn't support shape op, in the WebNN EP, we calculate the ONNX Shape node output and pass the values to a WebNN's constant + slice as workaround.	2023-06-14 20:31:01 -07:00
Changming Sun	dbc7a195b1	Update win-ci-pipeline.yml: enable xnnpack tests (#16244 ) 1. Enable xnnpack test 2. Change TSA database name from onnxruntime_master to onnxruntime_main. This is a leftover of renaming the "master" branch to "main" 3. Add two static analysis jobs for WinML and DML 4. Rename the machine pool "aiinfra-dml-winbuild" to "onnxruntime-Win2019-GPU-dml-A10", so that the internal and public ADO instances use the same machine pool name. 5. Move Windows GPU CI build pipeline from "onnxruntime-Win2022-GPU-T4" to "onnxruntime-Win2022-GPU-A10" machine pool, because we do not have enough T4 GPUs.	2023-06-14 19:12:42 -07:00
Tianlei Wu	9be133231f	Fix cuda graph capture (#15005 ) Fix two issues related to cuda graph capture: https://github.com/microsoft/onnxruntime/issues/14942 and https://github.com/microsoft/onnxruntime/issues/15002 Issue 1: Previously, graph capture starts at the second run. However, memory pattern optimization will allocate memory from the second run, and cudamalloc is not allowed during graph capture. In this PR, the graph capture will start graph capture after 2 runs to avoid the issue. Issue 2: https://github.com/microsoft/onnxruntime/pull/13495 introduced multiple stream support. But stream cleanup will call cudaStreamSyncronize which is not allowed in cuda graph capture. In this PR, we move stream cleanup after cuda graph capture. Update the squeeze net test model with dynamic axis so that we can test with larger batch size. Add a test that could reproduce the bug (when changing min runs from 2 back to 1).	2023-06-14 18:10:20 -07:00
Baiju Meswani	8a3de16d14	Temporary fix to make the training pipeline green (#16353 )	2023-06-14 13:11:35 -07:00
Baiju Meswani	ed2482667b	Fix training pipeline (#16342 )	2023-06-13 15:06:38 -07:00
zesongw	c5176ed122	[WebNN EP] Add several new unary Ops (Ceil, Exp, Identity, Reciprocal, Tan) (#16302 ) ### Description - Add new Ops: Ceil, Exp, Identity, Reciprocal, Tan. - Set MinSupportedOpSet for unary Ops. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support more Ops for other models. The legacy optimization attribute "consumed_inputs" is not supported in WebNN EP.	2023-06-13 08:14:55 -07:00

1 2 3 4 5 ...

9005 commits