onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
satyajandhyala	8092a89688	Changed command line argpasrse to process '--symmetric [True\|False]'. (#19577 ) ### Description <!-- Describe your changes. --> Accept the command line option --symmetric and its optional value correctly. If the optional value matches uncased to 'True' then set symmetric to True else set symmetric to False. Asymmetric quantization will generate zero_point input. ``` usage: matmul_4bits_quantizer.py [-h] --input_model INPUT_MODEL --output_model OUTPUT_MODEL [--block_size BLOCK_SIZE] [--symmetric [{True,False}]] [--accuracy_level ACCURACY_LEVEL] [-v] [--nodes_to_exclude NODES_TO_EXCLUDE [NODES_TO_EXCLUDE ...]] ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-20 21:18:54 -08:00
Baiju Meswani	124bde985a	Bring QAT POC back to a functional state (#19290 )	2024-02-20 19:20:42 -08:00
PeixuanZuo	6226c5f62f	[ROCm] Add SkipGroupNorm for ROCm EP (#19303 ) Add SkipGroupNorm for ROCm EP. --------- Co-authored-by: Peixuan Zuo <peixuanzuo@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-21 11:08:48 +08:00
zhijiang	8fadc6c913	Zhijxu/cleanup cached tensors when oom (#19306 ) in pytorch, when oom happens at bp, user could decrease the batch size and rerun it without restarting the process. while in ORT, the intermediate tensors are kept even OOM, so decrease batch size still fail. this is torch run, we can see after oom failure, torch will release tensor before next step ![image](https://github.com/microsoft/onnxruntime/assets/43435212/92b8a2e3-454b-448a-a223-17cb91d463c2) this is from ort, we can see ort not release its tensors after OOM failure. ![image](https://github.com/microsoft/onnxruntime/assets/43435212/bb6a3882-8e14-4f37-8079-e7f70fc2546b) ort with the PR, we can see memory is released, the 4GB memory is not own by ort, and will be released by torch at the end. ![image](https://github.com/microsoft/onnxruntime/assets/43435212/7f39d711-4e36-47d5-aecf-3805433a6d01)	2024-02-21 10:41:42 +08:00
Markus Tavenrath	0c4421cb78	Fix compile warnings (as errors) for functions which miss returning required return value (#19079 ) Added dummy return values to functions which specify a return value, but do not return an value value. ### Motivation and Context Fix compiler errors with 'warnings as errors' enabled.	2024-02-21 12:39:43 +10:00
Scott McKay	45e20bf781	Use build.py to build in py-win-gpu.yml so parallelization parameters are set (#19578 ) ### Description <!-- Describe your changes. --> build.py sets a few parallelization parameters when building. Using msbuild directly lacks those. `7a5860e490/tools/ci_build/build.py (L1665-L1669)` Changed to use build.py. If there's a concern with that we _could_ set the parameters in the yaml, but that will be uglier due to duplicating logic in multiple places. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-21 10:38:37 +08:00
Yulong Wang	6e04e36e3f	[js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317 ) ### Description upgrade tsc in common from 4.9.5 to 5.2.2	2024-02-20 17:33:37 -08:00
Yulong Wang	70567a4b3a	[js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358 ) ### Description use ApiTensor insteadof onnxjs Tensor in TensorResultValidator. Make test runner less depend on onnxjs classes.	2024-02-20 17:33:21 -08:00
Yulong Wang	3fe2c137ee	[js] small fix to workaround formatter (#19400 ) ### Description Rename shader variable names to snake_case naming and also to avoid formatter behaving inconsistently in win/linux.	2024-02-20 17:23:01 -08:00
Yulong Wang	97ff17c2cb	update script of run CI for external PRs to add "Big Models" (#19576 ) ### Description update script of run CI for external PRs to add "Big Models"	2024-02-20 17:02:11 -08:00
Jake Mathern	7a5860e490	Fix cmake function duplicate lib (#19547 ) ### Description Fixes cmake function definition in winml.cmake to copy link flags. ### Motivation and Context XFGCheck errors in WindowsAI because this function does not transfer linker flags	2024-02-20 13:41:40 -08:00
Scott McKay	ec9c8cbdc9	Use xcode parallel build flags to speed up iOS CI that is timing out (#19570 ) ### Description <!-- Describe your changes. --> Provide specific xcodebuild flags instead of depending on cmake to do the right thing. This built in just over an hour with a ccache miss. Previous CIs with a ccache miss were timing out after 150 minutes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-21 07:40:35 +10:00
Sheil Kumar	3c49aacd56	Disable __cpuid check on arm64 builds as intrinsic is not available (#19574 ) Disable __cpuid check on arm64 builds as intrinsic is not available Motivation Breaking the arm64 build. Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-02-20 13:13:40 -08:00
Jiajie Hu	1b48054e1b	[js/webgpu] Create Split indices helpers by rank, not by shape (#19554 ) ### Description This is required to make shape uniforms really work. ### Motivation and Context The bug was unveiled in a model with multiple Split nodes. The later nodes would try to reuse a previous pipeline cache, while the old shapes were hardcoded as constants in cache.	2024-02-20 09:24:34 -08:00
Xavier Dupré	7efb0dbe12	add option DefaultTensorType to specify the default tensor type to quantize (#19455 ) ### Description The current quantization tool relies on shape inference to provide the type of every intermediate tensor, then the tool knows which type it must dequantize into (float32, float16). However, this information is not available if shape inference fails. That happens every time the model include an operator from a custom domain such as com.microsoft. This PR introduces an extra option `DefaultTensorType` as a fall back when the quantizer cannot find the type it needs. ### Motivation and Context This fixes issue #19409.	2024-02-20 08:22:44 -08:00
Markus Tavenrath	e832562d70	Fix invalid usage of designated initializers. (#19497 ) ### Description I've replaces all ocurances of C++ designated initializers in the CUDA NHWC Tests by member initialization. ### Motivation and Context C++ designated initializers have been introduced in C++ 20. Yet GCC accepts designated initializers in C++17 which is the standard used to compile onnxruntime. Yet MSVC is standard conform and accepts this feature starting C++20 which leads to compile failures on Windows without this change.	2024-02-20 18:06:03 +10:00
PeixuanZuo	f3e3b531fe	Update build directory clean up stage for python package pipeline (#19553 ) Fix to make clean up stage take effect. If the `SourceFolder ` is empty, the task deletes files from the root folder of the repository as though [$(Build.SourcesDirectory)](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables) was specified.	2024-02-20 10:31:39 +08:00
pengwa	b55260d076	Minor fix for cmake (#19552 ) ### Minor fix for cmake When build on Linux, get a warning saying " CMake Warning at CMakeLists.txt:1603 (message): MPI and NCCL disabled on Win build. " This message is not correct. So have such a fix to avoid any misunderstanding from users. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/848c2d77-a538-4e31-8e0d-4b539233e515) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-19 10:21:19 +08:00
satyajandhyala	dfeda9019c	[JS/WebGPU] Add MatMulNBits (#19446 ) ### Description Add MatMulNBits to support MatMul using 4-bit quantized weights ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-17 09:19:17 -08:00
Yulong Wang	06269a3952	[js/webgpu] allow uint8 tensors for webgpu (#19545 ) ### Description allow uint8 tensors for webgpu	2024-02-16 18:28:27 -08:00
Adrian Lizarraga	4874a41008	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 ) ### Description Updates the default QNN SDK version to 2.19.2.240210. ### Motivation and Context Build and test the latest version of QNN SDK in our pipelines.	2024-02-16 16:59:43 -08:00
kunal-vaishnavi	44d8ad93b2	Whisper Timestamps and Temperature (#19509 ) ### Description This PR updates exporting and running the Whisper model with beam search by adding the following. - Adds temperature as a graph input to the exported model - Fixes the token ids by adding them as attributes to `WhisperBeamSearch` - Fixes the timestamps test cases so they pass now - Fixes a bug with invoking `torch.onnx.export` - Cleans up the Whisper scripts and groups the arguments in `convert_to_onnx.py` - Adds a `requirements.txt` file to specify package dependencies - Adds `whisper-large-v3` to list of pretrained models - Fixes a bug with missing cross-attention KV cache inputs in the decoder subgraph ### Motivation and Context - This is a follow-up to [this PR](https://github.com/microsoft/onnxruntime/pull/19188). - The incorrect token ids in the timestamps processor were first noticed during [this PR review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333520007). When they were originally added in [this PR](https://github.com/microsoft/onnxruntime/pull/15853), the offsets were previously constant across the Whisper model sizes. When comparing the new `whisper-large-v3` variant, the English-only variants (e.g. `whisper-tiny.en`), and the original variants (e.g. `whisper-tiny`), both the values and the offsets differ. Therefore, it is easier to set the token ids as attributes to `WhisperBeamSearch` when exporting to ensure the right values are used in the timestamps processor. - The Hugging Face API for returning timestamps and the expected outputs from the PyTorch model have both changed. - The fix for `torch.onnx.export` is a follow-up to [this PR review](https://github.com/microsoft/onnxruntime/pull/17179#issuecomment-1683001470). - The argument grouping is a follow-up to [this PR review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333521721). - Specific package versions are needed to run the Whisper scripts and the `requirements.txt` file ensures that these versions are installed. - The `whisper-large-v3` variant is released and should be in the list of official pretrained models. - After the changes from [this PR](https://github.com/microsoft/onnxruntime/pull/17316), the exported model is not loading in an ORT inference session because the cross-attention KV cache inputs are missing in the decoder subgraph.	2024-02-16 15:21:43 -08:00
Tianlei Wu	1dce5e1732	Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541 ) ### Description Some test thresholds that previously worked in T4 GPU does not work anymore. The reason is current pipeline uses A10, and TF32 is enabled by default. Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random test failure. ### Motivation and Context Linux Test has random failure at tests: ProviderOptionsTest > testCUDAOptions() FAILED org.opentest4j.AssertionFailedError: array contents differ at index [446], expected: <0.0419757> but was: <0.041948937> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99) at app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43) org.opentest4j.AssertionFailedError: array contents differ at index [6], expected: <0.0225981> but was: <0.022587791> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676) at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)	2024-02-16 14:41:11 -08:00
Adrian Lizarraga	b84712151c	QNN EP: Fuse DQ -> Q sequences into a QNN Convert op (#19511 ) ### Description Fuses DQ -> Q sequences into a QNN Convert operator if: - Converting from one qtype to another. Ex: Dequantize(uint8 to float) -> Quantize(float to uint16) - The DQ and Q operators are not part of another node unit (i.e., standalone) - The Q operator is the only consumer for the DQ operator. ### Motivation and Context Allows faster execution of QDQ models with mixed activation types by leveraging the QNN Convert operator, which converts between quantization types. For certain models, this results in inference latency speed-ups of up to 2x (depends on the number of DQ -> Q sequences). #### Example for Add node unit with 16-bit I/O: Original: ``` u8 ----> DQ ---> Q ---u16--> Add ---u16--> ^ \| u16 --------------------------+ ``` After fusing DQ -> Q: ``` u8 ----> Convert ---u16--> Add ---u16--> ^ \| u16 ------------------------+ ```	2024-02-16 14:36:05 -08:00
Sheil Kumar	ef0b71308c	Optimize KahnsTopologicalSort and PriorityNodeCompare (#19475 ) Description 1) During SessionInitialization, KahnsTopologicalSort is a major cause of perf degradation. The main cause of slow down is that the TopologicalSort needs to keep track of nodes to visit in order, and reorder them based on priority (as informed by a comparator). The existing implementation uses a priority_queue that is backed by a std::vector container. However, vectors are not good for insertion and reordering. The appropriate data type for this operation is a linked list. However, linked lists like std::list are not usable as a container for std::priority_queue. This is because std::priority_queue requires random access, which linked lists do not have. However, for this simple implementation, we can leverage a std::list under the hood and perform insertions manually using std::upper_bound. This drastically reduces the time taken by the method, which currently instead causes numerous recopies and a lot of movement inside the graph nodes to visit list. 2) In the comparator, I hide forward and backward attribute checking behind the #ifdef ENABLE_TRAINING macro, as I believe it should only be valid in the training scenario. 3) In noopelimination transformer, I prevent the creation of Initializer (which unpacks tensorproto data) in every node and only create initializers when Add/Sub/Mul/Div op nodes are detected. Motivation and Context Session creation time of many models is quite slow. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-02-16 05:34:55 -08:00
Tianlei Wu	4bfa69def8	Speed Up DecoderMaskedSelfAttentionTest (#19531 ) ### Description The unit tests take 19 minutes to run (in debug build) because of too many combinations. I reduce the combinations and remain good test coverage. After the change, the test can finish in 51 seconds. Before: [----------] 2 tests from DecoderMaskedSelfAttentionTest [ RUN ] DecoderMaskedSelfAttentionTest.Test_fp32 [ OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (394086 ms) [ RUN ] DecoderMaskedSelfAttentionTest.Test_fp16 [ OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (747035 ms) [----------] 2 tests from DecoderMaskedSelfAttentionTest (1141122 ms total) After: [----------] 2 tests from DecoderMaskedSelfAttentionTest [ RUN ] DecoderMaskedSelfAttentionTest.Test_fp32 [ OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (21057 ms) [ RUN ] DecoderMaskedSelfAttentionTest.Test_fp16 [ OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (30653 ms) [----------] 2 tests from DecoderMaskedSelfAttentionTest (51710 ms total) ### Motivation and Context Reduce test time, and improve build pipeline efficiency.	2024-02-15 20:22:36 -08:00
sophies927	d0061d6fb1	Update stale.yml to use old version as a bug fix (#19532 ) ### Description Changed the actions/stale version back to v8 from v9. ### Motivation and Context There is a well-documented issue w/ the new actions/stale version (v9.0.0) that causes the following error: "Error delete _state: [403] Resource not accessible by integration". See https://github.com/actions/stale/issues/1133 for more context. This issue is preventing the stale bot from labeling stale issues since the version was updated b/c the action can no longer access the cache and cannot apply labels to all issues due to GH API rate limiting. There are two potential fixes if we continue to use the new version: (1) run the action on all PRs/issues to avoid using the cache or (2) give write access to the endpoints listed in https://docs.github.com/en/rest/authentication/permissions-required-for-fine-grained-personal-access-tokens?apiVersion=2022-11-28#repository-permissions-for-actions. Neither of these options is preferable, so I am going to wait until the bug is fixed. Note: The old version (v8.0.0) uses Node 16, which will be deprecated in Spring 2024, instead of Node 20, so we should keep an eye on [this issue](https://github.com/actions/stale/issues/1133) to see when they make the fix and we can switch back to the new version.	2024-02-15 17:03:11 -08:00
rui-ren	d63c664ca0	fix rocm ci pipeline (#19525 ) ### Description <!-- Describe your changes. --> ROCm CI pipeline issue. ``` Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20... main() File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset builder_instance.download_and_prepare( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare self._download_and_prepare( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare split_generators = self._split_generators(dl_manager, **split_generators_kwargs) File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators data_file = dl_manager.download_and_extract(self.config.data_url) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract return self.extract(self.download(url_or_urls)) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download downloaded_path_or_paths = map_nested( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested return function(data_struct) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download return cached_path(url_or_filename, download_config=download_config) File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path output_path = get_from_cache( File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache raise ConnectionError("Couldn't reach {}".format(url)) ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Update the `datasets` pipeline to latest version `2.17.0`.	2024-02-15 00:02:08 -08:00
Changming Sun	660f39aca5	Perf improvement for Intel MTL CPUs (#19524 ) ### Description See the comments inside of the changed files for more detailed information. The file onnxruntime/core/platform/windows/hardware_core_enumerator.cc and onnxruntime/core/platform/windows/hardware_core_enumerator.h were copied from WinML source folder in this repo, with minor coding style changes. I had an offline discussion with Sheil. We agree that given the lack of a future proof solution, we may check-in this temp fix first, and rework it later. I will have a meeting with @ivberg for discussing the issue deeply, and seeking for a long term solution. Thanks for offering help, @ivberg ! ### Motivation and Context With this change, we will see about 2x perf improvement on some Intel CPUs.	2024-02-14 18:35:56 -08:00
jingyanwangms	775c774f4b	Add BF16 to Sqrt (#19363 ) ### Description Sqrt does not have BF16 support yet. Adding that with this PR ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-14 18:07:51 -08:00
rui-ren	a67e692546	add GatherSliceToSplitFusion and Unittest (#19218 ) ### Multi Query Attention Optimization in multi-query attention ``` batch_size, seq_length, three_times_hidden_size = fused_qkv.shape fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim) return fused_qkv[..., :-2, :], fused_qkv[..., [-2], :], fused_qkv[..., [-1], :] ``` which can be optimized to ``` batch_size, seq_length, three_times_hidden_size = fused_qkv.shape fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim) (query, key, value) = fused_qkv.split([self.num_heads, 1, 1], dim=2) return query, key, value ``` this optimization can be validated from nsight profiling and perf benchmarking. <img width="545" alt="image" src="https://github.com/microsoft/onnxruntime/assets/15321482/cefcd061-4a01-4aaf-a008-8e265f7f63e9"> As such, This PR is to Optimize the `Gather/Gather/Slice` Ops to `Split` Kernel. ### Optimization Target <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> As 2 `Gather` and 1 `Slice` Kernels are time consuming for backward prop, it would be efficient to use 1 `Split` Kernel ### Example - Before Fusion <img width="419" alt="image" src="https://github.com/microsoft/onnxruntime/assets/15321482/17410319-57ea-4176-afd4-1efdcd3fdbae"> - After Fusion <img width="424" alt="image" src="https://github.com/microsoft/onnxruntime/assets/15321482/f1ee1582-96d4-45f4-8778-49d1f3fd370a"> ### Perf Gain After the optimization, there will have ~7% perf gain. > The `Transpose` Kernel can be fused too, will update it in next PR. However, after testing Transponse Ops fusion on Falcon model, there is no perf gain. Will not create a new PR. --------- Co-authored-by: ruiren <ruiren@microsoft.com>	2024-02-14 15:07:56 -08:00
Scott McKay	4e5119760d	Add initial support for CoreML ML Program to the CoreML EP. (#19347 ) ### Description <!-- Describe your changes. --> Adds infrastructure to create an ML Package containing the Model using ML Program. Updated coremltools files to v7.1 to bring in new protobuf definitions along with the tools to write the weight.bin file and create an ML Package correctly. Enables building a CoreML Model on all platforms which means all the operator builder code can be debugged anywhere. Execution of the generated CoreML model is obviously limited to Apple platforms. The Conv operator builder has been updated to be able to generate an ML Program Operation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> NeuralNetwork is no longer being developed and ML Program is the replacement going forward.	2024-02-15 08:46:03 +10:00
Baiju Meswani	944d8f8513	Update the default std flag used during torch extensions compilation (#19516 )	2024-02-14 12:49:34 -08:00
Prathik Rao	3b03b2e046	Upgrade default ORTModule opset from 15 to 17 (#19315 ) ### Description <!-- Describe your changes. --> This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is the final opset supported by torchscript exporter (https://github.com/pytorch/pytorch/pull/107829) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Engineering excellence contribution for ORT Training DRI. --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-14 11:19:33 -08:00
Sheil Kumar	1508c2ee39	Restrict L2 Cache Core check to Intel devices (#19483 ) ### Description Limit SoC core detection via 2 level cache core logic to Intel and Hybrid processors. ### Motivation and Context The following code was added to add support for a new class of CPU cores present in Intel’s next generation Intel Core Ultra mobile processors. This code is essential to avoid placing threads on low performing SoC cores that don’t have L3 cache. SoC cores are meant to specialize in system bringup and help improve responsiveness and power usage, in other words they are not meant to run compute heavy AI workloads. In order to avoid broad exposure of this logic, it is currently designed to be restricted to Intel platforms that have hybrid enabled. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2024-02-14 10:31:03 -08:00
Tianlei Wu	fbff99a432	Change Jave Test Threshold (#19508 ) ### Description Increase the threshold to 1e-5 to avoid test failed in CUDA when difference is slightly larger than 1e-6. May because TF32 is used in those CUDA tests. ### Motivation and Context https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1291322&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d ProviderOptionsTest > testCUDAOptions() FAILED org.opentest4j.AssertionFailedError: array contents differ at index [103], expected: <0.0102678> but was: <0.010266338> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99) at app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43) https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1293200&view=logs&jobId=f2f63060-d9d6-52d0-adee-b97db5a9ab91&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d InferenceTest > testCUDA() FAILED org.opentest4j.AssertionFailedError: array contents differ at index [103], expected: <0.0102678> but was: <0.010266337> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123) at app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119) at app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360) at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676) at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)	2024-02-14 10:08:46 -08:00
Ye Wang	f53d2c2465	Phi2 script fixes (#19500 ) ### Description <!-- Describe your changes. --> This PR is intended to support Phi2 passes in Olive. Merge it before https://github.com/microsoft/Olive/pull/938 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-14 10:08:11 -08:00
Prathik Rao	544407038d	SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898 ) ### Description <!-- Describe your changes. --> Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which will provide speedup for Llama-v2 on A100 using bfloat16 numerical format. _layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_ ![image](https://github.com/microsoft/onnxruntime/assets/31260940/8c0a5f0f-5fcb-4637-bcd9-f34272ec0284) ### Repro Instructions ```python from torch import nn from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel import torch dtype = torch.bfloat16 # dtype = torch.float16 class Net(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(784, 10, dtype=dtype) self.layernorm = nn.LayerNorm([784], dtype=dtype) def forward(self, x): x = x.view(x.shape[0], -1) x = self.layernorm(x) x = self.fc(x) return x model = Net() model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO)) model.to("cuda") images = torch.randn((8, 28, 28), dtype=dtype).to("cuda") output = model(images) ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ONNX Runtime integration with Llama-v2 family of LLMs. --------- Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2024-02-14 10:05:16 -08:00
dependabot[bot]	18f76bd25d	Bump gradle/wrapper-validation-action from 1 to 2 (#19412 ) Bumps [gradle/wrapper-validation-action](https://github.com/gradle/wrapper-validation-action) from 1 to 2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/gradle/wrapper-validation-action/releases">gradle/wrapper-validation-action's releases</a>.</em></p> <blockquote> <h2>v2.0.0</h2> <h2>What's Changed</h2> <p>The version of the Node.js runtime was updated to 20, and the majority of dependencies were updated to the latest versions. From now on, the <code>wrapper-validation-action</code> will require a Node.js 20 runtime environment.</p> <p>There are no functional changes in this release. This release is tagged with the <code>v2</code> version label.</p> <ul> <li>[NEW] Update Node.js runtime to version 20 (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li> </ul> <h2>v2.0.0-rc.1</h2> <p>This is a release candidate for <code>v2.0.0</code>. It is also available under the <code>v2</code> version label.</p> <h2>What's Changed</h2> <p>The version of the Node.js runtime was updated to 20, and the majority of dependencies were updated to the latest versions. From now on, the <code>wrapper-validation-action</code> will require a Node.js 20 runtime environment.</p> <p>There are no functional changes in this release.</p> <ul> <li>[NEW] Update Node.js runtime to version 20 (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li> </ul> <h2>v1.1.0</h2> <p>The action now adds the path of the failed wrapper Jar as a <code>failed-wrapper</code> Step output parameter. This makes the value available for reporting in later Steps/Jobs.</p> <h2>v1.0.6</h2> <h1>Gradle Wrapper Validation</h1> <ul> <li>Security vulnerability: <a href="`959bfac6da`">Bump json5 from 1.0.1 to 1.0.2</a></li> <li>Security vulnerability: <a href="`ffa46e5c87`">Bump qs from 6.10.1 to 6.11.0</a></li> </ul> <h2>v1.0.5</h2> <h1>Gradle Wrapper Validation</h1> <ul> <li>Update dependencies for Node 16 (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/53">#53</a>)</li> <li>Update dependencies with security vulnerabilities (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/67">#67</a>)</li> <li>Update various other dependencies (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/45">#45</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/47">#47</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/48">#48</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/54">#54</a>)</li> </ul> <h2>v1.0.4</h2> <h1>Gradle Wrapper Validation</h1> <ul> <li>Retry connections to the server on failure (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/39">#39</a>)</li> <li>Update dependencies (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/38">#38</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/37">#37</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/36">#36</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/34">#34</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/31">#31</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/30">#30</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/29">#29</a>)</li> </ul> <h2>v1.0.3</h2> <h1>Gradle Wrapper Validation</h1> <p>Update <code>minimist</code> version to <code>1.2.5</code></p> <h2>v1.0.2</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`27152f6fa0`"><code>27152f6</code></a> Update to Node 20 (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li> <li><a href="`d8758a98d1`"><code>d8758a9</code></a> Build output</li> <li><a href="`e916071cca`"><code>e916071</code></a> Update NPM dependencies</li> <li><a href="`d9359e465a`"><code>d9359e4</code></a> Add asdf config file</li> <li><a href="`77d43de170`"><code>77d43de</code></a> Update upload-artifact version</li> <li><a href="`2f8436d9bb`"><code>2f8436d</code></a> Use setup-node@v4 instead of pinning to a revision</li> <li><a href="`bfa0fe410a`"><code>bfa0fe4</code></a> Consistently use npm cache for workflows</li> <li><a href="`8be8473276`"><code>8be8473</code></a> Update workflows and action to NodeJS 20</li> <li><a href="`c8fad9e3f8`"><code>c8fad9e</code></a> Bump <code>@babel/traverse</code> from 7.14.7 to 7.23.2</li> <li><a href="`342dbebe72`"><code>342dbeb</code></a> Update README to use <code>actions/checkout@v4</code></li> <li>See full diff in <a href="https://github.com/gradle/wrapper-validation-action/compare/v1...v2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gradle/wrapper-validation-action&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-13 15:59:24 -08:00
dependabot[bot]	f048fb5b14	Bump nuget/setup-nuget from 1 to 2 (#19411 ) Bumps [nuget/setup-nuget](https://github.com/nuget/setup-nuget) from 1 to 2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/nuget/setup-nuget/releases">nuget/setup-nuget's releases</a>.</em></p> <blockquote> <h2>v2.0.0</h2> <h2>What's Changed</h2> <ul> <li>build(deps): bump semver from 7.3.8 to 7.5.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/49">NuGet/setup-nuget#49</a></li> <li>build(deps-dev): bump word-wrap from 1.2.3 to 1.2.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/51">NuGet/setup-nuget#51</a></li> <li>build(deps-dev): bump <code>@babel/traverse</code> from 7.23.0 to 7.23.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/57">NuGet/setup-nuget#57</a></li> <li>Update to use Node.js 20 by <a href="https://github.com/frederikprijck"><code>@frederikprijck</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/59">NuGet/setup-nuget#59</a></li> <li>build(deps-dev): bump prettier from 2.8.7 to 3.0.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/60">NuGet/setup-nuget#60</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 18.18.0 to 20.8.9 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/62">NuGet/setup-nuget#62</a></li> <li>build(deps-dev): bump <code>@vercel/ncc</code> from 0.36.1 to 0.38.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/61">NuGet/setup-nuget#61</a></li> <li>build(deps-dev): bump eslint-plugin-jest from 27.4.0 to 27.6.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/64">NuGet/setup-nuget#64</a></li> <li>build(deps-dev): bump nock from 13.3.3 to 13.3.6 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/63">NuGet/setup-nuget#63</a></li> <li>build(deps-dev): bump eslint from 8.50.0 to 8.52.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/65">NuGet/setup-nuget#65</a></li> <li>build(deps-dev): bump <code>@typescript-eslint/parser</code> from 5.62.0 to 6.9.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/70">NuGet/setup-nuget#70</a></li> <li>build(deps-dev): bump eslint-plugin-github from 4.10.0 to 4.10.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/68">NuGet/setup-nuget#68</a></li> <li>build(deps-dev): bump <code>@types/jest</code> from 29.5.5 to 29.5.7 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/69">NuGet/setup-nuget#69</a></li> <li>build(deps-dev): bump eslint from 8.52.0 to 8.53.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/73">NuGet/setup-nuget#73</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.8.9 to 20.8.10 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/71">NuGet/setup-nuget#71</a></li> <li>build(deps-dev): bump nock from 13.3.6 to 13.3.8 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/72">NuGet/setup-nuget#72</a></li> <li>build(deps-dev): bump prettier from 3.0.3 to 3.1.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/74">NuGet/setup-nuget#74</a></li> <li>build(deps-dev): bump <code>@types/jest</code> from 29.5.7 to 29.5.8 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/76">NuGet/setup-nuget#76</a></li> <li>build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.9.1 to 6.10.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/77">NuGet/setup-nuget#77</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.8.10 to 20.9.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/75">NuGet/setup-nuget#75</a></li> <li>build(deps-dev): bump eslint from 8.53.0 to 8.54.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/80">NuGet/setup-nuget#80</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.9.0 to 20.9.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/79">NuGet/setup-nuget#79</a></li> <li>build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.10.0 to 6.12.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/81">NuGet/setup-nuget#81</a></li> <li>build(deps-dev): bump <code>@types/jest</code> from 29.5.8 to 29.5.10 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/83">NuGet/setup-nuget#83</a></li> <li>build(deps-dev): bump typescript from 5.2.2 to 5.3.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/82">NuGet/setup-nuget#82</a></li> <li>build(deps-dev): bump nock from 13.3.8 to 13.4.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/88">NuGet/setup-nuget#88</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.9.2 to 20.10.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/86">NuGet/setup-nuget#86</a></li> <li>build(deps-dev): bump eslint from 8.54.0 to 8.55.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/85">NuGet/setup-nuget#85</a></li> <li>build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.12.0 to 6.13.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/89">NuGet/setup-nuget#89</a></li> <li>build(deps-dev): bump <code>@types/jest</code> from 29.5.10 to 29.5.11 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/93">NuGet/setup-nuget#93</a></li> <li>build(deps-dev): bump prettier from 3.1.0 to 3.1.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/91">NuGet/setup-nuget#91</a></li> <li>build(deps-dev): bump typescript from 5.3.2 to 5.3.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/92">NuGet/setup-nuget#92</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.10.3 to 20.10.4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/90">NuGet/setup-nuget#90</a></li> <li>build(deps-dev): bump eslint from 8.55.0 to 8.56.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/94">NuGet/setup-nuget#94</a></li> <li>build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.13.2 to 6.19.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/107">NuGet/setup-nuget#107</a></li> <li>build(deps-dev): bump eslint-plugin-jest from 27.6.0 to 27.6.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/106">NuGet/setup-nuget#106</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.10.4 to 20.11.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/110">NuGet/setup-nuget#110</a></li> <li>build(deps-dev): bump prettier from 3.1.1 to 3.2.4 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/109">NuGet/setup-nuget#109</a></li> <li>build(deps-dev): bump <code>@types/node</code> from 20.11.5 to 20.11.10 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/116">NuGet/setup-nuget#116</a></li> <li>build(deps-dev): bump nock from 13.4.0 to 13.5.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/115">NuGet/setup-nuget#115</a></li> <li>build(deps-dev): bump ts-jest from 29.1.1 to 29.1.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/113">NuGet/setup-nuget#113</a></li> <li>build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.19.0 to 6.20.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/117">NuGet/setup-nuget#117</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/frederikprijck"><code>@frederikprijck</code></a> made their first contribution in <a href="https://redirect.github.com/NuGet/setup-nuget/pull/59">NuGet/setup-nuget#59</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/NuGet/setup-nuget/compare/v1.2.0...v1.3.0">https://github.com/NuGet/setup-nuget/compare/v1.2.0...v1.3.0</a></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`a21f25cd39`"><code>a21f25c</code></a> Update dist for release (<a href="https://redirect.github.com/nuget/setup-nuget/issues/118">#118</a>)</li> <li><a href="`5166d73a43`"><code>5166d73</code></a> build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.19.0 to 6.20.0 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/117">#117</a>)</li> <li><a href="`b915545882`"><code>b915545</code></a> build(deps-dev): bump ts-jest from 29.1.1 to 29.1.2 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/113">#113</a>)</li> <li><a href="`00081d4dbe`"><code>00081d4</code></a> build(deps-dev): bump nock from 13.4.0 to 13.5.1 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/115">#115</a>)</li> <li><a href="`e44f8a5711`"><code>e44f8a5</code></a> build(deps-dev): bump <code>@types/node</code> from 20.11.5 to 20.11.10 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/116">#116</a>)</li> <li><a href="`f685ada866`"><code>f685ada</code></a> build(deps-dev): bump prettier from 3.1.1 to 3.2.4 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/109">#109</a>)</li> <li><a href="`aee2c690f4`"><code>aee2c69</code></a> build(deps-dev): bump <code>@types/node</code> from 20.10.4 to 20.11.5 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/110">#110</a>)</li> <li><a href="`2bd1cef324`"><code>2bd1cef</code></a> build(deps-dev): bump eslint-plugin-jest from 27.6.0 to 27.6.3 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/106">#106</a>)</li> <li><a href="`c5ed90cfc8`"><code>c5ed90c</code></a> build(deps-dev): bump <code>@typescript-eslint/parser</code> from 6.13.2 to 6.19.0 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/107">#107</a>)</li> <li><a href="`34040aa462`"><code>34040aa</code></a> build(deps-dev): bump eslint from 8.55.0 to 8.56.0 (<a href="https://redirect.github.com/nuget/setup-nuget/issues/94">#94</a>)</li> <li>Additional commits viewable in <a href="https://github.com/nuget/setup-nuget/compare/v1...v2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nuget/setup-nuget&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-13 15:59:15 -08:00
fxmarty	1e10cdb2b9	Fix subgraph quantization regression in onnxruntime 1.17 (#19421 ) As per title, fixes https://github.com/microsoft/onnxruntime/issues/19418 ONNX Runtime 1.17 broke the quantization of ONNX models with subgraphs where initializers are placed on the top-level graph, while different subgraphs use the same initializer.	2024-02-13 15:49:19 -08:00
Yifan Li	5c7e6b2e2a	[EP Perf] Add CI option to enable TRT-OSS parser (#19448 ) ### Description <!-- Describe your changes. --> * Introducing CI option to enable TRT-OSS parser, during ep perf testing: ![image](https://github.com/microsoft/onnxruntime/assets/109183385/a9ba6393-6b94-4b8f-8ca4-ba7bc7954504) By default, open-sourced onnx-tensorrt parser listed under [cmake/deps.txt](https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt#L39-L40) will be used if enabling this option. ### To verify this option and check the difference during ORT image build: If this option is enabled: <img width="649" alt="image" src="https://github.com/microsoft/onnxruntime/assets/109183385/3b778583-451e-4617-ba8c-c064442e60fd"> If this option is not enabled (by default): <img width="683" alt="image" src="https://github.com/microsoft/onnxruntime/assets/109183385/cd8383ba-eff4-4536-94ab-a1424bb858ab"> * update default usage of cmake/trt version to the latest ### Motivation and Context Make it easier to test oss parser and find potential gap between tensorrt builtin/oss parser. Schedule runs with oss parser will be set after this PR gets merged	2024-02-12 23:04:08 -08:00
George Wu	5e70c6b3a6	allow protobuf lite build for TRT EP (#19498 ) allow protobuf-lite builds with TensorRT EP as long as it's built with the trt built-in parser and not the oss-parser. This is because trt built-in parser statically links protobuf so there aren't any conflicts for protobuf-lite.	2024-02-12 22:53:04 -08:00
Adrian Lizarraga	4dfba53bfb	[QNN EP] Build x64 python wheel for QNN EP (#19499 ) ### Description Adds a job to the python packaging pipeline that builds x64 python wheels for QNN EP. ### Motivation and Context Necessary to create a cached QNN model on Windows x64, which is done by creating a properly configured onnxruntime session with QNN EP.	2024-02-12 20:54:04 -08:00
Patrice Vignola	61e07a46e1	[DML EP] Support split hidden size for RotaryEmbedding (#18852 ) RotaryEmbedding now supports the `[batchSize, numHeads, sequenceLength, headSize]` format for its input, which is used in Mistral.	2024-02-12 19:36:08 -08:00
Hector Li	a622710fe1	Add option to skip session run in perf_test tool (#19501 ) Enable a option to exit after session creation so that user can measure session creation time to measure impact of enabling any initialization optimizations.	2024-02-12 19:11:40 -08:00
snadampal	7fa6f4fca4	add arm64 bfloat16 fastmath mode option for transformers benchmarking script (#19294 ) Add arm64 bfloat16 fastmath mode option for transformers benchmarking script. ### Motivation and Context onnxruntime now supports bfloat16 fastmath gemm kernels for arm64 platforms with bfloat16 instruction support. This PR updates benchmark scripts to test that mode.	2024-02-12 15:20:36 -08:00
Preetha Veeramalai	90e2e8561f	Ovep 1.17.1 (#19482 ) ### Description Handle bugs for API backward compatability. Update to consume the onnx model path rather the onnx serialised model to OV compile_model API	2024-02-12 12:31:08 -08:00
Changming Sun	9cb97ee507	Disable CPU EP's allocator's arena when address sanitizer is enabled (#19485 ) ### Description Disable CPU EP's allocator's arena when address sanitizer is enabled, because it masks problems. For example, the code in onnxruntime/test/quantization/quantization_test.cc has a memory leak problem: it allocated a buffer but didn't free it, but most memory leak check tool cannot detect that because the buffer was from an arena and the arena was finally freed. ### Motivation and Context Provider better memory leak check coverage.	2024-02-12 09:39:49 -08:00
Baiju Meswani	c831031ad5	Remove cuda gencode 90 to reduce onnxruntime-training package size (#19486 )	2024-02-12 09:24:36 -08:00

1 2 3 4 5 ...

10575 commits