onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Rachel Guo	fc3a2a3771	[CoreML EP] Add Pad op support (#14946 ) ### Description <!-- Describe your changes. --> As title. - Only support constant mode Pad for CoreML EP for now. - Enable Pad tests for CoreML with inputs as initializer types. CoreML Spec for reference: https://apple.github.io/coremltools/mlmodel/Format/NeuralNetwork.html#paddinglayerparams ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fill operator gaps for ClipChamp models. --------- Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-03-22 21:54:55 -07:00
pengwa	7bec80d92a	Fix reference count for autograd.Function (#15121 ) ### Fix reference count for autograd When PythonOp kernel initialized, `AddPointerScalarArgs` creates `const_args_` which put all non-tensor references (including ProcessGroup, string, or other user types) in it. In kernel's destructor, all ref cnt got decreased for `const_args_`. ``` void PythonOpBase::Clear() { for (auto ptr : const_args_) { auto obj = reinterpret_cast<PyObject*>(ptr); Py_DECREF(obj); } } ``` It means, we did not increase cnt, but just decrease cnt. Running the unit, segmentation fault will be thrown. The simple fix is to remove the Py_DECREF for those pointer-type constant inputs triggered by kernel destructor. NONTENSOR_OBJECT_POINTER_STORE is the place we increase the reference during export, then the reference will remain until the python program terminates. Additionally tunings: 1. Move some logs into verbose instead of warning in case of flooding training logs. 2. Move pointer type ref holding from python side (NONTENSOR_OBJECT_POINTER_STORE) to orttraining/orttraining/core/framework/torch/custom_function_register.h. Then we use a consistent approach to manage all PythonOp related python object/methonds ref count increasing and decreasing.	2023-03-23 12:51:50 +08:00
Yulong Wang	f972d21e81	[js] upgrade dependencies and enable strict mode (#14930 ) ### Description This PR includes the following changes: - upgrade js dependencies - enable STRICT mode for web assembly build. - corresponding fix for cmake-js upgrade - corresponsing fix for linter upgrade - upgrade default typescript compile option of: - `moduleResolution`: from `node` to `node16` - `target`: from `es2017` to `es2020` - fix ESM module import in commonJS source file ## change explanation ### changes to onnxruntime_webassembly.cmake `-s WASM=1` and `-s LLD_REPORT_UNDEFINED` in latest version is by-default and deprecated. ### changes to onnxruntime_node.cmake The npm package `cmake-js` updated its way to find file `node.lib`. previously it downloads this file from Node.js public release channel, and now it generates it from a definition file. The node.js release channel does not contain a windows/arm64 version, so previously cmake-js will fail to download `node.lib` for that platform. this is why we made special handling to download the unofficial binary to build. now this is no longer needed so we removed that from the cmake file. ### changes to tsconfig.json `node16` module resolution supports async import and `es2020` as target supports top level await.	2023-03-22 15:05:04 -07:00
cloudhan	71b67ec1e2	Refactor ke register to be decentralized (#15036 ) So that we can remove all unnecessay header files	2023-03-22 14:49:26 +08:00
Baiju Meswani	0086f7590d	LSTM and LSTM gradient implementation for training (#15034 )	2023-03-21 21:44:08 -07:00
JiCheng	126e7bf15f	[AMX] add assembler check (#15055 ) ### Description <!-- Describe your changes. --> AMX isn't supportted until assembler 2.40 even though the GCC frontend supports it. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-22 07:57:22 +08:00
Tianlei Wu	3e2d453b64	Supports model > 2GB in fp16 conversion with onnx shape inference (#15067 ) (1) Allow model to be path, and use infer_shapes_path to fix https://github.com/microsoft/onnxruntime/issues/15063 (2) Add some logging for float data truncation (3) Add RandomUniformLike to default op_block_list (4) Some minor changes to use f string.	2023-03-21 15:08:28 -07:00
Yufeng Li	c7ced7a5e9	Add PackedAttention for packing mode (#14858 ) ### Description <!-- Describe your changes. --> Transformer models can handle batch of inputs at once. However, sequences in a batch usually have different length. Then we have to pad the short one to have same length as the longest. This is not efficient especially for large batch with high variance. This PR introduces a PackedAttention operator which can take in packed sequences (no padding) and also produces output in packing mode. There will be another PR to use the PackedAttention to implement the encoder in packing mode. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-21 12:59:29 -07:00
Faith Xu	ef76b3aeb8	Transformers tool - update readme to link to docs page (#14964 ) ### Description Transformers tool documentation has been moved to: https://onnxruntime.ai/docs/performance/transformers-optimization.html	2023-03-21 11:56:19 -07:00
Chen Fu	34175f0b7c	FP16 conv (#15062 ) ### Description Convolution for fp16 datatype. Use NHWC for computation. For NCHW input, it rearranges the input tensor to NHWC format before computing the result. Support two optional fusion: 1. Activation 2. Add (not yet implemented) ### Motivation and Context Accelerating fp16 inference	2023-03-21 10:32:43 -07:00
Yi Zhang	a3570eb5bf	Add mac packages smoking test (#15122 ) ### Description Check the Mac x86_64 packages installation. ### Motivation and Context To avoid installation error, add packages smoking test before release.	2023-03-21 18:02:44 +08:00
Chi Lo	abb2418c02	TensorRT EP - timing cache [patch] (#15113 ) ### Description Patch https://github.com/microsoft/onnxruntime/pull/14767 in order to make two provider options `force_timing_cache` and `detailed_build_log` can be updated. Otherwise, they only use default value. `timing_cache_enable` is good.	2023-03-20 17:20:05 -07:00
Hariharan Seshadri	0ace27fdf7	Disable unit tests for decoder masked multihead attention on CC 5.2 or lower GPUs (#15114 )	2023-03-20 15:09:49 -07:00
Zhang Lei	226a691e05	one prefast warning fix (#14912 )	2023-03-20 10:38:27 -07:00
Justin Chu	bdd7bd084c	Remove the use of eval in test code (#15097 ) ### Description Remove the use of `eval` in test code so we don't (1) use eval and (2) create "unused" local vars that ruff will remove. Predecessor to #15085	2023-03-20 09:43:56 -07:00
Chi Lo	c964da7ea2	FasterTransformer model wrapper using custom op (#15013 ) ### Description <!-- Describe your changes. --> We are introducing the FasterTransfomer model-level integration using ORT [custom op runtime wrapper](https://github.com/microsoft/onnxruntime/pull/13427). In order to make the FT wrapper/integration work, two things need to be done: - New API `KernelInfoGetConstantInput_tensor`. (Done in this PR) During custom op kernel initialization, it needs to get the model weights (saved as node's constant inputs) ready for FT's weights instantiation. What's why we need to add this new API to make kernel info capable of getting constant inputs. - Custom op and custom op kernel to wrap FT model. (Will provide in onnxruntime extensions or inference examples) During custom op kernel initialization, it can fetch attributes from kernel info to determine which kind of FT model instance create. During custom op kernel compute/inference, it can get input/output from kernel context and then assign input/output buffers for model instance to run.	2023-03-20 09:05:30 -07:00
PeixuanZuo	32a4eebc17	[ROCm] add rocm5.4.2 to python package pipeline (#15081 ) add rocm5.4.2 to python package pipeline: https://download.onnxruntime.ai/onnxruntime_nightly_rocm542.html	2023-03-20 10:30:14 +08:00
Adrian Lizarraga	e42f7487df	Add logging APIs for custom operators (#14416 ) ### Description Add logging APIs for custom ops. This PR introduces a `OrtLogger` type, which can be retrieved from a `OrtKernelInfo` or `OrtKernelContext`. The kernel info's logger is the session logger stored in the execution provider. The kernel context's logger is a run logger. ### Motivation and Context Allows custom ops to log information in a manner consistent with built-in ops. Example usage in custom op: ```C++ struct MyCustomKernel { MyCustomKernel(const OrtApi& api, const OrtKernelInfo* info) { Ort::ConstKernelInfo kinfo(info); this->logger_ = kinfo.GetLogger(); // ... ORT_CXX_LOGF_NOEXCEPT(this->logger_, OrtLoggingLevel::ORT_LOGGING_LEVEL_ERROR, "Error: %s", err_msg); } void Compute(OrtKernelContext* context) { ORT_CXX_LOG(this->logger_, OrtLoggingLevel::ORT_LOGGING_LEVEL_VERBOSE, "Calling compute..."); // ... } // ... private: Ort::Logger logger_; }; ```	2023-03-17 15:05:28 -07:00
Kevin Chen	2023836c2f	Fix CUDA tests for Ampere cards, and bump layernorm tests opset version (#14761 ) ### Description Three main changes: * `qOrdered` tests fail 100% on Ampere+ cards with cublas error. Disable them on these cards. Bump LayerNormalization tests to opset 17, to be consistent with the ONNX specification. Mark TRT EP as a provider that does not do error checking on faulty LayerNorm definitions * Remove null tensor for optional `bias` input for LayerNorm. Optional inputs should be omitted entirely. ### Motivation and Context Streamlines testing for CUDA and TRT with later NVIDIA architectures. Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2023-03-17 09:30:01 -07:00
cloudhan	98ab4a62d6	Fix ROCm 5.2.3 pipeline (#15073 ) Make CK optional again.	2023-03-17 15:59:57 +08:00
pengwa	1ccb79476c	Fix training gpu ci related to pl upgrade (#15092 ) ### Fix training gpu ci related to pl upgrade As new version of pln relased, old parameter of pln.Trainer, gpus looks not supported. So we switch to new params to make it work. ``` ['/home/onnxruntimedev/miniconda3/bin/python3', 'orttraining_test_ortmodule_torch_lightning_basic.py', '--train-steps=470', '--epochs=2', '--batch-size=256', '--data-dir', '/mnist'] /home/onnxruntimedev/miniconda3/lib/python3.8/site-packages/torch/onnx/utils.py:1794: FutureWarning: The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead. warnings.warn( Traceback (most recent call last): File "orttraining_test_ortmodule_torch_lightning_basic.py", line 101, in <module> main() File "orttraining_test_ortmodule_torch_lightning_basic.py", line 96, in main trainer = pl.Trainer(kwargs) File "/home/onnxruntimedev/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/argparse.py", line 69, in insert_env_defaults return fn(self, kwargs) TypeError: __init__() got an unexpected keyword argument 'gpus' ```	2023-03-17 13:26:58 +08:00
Yi-Hong Lyu	5ac3a37be5	Add float16 Tanh support (#15048 )	2023-03-16 18:57:38 -07:00
PeixuanZuo	4a8cd4256a	fix miopen new API cannot be supported by ROCm5.2.3 (#15077 ) miopenTensorLayout_t was added after MIOpen version 2.18.0. Define it in ORT when use MIOpen version lower than 2.18.0.	2023-03-17 08:40:35 +08:00
PeixuanZuo	55174bb2e9	Fix Nhwcconv with asymmetric padding (#15050 ) 1. Fix Nhwcconv with asymmetric padding. The slice axies are (1,2) with NHWC layout. 2. For ROCm EP, Move Addbias after SliceOutUnwantedOutputSection, because before that, the actual output of Conv is s_.y_data.	2023-03-17 08:38:25 +08:00
dependabot[bot]	6a6513f9c0	Bump @sideway/formula from 3.0.0 to 3.0.1 in /js/react_native (#15028 ) [//]: # (dependabot-start) ⚠️ Dependabot is rebasing this PR ⚠️ Rebasing might not happen immediately, so don't worry if this takes some time. Note: if you make any changes to this PR yourself, they will take precedence over the rebase. --- [//]: # (dependabot-end) Bumps [@sideway/formula](https://github.com/sideway/formula) from 3.0.0 to 3.0.1. <details> <summary>Commits</summary> <ul> <li><a href="`5b44c1bffc`"><code>5b44c1b</code></a> 3.0.1</li> <li><a href="`9fbc20a02d`"><code>9fbc20a</code></a> chore: better number regex</li> <li><a href="`41ae98e042`"><code>41ae98e</code></a> Cleanup</li> <li><a href="`c59f35ec40`"><code>c59f35e</code></a> Move to Sideway</li> <li>See full diff in <a href="https://github.com/sideway/formula/compare/v3.0.0...v3.0.1">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~marsup">marsup</a>, a new releaser for <code>@sideway/formula</code> since your current version.</p> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@sideway/formula&package-manager=npm_and_yarn&previous-version=3.0.0&new-version=3.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-16 10:17:38 -07:00
dependabot[bot]	3059a73b38	Bump @sideway/formula from 3.0.0 to 3.0.1 in /js/react_native/e2e (#15029 ) Bumps [@sideway/formula](https://github.com/sideway/formula) from 3.0.0 to 3.0.1. <details> <summary>Commits</summary> <ul> <li><a href="`5b44c1bffc`"><code>5b44c1b</code></a> 3.0.1</li> <li><a href="`9fbc20a02d`"><code>9fbc20a</code></a> chore: better number regex</li> <li><a href="`41ae98e042`"><code>41ae98e</code></a> Cleanup</li> <li><a href="`c59f35ec40`"><code>c59f35e</code></a> Move to Sideway</li> <li>See full diff in <a href="https://github.com/sideway/formula/compare/v3.0.0...v3.0.1">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~marsup">marsup</a>, a new releaser for <code>@sideway/formula</code> since your current version.</p> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@sideway/formula&package-manager=npm_and_yarn&previous-version=3.0.0&new-version=3.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-16 10:16:40 -07:00
Pranav Sharma	6600fd792a	Fix CPU memory leak due to external weights not getting memory unmapped when using non-CPU EP. (#15040 )	2023-03-16 08:01:46 -07:00
Yi Zhang	1e7849c2c8	Add compilation cache in iOS pipeline (#15070 ) ### Description <!-- Describe your changes. --> ### Motivation and Context iOS pipeline duration could be reduced to 20 more minutes from 90 more minutes https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=921577&view=results ### Ref https://ccache.dev/manual/4.8.html#_c_modules	2023-03-16 21:43:18 +08:00
Vincent Wang	25e537770f	[CUDA] Fix Alignment of SkipLayerNorm Vectorized Kernel (#15054 ) Some of our vectorized kernels (including SkipLayerNorm) doesn't check the alignment of data pointer. While ORT's allocator may guarantee the alignment, but since training is using PyTorch's allocator, which cannot guarantee that, we need to add the data pointer check before we call any vectorized kernel. This PR is to fix such data pointer alignment issue for SkipLayerNorm's vectorized kernel. We found this issue when running huggingface's swinv2 model. The PR also refactored the code for SkipLayerNorm kernel to make it simpler.	2023-03-16 15:23:23 +08:00
cloudhan	a5ab88247b	ROCm Flash Attention (#14838 ) Adds flash attention via composable kernel for ROCm EP	2023-03-16 10:39:58 +08:00
Yi Zhang	881f3f6be3	[Fix] Error in Linux_Packaging_combined_GPU of nuget packaing pipeline (#15060 ) ### Description ### Motivation and Context It caused by the #14958, in the nuget packaging pipeline, it calls get_docker_image.py directly rather than by get-docker-image-steps.yml. Considering the difference, one parameter is added for compatibility. ### Test Link https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=288042&view=logs&j=505ca2b7-596d-550d-8417-9b1519e87977	2023-03-16 08:49:37 +08:00
Hariharan Seshadri	ed7ab1660d	[CUDA] Add option to use DecoderMaskedMultiheadAttention in BeamSearch (#14990 )	2023-03-15 17:16:32 -07:00
Yufeng Li	da084b0fc1	check axis range for LayerNorm (#14845 ) ### Description <!-- Describe your changes. --> Add check on axis to make sure it is in a valid range ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-15 14:44:59 -07:00
Changming Sun	5213546e62	Change how to find npm (#15001 )	2023-03-15 11:10:10 -07:00
wejoncy	32533dd1c2	fix	2023-03-15 13:23:56 +08:00
JiCheng	cc15ceef4e	Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/model_builder.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	6bdb03281a	clean comments	2023-03-15 13:23:56 +08:00
wejoncy	762ea2402e	fix	2023-03-15 13:23:56 +08:00
JiCheng	8db28d9139	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	8d00961321	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	c10462b5f5	ORT_UNUSED_PARAMETER	2023-03-15 13:23:56 +08:00
wejoncy	92fabf57ea	comments	2023-03-15 13:23:56 +08:00
JiCheng	cd3173d531	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	dad772ef09	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_api_helper.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
wejoncy	8aeed1e87d	amend	2023-03-15 13:23:56 +08:00
wejoncy	5fe61c53a3	comments	2023-03-15 13:23:56 +08:00
JiCheng	d236085845	Update onnxruntime/core/providers/nnapi/nnapi_builtin/model.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	d490908836	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	a8ed956fa7	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.cc Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00
JiCheng	de5e58c077	Update onnxruntime/core/providers/nnapi/nnapi_builtin/nnapi_execution_provider.h Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-15 13:23:56 +08:00

1 2 3 4 5 ...

8374 commits