onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

Author	SHA1	Message	Date
Ashwini Khade	ea7bbd667d	fix headers for training apis (#14350 ) ### Description Minor refactor PR for fixing header placement for training apis	2023-01-19 10:26:53 -08:00
Yi Zhang	b51415b0ea	disable cache for training_x64_debug (#14358 ) ### Description disable cache to save disk space for training_x64_debug ### Motivation and Context To mitigate not enough disk space in training_x64_debug first.	2023-01-19 15:08:34 +08:00
Chi Lo	80d61989e9	Unit test modification for TensorRT EP (#14339 ) Two modifications: - After [TRT 8.5](https://github.com/microsoft/onnxruntime/pull/13867) being merged, we can manually set timeout and make TRT EP only run small portion of unit tests (`onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS=ON`) due to additional TRT kernel overhead introduced by TRT 8.5 which increases test time a lot. This PR modifies the checking condition and make TensorRT CIs (can enable builder placeholder) still run most of the unit tests. - Exclude TRT EP from [Resize Opset 18](https://github.com/microsoft/onnxruntime/pull/13890) unit tests since TensorRT 8.5 supports operators up to Opset 17.	2023-01-18 21:30:19 -08:00
Adrian Lizarraga	a491f33f54	Allow PostAnalysis@2 task to continue on error for Windows_Packaging_CPU_x86_default (#14332 ) ### Description Allows the PostAnalysis@2 task for windows CI jobs to continue even if an error is encountered. ### Motivation and Context This is a temporary workaround that enables the `Windows_Packaging_CPU_x86_default` job within the Zip-Nuget-Java-NodeJS packaging pipeline to finish. A recent update to dotnet 6 has broken the PostAnalysis task for this job. This task was originally added by https://github.com/microsoft/onnxruntime/pull/13694	2023-01-18 19:54:48 -08:00
Edward Chen	20e164786e	[objc] Fix parameter name in documentation. (#14330 ) Fix mismatch between documented and actual parameter name.	2023-01-18 16:54:59 -08:00
Rui Ren	904e63633a	increase the time limit as more unit tests added (#14327 ) ### Description Pipeline failed because we added more unit tests, reference: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863643&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=305229be-e8ba-5189-ca61-fcb77d866478 Now we have: [2430 tests]( https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863619&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43853) Previously we had: [2422 tests](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=859543&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43640) - Timeout error as we have 2 hour threshold ``` jobs: - job: Linux_Build timeoutInMinutes: 120 variables: skipComponentGovernanceDetection: true ``` ### Motivation and Context - Increase the timeoutInMinutes to `150`	2023-01-18 15:51:21 -08:00
Rui Ren	c4e693c4b7	update gsl-lite license (#14318 ) ### Description - Update gsl-lite license with MS GSL's License ### Motivation and Context - Work Item: https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/10175 - Release ORT 1.14.0	2023-01-18 15:49:13 -08:00
dependabot[bot]	3c695f78fe	Bump electron from 15.5.5 to 18.3.7 in /js/web (#13617 ) Bumps [electron](https://github.com/electron/electron) from 15.5.5 to 18.3.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/electron/electron/releases">electron's releases</a>.</em></p> <blockquote> <h2>electron v18.3.7</h2> <h1>Release Notes for v18.3.7</h1> <h2>Fixes</h2> <ul> <li>Fixed WCO not responding to touch events on windows. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35177">#35177</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35176">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/35174">20</a>)<!-- raw HTML omitted --></li> <li>Fixed <code>webContents.getUserAgent()</code> incorrectly returning an empty string unless previously set. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35130">#35130</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35151">17</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/35132">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/35131">20</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue in which calling setBounds() after e.preventDefault in a 'will-move' or 'will-resize' event wouldn't change the window's shape until the mouse button was released. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35082">#35082</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35083">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/35084">20</a>)<!-- raw HTML omitted --></li> <li>Fixed context menu not showing all items on macOS when dock is not hidden. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35198">#35198</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35199">19</a>)<!-- raw HTML omitted --></li> <li>None. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35171">#35171</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35172">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/35173">20</a>)<!-- raw HTML omitted --></li> </ul> <h2>Other Changes</h2> <ul> <li>Fixed page size always being restricted to 4k on Linux arm64. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35184">#35184</a></li> <li>Security: backported fix for CVE-2022-2478. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35099">#35099</a></li> <li>Security: backported fix for chromium:1334864. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35097">#35097</a></li> </ul> <h2>electron v18.3.6</h2> <h1>Release Notes for v18.3.6</h1> <h2>Fixes</h2> <ul> <li>Fixed a crash when calling <code>BrowserWindow.setEnabled()</code>. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34973">#34973</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34971">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34972">20</a>)<!-- raw HTML omitted --></li> <li>Fixed a potential crash when changing window settings after initializing WCO with an invalid <code>titleBarStyle</code>. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34873">#34873</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35031">17</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34874">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34875">20</a>)<!-- raw HTML omitted --></li> <li>Fixed alwaysOnTop BrowserWindow option for X11 Linux. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34911">#34911</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34912">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34913">20</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue where BrowserWindows on macOS were incorrectly marked as resizable. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34907">#34907</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34906">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34433">20</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue where Windows Control Overlay buttons did not respect maximizable/minimizable/closable states of a BrowserWindow. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34720">#34720</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34733">17</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34722">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34721">20</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue where calling <code>BrowserWindow.setRepresentedFilename</code> on macOS with <code>titlebarStyle: 'hiddenInset'</code> or <code>titlebarStyle: 'hidden'</code> inadvertently moves the traffic light location. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34847">#34847</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34848">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34849">20</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue where some <code>BrowserWindow</code>s opened from new links wouldn't properly load URLs. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34910">#34910</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34189">19</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue where the minimize button with WCO enabled would incorrectly be highlighted in some cases. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34838">#34838</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34837">17</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34839">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34840">20</a>)<!-- raw HTML omitted --></li> <li>Fixed an issue with background colors being improperly applied to <code>BrowserView</code>s on Windows. <a href="https://github-redirect.dependabot.com/electron/electron/pull/33478">#33478</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/33546">16</a>)<!-- raw HTML omitted --></li> <li>Fixed empty app_id when running under wayland. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34877">#34877</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34878">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34879">20</a>)<!-- raw HTML omitted --></li> <li>Fixed missing Sec-CH-UA headers and empty navigator.userAgentData. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34758">#34758</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/34760">17</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34757">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/34524">20</a>)<!-- raw HTML omitted --></li> <li>Fixed symbol generation on 32-bit Windows release builds. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35096">#35096</a> <!-- raw HTML omitted -->(Also in <a href="https://github-redirect.dependabot.com/electron/electron/pull/35090">19</a>, <a href="https://github-redirect.dependabot.com/electron/electron/pull/35091">20</a>)<!-- raw HTML omitted --></li> <li>Prevent brief display of "Ozone X11" in window title on Linux. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34943">#34943</a></li> </ul> <h2>Other Changes</h2> <ul> <li>Backported fix for CVE-2022-2294. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34882">#34882</a></li> <li>Security: backported fix for 1287804. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35102">#35102</a></li> <li>Security: backported fix for 1333333. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34689">#34689</a></li> <li>Security: backported fix for 1335054. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34687">#34687</a></li> <li>Security: backported fix for 1335458. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34685">#34685</a></li> <li>Security: backported fix for 1336014. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35004">#35004</a></li> <li>Security: backported fix for 1339844. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35002">#35002</a></li> <li>Security: backported fix for 1340335. <a href="https://github-redirect.dependabot.com/electron/electron/pull/35000">#35000</a></li> <li>Security: backported fix for 1340654. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34998">#34998</a></li> <li>Security: backported fix for CVE-2022-2162. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34714">#34714</a></li> <li>Security: backported fix for CVE-2022-2295. <a href="https://github-redirect.dependabot.com/electron/electron/pull/34881">#34881</a></li> </ul> <h2>electron v18.3.5</h2> <h1>Release Notes for v18.3.5</h1> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`dee6e01e9e`"><code>dee6e01</code></a> Bump v18.3.7</li> <li><a href="`483e39cc74`"><code>483e39c</code></a> chore: cherry-pick 97193a64b431 from chromium (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35184">#35184</a>)</li> <li><a href="`cd7490d233`"><code>cd7490d</code></a> fix: consider dock space when showing menu (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35198">#35198</a>)</li> <li><a href="`b990bd6c97`"><code>b990bd6</code></a> fix: allow setsize to be called within a move or resize for preventDefault (#...</li> <li><a href="`56a0b45ef2`"><code>56a0b45</code></a> fix: modify file extension generation on Windows (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35171">#35171</a>)</li> <li><a href="`5871f81bb9`"><code>5871f81</code></a> fix: touch events not recognized by WCO on windows (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35117">#35117</a>) (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35177">#35177</a>)</li> <li><a href="`511f27506f`"><code>511f275</code></a> ci: turn off windows on arm test result comments (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35167">#35167</a>)</li> <li><a href="`8189ee64b9`"><code>8189ee6</code></a> chore: add electron deps to //src gitignore (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35148">#35148</a>)</li> <li><a href="`cc52f07023`"><code>cc52f07</code></a> ci: switch to GHA for WOA (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35127">#35127</a>)</li> <li><a href="`890adefb95`"><code>890adef</code></a> docs: new main -> renderers messageChannel example (<a href="https://github-redirect.dependabot.com/electron/electron/issues/35133">#35133</a>)</li> <li>Additional commits viewable in <a href="https://github.com/electron/electron/compare/v15.5.5...v18.3.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=15.5.5&new-version=18.3.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-01-18 14:58:09 -08:00
Adrian Lizarraga	de17d53c50	Custom Op runtime wrapper (#13427 ) ### Description Adds the below C APIs to support custom ops that wrap an entire model to be inferenced with an external runtime. The current SNPE EP is an example of an EP that could be ported to use a custom op wrapper. Ex: The custom op stores the serialized SNPE DLC binary as a string attribute. The SNPE model is built when the kernel is created. The model is inferenced with SNPE APIs on call to the kernel's compute method. #### C APIs \| API \| Description \| Why \| \| --- \| --- \| --- \| \| `KernelInfo_GetInputCount` \| Gets number of inputs from `OrtKernelInfo`. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetOutputCount` \| Gets number of outputs from `OrtKernelInfo`. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetInputName` \| Gets an input's name. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetOutputName` \| Gets an output's name. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetInputTypeInfo` \| Gets the type/shape information for an input. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetOutputTypeInfo` \| Gets the type/shape information for an output. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfoGetAttribute_tensor` \| Get a OrtValue tensor stored as an attribute in the graph node \| Extract serialized models, weights, etc. \| \| `GetSessionConfigEntry` \| Get a session configuration value \| Need to be able to get session-time configurations from within custom op \| \| `HasSessionConfigEntry` \| Check if session configuration entry exists. \| Need to be able to get session-time configurations from within custom op \| #### Why so many KernelInfo APIs?<sup>1</sup> Similar APIs currently exist for `OrtKernelContext`, but not `OrtKernelInfo`. Note that `OrtKernelContext` is passed to the custom op on call to its kernel's compute() function. However, `OrtKernelInfo` is available on kernel creation, which occurs when the session is created. Having these APIs available from `OrtKernelInfo` allows an operator to trade-off computation time for session-creation time, and vice versa. Operators that must build expensive state may prefer to do it during session creation time instead of compute-time. SNPE is an example of an EP that needs to be able to query `KernelInfo` for the name, type, and shape of inputs and outputs in order to build the model from the serialized DLC data. This is an expensive operation. Other providers (e.g., OpenVINO) are able to query i/o info from the serialized model, so they do not strictly need these APIs. However, the APIs can still be used to validate the expected I/O characteristics. Additionally, several of our CPU contrib ops currently use the same internal version of these KernelInfo APIs (Ex: [qlinear_softmax](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/contrib_ops/cpu/quantization/qlinear_softmax.cc#L71)). If custom ops are also meant to be a test bed for future ops, then all custom ops (not just runtime wrappers) would benefit from the addition of these public KernelInfo APIs (IMO). #### Example of usage in a custom OP From `onnxruntime/test/testdata/custom_op_openvino_wrapper_library/openvino_wrapper.h` ```c++ struct CustomOpOpenVINO : Ort::CustomOpBase<CustomOpOpenVINO, KernelOpenVINO> { explicit CustomOpOpenVINO(Ort::ConstSessionOptions session_options); CustomOpOpenVINO(const CustomOpOpenVINO&) = delete; CustomOpOpenVINO& operator=(const CustomOpOpenVINO&) = delete; void* CreateKernel(const OrtApi& api, const OrtKernelInfo* info) const; constexpr const char* GetName() const noexcept { return "OpenVINO_Wrapper"; } constexpr const char* GetExecutionProviderType() const noexcept { return "CPUExecutionProvider"; } // IMPORTANT: In order to wrap a generic runtime-specific model, the custom operator // must have a non-homogeneous variadic input and output. constexpr size_t GetInputTypeCount() const noexcept { return 1; } constexpr size_t GetOutputTypeCount() const noexcept { return 1; } constexpr ONNXTensorElementDataType GetInputType(size_t /* index /) const noexcept { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; } constexpr ONNXTensorElementDataType GetOutputType(size_t / index /) const noexcept { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; } constexpr OrtCustomOpInputOutputCharacteristic GetInputCharacteristic(size_t / index /) const noexcept { return INPUT_OUTPUT_VARIADIC; } constexpr OrtCustomOpInputOutputCharacteristic GetOutputCharacteristic(size_t / index */) const noexcept { return INPUT_OUTPUT_VARIADIC; } constexpr bool GetVariadicInputHomogeneity() const noexcept { return false; // heterogenous } constexpr bool GetVariadicOutputHomogeneity() const noexcept { return false; // heterogeneous } std::vector<std::string> GetSessionConfigKeys() const { return {"device_type"}; } private: std::unordered_map<std::string, std::string> session_configs_; }; ``` #### How to create a session: ```c++ Ort::Env env; Ort::SessionOptions session_opts; Ort::CustomOpConfigs custom_op_configs; // Create local session config entries for the custom op. custom_op_configs.AddConfig("OpenVINO_Wrapper", "device_type", "CPU"); // Register custom op library and pass in the custom op configs (optional). session_opts.RegisterCustomOpsLibrary(lib_name, custom_op_configs); Ort::Session session(env, model_path.data(), session_opts); ``` ### Motivation and Context Allows creation of simple "wrapper" EPs outside of the main ORT code base.	2023-01-18 09:09:32 -08:00
Tang, Cheng	734ae398ee	Fix a security warning in cuda gemm int8 kernel (#14335 ) ### Description fix a security warning in GemmInt8 cuda kernel ### Motivation and Context it is for issue: https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/11158/ Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-01-18 09:00:44 -08:00
Yi Zhang	8236808e89	add opset18 node test (#14236 ) ### Description add opset18 into download data list ### Motivation and Context Ref: https://github.com/onnx/onnx/releases/tag/v1.13.0	2023-01-19 00:56:57 +08:00
Scott McKay	dab900dfa0	Fix type mismatch when ORT_ENABLE_STREAM is off (#14324 ) ### Description <!-- Describe your changes. --> PartitionIntoStreams was incorrectly using std::string instead of PathString for the config file argument when ORT_ENABLE_STREAM was not defined. Also Incorporate changes from #14291 to fix build and test issues. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix build error on Windows due to mismatched type.	2023-01-18 13:45:00 +10:00
Dwayne Robinson	f6d0598b4d	DML EP return clearer error message when users attempt to use software adapter (#14273 ) ### Description The DML EP provider factory verifies the adapter id is a real GPU (not some software emulation like WARP which would be quite slow or basic display driver which lacks D3D compute ability), but the automated tests sometimes erratically get run on a variety of ADO cloud machines that lack a GPU or are in a bad state such that Windows fell back to software emulation. In such cases, you end up reaching the `!IsSoftwareAdapter` check in the provider factory ([line 132](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/dml/dml_provider_factory.cc#L132)) and seeing in the pipeline logs E_INVALIDARG. Let's return a more immediately enlightening error code like ERROR_GRAPHICS_INVALID_DISPLAY_ADAPTER rather than just E_INVALIDARG. ### Motivation and Context - Why is this change required? What problem does it solve Pipeline noise. - If it fixes an open issue, please link to the issue here. NA.	2023-01-17 18:03:02 -08:00
Tianlei Wu	477cad3051	[CUDA] Add trt cross attention kernels (#14328 ) Add TRT cross attention kernels for stable diffusion optimization.	2023-01-17 17:55:45 -08:00
Zhang Lei	a8df6c35f8	Support flash attention on 2d attention mask for gpt2 left padding. (#14215 )	2023-01-17 16:45:29 -08:00
Adrian Lizarraga	30b9f5dde1	Clean up TensorRT deprecations, warnings, unbounded string copy (#14148 ) ### Description - Updates deprecated use of `nvinfer1::___::destroy()` by using a `std::unique_ptr<>` instead of our own smart pointer that calls `destroy`. See [TensorRT deprecation list](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/deprecated.html#:~:text=Deprecated%20prior%20to%20TensorRT%208.0%20and%20will%20be,noexcept%20Use%20addMatrixMultiply%20instead.%20Deprecated%20in%20TensorRT%208.4.) and search for `destroy`. - Fixes warnings regarding uninitialized member variables. - Fixes bugs in TensorRT model ID generation: - Potential segfault when model path only has a root component. - Unbounded string copy for non-Windows builds. ### Motivation and Context Clean up	2023-01-17 15:55:56 -08:00
Guenther Schmuelling	60290393f3	enable ort-extensions in wasm release builds (#14239 ) enable ort-extensions in wasm release builds. sentence piece, gpt2, bert and word piece tokenizers for now. wasm size will grow from 8.4MB to 8.9MB.	2023-01-17 12:39:13 -08:00
Ye Wang	2db57a53a3	Add mask_filter in Attention related ops' attribute (#14274 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/12843 Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-01-17 12:28:11 -08:00
zhijiang	caa5900508	delete unused local typedef VK to fix pipeline error (#14322 ) fix the error "/onnxruntime_src/onnxruntime/core/providers/cuda/test/greedy_search_top_one.cc:34:9: error: typedef ‘using VK = struct std::pair<float, int>’ locally defined but not used [-Werror=unused-local-typedefs]34 \| using VK = std::pair<float, int32_t>"	2023-01-17 12:27:00 -08:00
Adrian Lizarraga	19b4d9d41e	Fix murmurhash3 inclusion in TensorRT shared library (#14221 ) ### Description Updates TensorRT and CANN EPs to use murmurhash3 from core/framework via provider bridge. ### Motivation and Context A failure in a packaging pipeline required us to temporarily duplicate murmurhash3 code for the TensorRT EP. This PR removes the duplicate code. This is what is happening: The original version of this code conditionally included a murmurhash function for TensorRT only (not cuda) in the provider bridge. The packaging pipeline selectively [copies binaries from two separate builds](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/linux/extract_and_bundle_gpu_package.sh) (a cuda-only build and a tensorrt build) into a single libs directory. These are the files within the resulting libs directory: - onnxruntime.so (copied from tensorrt build, implements murmurhash in provider bridge host) - onnxruntime_providers_shared.so (copied from tensorrt build) - onnxruntime_providers_tensorrt.so (copied from tensorrt build) - onnxruntime_providers_cuda.so (copied from cuda-only build, expects a provider host w/o murmurhash) The [squeezenet example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx/squeezenet) crashed when onnxruntime_providers_cuda.so was loaded because the cuda ep tried to call functions from a `ProviderHost` object that did not match what was actually implemented by onnxruntime.so. I've confirmed that we _can_ prevent the crash by modifying the pipeline to use the onnxruntime_providers_cuda.so file from the tensorrt build (instead of the file from the cuda-only build). However, I don't think that is necessarily correct. Instead, I think we should try to make sure that the provider bridge exposes the same interface to any EP libraries that can potentially coexist in the same application (like cuda and tensorrt). Failing that, there's probably something we can do to generate a better error message when an EP detects that the Provider Host implements an unexpected interface. Note that the above applies to the Windows build in the packaging pipeline as well. I used the onnxruntime branch [adrianl/test-trt-cuda-bridge-packaging-pipeline](https://github.com/microsoft/onnxruntime/tree/adrianl/test-trt-cuda-bridge-packaging-pipeline) along with the onnxruntime-inference-examples branch [adrianl/squeezenet_ld_debug](https://github.com/microsoft/onnxruntime-inference-examples/tree/adrianl/squeezenet_ld_debug) to test that copying the onnxruntime_providers_cuda.so file from the tensorrt build gets rid of the crash.	2023-01-17 11:18:49 -08:00
Yufeng Li	c99cd06b10	fix transformer model unit tests (#14319 ) For following failures, folder of convert_to_onnx should be specified to import for source code case: FAILED test_gpt2_to_onnx.py::TestGpt2ConvertToOnnx::test_auto_mixed_precision FAILED test_gpt2_to_onnx.py::TestGpt2ConvertToOnnx::test_stage1 - TypeError: ... FAILED test_gpt2_to_onnx.py::TestGpt2ConvertToOnnx::test_stage2 - TypeError: ... For failure below, SkipLayerNormal is fused: FAILED test_optimizer.py::TestModelOptimization::test_huggingface_openaigpt_fusion	2023-01-17 10:34:56 -08:00
Yi Zhang	909d7f4be5	Skip some tests to pass orttraining-gpu with TRT8.5 (#14314 ) ### Description skip 3 tests ### Motivation and Context These 3 tests failed in orttraining-linux-gpu-ci with TRT8.5 image.	2023-01-17 14:41:55 +08:00
kailums	db69079312	fix roctracer missing nccl operation bug (#14277 ) ### Description <!-- Describe your changes. --> This change fixes a bug that when running ort with nccl collective operation on AMD, it can't trace nccl operation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The reason of missing nccl operation in roctracer is that roctracer is using whitelist of which api can be traced, and nccl use hipExtLaunchKernel api which is not included in the whitelist. This fix is to add hipExtLaunchKernel into whitelist, then nccl operation could be traced.	2023-01-17 11:14:05 +08:00
Jian Chen	d95249f516	Removing Double QDQ from Graphs (#14024 ) ### Description When there are 2 QDQ pair back to back, we want to delete the 1 Q and 1 DQ nodes. ex: Q->DQ->Q->DQ =====> Q->DQ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-01-16 19:06:57 -08:00
Yi Zhang	fb801d58b1	Add Cache in Linux CPU Aten Pipeline (#14313 ) ### Description Add compilation cache in Linux CPU Aten Pipeline. The pipeline could be completed in 6 minutes at best. ### Motivation and Context 1. Accelerate the pipeline. 2. It's the shortest pipeline with docker image. I'll use it to try moving the storage of linux docker image from ACR to ADO pipeline cache.	2023-01-17 10:49:29 +08:00
xkszltl	3a9f30df46	Compatibility patch for nlohmann/json < 3.9.0. (#12394 ) This is required on CentOS 7 if using distro-provided json-devel 3.6.1. Regression introduced in: - https://github.com/microsoft/onnxruntime/pull/11775 Related upstream commit: - `74520d8bb0` Fixed https://github.com/microsoft/onnxruntime/issues/12393	2023-01-17 10:59:20 +10:00
Scott McKay	7f374f4012	Fix build error on Windows if Python debug libraries are installed (#14308 ) ### Description <!-- Describe your changes. --> If a user installs the debug libraries from Python on Windows the ORT python project file attempts to use the debug python lib, which conflicts with a pragma in pyconfig.h that wants the release lib (due to pybind11 undefining _DEBUG). Explicitly use the release lib instead of Python::Module so the build doesn't break. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix obtuse build break.	2023-01-17 09:48:26 +10:00
stevenlix	49cfb56cc3	Fix subgraph index issue in TRT (#14305 ) Subgraph index in TRT engine name keeps increasing when multiple sessions are created for the same model, which causes TRT engine not being reused and new engine is created again. The issue is because trt_model_id_generator_ is defined globally. This PR made following changes and improvements, 1. Define subgraph index as local variable thus it won't be shared across sessions. 2. Decouple subgraph index from hash id generator 3. Call hash id generator once at the beginning of GetCapability since hash id is shared between TRT subgraphs and there is no need to call it for each subgraph fix https://github.com/microsoft/onnxruntime/issues/14269	2023-01-16 14:40:41 -08:00
Yi Zhang	6d60dc24fe	install shared deps script (#14234 ) ### Description Add a new install_shared_deps.sh ### Motivation and Context Azcopy, Ninja, Node.js and CCache are all needed, but they are copied everywhere.	2023-01-16 18:27:29 +08:00
Jeff Daily	fe052e603b	ROCm header path updates (#14170 ) ROCm reorganized header file locations. Use the new locations to avoid warnings.	2023-01-16 10:28:13 +08:00
spampana95	9c0c49900e	Perform QlinearConv for a batch in a single parallel (#14296 ) ### Description This code change allows for the QlinearConv operator to sync batches into a single parallel section. This allows for the tasks of all the batches to be made available for threads to exercise. This would act alternatively to the existing method which parallelizes the tasks of induvial images separately which forces threads to wait for all an entire image’s tasks to complete before continuing. ### Motivation and Context For int8 convolution models where multiple batches are being utilized, this patch delivers an inference improvement of up-to 41% and 39% for Mobilenet_edtpu (U8S8) and Resnet50(U8S8) respectively on systems with higher core counts. The patch, delivers the highest benefit on systems with higher thread counts and when utilizing large batch sizes. <html> <body> <!--StartFragment--><span style="color: rgb(201, 209, 217); font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-size: 14px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"><style> </style></span> \| \| Batch 2 \| Batch 4 \| Batch 8 \| Batch 16 \| Batch 32 \| Batch 64 -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- resnet50 \| % Gain \| 22% \| 25% \| 32% \| 36% \| 33% \| 32% <!--EndFragment--> </body> </html>	2023-01-14 09:22:25 -08:00
Zhang Lei	15141a40b4	Add present_past_share_buff to QAttention Defs to enable QAttention related tests. (#14297 )	2023-01-14 09:19:06 -08:00
Yi Zhang	2a82f95040	Increase package python test pipeline timeout limit (#14288 ) ### Description Increase python test pipeline timeout limit. So far, It's a known issue for tensortRT8.5.	2023-01-14 13:46:09 +08:00
Zhang Lei	bd39c8f35e	Fix causual flash attention related kernel run (#14299 )	2023-01-13 21:40:22 -08:00
Yufeng Li	8824f812e0	optimize topk for greedysearch (#14271 ) Optimize top 1 computation in greedysearch. For vocabulary size 50k on A100, - batch size 1: from 220us to 10.4us. - batch size 4, from 230us to 11.5us. For generation of 50 tokens for example, it saves 50*0.2ms = 10ms.	2023-01-13 15:03:49 -08:00
JiCheng	4f309f05ca	[CPU] Resize of Opset 18 (#13890 ) ### Description To Implement Resize 18. This PR depends on https://github.com/microsoft/onnxruntime/pull/13765. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-01-14 08:57:23 +10:00
Adam Louly	f0555eb437	Improved test cases by using paramerters (#14246 ) ### Description Completing some missing parts of some test cases for python bindings ### Motivation and Context Some test cases like test_training_module_checkpoint and test_optimizer step were not completed before because we had no access to parameters to check if the parameters are changing after the optimizer step or that the checkpoint saved parameters remains the same. now that we have access to the vector or parameters by exposing get_contiguous_parameters() method. we can complete the tests.	2023-01-13 12:54:23 -08:00
Yilun Huang	6ac7c894bf	[bug fixed] use different node names for different dedicated QDQ pairs (#14258 ) ### Description <!-- Describe your changes. --> Bug fixed: Quantized models cannot be loaded into ort.InferenceSession when DedicatedQDQPair is True in extra_options of QDQQuantizer. Solutions: Add postfix to node names of dedicated QDQ pairs similar to tensor names of them. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Loading quantized model fails when setting `DedicatedQDQPair` to `True` in `extra_options` and raise an error as below: ``` Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from mobilenetv2-opset10-quantized-dedicated.onnx failed:This is an invalid model. Error: two nodes with same node name (489_QuantizeLinear). ``` After visualizing the quantized model using netron, we can find that both the dedicated QDQ pairs for tensor 489 have the same node names of "489_QuantizeLinear". So I found that in QDQQuantizer, there is no unique postfix for the node names of dedicated QDQ pairs. <img width="1171" alt="image" src="https://user-images.githubusercontent.com/12782861/212010296-f8cc05ce-c20e-4189-a692-aaf4bbac3a29.png"> Therefore, I add postfix to node names of QDQ pairs similar to doing so to tensor names. After this modification, the quantized model can be loaded successfully and dedicated QDQ pairs have different node names.👌🏻 <img width="1037" alt="image" src="https://user-images.githubusercontent.com/12782861/212010594-78eba39d-eab6-4d77-9ecd-b55f5303bcf4.png">	2023-01-13 11:24:54 -08:00
Scott McKay	114f18357a	Add Java and Objective-C bindings for RegisterCustomOpsUsingFunction. (#14256 ) Description Add bindings for Android and iOS. Motivation and Context Enable mobile app linking against ort-extensions library and registering the custom ops with ORT.	2023-01-13 09:04:26 -08:00
Xavier Dupré	a909cc0e1b	Improves parallelization by trees for TreeEnsemble (#13835 ) ### Description If the number of trees is >= 100 and batch size >= 2000, the parallelization by tree becomes slower than the parallelization by rows. However, by applying the parallelization by trees over smaller chunks of data, it is still better than the parallelization by rows. The following script was used to measure the performance [plot_gexternal_lightgbm_reg_per.zip](https://github.com/microsoft/onnxruntime/files/10149092/plot_gexternal_lightgbm_reg_per.zip) with different thresholds. The graph were produced by the script following the graph. * //N means parallelization by rows * //T means parallelization by trees * //T-128 means parallelization by trees every batch of 128 rows. * //T-1024 means parallelization by trees every batch of 1024 rows. The following graphs shows that the parallelization by trees is better than the parallelization by rows on small batches only. It is also better to split the input tensor by chunks of 128 rows and parallelize by trees on every chunk of 128 rows. The proposed changes implements that optimization. It applies the same idea even when there is only one thread. It also makes sure one thread is used when the user only wants one. ![image](https://user-images.githubusercontent.com/22452781/205505093-6d04c684-80a3-40b4-b2a5-ca1bcee5f7d2.png) ```python import pandas import matplotlib.pyplot as plt filenames = [ ("//N",r"plot_gexternal_lightgbm_reg_per_N.csv"), ("//T", "plot_gexternal_lightgbm_reg_per_T.csv"), ("//T-128", "plot_gexternal_lightgbm_reg_per_128.csv"), ("//T-1024", "plot_gexternal_lightgbm_reg_per_1024.csv"), ] dfs = [] for name, filename in filenames: df = pandas.read_csv(filename) for c in df.columns: if "batch" in c: df[f"-{name}-{c}"] = df[c] dfs.append(df) df = dfs[0][["N"]].copy() for _df in dfs: for c in _df.columns: if c[0] == "-": df[c] = _df[c].copy() fig, ax = plt.subplots(1, 3, figsize=(14, 6)) Ts = [50, 500, 2000] ga = df.set_index("N") for i, nt in enumerate(Ts): cs = [c for c in ga.columns if c.endswith(f"-{nt}")] ga[cs].plot(ax=ax[i], title=f"Trees={nt}", logy=True, logx=True) ``` Below the performance gain for the monothread implementation by looping on data in the inner loop. ![image](https://user-images.githubusercontent.com/22452781/207379886-10540b53-d66f-4103-937a-15074154c166.png) ### Motivation and Context Performance. Signed-off-by: xadupre <xadupre@microsoft.com>	2023-01-13 10:03:10 +01:00
PeixuanZuo	d3a09cf77f	[ROCm] use pytest-xdist for fast pytest (#14261 ) ### Description Use pytest-xdist to distribute tests across multiple CPUs to speed up test execution. Use pytest-rerunfailures to rerun failed test in case of pytest-xdist crash. `pytest -n 16` can reduce pytest time from 80 minutes to 20 minutes. ### Motivation and Context Now kernel explorer pytest of ROCm CI takes nearly 1 hour 20 minutes. It will take longer time when we add more tunableOp in the future.	2023-01-13 16:57:50 +08:00
Scott McKay	9bd9206928	Attempt to fix flaky Windows GPU CI Pipeline 'cuda' stage. (#14281 ) ### Description <!-- Describe your changes. --> Change tolerance for tests involving MNIST and cuda to try and fix flaky CI tests. Errors from CI: ModelTests/ModelTest.Run/cuda__models_zoo_opset8_MNIST_model expected 4.0755 (40826a83), got 4.06948 (40823938), diff: 0.00601721, tol=0.0050755 idx=4. 2 of 10 differ ModelTests/ModelTest.Run/cuda__models_zoo_opset7_MNIST_model expected 7.89851 (40fcc09e), got 7.88879 (40fc70f8), diff: 0.00972271, tol=0.00889851 idx=4. 4 of 10 differ ModelTests/ModelTest.Run/cuda__models_zoo_opset12_MNIST12_mnist12 expected -5.50068 (c0b00595), got -5.49023 (c0afaff0), diff: 0.0104547, tol=0.00650068 idx=1. 1 of 10 differ Use rtol of 1e-2 if cuda is enabled. Use same for openvino for simplicity. ``` >>> expected = np.array([4.0755, 7.89851, -5.50068], dtype=np.float32) >>> actual = np.array([4.06948, 7.88879, -5.49023], dtype=np.float32) >>> np.isclose(expected, actual, rtol=1e-2, atol=1e-3) array([ True, True, True]) ``` Whitespace changes are from clang-format. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> CI fails semi-frequently causing unnecessary re-runs.	2023-01-13 18:09:49 +10:00
Yufeng Li	16e39807e0	presence_mask should be sampling only (#14275 )	2023-01-12 22:09:17 -08:00
Ashwini Khade	cc7799835e	Enable a single build with optimized inference and on device training (#14241 ) ### Description Right now prepacking code is not compiled when training is enabled. Our partners want a single build of ort which can do both optimized inference + training on device. This PR enables prepacking code in a training build and controls whether it is enabled or not using already existing session option - kOrtSessionOptionsConfigDisablePrepacking For Inference scenarios - prepacking will be turned on by default and this behavior remains the same after this PR too. For training scenarios - prepacking will be disabled by default and if user explicitly enables it then an error will be thrown. ### Motivation and Context Enable both optimized inference as well as on device training in a single build. For on device training use flag --enable_training_apis.	2023-01-12 21:36:43 -08:00
Vincent Wang	fb3c1221e4	Fix Prefast Warning (#14250 ) Fix two prefast:Warning related to constexpr.	2023-01-13 10:16:35 +08:00
Scott McKay	ea12b674c0	Disable the failing opset 18 model tests that are breaking the packaging pipeline (#14259 ) ### Description <!-- Describe your changes. --> Skip tests for opset18 models that we haven't implemented kernels for yet. Slice was checked in today so those failures should go away. Resize: #13890 (all resize failures are fixed by this PR as confirmed in output [here](https://dev.azure.com/aiinfra/530acbc4-21bc-487d-8cd8-348ff451d2ff/_apis/build/builds/264725/logs/729)) Col2Im: #12311 ScatterND and ScatterElement: #14224 Pad (should also fix CenterCropPad failures): #14219 Bitwise ops: #14197 Optional: Unknown if we're intending to support this in 1.14 Not sure about SoftPlus as that is failing due to `Could not find an implementation for Exp(1)`. ORT supports Exp from opset 6 and on, and it seems incorrect for the test model created for opset 18 to be using a version of Exp that is so old. Would have expected it to use the latest - Exp(13). @liqunfu is this something that requires a fix to the ONNX model? ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix pipeline	2023-01-13 09:55:52 +10:00
Ye Wang	c9a53c9255	Some changes to Sampling Op (#14218 ) ### Description <!-- Describe your changes. --> 1. add an optional input to pass in seed 2. two UTs. one for top_p=0.5, another for top_p=0.01(create greedy search result, in convert_generation.py) 3. fix a bug in cpu kernel ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-01-12 14:15:26 -08:00
Hariharan Seshadri	3898b22a1a	Fix some prefast warnings (#14247 )	2023-01-12 11:15:23 -08:00
Numfor Tiapo	dee36f8ade	DML EP Register ScatterND-16 (#14240 ) This PR registers ScatterND-16 to the DML EP - CPU fallback is added if the reduction attribute is in use, as this is not yet supported by DML. Co-authored-by: Numfor Mbiziwo-Tiapo <numform@microsoft.com>	2023-01-12 10:39:25 -08:00
Yufeng Li	8f7eb75c3e	fix greedysearch token out of range bug (#14242 ) Bug: the last sentence generates token out of vocabulary size. Cause: total element should be computed with padded vocabulary size.	2023-01-12 09:06:05 -08:00

1 2 3 4 5 ...

8023 commits