Add benchmark script for Transformer models
* Set intra_op_num_threads=1 for cpu (version <= 1.2.0)
* Add percentiles for latency
* torch.set_num_threads (for intra op) to get fair comparison
* Allow export ONNX model with specified number of inputs
* Add fusion statistics
* Install transformers from source
* Outputs from model execution should always be returned in a newly allocated buffer or an pre-allocated buffer provided in fetches. When an initializer is providing a graph output (e.g. constant folding may result in this) we were returning an OrtValue that pointed to the initializer and not a separately allocated buffer with a copy.
This was wrong as:
- value wasn't returned in a pre-allocated fetch so whilst the value returned was correct, it was returned in the wrong place
- user could alter the data in the initializer via the returned value
* Add unit test with and without pre-allocated fetch.
* Add some extra info around why we're handling this special case.
* Merged PR 4616739: Update QLinear Ops fix 1D support layout
Update QLinear Ops fix 1D support layout
Related work items: #26011523
* Merged PR 4617257: Gather operator DML EP fails with scalar indices and 1D inputs
Fix gather with scalar value.
The ONNX conformance test case is in another PR:
// 0D, axis 1, rank 0 indices tensor
{
"op_type": "Gather",
"axis": 0,
"data": [1,2,3],
"indices": 0,
"output": 1,
"T": "float32"
}
* Merged PR 4632178: Re-enable ORT onnx_test_runner test case (DirectML ConvTranspose validation needs to be loosened to comply with ONNX definition of output_padding)
Re-enable 1D convolution tests.
Related work items: #23499747
* Merged PR 4656672: Make DML EP use Direct queue
While a Compute queue has benefits, Direct is consistent with Winml.
Related work items: #26324112
* Update DML nuget version
* Merged PR 4662079: Update DmlDev branch again from github master
Include Sheil's changes to fix namespace and header file include paths. Without this, the ONNX conformance tests all fail with E_NOTIMPL.
* Increment DML nuget version
Co-authored-by: Nick Feeney <nickfe@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
* Added aarch64 build pipeline
* Fix build error
* Remove auditwheel repair which doesn't work with cross compiling
* Statically link C++
* Added auditwheel repair back and fix stdlib.h
* Remove extra space
* Stop proceeding with constant folding if a CPU kernel is not found for a node
* Fix build
* PR feedback
* Fix typo
* Refine
* Remove unnecessary header inclusion
* Refine
* Fix build
* More changes
* More changes
* More changes
* Fix CentOS build
* Add signed nuget package to publish ort-nightly nuget feed
* Push managed nuget as well
* Indentation fix
* Indentation fix
* Update gpu.yml to also publish directml nuget
* Fix typo in naming of task
* dashboard integration - first phase
* change a field
* perf scripts
* addressing PR comments
* address comments and fix build
* minor
* make GetConfigFromData() const
* more update for comments
* addressing comments
* more on addressing comments
* minor
* fix build
* add condition check
* more on comments
* retrun status
* remove batch size
* on comments
* rename pkg path
* rename pkg path
* additional commentss
Co-authored-by: Ethan Tao <ettao@microsoft.com>
* Do not register Dropout(12) as training ONLY kernel.
* Move Dropout forward implementation in inference project.
* fix inference build test failures.
* remove fp16 test since its support is absent on CPU.
* build break.
* Fold Shape node in constant folding.
* bugfix
* Fix test failure.
* Bugfix for C++ frontend.
* Bugfix for C++ frontend.
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
* add build inbox flag
* remove raw tests and wstring for utf filenames
* enable raw tests
* use ToWideString
* create new utf8 helper
* update string helper to utf8
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Fix C# log APIs. Fixes github issue #3409.
* Fix build error due to accidental duplication of GraphOptimizationLevel
* Fix runoptions
* Fix broken test. Add --blame switch to dotnet test cmd line to print the failed test in case of crash.
* java - adding support for custom op libraries.
* Adding support for RunOptions and additional methods for SessionOptions and OrtSession.
As a result OrtEnvironment.LoggingLevel moved to be a top level enum
called OrtLoggingLevel.
* java - adding unit tests for RunOptions and SessionOptions.
* java - removing unused releaseNamesHandle method
* java - add test for custom op library.
* java - adding log verbosity methods, and tests for the same.
* java - fixes for custom op loading test on Windows.
* Cleanup after rebase on master.