* Implement QLinearRelu and its unit test.
* Add logic to compute table during constructor when all parameters is constant.
* Fix test case rounding result related with rounding mode.
* Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
Update coding standards doc.
* Update BUILD doc for ARM64 build for TensorRT support on Jetson device
* minor revision
* JetPack 4.4 is in developer preview stage, so we suggest to use JetPack
4.3
Detect os and arch and move the artifacts to a new folder.
Remove unnecesary jars so we cam focus on those we publish.
Add signing
Make signature simlper.
Fix indent.
Halt on 32-bit arch.
Credits: @Craigacp
Add benchmark script for Transformer models
* Set intra_op_num_threads=1 for cpu (version <= 1.2.0)
* Add percentiles for latency
* torch.set_num_threads (for intra op) to get fair comparison
* Allow export ONNX model with specified number of inputs
* Add fusion statistics
* Install transformers from source
* Outputs from model execution should always be returned in a newly allocated buffer or an pre-allocated buffer provided in fetches. When an initializer is providing a graph output (e.g. constant folding may result in this) we were returning an OrtValue that pointed to the initializer and not a separately allocated buffer with a copy.
This was wrong as:
- value wasn't returned in a pre-allocated fetch so whilst the value returned was correct, it was returned in the wrong place
- user could alter the data in the initializer via the returned value
* Add unit test with and without pre-allocated fetch.
* Add some extra info around why we're handling this special case.
* Merged PR 4616739: Update QLinear Ops fix 1D support layout
Update QLinear Ops fix 1D support layout
Related work items: #26011523
* Merged PR 4617257: Gather operator DML EP fails with scalar indices and 1D inputs
Fix gather with scalar value.
The ONNX conformance test case is in another PR:
// 0D, axis 1, rank 0 indices tensor
{
"op_type": "Gather",
"axis": 0,
"data": [1,2,3],
"indices": 0,
"output": 1,
"T": "float32"
}
* Merged PR 4632178: Re-enable ORT onnx_test_runner test case (DirectML ConvTranspose validation needs to be loosened to comply with ONNX definition of output_padding)
Re-enable 1D convolution tests.
Related work items: #23499747
* Merged PR 4656672: Make DML EP use Direct queue
While a Compute queue has benefits, Direct is consistent with Winml.
Related work items: #26324112
* Update DML nuget version
* Merged PR 4662079: Update DmlDev branch again from github master
Include Sheil's changes to fix namespace and header file include paths. Without this, the ONNX conformance tests all fail with E_NOTIMPL.
* Increment DML nuget version
Co-authored-by: Nick Feeney <nickfe@microsoft.com>
Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
* Added aarch64 build pipeline
* Fix build error
* Remove auditwheel repair which doesn't work with cross compiling
* Statically link C++
* Added auditwheel repair back and fix stdlib.h
* Remove extra space
* Stop proceeding with constant folding if a CPU kernel is not found for a node
* Fix build
* PR feedback
* Fix typo
* Refine
* Remove unnecessary header inclusion
* Refine
* Fix build
* More changes
* More changes
* More changes
* Fix CentOS build
* Add signed nuget package to publish ort-nightly nuget feed
* Push managed nuget as well
* Indentation fix
* Indentation fix
* Update gpu.yml to also publish directml nuget
* Fix typo in naming of task