Fix longformer parity and perf regression (#6760) …
Adding fp16 support for Einsum Cuda kernel (#6775) …
Update DirectML 1.4.1 to 1.4.2 for ORT 1.7 (#6780) …
Fix regression in constant folding optimizer (#6795)
Update transformers benchmark for transformers 4.3.* and ORT 1.7 (#6796) …
Make keepdims to its default value when adding ReduceMin/ReduceMax fo (#6788)… …
fix issues caused by quantize/calibrate changes (#6802)
6735 and 6728 already in release branch
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com>
Co-authored-by: Ori Levari <orlevari@microsoft.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: stevenlix <38092805+stevenlix@users.noreply.github.com>
ONNX Runtime 1.7 will the last release that will publish MCR
container images for ONNX Runtime with OpenVINO EP. From ONNX
Runtime 1.8 onwards, this will be discontinued. Users are advised
to switch to using PyPi packages or build their own containers
using dockerfiles.
* Fixes OpenVINO-EP windows build
Openvino EP build is broken on windows. The issue
is wchar_t is UTF-16 on windows while on other platforms
such as Linux and MacOS, wchar_t is UTF-32.
so wide Unicode string has to be converted to an UTF8 string
for sure on windows.
This commit fixes this issue.
* Add support for custom ops library to the ORT model conversion script
Simplify model conversion now that we read ops from the ORT format model.
Enable custom ops in the python bindings if custom ops are turned on in a minimal build.
* Add test of model conversion involving custom ops.
* Integrate memory improvements from NVidia
* compute max_global_num before buffer allocation
* update conversion script to support transformers 4.0
* update benchmark script for creating dummy inputs for different batch_size
* Use a wrapper of cuda event to avoid memory leak
* rename pipelines
* resync and rename
* resync master
* rename package id
* remove OrtPackageId which is for nuget
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Adding changes to enable ov_config_options
Enabling a flag to pass OpenVINO Runtime options
as an string argument using a command line.
* Enabling OpenVINO Runtime options for perftest
Enables OpenVINO EP runtime options into onnxruntime_perf_test.
Now these options can be passed as an argument to the perf test CPP
application using key-value pairs seperated by a space via a
command line.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* minor changes added
* Corrected Indentation
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* corrected Indendation issues
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Making config options generic to all EP's
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
1. For previous openmp build, remove --use_openmp, so thread pool will become default;
2. For previous non-openmp build, add --use_openmp and rename the package to indicate the inclusion.
3. Add a mac build with openmp enabled.
* Remove support from custom ops from the base minimal build as they contribute too much binary growth to an Android build.
Add ability to explicitly enable custom op support in a minimal build.
Change one minimal build CI to test adding custom op support (unit tests are run in that build to validate)
* model building
* fix build
* winml adapter model building api
* model building
* make build
* make build again
* add model building with audio op
* inplace and inorder fft
* add ifft
* works!
* cleanup
* add comments
* switch to iterative rather than recursive and use parallelization
* batched parallelization
* fft->dft
* cleanup
* window functions
* add melweightmatrix op
* updates to make spectrogram test work
* push latest
* add onesided
* cleanup
* Clean up building apis and fix mel
* cleanup
* cleanup
* naive stft
* fix test output
* middle c complete
* 3 tones
* cleanup
* signal def new line
* Add save functionality
* Perf improvements, 10x improvement
* cleanup
* use bitreverse lookup table for performance
* implement constant initializers for tensors
* small changes
* add matmul tests
* merge issues
* support add attribute
* add tests for double data type windowfunctions and minor cleanup
* stft onesided/and not tests
* cleanup
* cleanup
* clean up
* cleanup
* remove threading attribute
* forward declare orttypeinfo
* warnings
* fwd declare
* fix warnings
* 1 more warning
* remove saving to e drive...
* cleanup and fix stft test
* add opset picker
* small additions
* add onnxruntime tests
* add signed/unsigned
* fix warning
* fix warning
* finish onnxruntime tests
* make windows namespace build succeed
* add experimental flag
* add experimental api into nuget package
* add experimental api build flag and add to windows ai nuget package
* turn experimental for tests
* add minimum opset version to new experimental domain
* api cleanup
* disable ms experimental ops test when --ms_experimental is not enabled
* add macro behind flag
* remove unused x
* pr feedback
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Partial updating of ROCM reduction code.
* Update reduction_all.cu
* Add reduce template parameters.
* miopen common
* Reuse CUDA's reduction_functions.cc
* Reduction ops.
* Update remaining reduction ops to use MIOpen. double datatype is not supported, so disable those typed kernels.
* Disable a couple more unsupported tests.
* Code formatting.
* Delete ROCM-specific reduction code that is identical to CUDA reduction code.
* Fix scratch buffer early free.
* Fix merge conflict.
* first attempt nightly amd ci pipeline
* try fix bad yaml file
* try again with corrected model directory
* add convergence test as well
* update reference loss for amd mi100
* include mi100 test results csv
* update the mi100 convergence test reference values
* update batch sizes for mi100 32g
* fix gpu sku for run_convergence_test.py
* undo unrelated changes to master
* pr comments
* pr comment
Co-authored-by: Jesse Benson <jesseb@microsoft.com>