* optimize some lstm gate computation. Remove no need string constructions.
* change gcc optimization flags for computation bound logics in rnn_helpers
* better qgemm for M=1
* Some improve on avx512
* add condition to limit GCC related marcros
* Correct QGemm assembly for M=1 AVX2 optimization to pass mlas_test.
* Fix rnn_helper build issue for wasm.
* better asm code here according to feedbacks.
* Remove customized vectorize and unroll option for GCC.
Using restrict on some function to help GCC to correctly vectorize it.
Rewrite clip_add_bias() to let GCC correctly vectorize it.
* Better restrict semantic for merge_lstm_gates_to_memory() by adding in_place().
Add MSC __restrict for the clip_add_bias() mthod to vectorize correctly.
* Force CI restart as it stucked by the onnxruntime-python-checks-ci-pipeline which can not restart.
Bug #31652854 also repros on Qualcomm Adreno (down to the exact same pixel). This change disables this model test for Qualcomm, in addition to the existing disablement for Intel.
By default, not do enable subgraph quantization to make it consistent with existing behavior.
It should be OK to enable it at quantize_dynamic mode with extra_options.
* changes
* tile grad unsqueeze fix for opset 13
* clean up
* remove bool support for opset 2 to 12 for Pad as it is not supported.
* Copy OperatorKernels.md from artifacts of Windows CI build.
* Remove APIs unavailable in Store in #8349, #8178, #8065
* Add UWP stubs of C runtime functions
* Remove UWP incompatible tests from UWP build
* Remove incompatible tests from Store
* Use UWP stubs in store only
* Skip partition check outside of Windows
* Remove unused WRL include
* Workaround Windows header not including what it uses
* Fix precompiled header name clash
* Workaround SDK bugs
* DXCore workaround in Win7
* Fix warning
* Fix more warnings
* Bump WinML to target Windows 8
* Fix more warnings
* Remove unnecessary workarounds
* rename BertOptimizationOptions to FusionOptions
* remove disable_onnxruntime, and use opt_level to control whether onnxruntime graph optimization is used.
* Change default opt_level for backward compatible. When opt_level is not specified, default value is based on model type.
This change adds a new pipeline for checking Python code. Currently this pipeline only runs flake8.
flake8 is also run as part of the CMake project builds, but we can switch over completely to the new pipeline later.
The .flake8 config file was also updated to make it easier to run standalone (flake8 --config ./.flake8) and some Python formatting issues were addressed in files that were not previously scanned.
Re-enable tests that disabled in PR 8530
Update import of test_optimizer.py so that the test could run in source directory.
Add a parameter to disable symbolic shape inference in fp16 conversion since it throws exception for some model.
* integrate eager mode source codde; build with cmake and integrate the python test
* Adding the python path for importing libraries in the Eager mode
* fix clang break;check if training and python enabled
* handling the linking of torch libraries across multiple platforms
* merge and fix the naming
* add build instruction
Co-authored-by: Abhishek Jindal <abjindal@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: ajindal1 <abjindal@microsoft.com>
* updates for picking pnnx commit
* add tests filter to c# tests
* plus test fixes
* fix versioning for contrib ops
* fix tests
* test filter for optional ops
* more versioning related updates
* fix test
* fix layernorm spec
* more updates
* update docs
* add more test filters
* more filters
* update binary size threshold
* update docs
* plus more fixes
* updates per review
* update to release commit
* add filters for optional type tests
* plus updates