The current image cache cleanup is not removing many images. Upon examining the cache container registry logs, it appears there are some infrequent pulls of old images which may be made by something other than CI builds (perhaps some automated scan of the registry).
This change adds a minimum access count for images in the cache so that infrequently but periodically accessed images can be removed. The idea is that images used by CI builds that are worth caching will have a higher volume of accesses.
* support gpt2 and longformer in profiler tool
* rename bert_profiler to profiler
* Add --basic_optimization to allow user to use basic level of graph optimization
* Add --kernel_time_only to filter kernel time and exclude fence time
* Add --threshold to filter nodes that with low run time percentage.
* Initial running changes
* Checkpointing aggregation changes
* compare with older version
* initial cleanup
* Add zero test, minor fix
* Fix zero test, transform, formatting
* Review comments
* add more unit tests
* review comments
* Try fix CI
* Add additional check on just aggregation code
* Try fix ckpt gen
* Add pregenerated ckpt for CI, enable zero test in e2e
* Moving test to nightly, removing ckpt files
* Add tests to dist GPU CI
* Fix dist test
* Review comments
* Fix test
Update training Python packaging build to use get_docker_image.py.
Remove BUILD_EXTR_PAR docker build argument.
Update get_docker_image.py to check again for the image in the cache after building and before pushing to reduce the chance of a redundant push.
* initial implementation of longformer tools for onnx conversion and benchmark
* Support ONNX conversion for transformers 4.0
Add an option to optimize onnx model, and export fp16 model
* improves processing time by 10
* extend coverage unit test coverage
* better implementation for the multi regression case
* better comment, keep parallelization by trees when not enough trees
MatMulIntegerToFloat fusion fuses per-row and per-column MatMulInteger, which is not supported by the MatMulIntegerToFloat kernel now. Limit the fusion to per-matrix only before we supporting the per-channel fully.
* Add support for non-1d tensor for C of Gemm
* check android api level before add squeeze
* Minor update
* Fix to accept c only in format of {1,1,...,1,n}
* Add suspend handler with new telemetry event
* Fix build warning
* Use cppwinrt from nuget
* Restore nuget packages
* add dependencies
* Add nuget_helpers
* Cleaned up
* Clean up
* Comment
* Add dependencies for the rest
* Remove unused line
* Update activation string
* PR comment to remove ALL
* Expand the documentation on using compiling EPs with a minimal build to call out a 'simple' option that is easier to use. Provide more background on what happens to help users choose the best option for them.
Tweak conversion script to be noisier about attempted usage of 'all' optimization level.
Co-authored-by: manashgoswami <magoswam@microsoft.com>
* optimize a bert model converted using tf2onnx
* add test data
* update
* remove comments
* format
* Revert "format"
This reverts commit f8ae88cb564bce5caf4780e56561403f3ba3d524.
* Revert "remove comments"
This reverts commit 59d8a693581a731fd0291b70fe2c9cec6c4950fe.
* add a squeeze node to convert a 3-d mask to 2-d
* update
* update
* verify and add comments
1. Make sure to free the output_shape vector even if Output names mismatch between OpenVINO and ONNX exception is thrown
2. Piggy back this PR to remove un-needed call to fstream close method
Authored-by: modav <modav@microsoft.com>