* [NupharEP] Enable parallel schedule
* Update TVM with the fix to TVM threadpool to use OpenMP if possible
* Add parallel schedule when trying to vectorize
With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms
* Address CR, docs and cmake update
* Doc fix
* Fix mkl
* Fix TVM windows build when using mklml
* Implement Nuphar execution provider
Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.
* Fix submodules
* Fix TVM submodule
* Update Nuphar to latest and resolve confliction
* Remove stale files caused by merge -X theirs
* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test
* Fix bad merge
* Merge from Nuphar
* Fix warning treated as error, revert some unnecessary changes
* Revert some more test changes
* Some more test revert or comments to make review easier
New tests could be added later
* One more revert of unnecessary changes
* More change revert. Test could be added back later.