* investigate duplication of telemetry in winml and ort
* remove winml telemetry events
* telemetry executionProviderEvent
* remove unneccessary file and refactor code little bit
* Revert back TelemetryEvent, which send up ETW event.
* thrown and handle onnxruntime exceptions
* handle exception thrown from ort in winmladapter
* undo changes in error.h
* add message to HRESULT
* add status error message
* commetns for dml graph transformer
fixed ort value passing using the allocatir info
* fixed and coded maps and sequences across the abi
* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml
* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.
* moved files from inc to lib\api.core
cleaned up some of the cmake
* staged changes
* making windowsAI azure dev ops work.
* code review comments.
* revert changes
* Do not shutdown protobuf after ort environment gets destroyed. Lazy load lotus environment first time it is needed
* comment typo
* pr comment about calling phoenix singleton
* Make lotus_environment static in winmladapter
* commetns for dml graph transformer
fixed ort value passing using the allocatir info
* fixed and coded maps and sequences across the abi
* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml
* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.
* moved files from inc to lib\api.core
cleaned up some of the cmake
* staged changes
* add centos tests to linux cpu ci pipeline
* Disable failing test
* use centos6 instead of centos7
* change back to centos7
* add dotnet runtime dependency
* fix dotnet runtime dependencies
* install dotnet sdk instead of runtimes
* add more dotnet dependencies
* temporary skip failing test
* ix lib path
* reenable failing test
* turn devmode back on for winml builds
* fix some warnings. include protobuf in a way that disables some warnings
* undo protobufhelpers changes and just ignore 4100 errors in pb code
* attempt to isolate protobufhelpers errors
* add template specialization for getting tensor proto data
Add support of GPT2 model optimization:
* Match subgraph of Gelu Approximation (using Tanh).
* Fuse LayerNormalization if SkipLayerNormalization is not ready.
* Output model even if embedding layer is not fused.
* Improve Reshape Fusion to improve coverage.
* Refine constant input checking, and output fused op counter.
Update script according to latest op improvements:
* Fusion of Add Bias and Gelu.
* Fuse SkipLayerNormalization and Add Bias.
Other:
* Add ReduceSum for mask as intermediate step.
* Refactor verbose setting.
* Constant folding bug fix/improvements
- Handle constant folding for node that is assigned to a non cpu EP
- Check for errors in optimizer execution frame setup
- Improve CUDA partitioning to look for initializers in parent graphs
- Add unit test
Fixes#2474
* [NupharEP] Add parallel schedule to JIT function name
Update Nuphar docker to use Python 3.6 and ubuntu 18.04
* Update notebook
* Avoid JIT cache file name conflict
* [NupharEP] Enable parallel schedule
* Update TVM with the fix to TVM threadpool to use OpenMP if possible
* Add parallel schedule when trying to vectorize
With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms
* Address CR, docs and cmake update
* Doc fix
* Fix mkl
* Fix TVM windows build when using mklml
* commetns for dml graph transformer
fixed ort value passing using the allocatir info
* fixed and coded maps and sequences across the abi
* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml
* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.