* Fix places where MinSizeRel wasn't having relevant flags added in the same way as Release and RelWithDebInfo
Enable LTO for minimal build. Cleanups onnx_minimal.cmake to remove some things handled when LTO is enabled in CMakeLists.txt
* Only enable LTO for MSVC in a minimal build
* Nuget store packaging
* Move DNNL workaround to EP
* Fix warning as error
* Disable store tests
* Skip store tests
* msbuild target
* Cross compile protoc in Store
* Disable DML in store
* Move store builds to CPU queue
* Copy uap10 to final nuget
* Fix pip8 error
* Remove extra dml copies
* Fix argparse
* pep8
* Forward IsStoreBuild
* Apply is_store_build to duplicate generate_nuspec
* runtimes
* Refactor uap10
* Store .NET
* uap
* PR feedback
* cancel night build on pyop
* setup win cuda11 pipeline
* add debug build
* test base gpu settings
* setup pipelines to test cuda 10.2 and 11
* rename linux docker images
* rename docker image tag and add clean up job
* fix typo in cuda 11 config
* set cuda11 env
* update linux cuda 11 pipeline
* reset docker image name
* disable uninitialized warning from linux build
* change the way to silence uninitialized warning
* add flags to linux gpu pipeline
* switch docker image for linux cuda 10.2
* switch linuc cuda 10.2 image
* test cuda11 with devtool8
* try latest built images
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
* Prototype NCCL P2P
* Clean code
* Fix NCCL path and some minor bugs
* Add path
* Fix path
* Try fix path
* Add missed files
* Address some comments
* Clean code
* Rename files
* Add MPI path back and fix a path
* Put MPI path under USE_NCCL flag
* not to build Send and Recv when MPI is not installed
* add runtime session id to (de)tensorization events
* append start or stop to the event names and remove opcodes
* add appsessionguid to telemetry events
* initial test version
* update yml
* minor updates
* minor updates
* Test minimal build
* update with include ops for minimal build ut only
* error case to see build failure
* test no_exceptio
* Remove error cases
* address pr comments
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
* Remove serialization of outer scope node arg info in ORT format model. We don't currently need it in a minimal build as only SessionState calls Graph::IsConstantInitializer and it doesn't search outer scope. If we do need it in the future the information can be calculated at runtime (small binary size cost to do so).
Motivation: ORT format model was 32% bigger for a BERT model with multiple levels of subgraph and a lot of nodes due to this. Size is about 5% larger of the original ONNX model with the change. ORT format has type/shape info for all nodes, and this model has 2000 nodes so this seems reasonable.
Added example code to dump ORT format model to json.
Fixed misc bug in python test script around handling float and non-float expected output.
* match new/old api numbers
* new golden numbers for Roberta and MC
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* opset13 cuda kernels for BERT.
* add opset13 SoftmaxCrossEntropyLoss.
* opset13 size.
* fix argmax/min for ut.
* fix ut failure for argmax/min.
* OrtMemTypeCPUInput
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
* Add minimal build option to build.py
Group some of the build settings so binary size reduction options are all together
Make some cmake variable naming more consistent
Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used.
Add initial doco and ONNX to ORT model conversion script
Misc cleanups of minimal build breaks.
* Add SetLanguageProjection C Api and use it in four projections
* static cast enum languageprojection to uint32_t
* resolve comments
* fix typo and line added unintentionally
* revert unecessary change
* reorder c# api
* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis