After applying all the graph transformations the metadata and signature could have changes
(e.g.: new outputs got added, or the outputs/inputs got renamed). Therefore the local
copies of metadata and signature, that InferenceSession administrated for faster lookup, has to be updated.
For this the `SaveModelMetadata`, that now has to be idempotent, should be called after resolving the transformed graph
Make GatherElements kernel process 16 items each.
unroll the constant loop. Quit loops early for zero dividend.
Optimize Binary CompareFunction and remove Impl_Cast invocation.
This PR also includes:
* More LossScaler tests
* Minor LossScaler improvement
* Check model after extra post processing
* Improve basic training tests to include all optimizers
* Set rtol=1e-7 tolerance for Legacy vs Experimental frontend API tests
* Increase number of training tests for Legacy vs Experimental tests
* Minor refactoring on existing tests
* Fix Checkpoint API for Gradient Accumulation / fp16 scenarios
* Rename DeviceAllocatorRegistrationInfo to a more generic name; Remove OrtMemType; Simplify CreateAllocator interface.
* - fix builds
- fixed mixed aggregation + constructor calls (which were coded before this PR)
- changed default value of max_mem in API header
- added some validation of values for for arena_extend_strategy
* fix tensorrt and cuda tests
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* cancel night build on pyop
* setup ci pipeline for build of reduced ops
* add back c# test
* remove debugging print
* add testing model
* add more arg in pipeline script
* disable pipeline trigger temporarily
* fix yaml format
* fix yaml format
* fix pipeline error
* rid c# test
* add ops for test cases
* add Conv from domain com.microsoft.nchwc
* remove --reduce_ops
* fix typo
* remove --build_java
* add test case for excluded op
* update doc with --skip_test
* formatting code, renaming files and simplify yaml
* remove debug build from yaml
* remove surplus ops from included_ops.txt
* add MinSizeRel build to yaml
* rename test cases and models
* exclude ir test from minimum build
* restrict ir test to be only applied to reduced ops build
https://github.com/microsoft/onnxruntime/pull/4639 changed the default
behavior by removing optimizer state from state_dict/checkpoint APIs.
The reason for the previous change was to allow models trained on ORT to
be used for inference on PyTorch, which is an important feature.
Due to the change aforementioned, when resuming training from a checkpoint,
the optimizer would start with random weights, leading to a bad performance.
This behavior would also cause reproducibility issues, as the optimizer
wouldnt be able to resume from its previous state.
This PR adds a boolean flag to state_dict/save_xheckpoint API that
when True (default) it saves both model and optimizer state.
When False, only the model state is kept.
* enable rejecting models based on onnx opset
* enable unreleased opsets in linux and mac CI
* test fixes and more updates
* enable unreleased opsets in CI builds
* enable released opsets in linux cis
* try fix windows ci yml
* yml fixes
* update yml
* yml updates post master merge
* review comments
* bug fix
Disabling this test until it's intermittent failure is root caused, this is a function and does not have a dedicated op by itself. However, this op is not used in known model to the best of my knowledge to disabling this test for the sanity of CI until the investigation is over is probably reasonable.
* make tensorizer events measures
* throttle the events and add a new one SoftwareBitmapToGPUTensorTelemetryEvent
* factor out timing code into a class
* typo
* typo
* move eventimer class into its own header file
* add throttling to detensorization and remove variable timing
* make detensorization events measures as well
* add ConvertGPUTensorToSoftwareBitmapTelemetryEvent event
* de-duplicate event names
* fix comment
* PR feedback