* Update the operator documentation generation
- Make layout a little nicer
- Update to latest supported operators including training
- Fix some links that are broken when the docs content is copied to github-pages
- Fix incorrect usage of 'onnx.ai.ml' as the default domain
- ML ops are now separated from the real default domain of 'onnx.ai'
- Include CPU, CUDA and training kernels
- exclude DNNL as it's not an EP we own
* There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library.
* Enable validation of both contrib ops and operator kernels in build
Filter generation so it's deterministic
Add ability for CI to publish the md files as build artifacts if they differ so a developer can download and add to their PR to resolve any diffs.
Remove workarounds for github-pages as that will now link to the github docs which display correctly
* Test Pytorch DDP with ORTModule
* Remove unused MP model
* Update orttraining/orttraining/test/python/orttraining_test_ort_module_pytorch_ddp.py
* Update orttraining/orttraining/test/python/orttraining_test_ort_module_pytorch_ddp.py
* Change file name
* Fix import
* Skip a test
* Address a comment
* Add test back
Motivation:
As part of the OnnxConformance Backend tests, DynamicQuantizedLinear_max_adjusted_expanded is failing.
Root Cause:
- The test model has `Identity` operator as one of the node. The input of this node is of non-float data type.
- In DML, `Identity` operator is registered as operator which requires floating input.
- As per `DirectMLSchema.h`, support for non-float input has been added for `Identity` operator in DML but the same has not been reflected in the `OperatorRegistration.cpp`.
Changes:
- Removed all traces of the requiresFloatFormatsForGraph flag from it's definition and usage. This flag was only used for Identity and it's related operator.
- Added null check for the graphOutput nodeArg in GraphDescBuilder.cpp to stop the crash of the test.
Related work items: #33076298
* There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library.
* Refine check
* Encapsulate children modules inside a ModuleAccessor object to prevent erroneuos iteration over children while loading the state dictionary
* Add named_models, models, apply methods, change ModuleAccessor to ModuleMetadata and modify unit tests
* Change ModuleMetadata module getter logic, raise NotImplementedError for add_modules
* Add comment explaining why overriding _load_from_state_dict method is needed
Description:
Change requantize interface so it can be processed block by block. This enable as to make requantize to be a post processor of QGEMM.
Motivation and Context
Previous changes show we improve performance by parallelize batch gemm. Unfortunately we could not parallelize the batch gemm in quantize_linear_matmul due to the requantize operation at the end of each gemm. By changing requantize to be a qgemm post processor, we now can parallelize the batch operation.
Co-authored-by: Chen Fu <fuchen@microsoft.com>
* fixed bugs in packed mode and enable pack mode tests in ci
* removed unnecessary space
* pr comments
* pr comments
* disable an average pool test
* try disabling another avg pool
* disable more avg pool tests
* disable maxpool tests
* Fix up constness in pybindings
Fix up return argument treatments.
Specifically, for all functions that return pointers or references
to the members of other pybind registered classes, we want not to copy
them, but internally bump up a reference to the hosting class so they do not
disappear before the reference to the returned members is re-claimed.
This policy is applied by default to def_property and def_readwrite but not to def_readonly
and other def methods.
See https://pybind11-jagerman.readthedocs.io/en/stable/advanced.html#return-value-policieshttps://pybind11.readthedocs.io/en/stable/advanced/functions.html#return-value-policies
Move OrtValue binding to a separate file
Move IOBinding into separate file.
* migrated changes to support running super resolution model using ortweb
* reverted benchmarking tool related changes which will be in a separate pr
* added kernel tests to op and node tests
* minor change to the order of variables
* added one more unit test for packed matmul