* Update Vitis-AI EP support multiple DPU targets & specifically arm64 dpuczdx8g target
* Fix Vitis AI docker and default PyXIR versions
Co-authored-by: Jorn Tuyls <jornt@xilinx.com>
Co-authored-by: Jorn Tuyls <jornt.tuyls@gmail.com>
* [js/web] support string tensor for wasm backend
* disable v9/test_cast_STRING_to_FLOAT: test data is wrong
* add non-string check
* Update session-handler.ts
* Update session-handler.ts
1. Update manylinux build scripts. This will add [PEP600](https://www.python.org/dev/peps/pep-0600/)(manylinux2 tags) support. numpy has adopted this new feature, we should do the same. The old build script files were copied from https://github.com/pypa/manylinux, but they has been deleted and replaced in the upstream repo. The manylinux repo doesn't have a manylinux2014 branch anymore. So I'm removing the obsolete code, sync the files with the latest master.
2. Update GPU CUDA version from 11.0 to 11.1(after a discussion with PMs).
3. Delete tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda10_2. (Merged the content to tools/ci_build/github/linux/docker/Dockerfile.manylinux2014_cuda11)
4. Modernize the cmake code of how to locate python devel files. It was suggested in https://github.com/onnx/onnx/pull/1631 .
5. Remove `onnxruntime_MSVC_STATIC_RUNTIME` and `onnxruntime_GCC_STATIC_CPP_RUNTIME` build options. Now cmake has builtin support for it. Starting from cmake 3.15, we can use `CMAKE_MSVC_RUNTIME_LIBRARY` cmake variable to choose which MSVC runtime library we want to use.
6. Update Ubuntu docker images that used in our CI build from Ubuntu 18.04 to Ubuntu 20.04.
7. Update GCC version in CUDA 11.1 pipelines from 8.x to 9.3.1
8. Split Linux GPU CI pipeline to two jobs: build the code on a CPU machine then run the tests on another GPU machines. In the past we didn't test our python packages. We only tested the pre-packed files. So we didn't catch the rpath issue in CI build.
9. Add a CentOS machine pool and test our Linux GPU build on real CentOS machines.
10. Rework ARM64 Linux GPU python packaging pipeline. Previously it uses cross-compiling therefore we must static link to C Runtime. But now have pluggable EP API and it doesn't support static link. So I changed to use qemu emulation instead. Now the build is 10x slower than before. But it is more extensible.
* Add podspec template for ios package
* minor formatting update
* Add spec.source_files for header files
* Update spec.public_header_files to spec.source_files
* minor update
* Update the operator documentation generation
- Make layout a little nicer
- Update to latest supported operators including training
- Fix some links that are broken when the docs content is copied to github-pages
- Fix incorrect usage of 'onnx.ai.ml' as the default domain
- ML ops are now separated from the real default domain of 'onnx.ai'
- Include CPU, CUDA and training kernels
- exclude DNNL as it's not an EP we own
* There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library.
* Enable validation of both contrib ops and operator kernels in build
Filter generation so it's deterministic
Add ability for CI to publish the md files as build artifacts if they differ so a developer can download and add to their PR to resolve any diffs.
Remove workarounds for github-pages as that will now link to the github docs which display correctly
* Test Pytorch DDP with ORTModule
* Remove unused MP model
* Update orttraining/orttraining/test/python/orttraining_test_ort_module_pytorch_ddp.py
* Update orttraining/orttraining/test/python/orttraining_test_ort_module_pytorch_ddp.py
* Change file name
* Fix import
* Skip a test
* Address a comment
* Add test back
* There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library.
* Refine check
* Encapsulate children modules inside a ModuleAccessor object to prevent erroneuos iteration over children while loading the state dictionary
* Add named_models, models, apply methods, change ModuleAccessor to ModuleMetadata and modify unit tests
* Change ModuleMetadata module getter logic, raise NotImplementedError for add_modules
* Add comment explaining why overriding _load_from_state_dict method is needed
Description:
Change requantize interface so it can be processed block by block. This enable as to make requantize to be a post processor of QGEMM.
Motivation and Context
Previous changes show we improve performance by parallelize batch gemm. Unfortunately we could not parallelize the batch gemm in quantize_linear_matmul due to the requantize operation at the end of each gemm. By changing requantize to be a qgemm post processor, we now can parallelize the batch operation.
Co-authored-by: Chen Fu <fuchen@microsoft.com>
* fixed bugs in packed mode and enable pack mode tests in ci
* removed unnecessary space
* pr comments
* pr comments
* disable an average pool test
* try disabling another avg pool
* disable more avg pool tests
* disable maxpool tests
* Fix up constness in pybindings
Fix up return argument treatments.
Specifically, for all functions that return pointers or references
to the members of other pybind registered classes, we want not to copy
them, but internally bump up a reference to the hosting class so they do not
disappear before the reference to the returned members is re-claimed.
This policy is applied by default to def_property and def_readwrite but not to def_readonly
and other def methods.
See https://pybind11-jagerman.readthedocs.io/en/stable/advanced.html#return-value-policieshttps://pybind11.readthedocs.io/en/stable/advanced/functions.html#return-value-policies
Move OrtValue binding to a separate file
Move IOBinding into separate file.