Description: Reinstate #11127 with Cuda fix.
Motivation and Context
Fixes Inference on Onnx with external data not working since PR 11320 (location planning logic) #11511
* Remove hand written add_.Tensor as it can now be generated.
* Generate .out for tensor version of basic math ops. Add.out testing added too.
* Remove sin tests as they are covered by parameterized tests. Also, moved all parameterized tests to the end in their own section.
* Add binary ops tests for tensors. Scalar tests are calling the aten .out which is for tensor.
* Add support for scalar input to add, div, mul, and sub.
* Apply project formatting rules to ort_aten.cpp
Formatting applied by formatting the file in VS Code.
This file is under active development and the inconsistent formatting
was causing friction due to:
1. cpplint job on Pipeline was flagging a lot of style issues,
resulting in a lot of noisy annotations.
2. local edits would result in changes that are not part of the core change.
While there are other files in this part of the source tree with
inconsistent formatting, this file was causing the most friction. We can
come back and address the other files later, which would be a much
larger change.
* Apply consistent pattern for invoker.Invoke(...)
* cpu adamwoptimizer implementation
* unit tests for cpu kernel pass
* refine based on comments
* parallize the weights loop in PrepareForCompute.
* fix wrong test data path
* fix kernel hash
* fix rocm ci pipeline
Initialize generated tensor data in onnxruntime_perf_test to zeroes instead of leaving it uninitialized. String tensors were already being initialized.
MatMul allows multiplying batches of matrices. This change enables limited support of batch inputs in the NNAPI EP.
Some limitations:
- Broadcasting is not supported. A and B must have the same leading dimensions.
- Only float inputs are supported. QDQ MatMul or QLinearMatMul with batch inputs is not supported yet.
Note that NNAPI's ANEURALNETWORKS_BATCH_MATMUL is pretty much what we need, but it is only available from NNAPI feature level 6. This change composes a bunch of NNAPI operations to achieve a similar result but this is not ideal.
* release cached cuda memory after temp model_copy run
* op schema change only: remove PythonOp forward output from PythonOpGrad inputs.
* always export model using torch.no_grad
* 1.update PythonOP's "input_requires_grads" attribute according to ORT gradient graph.
2. remove PythonOp's "output_tensor_requires_grads" attribute because in torch.no_grad mode, the exported value is not correct.
3. [related to 2] remove PythonOPGrad's "input_tensor_requires_grads" because it comes from corresponding PythonOP's "output_tensor_requires_grads".
* fix uts
* refine basde on wschin's comments && fix pylint
* fix comments
* fix unused variable
* Sort supported types order so we get a consistently generated order of types.
* Fix promote type to include all the input types and not just the first one.
Dev containers[1] provide a self-contained development environment that
can be tailored for a project. GitHub Codespaces[2] provide a cloud
hosted environment to run these containers in. This makes it easy to
provision a consistent development environment with developer tooling
already installed and configured that provide the following benefits:
1. Developer onboarding is simplified.
1. Easy to get environment setup and running
2. Reference environment is available, if developer is having issues
with local environment
2. Developer tooling is provided and automatically configured.
1. Python / C++ build tooling
2. Python / C++ code formatters / linters
3. Easy to provision cloud hosted environment via GitHub Codespace.
4. Easy to create ephemeral development environments to test new changes
1. Can be used to provision environments to test changes
and Pull Requests
This can ease several pain points that developers on-boarding to the
project can encounter. One of the problems I have seen with developers
new to the project (I am one of these) is having the baseline
development environment (Python / C++) and recommended tools (e.g. VS
Code Python / C++ extensions, linters, and autoformatters) installed and
configured to efficiently get started in the repository. For all
developers, this makes it easy to leverage ephemeral cloud hosted
development environments via GitHub Codespaces.
**Notes:**
- Compiling the project can run into trouble if the codespace has < 32
GB of RAM
1) https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/introduction-to-dev-containers
2) https://docs.github.com/en/codespaces/overview
* consume onnx test data from github
* ensure tests
* update script and allow opset specification
* fix python format
* fix python format
* consume new filter format
* fix linting error
* use 3D grid to avoid the upper limit of grid dimension
* enrich tests
* Revert "use 3D grid to avoid the upper limit of grid dimension"
This reverts commit 2d5badf2fe8cd985f3f29ee2cb18fff13d07c2ab.
* change to a fix: switch the 1st and 2nd dim