* Mention OrtCreateSessionFromArray in C API doc
* Fix perf test executable due to removal of certain C APIs
* fix linux build
* Avoid duplication
* Update coding guidelines to prefer using make_unique for heap allocations (unless where not possible).
* Implement Nuphar execution provider
Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.
* Fix submodules
* Fix TVM submodule
* Update Nuphar to latest and resolve confliction
* Remove stale files caused by merge -X theirs
* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test
* Fix bad merge
* Merge from Nuphar
* Fix warning treated as error, revert some unnecessary changes
* Revert some more test changes
* Some more test revert or comments to make review easier
New tests could be added later
* One more revert of unnecessary changes
* More change revert. Test could be added back later.
Fix an issue that CUDA EP fallback too much nodes to CPU for some case which cause huge data copy.
https://github.com/microsoft/onnxruntime/issues/1675
Currently, if the node's inputs are all as initialier, CUDA EP will fallback it to CPU. And it will also fallback some nodes under it. It could cause some huge data copy. for the case reported by a user, it has several Slices with input from initializer, and a Concat op to concat the output from Slice output. The data is huge 16MB after concat, which make the data copy from CPU to GPU quite costly because it's a sync copy.
Fix
If the node's inputs are all initializer, we shouldn't fallback the node to CPU.
* Support bilinear mode with actual 2D inputs in Resize and upsample
* Fix build break
* Fix build break
* Add test
* CUDA changes
* Resolve PR comments
* Resolve comments
* Use exec form of ENTRYPOINT for docker server
# Issue
The entrypoint currently uses the shell form - this prevents users from passing in any cmdline arguments... also passing a model_path in means the server only works in the envvar is set... however this is not what the error message says!
```
$ docker run -v /home/rakelkar/try/onnxzoo/style:/mnt/models -it mcr.microsoft.com/onnxruntime/server --model_path /mnt/models/model.onnx
Version: local_build
Commit ID: default
model_path must be the location of a valid file
Allowed options:
-h [ --help ] Shows a help message and exits
--log_level arg (=info) Logging level. Allowed options (case sensitive):
verbose, info, warning, error, fatal
--model_path arg Path to ONNX model
--address arg (=0.0.0.0) The base HTTP address
--http_port arg (=8001) HTTP port to listen to requests
--num_http_threads arg (=4) Number of http threads
--grpc_port arg (=50051) GRPC port to listen to requests
```
# Fix
1. remove the env var
2. use the exec form
* Update readme to use model_path arg
Description: The change adds necessary quantization support on CPU with mixed int8/uint8, as well as int16 for matrix multiply operations that outputs int32
Motivation and Context
Integer operations are critical for quantized model's performance
Current MatMulInteger implementation in CPU only supports uint8 x uint8, while the spec supports int8 x uint8. Having a default CPU implementation that fully support the spec would help accuracy verification.
Besides, some model may need to quantize to int16, but MatMulInteger op does not support that yet. A custom op of MatMulInteger16 is added to satisfy such models.
as long as these providers use the same allocator device
Description: Currently ORT throws error when one input is used in different EPs. The change removes that restriction
Motivation and Context
It is now possible to share inputs across EPs now that allocation are device-based, instead of EP based.
* Updates
* Remove preview texts
* Update README.md
* Updates
* Update README.md
* Update README.md
* Minor wording update
* Update README.md
* Update doc on CUDA version
* revert update
* Update readme for issue #1558
* Clean up example section
* Cosmetic updates
- Add a index of build instructions for browsability
- Update build CUDA version from 9.1 to 10
* Fix broken link
* Update README to reflect upgrade to pip requirement
* Update CuDNN version for Linux Python packages
* Clean up content
Updated ordering and add table of contents
* Minor format fixes
* Move Android NNAPI under EP section
* Add link to operator support documentation
* Fix typo
* typo fix
* remove todo section
- Fix the Windows end-to-end test in NuGet CI
- Skip the TestModelSerialization, because it is failing on Linux. Must be fixed before API is released for use. Owner is notified.
Description: make default CPU allocator to use MLAS preferred alignment
Motivation and Context
This is needed for C API to have an aligned default CPU allocator, the same as the one in CPU provider
* Mention OrtCreateSessionFromArray in C API doc
* Don't create the default allocator every single time. Rename API accordingly.
* Don't create the default allocator every single time. Rename API accordingly.
* updates...
* updates...
* PR comments
* fix typo in license header
* fix build
- Support bool-Tensor and int8-Tensor in input-output of C# api
- Support string-tensor as input in C# api
- Fix a bug in InferenceSession.Run() -- RunOptions was not passed into the native call
* Added check for unnecessary function initializations, and removed lock from unneeded areas of code.
* Added LRU cache to EP.
* Bugfixes for nGraph EP Optimization PR
* Changed default cache size to 500 and refactored mutex readability.
* Fixed unsafe environmental variable fetch for Windows.
* Cleaned up Windows environment functions and cleaned up mutexes.
Fix issue that cudnnRNNForwardInferenceEx doesn't support 0 sequence in the bathes
Solution:
Reset the 0 sequence to 1 for the bathes before call the cudnnRNNForwardInferenceEx, has a array to track the batch id which has 0 sequence. Once get the result, call a CUDA kernel to mask on the output using the batch id tracked in the array.
* Mention OrtCreateSessionFromArray in C API doc
* Update perf tool documentation to reflect the new graph optimization enums. Relax constraint for enable_all.
* Update one more doc
* Update onnx test runner documentation
* Add default in the docs