* Improve CUDA kernel performance for Concat. Implement the kernel code instead of using cudaMemCpy in a loop.
* Update the index lookup part for Concat & Split
* init
* Update DNNLibrary
* Update DNNLibrary, set compiler flags, it compiles now
* Add more missing flags, add test
* Update DNNLibrary
* Update Compile method, fix allocator and some other bugs
* Update DNNLibrary
* Implement CopyTensor
* Not delete state explicitly since it is managed by unique_ptr
* Add the missing files when SingleUnitTestProjct is ON
* misc changes
* Fix wrong name in provider factory
* Add my own test
* Update the code of add node into graph, and add the missing initializer into graph
* Fix the bug that re-build the graph produces extra output
* Update DNNLibrary
* Transpose nchw (ONNX) -> nhwc (NNAPI)
* Add license
* Add GetSupportedNodes method (implement it later)
* Rename onnxruntime_nnapi_test->onnxruntime_nnapi_squeezenet_test
* Update squeezenet_test.cpp after rebase master
* Remove squeezenet_test.cpp since it is almost same with the c++ sample
* Update DNNLibrary for GetSupportedNodes
* Update GetSupportedNodes
* Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample"
This reverts commit a97575fd9ff49e50ba1dc8d8154790d8cd86c48d.
* Update DNNLibrary
* Fix multiple outputs bug
* Remove GetKernelRegistry
* Revert "Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample""
This reverts commit 2a0670e9cbf10ea654111ce39e198a4be0ddd838.
* Set default memory type of NNAPI EP
* Add CPUOutput allocator
* Update DNNLibrary for multiple outputs
* Fix bug of nhwc->nchw
* Remove GetExecutionHandle()
* Update cuda for python wheels
* Update cuda for python wheels
* Update cuda for python wheels
* Update azure-pipelines-py-packaging.yml
* Update to cuda 10
* Only test win gpu
* Update cuda for python wheels
* Use manylinux2010 image to build linux python wheels
Allow wheels built to truly be compliant with a manylinux policy
* Add CUDA expand operator
* Reset counter variables when striding
* Reset counter variables when striding
* use fast_divmod and other PR comments
* Fix merge variable rename
* Fix indentation per PR comment
* Remove maxpool_argmax
* Reduce number of type templates for Expand operator
* removed all types
* Commit updated cuda_execution_provider.cc
* Check for non-existent initializers while fusing conv and add.
* Fix other places where initializer can be null
* Add check if initializer is an input
* update the models to comply with the new ONNX spec.
In new ONNX spec, the initializers should not be in inputs.
* Fix previous temporary code
* Add negative test
* Revert changes to conv_bn_fusion and conv_mul_fusion
* making helper IsNodeArgConstant a little more general; updating remaining Conv*Fusion rules
* minor comment
* AllNodeIputsAreConstant to use new function
Implementation of the MLAS changes for NCHWc convolution/pooling support. These changes adopt the blocking format used by MKL-DNN and other convolution libraries for better performance.
Description:
The remove duplicate Cast logic was processing a node already removed, leading to multiple calls to remove the same node causing an error. Add a check so that nodes marked for removal are skipped.
Motivation and Context
If a model has 3 Cast nodes in a row the bug would cause an exception to be thrown due to multiple calls to remove the same node. This causes the latest optimized tf2onnx conversion of ssd_mobilenet to break.
* move all contrib ops to one place
* namespace changes
* bug fix - remove redundant file after merge master
* plus more minor bug fixes
* bug fix
* fix extra space in include header + namespace fix
* fix linux build failure:
* fix test group names
* remove redundant test
* Simplify linux gpu pipeline
* Refactor win-gpu-ci-pipeline.yml
* Set cuda environment variables for testing and version
* Remove variables from starter script
* minor fix
* Add GPU Nuget pipeline
* Set DisableContribOps environment variable for Linux package tests
* Add ESRP tasks
* Add ESRP signing templates
* Test out hardcode value of ERSP
* Test out hardcode value of ERSP
* Test out hardcode value of ERSP
* Test out hardcode value of ERSP
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test out variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* test variable expansion
* update cpu pipeline to conditionally esrp sign
* Set C# GPU tests to run only if env var is set
* Refactor for easy parameter passing
* refactored esrp templates
* remove variables from template
* Add packaging variables back to pipelines
* update C# for cuda 10
* Merge vars ana parameters for gpu pipeline
* remove vars from mklml pipeline
* display envvars on terminal
* Clean up C# cuda tests, and upgrade to Cuda10
* Introduce CUDNN_PATH pipeline varaible
* YAML variable are always uppercased (not true with classic)
* Update C# GPU test to be more meaningful
* remove macos from gpu tests
* remove debugging info for DisableContribOps option
* Remove DisableContrib ops parameters -- use variables only
* Fix typo from = to -
* remove debug steps
* fix typo
* remove unused variable TESTONGPU from some templates
* clean up CUDA env setup scripts
* Remove CUDNN_PATH from setup_env_cuda.bat
- Introduce Docker build ARG `ONNXRUNTIME_REPO`
to allow building Docker container based on a different git repo.
Example docker build command:
```bash
cd dockerfiles
docker build -t onnx-runtime \
--build-arg ONNXRUNTIME_REPO=https://github.com/jthelin/onnxruntime \
--build-arg ONNXRUNTIME_SERVER_BRANCH=my-branch \
-f Dockerfile.server .
```
- Add a basic `.dockerignore` file, to cut down the number of filles passed into the Docker build context.
Description: This fixes nullptr of fused func manager issue when running fused function inside sub graph session state
Motivation and Context
The bug happens in running fused functions created IExecutionProvider::Compile inside sub graph, i.e. Scan, which causes crash.
The problem is that FuncInfo is collected into main graph's session state, before sub graph session state is created.
The fix is to share FuncInfo between main graph and sub graph.
* Move quantization tool from onnx to onnxruntime
* Fix some issues
* Use u8_s8 for asymmetric mode and u8_u8 for symmetric mode irrespective of whether inputs are initializers or from previous
* Address PR comments
* Fix error message formatting
* Separate static/dynamic and quantization mode
* Attempt to provide the correct rank for an output from a Loop node when there are no iterations.
For a loop output (vs. loop carried dependency) the first dimension is the iteration count so will have a value of 0 and the output size will be zero. Use the rank of the matching subgraph output if available.
If the subgraph output rank is not available output a warning and use a rank 1 shape of {0}.