Create N-1 threads in a thread pool when configured with intra-op parallelism of N. This ensures we have N active threads, given that the main thread also runs work. To avoid ambiguity on the value returned, rename ThreadPool::NumThreads method to ThreadPool::DegreeOfParallelism, and make corresponding updates in MLAS and operators.
* Split ComputePadAndOutputShape into ComputePad and ComputeOutputShape
* update NNAPI conv ouput shape compute to use shared ComputeOutputShapec
* move use ptr to use reference for ComputePadAndOutputShape
* Enable onnxruntime_test_all for NNAPI EP
* switch to use ninja for ANdroid CI
* make android elumator boot faster in android ci
* simplify adb push
* more style change
* more tweaking on android ci
* build.py style update
- make size_ and data_ data members private
- rename GetCapacity() to Capacity() to be consistent (e.g., with Size())
- add static_assert for trivially copyable T because it is copied with memcpy
Extracting some common code related to "AutoPadType" from the cpu execution provider into "common.h".
Motivation and Context
* Sharing code with authors of other execution providers that need the same functionality.
* I didn't modify the code in shared_library or dnnl EP to avoid changing their dependency structure, so there is still a redundant copy of the AutoPadType code in there.
Handle model with multiple embed nodes:
* update embed layer norm fusion in onnxruntime
* Fix temp model path in optimizer
* Add unit test for model with multiple embed nodes.
* Add unit test for gpt2 fusion with past state and mask
* Add unit test for change input to int32
For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.
* Fix avx2 load 32 bytes buffer overrun.
* Fix qladd buffer overrun for sse2 code.
* Fix QLinearAdd buffer overrun for arm.
* Add mlas test for qladd to cover overrun and more.
* Change API to save binary space. Add more test in mlas to cover different zeropoints.
* add modern standards to function arguments
* code cleanup
* fix code formatting
* add element access convenience function
* change template type name to match rest of code
* remove new At() convenience function
* add better documentation message
* build e2e cppwinrt tests
* add use nuget task
* make all referenced to package version prop/target-ified
* remove dupe props/targets reference
* work around project.assets.json error by deleting it
* powershell test invocation
* switch to batch script
* print debug info
* update x86->x64
* stdio.h
* pushd/popd
* add csharp tests
* package.config -> packages.config
* typo
* x86 -> anycpu
* debug is default
* add test path
* update csproj as well
* debug
* really replace all package versions
* debug output
* really use [PackageVersion]
* sleep intead of converting async operation to task and waiting
* dont close software bitmap
* switch to powershell script
* remove binding check
* continue on failure
* continuse on error action
* continueOnError and errorActionPreference
* tabbing
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* init version to use graph instead of model_proto for IsOpSupported
* move add to modelbuilder to use graph node
* move the rest of model_builder to use graph instead of modelproto
* remove redundant code
* Clear some redundant code
* merge master and some minor style changes
* move check if an initializer is external to individual op instead the whole graph
* Addressed comments
* Change the GetType and GetShape to log waring info inside to simplify the caller, remove some redundant onnxruntime namespace
* add squeeze op support, some more code style clean up
* fix a bug where duplicate output can be added to a subgraph, some other minor logging changes
* Add protobuf mutator library as a git submodule
* Added files and instructions to build the protobuf mutator library in CMake
* Added fuzzing flag to build system and added fuzzing dependency library. To run fuzzing test use the flags --fuzz_testing --build_shared_lib --use_full_protobuf --cmake_generator 'Visual Studio 16 2019'
* Added src files and build instructions for the main fuzzing engine
* Removed Random number generation test from inside the engine
* Added license header to files
* Removed all pep8 violations introduced by this change and other E501 violations