Fix issue #1591
Root Cause:
CUDA Equal Greater Less do not support multi-directional broadcast
Fix:
Add code to support the multi-directional broadcast
Also add tests to cover more cases.
* Mention OrtCreateSessionFromArray in C API doc
* Add C API for free dim override
* Add C API for free dim override, fix missing API mention in InferenceTest.cs, fix confusing print statement in perf_test.
* Remaining C#files
* fix c# build
* Run the tests in blame mode. This option is helpful in isolating a problematic test causing the test host to crash.
* fix order
* Avoid variable length stack array variables for VC++ compatibility
Use dynamically allocated arrays or vectors instead.
* windows enabling
* openvino windows build
* Update build instructions
* resolve conflicts for PR
* remove debug messages from cmake
* PR fix for window support
* Disabled Div unit test on GPU
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled div unit test for GPU in python backend tests
*Added more backends for OpenVINO
*Disabled div unit test in onnx_test_runner
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Disabled div for GPU_FP16 in python backend tests
Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
* Handle std::bad_alloc when growing arena.
Allow more than one attempt at reducing the buffer if allocation fails. More memory may have become available so never trying to backpedal more than once means we potentially fail when a large enough buffer could have been allocated.
Description: Refine threading control options and move inter op thread pool to session state.
Added thread_utils.h/cc to centralize the decision around the thread pool size under various conditions.
Motivation and Context
Currently the thread pool size of the parallel executor is hardcoded to 32 for some reason. This PR makes the options to configure the thread pool sizes clearer.
* Initial commit
* Uncomment tests
* Updates
* Updates
* Disable CRD mode DepthToSpace for NGraph builds
* Disable test
* Update tests
* PR feedback
* Add unit test for CRD mode
* Reflect class variable in naming
* Add a test to NGRAPH disabled list
* Update main.cc
* Update main.cc
* Fix symbolic shape inference for faster_rcnn, mask_rcnn, yolov3
Force merge when --auto_merge, on symbolic dims which sympy cannot simplify
Add symbolic inference for Resize opset 10
Add support for step != 1 in Slice
Add support for computed dim in TopK
Bug fixes in passing symbolic dims from subgraph
Fix an outdate comment in Nuphar provider header
1. remove sudo from the cleanup step for Linux so that we don't need the sudo access for vstsagent build user
2. a minor fix in the install_ubuntu.sh to make the image smaller for openvino
* Revert "MlasGetMaximumThreadCount: plus 1 to the NumThreads from ORT thread pool (#1646)"
This reverts commit 413730365f.
* A small fix to the parallel for
* add mimalloc submodule
* basic hooks into execution provider header and build script option
* pull mimalloc into build
* windows has to use the override vcxproj already set up, and disable bfcarena when using mimalloc
* fix import_location
* generalize build msbuild command
* add mimalloc dependency to python package as well as various commenting cleanups
* update mimalloc commit as stop gap
* include mimalloc changes from master
* create capi directory if doesn't exist for mimalloc copying over
* disable runtime hooks and remove old comment
* temporary change to test CI
* fetch the mimalloc output name property
* uniformly call target_link_libraries
* query cmake to get the correct windows sdk to target
* revert change to trailing directory slash
* pickup windows sdk off msbuild path if possible
* copy the produced dll/so at install time, not configure time
* deal with mimalloc unimplemented atomic
* move to dev branch of mimalloc to avoid atomic issues on gcc
* for windows specify solution settings (x86) rather than individual project settings
* pin mimalloc submodule to updated commit
* typo
* Revert "temporary change to test CI"
This reverts commit 764867376936a5d307dded3cc37f00a34e3b0c96.
* add python api's for get/set execution providers, checking available and all providers.
* add back deleted code.
* minimize peak memory consumption for sess.set_providers() api. need to remove references to underlying _sess object
* fix typo.
* add validation for set_providers(), addr other review feedback.
* Fix broken link and minor wording updates
* Update links to use relative paths
* Update sample section organization
* Fix a few more links
* Update links to relative paths
* Fix link urls
* Update links to relative paths
* Update link to perf test doc page
* Update links to relative paths
* Update to relative paths for links
* Update link
* use unaligned buffer for Nuphar in onnxruntime_perf_test to avoid crash
symbolic shape inference fixes to support more sophiscated models
remove useless code from model_quantizer
* Run symbolic shape inference for subgraphs in Loop/Scan
* Allow a symbolic dimension to merge with an int dimension
* Bump onnx to latest
Update onnx.in.proto with changes for SparseTensor.
* add temp skip tests
* remove passed tests from skip list
* skip more tests for new ops in opset 11
* skip crashing tests
* update handling of new attribute types sparse tensor and sparse tensors
* advance onnx commit and remove skip cpu_flaky_tests
* temporarily skip yolo3 model test due to resize opset10 shape inference regression
* update proto for onnxruntime server
* advance onnx commit further
1. Add openvino GPU nightly build pipeline, this test is running on Intel Up square Edge device. The device are host locally not from Azure VM. We persist a smaller model test data on Edge device.
2. Update the build condition for openvino GPU so it works for GPU_FP32, GPU_FP16
3. add option to install_ubuntu.sh to exclude the package used for nuphar, so that we can save some disk space as the Edge device usually have limited disk space.
C/C++ Opage APIs
Add new virtual interfaces for NonTensorType
Implement entry points.
Add shared header for the data container.
Add export symbols.
Add serialization/deserialization.
Implement model with Opaque types.
Rework opqaue_api_test as a standalone executable.
It should always have outputs but in case it doesn't (nothing fails currently if it doesn't even though that makes it meaningless) make sure it also has a node.