* Modify CPU fallback logic
* Review comments, failing test
* Add test for topological order
* review comment
* Fix test for amd ci
* fix build
* Fix amd test
* Reduce the binary size growth from this change. Minimal build grew by 7KB from this checkin.
Firstly simplify the checking logic a little. Same checks are still done - just without using an extra layer of helpers.
The issue being addressed by the original change only applies if you have a graph output where the shape wasn't able to be inferred. e.g. Reshape node with dynamic input causes downstream shapes to be unknown. If that is not the case, MergeShapeInfo in graph.cc would have resolved any differences between a specified output shape and the inferred output shape during Graph::Resolve.
The issue does not apply to the execution frame used by the optimizer as the only time it would create a graph output is if it could constant fold all the way through, so MergeShapeInfo would have handled any difference in that case as well.
Due to these considerations, wiring a logger in at the IExecutionFrame level isn't necessary if VerifyOutputSizes optionally overridden by an implementation that cares.
* Address PR comments
->unsetting the CMAKE_MAP_IMPORTED_CONFIG that was
set for OpenVINO EP for Relwithdebinfo build on
windows.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>
Update Objective-C API to be more usable from Swift. E.g., to allow conversion from Objective-C methods with trailing NSError** parameter to throwing Swift methods.
Update CMake Objective-C framework setup.
* Fix run-to-run not deterministic bug.
* Remove non-deterministic logic in softmax
* Fix value diff when removing non-deterministic issue.
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
* Fix compiler warning in GistEncodeDecode.
* Fix other use of member variable.
* Make `compression_type_` const.
* Change floor to floorf in CUDA code.
* Statically cast size_t to int in GIST CUDA kernels
* Add explicit cast to `long` in gist.cc
Co-authored-by: Derek Murray <demurra@microsoft.com>
* test
* [gwang] make cmake compile work
* [gwang] enble build apks
* some build update
* add simple sigmoid test android project and cmake
* add build.py
* refine and remove unused import lib
* address CR comments
* remove unnecessary files
* add README.md
* minor update
* remove
* minor change
* fix ci failure and minor update
* fix typo in project folder
* remove
* remove and minor update
* refine
* minor fix
* fix
* fix typo
* add gradle spotlessApply task to fix CI failure
* fix
* enable spotlessApply in build gradle
* revert some changes
* minor fix
* run spotless apply for format
* address CR comments and fix CI version and format
* refine
* Refine
* address comments
* refine
* refine
* modify
* reformat
* resolve version conflicts
* minor update
* minor update
* address comments
* minor update
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
* disable nnapi for graph with dynamic input shape
* Add warning for multiple paritions
* minor update
* update the message logging
* Fix coreml ci failure
* ORTModule enable run_symbolic_shape_infer by default
* Fix UTs by replacing Relu with Softmax
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
catch symbolic shape inference exception.
no prune graph when there is inner graph (Loop/If/Scan)
add an wrapper for numpy_helper.to_array so that we can debug onnx graph without external data
remove fuse_mask that is not used any more in onnx_model_bert_tf.py
* Install and use conda on ortmodule CI pipelines
* Update build script to install onnxruntime wheel before running unit tests
* Remove python 3.5 from install_python_deps
* Pinning deepspeed version to 0.3.15
* Adding Output Shape Validation for ORT-CPU execution flow
* Skipping validation check in-case output is not a tensor. Fixed conv_transpose test. Ignoring pad and reduction test
* Comparison b/w signed and un-signed int. Removed const for a primitive variable
* Commented the un-used test function signature
* Removed exception instead logging warning. Because there are lots of ORT tests which are failing because of this validation
* Fixed warning condition and test
* Fixed test and addressed comment on the PR
* Output shape verification will happen only for final output nodes of the model
* Changed variable name from camel case to underscore style
* Enable the tests as the validation failure will now logs warning instead of throwing an exception
* Adding Output Shape Validation for ORT-CPU execution flow
* Resolve merge conflict
* Comparison b/w signed and un-signed int. Removed const for a primitive variable
* Commented the un-used test function signature
* Removed exception instead logging warning. Because there are lots of ORT tests which are failing because of this validation
* Fixed warning condition and test
* Fixed test and addressed comment on the PR
* Output shape verification will happen only for final output nodes of the model
* Changed variable name from camel case to underscore style
* Enable the tests as the validation failure will now logs warning instead of throwing an exception
* Remove duplicate function "GetLogger()"
Remove duplicate function "GetLogger()"
* Fixed typo in method name TestConvTransposeOpInitializer
Fixed typo in method name "TestConvTransposeOpInitializer"
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
* LayerNorm function body v1
* LayerNorm function body
* layernorm function test
* Minor fixes
* Fix signed unsigned comparison
* Move contrib ops test
* Handle optional output parameters
* Add test case for optional outputs
* Handle float16 random generation
* Add function body to Gelu and FastGelu
* Add FastGelu test
* Fix comments
* Include cmath
[ PR previously merged as https://github.com//pull/7372, then reverted pending investigation of lost-wake-up issue seen with ParallelExecutor. Issue was a missing test for new work pushed to thread concurrent with a worker blocking. Change from 7372 is the addition of: https://github.com/microsoft/onnxruntime/blob/tiharr/dev-sticky-4/include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h#L1473-L1492 ]
Description: This change updates the heuristics used when a thread selects which worker threads to push work to on entering a parallel loop. Previously, worker threads would maintain a best-effort bitmap of "good worker hints" indicating the threads that were likely to be spinning waiting for work. This change uses a simpler heuristic where a thread records which workers ran its previous loop, and then re-submits its next loop to those same workers. The aim is to retain affinity between a thread and a set of workers, and to avoid maintaining the "good worker hints" bitmaps.
Motivation and Context: Profiling suggested that maintaining the "good worker hints" was taking unexpected time, particularly on NUMA systems. In addition, when running many concurrent workloads, the hints did not provide a way to help retain locality of workers and hence data in caches. Testing to confirm no regressions on microbenchmark (./build/Linux/Release/onnxruntime_benchmark --benchmark_filter=BM_ThreadPoolParallelFor) and on Linux mobilenet_v1_1.0_224.onnx, comparing p50 and p99 with vs without this change:
1 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0204s vs 0.0216s
2 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0213s vs 0.0221s