* Fix run-to-run not deterministic bug.
* Remove non-deterministic logic in softmax
* Fix value diff when removing non-deterministic issue.
Co-authored-by: Lei Zhang <zhang.huanning@hotmail.com>
* Fix compiler warning in GistEncodeDecode.
* Fix other use of member variable.
* Make `compression_type_` const.
* Change floor to floorf in CUDA code.
* Statically cast size_t to int in GIST CUDA kernels
* Add explicit cast to `long` in gist.cc
Co-authored-by: Derek Murray <demurra@microsoft.com>
* test
* [gwang] make cmake compile work
* [gwang] enble build apks
* some build update
* add simple sigmoid test android project and cmake
* add build.py
* refine and remove unused import lib
* address CR comments
* remove unnecessary files
* add README.md
* minor update
* remove
* minor change
* fix ci failure and minor update
* fix typo in project folder
* remove
* remove and minor update
* refine
* minor fix
* fix
* fix typo
* add gradle spotlessApply task to fix CI failure
* fix
* enable spotlessApply in build gradle
* revert some changes
* minor fix
* run spotless apply for format
* address CR comments and fix CI version and format
* refine
* Refine
* address comments
* refine
* refine
* modify
* reformat
* resolve version conflicts
* minor update
* minor update
* address comments
* minor update
Co-authored-by: Guoyu Wang <wanggy@outlook.com>
* disable nnapi for graph with dynamic input shape
* Add warning for multiple paritions
* minor update
* update the message logging
* Fix coreml ci failure
* ORTModule enable run_symbolic_shape_infer by default
* Fix UTs by replacing Relu with Softmax
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
catch symbolic shape inference exception.
no prune graph when there is inner graph (Loop/If/Scan)
add an wrapper for numpy_helper.to_array so that we can debug onnx graph without external data
remove fuse_mask that is not used any more in onnx_model_bert_tf.py
* Install and use conda on ortmodule CI pipelines
* Update build script to install onnxruntime wheel before running unit tests
* Remove python 3.5 from install_python_deps
* Pinning deepspeed version to 0.3.15
* Adding Output Shape Validation for ORT-CPU execution flow
* Skipping validation check in-case output is not a tensor. Fixed conv_transpose test. Ignoring pad and reduction test
* Comparison b/w signed and un-signed int. Removed const for a primitive variable
* Commented the un-used test function signature
* Removed exception instead logging warning. Because there are lots of ORT tests which are failing because of this validation
* Fixed warning condition and test
* Fixed test and addressed comment on the PR
* Output shape verification will happen only for final output nodes of the model
* Changed variable name from camel case to underscore style
* Enable the tests as the validation failure will now logs warning instead of throwing an exception
* Adding Output Shape Validation for ORT-CPU execution flow
* Resolve merge conflict
* Comparison b/w signed and un-signed int. Removed const for a primitive variable
* Commented the un-used test function signature
* Removed exception instead logging warning. Because there are lots of ORT tests which are failing because of this validation
* Fixed warning condition and test
* Fixed test and addressed comment on the PR
* Output shape verification will happen only for final output nodes of the model
* Changed variable name from camel case to underscore style
* Enable the tests as the validation failure will now logs warning instead of throwing an exception
* Remove duplicate function "GetLogger()"
Remove duplicate function "GetLogger()"
* Fixed typo in method name TestConvTransposeOpInitializer
Fixed typo in method name "TestConvTransposeOpInitializer"
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
* LayerNorm function body v1
* LayerNorm function body
* layernorm function test
* Minor fixes
* Fix signed unsigned comparison
* Move contrib ops test
* Handle optional output parameters
* Add test case for optional outputs
* Handle float16 random generation
* Add function body to Gelu and FastGelu
* Add FastGelu test
* Fix comments
* Include cmath
[ PR previously merged as https://github.com//pull/7372, then reverted pending investigation of lost-wake-up issue seen with ParallelExecutor. Issue was a missing test for new work pushed to thread concurrent with a worker blocking. Change from 7372 is the addition of: https://github.com/microsoft/onnxruntime/blob/tiharr/dev-sticky-4/include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h#L1473-L1492 ]
Description: This change updates the heuristics used when a thread selects which worker threads to push work to on entering a parallel loop. Previously, worker threads would maintain a best-effort bitmap of "good worker hints" indicating the threads that were likely to be spinning waiting for work. This change uses a simpler heuristic where a thread records which workers ran its previous loop, and then re-submits its next loop to those same workers. The aim is to retain affinity between a thread and a set of workers, and to avoid maintaining the "good worker hints" bitmaps.
Motivation and Context: Profiling suggested that maintaining the "good worker hints" was taking unexpected time, particularly on NUMA systems. In addition, when running many concurrent workloads, the hints did not provide a way to help retain locality of workers and hence data in caches. Testing to confirm no regressions on microbenchmark (./build/Linux/Release/onnxruntime_benchmark --benchmark_filter=BM_ThreadPoolParallelFor) and on Linux mobilenet_v1_1.0_224.onnx, comparing p50 and p99 with vs without this change:
1 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0204s vs 0.0216s
2 concurrent:
p50 0.0172s vs 0.0181s
p99 0.0213s vs 0.0221s
* Use positivity everywhere; handle negative index in Slice
* limit positivity to inputs
* make handle_negative_index private
* strengthen sympy comparison
* further strengthen compariso
n and a minor refactoring
* Add flip test
* Fall through if -int_max in handle_negative_index()
* minor fix for infer_Concat to include initializers
* Add more tests
* use simplify
* more tests
* Check whether nvcc supports -Wstrict-aliasing before adding the compiler flag in CMakeList.txt.
* Removed reinterpret_cast to not cause strict aliasing violation errors or require -Wno-strict-aliasing when it is not available.
* initial draft for kernel invoke api
* initial implementation of kernel invoker
* [eager] fix build on Mac
* [eager] increment input name in kernel invoker
* temp fix for type in eager mode
* use global default log manager
* rollback the previous commit since it break linux build
* Revert "rollback the previous commit since it break linux build"
This reverts commit 58c2c3423a.
* Eager Mode: fix linking on macOS
* optimizer_execution_frame: ignore unused lambda capture (model_path)
* fix link issue
* ORTInvoker: set correct input argument tensor element proto types
Do not set a type proto on output arguments to allow ORT to deduce them
* ORTInvoker: create only one logging manager
* Minor fix to set execution provider type correctly. (#7000)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* training fix
* support config output ml values in frame, so we can use it to implement inplace update
* Fix range loop error while building. (#7087)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* Conditionally link with nsync_cpp if not windows. (#7151)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* Fixed initialization order in ORT kernel invoker (#7342)
* Updated constructor of ort_kernel_invoker to take a logger.
* Changed linking order.
* Updated test.
* add inplace ut
* add build option
* Update include/onnxruntime/core/eager/ort_kernel_invoker.h
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
* resolve comments in pr
* fix build break;merge from master
* fix build break
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Aaron Bockover <abock@microsoft.com>
Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com>
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>