* Check whether nvcc supports -Wstrict-aliasing before adding the compiler flag in CMakeList.txt.
* Removed reinterpret_cast to not cause strict aliasing violation errors or require -Wno-strict-aliasing when it is not available.
* initial draft for kernel invoke api
* initial implementation of kernel invoker
* [eager] fix build on Mac
* [eager] increment input name in kernel invoker
* temp fix for type in eager mode
* use global default log manager
* rollback the previous commit since it break linux build
* Revert "rollback the previous commit since it break linux build"
This reverts commit 58c2c3423a.
* Eager Mode: fix linking on macOS
* optimizer_execution_frame: ignore unused lambda capture (model_path)
* fix link issue
* ORTInvoker: set correct input argument tensor element proto types
Do not set a type proto on output arguments to allow ORT to deduce them
* ORTInvoker: create only one logging manager
* Minor fix to set execution provider type correctly. (#7000)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* training fix
* support config output ml values in frame, so we can use it to implement inplace update
* Fix range loop error while building. (#7087)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* Conditionally link with nsync_cpp if not windows. (#7151)
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
* Fixed initialization order in ORT kernel invoker (#7342)
* Updated constructor of ort_kernel_invoker to take a logger.
* Changed linking order.
* Updated test.
* add inplace ut
* add build option
* Update include/onnxruntime/core/eager/ort_kernel_invoker.h
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
* resolve comments in pr
* fix build break;merge from master
* fix build break
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Aaron Bockover <abock@microsoft.com>
Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com>
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
Resize is spec'd to ignore the "roi" tensor in certain modes. For some reason, converters are specifying an arbitrary value for this tensor, even though it's optional.
This makes the graph partitioner skip a check for empty shape dimensions for tensors such as this, which the DML kernel registers as consuming as CPU inputs. Otherwise, the node is not included in DML graph partitions, because the DML graph doesn't handle empty dimensions.
Related work items: #32221164
A model from one of our partners regressed with a failure to evaluate due to the addition of strided 64-bit emulation in the DML EP for the Cast operator. Specifically, the model uses a Cast from int32 to int64 to produce the input shape to a Reshape node. When supplied with a shape dimension of -1 (int32 0xffffffff), the strided emulation in Cast ends up producing an int64 result of 0x00000000ffffffff. This is then fed into the Reshape operator, where it produces an incorrect tensor shape and a failure during evaluation.
Generally speaking we assume that using strided 64-bit emulation is safe if a node's inputs came from the DML EP itself. This isn't true in the general case for Cast, however - casting negative signed values can and will produce incorrect outputs with strided emulation.
After this change, Cast nodes with 64-bit types will fall back to CPU unless running on a GPU that native supports 64-bit datatypes.
Related work items: #31768166
* Include ORT format model conversion scripts and infrastructure in ORT python package.
- tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).
* Address PR comments
fix crash when DeQuantizeLinear's output is graph output
propagate only when scale and zp are scalar.
fix bug for is_modified= is_modified || TryCancelOutDQQPair(graph, dq_node, q_node); in which TryCancelOutDQQPair wouldn't be invoked if is_modified is true
* check in early stop search as separate type
* rename to beam search configurations
* update do sample configuration flag help
* rename to configurable search step
* add option groups
* add more unit tests
Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
This patch helps to set architecture as power, when processor
check output matches ppc64le*.
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
**Description**:
- Fix SequenceInsert with last position, which is equal to the current sequence length.
- Implement Identity to support sequence input for opset 14.
**Motivation and Context**
- Required to export Huggingface/transformers T5 with beam search.
* Update symbolic_shape_infer.py
don't rely on static code infer in _infer_Squeeze_
* checking if dorpped axes might be =! 1
* Checking opset. Logging assumption that symbolic dimensions are unequal to 1.
* more checks
* LayerNorm function body v1
* LayerNorm function body
* layernorm function test
* Minor fixes
* Fix signed unsigned comparison
* Move contrib ops test
* Handle optional output parameters
* Add test case for optional outputs
* Handle float16 random generation
* Address PR feedback