* Merged PR 5195856: Fix broken cases of zero size tensors in Cast/Reduce
MaskRCNN failed when `Cast` tried to execute `Xor` with emptiness (zero in dimensions). This is perfectly legal and should be treated as a nop.
Ultimately DML itself should treat this case as a nop, just like how C's `memcpy` treats 0 count as a nop, but I'm just addressing it in ORT now, as enabling it in DML would impact more operators to be consistent (probably should incrementally add a flag to tensor validation so operators can be opted in gradually).
Corresponding WindowsAI PR: https://microsoft.visualstudio.com/WindowsAI/_git/WindowsAI/pullrequest/5195850
Related work items: #27469839, #28761382
* Merged PR 5201369: Remove copy of initializers added in DMLXP refactor
When used in ORT, a common method shouldn't copy and return initializer data
Related work items: #29514403
Co-authored-by: Justin Stoecker <justoeck@microsoft.com>
Co-authored-by: Jeff Bloomfield <jeffbloo@microsoft.com>
* [java] Fixing the buffer semantics.
* Renaming bufferCapacity to bufferRemaining.
* Adding a cast to char* so the pointer arithmetic works on Windows.
* Rework broadcasting setup to decrease binary size. Push all the type specific down and separate out the broadcasting/parallelization.
Reductions:
element_wise_ops: 521.0KB -> 268.8KB
where: 25.8 KB -> 17.3 KB
qlinear_binary_op: 28.1 -> 12.8
* Place shape related nodes in CPU
* visit candidates by topological order
* Make CPU node placement a utility function
* skip placing on CPU if the data typs is float16 or bfloat16
* Allow sharing of initializers between sessions.
* Allow sharing of initializers between sessions (2).
* Add test for C#
* Add test for C#; address PR comments
* Address PR comments
Moved AddInitializer logic to internal session options
Added tests for owned buffer
Clarified documentation
Fix bug where memory info and not device was getting compared
* Fix test
* Fix training build
* Add ver 5 end marker and ver 6 starter, add scenario and usage examples.
* bias softmax kernel
* bias softmax kernel
* remove debug comments
* remove debug comment
* windows build doesnt handle unary minus on unsigned type
* int64 => int treated as error
* only support cuda
* add bias softmax fusion tests
* PR comments
* more PR comments
* use MLTypeCallDispatcher
* break function into pieces
* add loop unroll and add to list for inference as well
* use std::min and move operator==
* revert std::min (doesnt work ci pipeline) and fix int to size_t error
* pr comments
* fixes for windows ci
* fix for windows ci
* pr comments on consistency
* p_model_
* fix formatting and add anonymous namespace
Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* remove shape inference and fix save large model problem
* remove unnecessary import
* refine code and add external format for quantize_qat
* remove initializers in tensors_to_calibrate
* small refine
Co-authored-by: t-yguo <t-yguo@microsoft.com>
This updates the NCHWc transformer to not interfere with quantized convolution models, based on observations from internal models. The tensor type for MaxPool must be float. The input to GlobalAveragePool/GlobalMaxPool must be in NCHWc format.