* Add asm statement to model.mm to force linker to link against CoreML.Framework.
Update targets.xml as per Rolf's suggestions
* Remove explicit numpy version from macos build. We don't specify it for other CIs and the version specified doesn't have a pre-built 3.10 wheel. This leads to the CI attempting to build numpy which fails.
Fix arithmetic overflow warning. Suggested fix by static analysis tool
Arithmetic overflow: Using operator '+' on a 4 byte value and then casting the result to a 8 byte value.
Cast the value to the wider type before calling operator '+' to avoid overflow (io.2).
* Fix SAME_UPPER/SAME_LOWER (auto_pad attribute) in ConvTranspose
* Bump ONNX 1.10.2 globally
* load ONNX_VERSION from VERSION_NUMBER
* /
* revert deprecate warning in ORT 1.12
* add a comment about why removing cntk_simple_seg
* correct the implem in DML as well
(1) Modify some lines to fit line length limit 120
(2) Adjust parameter order of LaunchAttentionKernel
(3) Format code with Clang-Format in VS Code
(4) Fix spelling errors
* Make ORT as Pytorch JIT backend
LORT likely doesn't work with aten fallback so we only test LORT in its own CI.
* Revert changes to enable external CUDA allocator. Will add it later.
Revert "Revert changes to enable external CUDA allocator. Will add it later."
This reverts commit d5487f2e193014c805505afae8fb577c53667658.
Fix external allocator
* Relax tolerance and remove commented code
* Print more information in CI
* Fix pointer
* Address comments.
1. Reuse ORT-eager mode's environment.
2. Remove unused ctor.
* Use Pytorch master branch as all PRs are merged
Fix
* Refine based on cpplint feedbacks
* Revert changes to allow custom CUDA allocator in public APIs
* Use torch.testing.assert_close
* Use unittest framework
* Switch docker repo
* Rename *.cpp to *.cc
* Address comments
* Add comment
* Use same pipeline file for eager and lort pipelines
* Address comments
* Add yaml comment
* Fix cmake files
* Address comments
* Rename flags, remove printing code, remove dead comment
* Remove ostream operator<< definitions for TensorShapeProto and TensorProto as they clash with ONNX definitions in onnx/defs/printer.h/cc.
Currently printer.h (unnecessarily) pulls in a number of other ONNX headers which causes naming clashes with parts of ORT. It is also excluded in a minimal build.
Instead convert the onnx::TensorShapeProto to onnxruntime::TensorShape so we use the existing ostream operator<< for TensorShape.
Make GetTensorShapeFromTensorProto consistent with GetTensorShapeFromTensorShapeProto so both return a TensorShape (as the name implies).
QDQ loss debug - Weights Matching
Part 2 of QDQ loss debugging tool: given a float model and its qdq model, return the matching of all weight tensors and their corresponding dequantized weights from the qdq model.
LLVM compiler complains the std::hash<const char*> and suggests std::hash<const void*>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.
* add AddBiasTranspose kernel, new format of weights
* Use compact global_q in GEMM
* sequence_index from BxS to S; new stream for copy
* merge input and output pointers in scratch2
* update default benchmark tests
* add new format 0 for weight and bias
* avoid integer overflow
* check gpu memory
* output summary in benchmark
* add logging
* update unit tests with non empty bias value
* add rocblasGemmHelper and rocblasGemmStridedBatchedHelper for Rocm
* use std::variant for synthetic data storage.
* use std::variant to replace TypedCheckpointProperty
* Remvoe shared ptr for checkpoint property
* fix tests
* refine std::variant usage a bit
* remove CheckpointProperty data abstraction
* use InlinedVector and InlinedHashMap if possible
* fix comments
* fix build and test
* fix some comments
* use gsl::span
* fix tests
* refine based on comments
* fix win build
* fix build
* Add build option to link prebuilt TensorRT parser
* Test without the build option to link prebuilt TRTParser
* Minor: update name of build option
* Minor: update name of build option
Debugger for QDQ loss - activation matching
This is the first part of the QDQ debugger tool: activation matching, where we identify and match corresponding activations from the float model and the qdq model. The idea is that during quantization, we have an original float model and a qdq model. The debugger can run the two models side by side using the same input data. By comparing intermediate activations, we can help the model author figure out where the values differ, and take steps to reduce precision loss.