* Code refactor
* fix bug
* modify comment
* modify test for the new ORT TRT cache behavior
* update comment
* rename variable
* fix bug for not having trt context
* Custom parameters (#10964)
* get inputs independently for trtexec
* track one process only
* remove engine and profile files
* change time to commit time
* add runtime option for io binding
* move to commit date
* fixes
* add option for graph optimization
* cleanup docker script
* note second time creation
* allow for parameters to be configured from pipeline at runtime
* uncomment
* include optional arguments at runtime
* post second session creation
* update cmake version
* Revert "update cmake version"
This reverts commit 09a1364eae68610724c8e90eeea777b7ee03f74b.
* Move data format import
* Perf FasterRCNN + MaskRCNN (#11102)
* add faster mask
* fix paths
* add a test scenario that - if engine cache is present, trt ep should load the engine cache and run inference
* Revert "Merge branch 'trt_cache_refactor' of https://github.com/microsoft/onnxruntime into trt_cache_refactor"
This reverts commit 8edc574de1ea6055534f33a57b9365c721c2eb29, reversing
changes made to 0c92e5b2b1d453527001fe731ed4ccfc79e6adad.
Co-authored-by: Olivia Jain <oljain@microsoft.com>
Description: Format all python files under onnxruntime with black and isort.
After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.
#11315, #11316
Update the code to use OrtApis instead of the old onnxruntime::InferenceSession class. Mainly because the old one doesn't support custom op. We are trying to convert some EPs to custom ops. Hopefully they can continue to leverage this test set.
* increase timeout
* show mac agent info
* Revert "show mac agent info"
This reverts commit a646ebefff8940a3044f1984107856db33319eb8.
* increase timeout in PR test
Prior to this, certain shape and type errors were surfaced only when
the model was using the latest known op set version.
Providing users an explicit option allows for better testing of code
that produces models, which includes unit tests within this repo and
other repos such as the TF-ONNX and PT-ONNX converters.
Remove the previous behavior which seems quite counter-intuitive:
an otherwise identical model with a later op set version should be treated
identically in this regard.
The option defaults to false to avoid causing errors for users that
rely on the previous permissive behavior.
Turned on the strict enforcement by default in OpTester, which revealed a few
disagreements between ORT and ONNX on what the correct output shape should
be.
Fix shape inference bug in ReduceSumTraining with noop_with_empty_axes=1
which was revealed.
Fix TensorOpTest.Unsqueeze_scalar, which was testing negative axes on an
op set version where the op did not actually support negative axes.
Fixes#9506.
* Update model test
* update comment
* create map to hold OnnxModelInfo so test doesn't need to reload the model again
* revert the code and use GTEST_SKIP() to skip test
* fix bug
* revert LATEST_ONNX_OPSET_SUPPORTED_BY_TENSORRT
TODO: Someone should investigate why the AARCH64 build takes 3+ hours and reduce it if possible. Assuming it's using an emulator given the x64 build with the same arguments takes 13 minutes.
Move a check for a graph output (for the partition) prior to iterating the downstream nodes to avoid trying to get a NodeUnit for a node that is outside of the partition.
onnx.shape_inference.infer_shapes only works for model size < 2GB, while onnx.shape_inference.infer_shapes_path works for all models. This PR replaces infer_shapes with infer_shapes_path.
* Disable training code in DNNL LayerNorm code
The capability code already does not claim the LayerNorm and
SkipLayerNorm that require more than one output. However,
building with training enabled was causing issues.
The training specific code has been removed even when building with
training enabled.
Signed-off-by: George Nash <george.nash@intel.com>
* Fix for DNNL FusedMatMul op.
The bug was in the transpose code.
Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* Use agreed upon memory format type when runnig Pooling Gradient in dnnl ep
The dnnl ep does not currently have a way to pass memory_format information
between the forward pooling primitive to the backward pooling primitive.
This change explicitly sets the memory_format to use match that of Onnxruntime.
For both the forward and backward pooling code. This will prevent using un-matched
memory format that could result in an `unimplemented` error from dnnl ep.
Signed-off-by: George Nash <george.nash@intel.com>
* Update dnnl ep to use OneDNN v2.6
Do not run ReduceInfLogSum on the kDnnlExecutionProvider due to a
calculation bug when doing Log or infinity valuse. The fix for this
issue will be part of the next OneDNN release.
Signed-off-by: George Nash <george.nash@intel.com>
* Update PrintMemory function in dnnl ep
This modification can be used to enable/disable memory printing
for dnnl ep develpers. This is considered a developer only feature
and is disabled by default. It must be enabled and code recompiled
to use.
Even if it is enabled it will not actually print any memory because
the developer needs to take the extra step of spefifying the memory
that will be printed to the screen.
Signed-off-by: George Nash <george.nash@intel.com>
* Update binary ops to run on intel GPU when using dnnl ep
Binary ops (i.e. Add, Div, Mul, and Sub ) was updated to no longer
call GetMemoryAndReshape in the past this would move the memory from
CPU to the GPU. This extra call is no longer needed since it is taken
care of by the GetMemoryInOrtFormat call. Removing the GetMemoryAndReshape
prevented copying the memory to GPU twice.
Signed-off-by: George Nash <george.nash@intel.com>
Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
* scale input
* more condition check
* alternative
* per comments
* fix comments
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* add trt node list consolidation
* add more log
* fix typo
* seperate cycle detection and removal
* update
* change function name
Co-authored-by: Ubuntu <azureuser@orttrtlinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
* Add cffconvert.yml to validate CITATION.cff
* Fix CITATION.cff by removing duplicate title and correcting the license
Co-authored-by: Abel Soares Siqueira <abel.s.siqueira@gmail.com>
Some micro-architectures of power efficient cores in ARMv8 system have narrow 64b load/store resources, which require specialized computing kernels in MLAS. We leverage pytorch CPUinfo package for detecting these cores. Unfortunately CPUinfo package does not work on Windows.
This commit implements ARM64 micro-architecture detection.