ARM a55 micro-architecture (with dot product instructions), similar to a53, is widely used as little cores in big.Little configurations. A55 has a narrower memory load/store hardware, where a 128b load instruction would block the pipeline for 2 whole cycles, during which no other instructions can be executed. On the other hand, a 64b load instruction can be duo issued with many other instructions.
This change adds a Symmetric QGEMM kernel for a55 micro-architecture, where we replace
ldr q4,[x1],#16
with
ldr d4,[x1],#8
ldr x11,[x1],#8
ins v4.d[1],x11
so that we can try to hide the memory load cycles behind computing cycles in the kernel.
Co-authored-by: Chen Fu <fuchen@microsoft.com>
* change BeamSearch op to support encoder decoder model
* check model_type and decoder attribute
* fix
* update comments
* warn shape inference issue with onnx v1.11 or T5
* skip parity test when tempature != 1.0
* fix build
* Update dnnl Add, Mul, Sub, Div ops to handle scalar values
Signed-off-by: George Nash <george.nash@intel.com>
* Add additional scalar support for dnnl execution provider
This will add scalar support for:
Eltwise operators: Abs, Elu, Exp, LeakyRelu, Log, Relu, Round,
Sigmoid, Softplus, Sqrt, and Tanh
Gelu operators: BiasGelu, FastGelu, and Gelu
Softmax operator
Signed-off-by: George Nash <george.nash@intel.com>
This code is valid only when -mcpu is set to utilize POWER9 technology
or above. A compatible code for POWER8 was created as well, but it
was not tuned for performance.
* get inputs independently for trtexec
* track one process only
* remove engine and profile files
* change time to commit time
* add runtime option for io binding
* move to commit date
* fixes
* add option for graph optimization
* cleanup docker script
* include remaining changes
* choose graph optimization option
* add space in option
* Add micro-benchmark for FastGelu
* Delete the bert-base case, as it is very similar to the bert-large one.
* Add argument parsing and more user-friendly provider type assertion.
* Change storage container, simplify build definition parameters.
* Remove explicit version from Objective-C docs.
* Increase timeout.
* Use real storage account.
* Get static website URL with az cli.
* Add android package build settings for full build
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Add restriction to first usage in allocation planner
* change phrases
* add UT
Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
* POWER10: QGEMM optimization
This patch makes use of POWER10 MMA feature for QGEMM function.
This optimization includes signed and unsigned cases.Tested and
there are no new failures with gcc11 and clang-14.
* Changes as per review comments
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* add executor option (vm or graph) and support virtual machine methods
* nullptr check for compile and run methods (see also PR#10211 from microsoft:onnxruntime)
* get output shapes for VM
* remove run_with_benchmark. remove run methods from python api, get it from native side
* get outputs method for VM was implemented
* support multiple input for VM
* update python logging and exception
* small fix
* update tvm with patch for VM API
* update nhwc transformations for TVM EP
* add data alignment check and support set_input_zero_copy for GE in TVM EP
* fix logger name
* return back to apache/tvm with VM fixes instead of local dev branch
* hide customized tvm logger while issue is not resolved. fix tvm warning related to target_host
* flake8 fix
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* skip browserstack test at release pipeline
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* pool name as a parameter to run at lotus
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* create a packaging pipeline for web
* Update web-packaging-pipeline.yml for Azure Pipelines
* make web-ci-pipeline as a template
* make web-ci-pipeline as a template
* make web-ci-pipeline as a template
* make web-ci-pipeline as a template
* change a paramter name checking a pipeline
* make a pool name changable for react native pipeline
* disable code sign validation for react native
* fix react native package.json publish
* fix indentation
* remove unnecessary comment
* test onnxruntime-common package publish
* ts and js files use lf as eol for windows
* use Linux style of ending line break
* change newLine at only tsconfig.json
* restore a commented code
* fix git restore directory for npm packaging
* fix a typo
* force eol to lf on windows for js directory in CI
* Add microbench to benchmark single operators.
* Move to tool directory; seperate data genration from io binding.
* Refector.
* Clean up.
* Use precision instead for extensibility.
* Refactor the create_io_binding function to take in torch tensors
instead of numpy arrays; this reflects more accurately what
the function does, because it is torch tensors that got bound.