This code is valid only when -mcpu is set to utilize POWER9 technology
or above. A compatible code for POWER8 was created as well, but it
was not tuned for performance.
* get inputs independently for trtexec
* track one process only
* remove engine and profile files
* change time to commit time
* add runtime option for io binding
* move to commit date
* fixes
* add option for graph optimization
* cleanup docker script
* include remaining changes
* choose graph optimization option
* add space in option
* Add micro-benchmark for FastGelu
* Delete the bert-base case, as it is very similar to the bert-large one.
* Add argument parsing and more user-friendly provider type assertion.
* Change storage container, simplify build definition parameters.
* Remove explicit version from Objective-C docs.
* Increase timeout.
* Use real storage account.
* Get static website URL with az cli.
* Add android package build settings for full build
Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
* Add restriction to first usage in allocation planner
* change phrases
* add UT
Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
* POWER10: QGEMM optimization
This patch makes use of POWER10 MMA feature for QGEMM function.
This optimization includes signed and unsigned cases.Tested and
there are no new failures with gcc11 and clang-14.
* Changes as per review comments
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
* add executor option (vm or graph) and support virtual machine methods
* nullptr check for compile and run methods (see also PR#10211 from microsoft:onnxruntime)
* get output shapes for VM
* remove run_with_benchmark. remove run methods from python api, get it from native side
* get outputs method for VM was implemented
* support multiple input for VM
* update python logging and exception
* small fix
* update tvm with patch for VM API
* update nhwc transformations for TVM EP
* add data alignment check and support set_input_zero_copy for GE in TVM EP
* fix logger name
* return back to apache/tvm with VM fixes instead of local dev branch
* hide customized tvm logger while issue is not resolved. fix tvm warning related to target_host
* flake8 fix
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
* skip browserstack test at release pipeline
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* pool name as a parameter to run at lotus
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* Update web-ci-pipeline.yml for Azure Pipelines
* create a packaging pipeline for web
* Update web-packaging-pipeline.yml for Azure Pipelines
* make web-ci-pipeline as a template
* make web-ci-pipeline as a template
* make web-ci-pipeline as a template
* make web-ci-pipeline as a template
* change a paramter name checking a pipeline
* make a pool name changable for react native pipeline
* disable code sign validation for react native
* fix react native package.json publish
* fix indentation
* remove unnecessary comment
* test onnxruntime-common package publish
* ts and js files use lf as eol for windows
* use Linux style of ending line break
* change newLine at only tsconfig.json
* restore a commented code
* fix git restore directory for npm packaging
* fix a typo
* force eol to lf on windows for js directory in CI
* Add microbench to benchmark single operators.
* Move to tool directory; seperate data genration from io binding.
* Refector.
* Clean up.
* Use precision instead for extensibility.
* Refactor the create_io_binding function to take in torch tensors
instead of numpy arrays; this reflects more accurately what
the function does, because it is torch tensors that got bound.
* Moved the init earlier to keep the cache coherent
* Move setting of w_desc later, and zero shape check later to catch all cacheable changes.
* Add comment
Fix CUDA 10.2 compile error due to inlined_containers.h inclusion
into a common CUDA header.
Use NumberOfNodes() to reserve space in a hash table
Prefer separate call to reserve() rather than passing in the
hash table constructor. They have somewhat different meaning.