onnxruntime/docs
Viswanath Boga afce0e2543
Attention kernel update to handle different Q,K,V hidden sizes (#8039)
* changes working to convert akv nodes

* changes to replace nodes

* changes to accomodate qkv hidden sizes as attributes

* kernel to accept qkv_hidden_size attributes

* Working till compute for varied dimension, todo applyattention()

* changes to make all regression tests work

* inference running successfully without prepack

* success inference with pre-pack weights

* add test for diff sizes

* bias shape need not be a mul of 3

* get the output_hidden_size from input

* infer output shape from input

* merge with master

* cleaning up files that got merged wrong

* accurancy at accepted level

* added unit test case for different dimensions

* all unit tests passing

* packed weights working for attention

* prepacked weights working

* added test case for newly added extra qk input

* updated unit test to test only extra add qk

* fixing build error

* removing few debugs

* reverting test changes

* all python test passing

* cleaning up

* new unit test added, major clean up of code

* removed extra code

* minor

* minor fix to tests

* prepack weights code cleaned up

* compacted compute() in attention.cc

* reformat compute()

* making a parameter T

* adding 3 q,k,v buffers in all cases

* fixing build

* running tests only on cpu

* Updating docs

* trigger ci builds

* Addressing comments in PR

* addressing some more comments

* get add_qk_str from add_qk node directly

* updating docs, added extra check to verify attn inputs

* Optimized the extra add by parallelizing

* added attention_shape to symbolic_shape_infer.py

* minor refactoring to address comments
2021-07-19 12:21:33 -07:00
..
execution_providers/images Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225) 2021-02-05 18:09:27 -08:00
images Expand the documentation on using compiling EPs with a minimal build (#5893) 2020-12-02 09:12:36 +10:00
python Upgrade tf 2.4.1 to 2.4.2 for component governance (#8036) 2021-06-14 09:30:58 -07:00
ABI_Dev_Notes.md Fix some typos. (#3582) 2020-04-18 14:18:05 -07:00
Android_testing.md Removed BUILD.md from master as source now lives in gh-pages (#6709) 2021-02-19 11:34:21 -08:00
C_API_Guidelines.md Add C API Guidelines document (#5686) 2020-11-04 18:50:31 -08:00
cmake_guideline.md Add a doc for cmake (#1524) 2019-08-06 07:51:53 -07:00
Coding_Conventions_and_Standards.md Change onnxruntime::make_unique to std::make_unique (#7502) 2021-04-29 17:04:53 -07:00
ContribOperators.md Attention kernel update to handle different Q,K,V hidden sizes (#8039) 2021-07-19 12:21:33 -07:00
FAQ.md Add FAQ page (#3324) 2020-05-06 15:43:32 -07:00
How_To_Update_ONNX_Dev_Notes.md CGManifest - add training entries and generate entries for submodules. (#3933) 2020-05-15 13:34:18 -07:00
Model_Test.md Renaming MKL-DNN as DNNL (#2515) 2019-12-03 07:34:23 -08:00
NotesOnThreading.md Support multi-loop parallel sections, use multi-loop sections in GRU (#5602) 2020-11-10 12:24:57 +00:00
ONNX_Runtime_Server_Usage.md Update docs/ONNX_Runtime_Server_Usage.md (#7818) 2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot Update dependencies graph 2020-04-17 07:38:45 -07:00
onnxruntime_dependencies.png Update dependencies graph 2020-04-17 07:38:45 -07:00
onnxruntime_extensions.md Update submodule onnxruntime-extensions. (#8282) 2021-07-13 10:21:11 +08:00
OperatorKernels.md Attention kernel update to handle different Q,K,V hidden sizes (#8039) 2021-07-19 12:21:33 -07:00
ORTMobilePackageOperatorTypeSupport.md Add supported operators/types documentation for the ORT Mobile package (#7807) 2021-05-26 15:57:40 +10:00
PR_Guidelines.md Add guidelines for writing a good PR. (#3830) 2020-05-05 16:28:21 -07:00
Privacy.md [C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871) 2020-05-12 17:07:06 -07:00
Reduced_Operator_Kernel_build.md Support required types when excluding typed registrations (#6871) 2021-03-08 08:22:07 -08:00
ReleaseManagement.md Updated TPN for OpenMPI and cleanup (#3932) 2020-05-14 11:42:44 -07:00
Roadmap.md Doc updates for 1.5 (#5302) 2020-09-30 09:53:33 -07:00
Server.md Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) 2020-12-18 02:00:42 -08:00
Versioning.md Update Version.md (#8021) 2021-06-13 18:52:40 +02:00
WinML_principles.md Winml_principles_change (#5727) 2020-11-12 10:39:24 -08:00