onnxruntime/docs
Ye Wang 6856619b18
Decoder Attention CUDA Op (#9792)
* add kernel interface

* register kernel

* add self/cross qkv projection without cache

* add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H)

* refactor ConcatPastToPresent

* DecoderQkvToContext interface

* q,k,v buffer and cache as output

* qk, pv and transctx

* fix compiler error on linux machine

* key_padding_mask

* add test_parity file. However not runnable

* add partial unittest

* made partial attributes to inputs

* --gen_doc

* change kernel interface, add more tests

* morre parity tests

* fix test

* fix typo

* transpose optimizer has bug. remove it temporarily

* add input shape checks

* add type/shape inference

* fix cache shape check

* fix rocm build failure

* fix rocm build error

* review comments

* review comments
2021-11-19 19:25:36 -08:00
..
c_cxx Fix S360 issue by using "use strict" for javascript code. (#9128) 2021-09-20 20:32:44 -07:00
execution_providers/images Remove docs that have been migrated to https://onnxruntime.ai/docs (#6225) 2021-02-05 18:09:27 -08:00
images API Documentation (#8948) 2021-09-09 22:04:51 -07:00
python fixing pypi pipeline for release (#9716) 2021-11-10 17:33:51 -08:00
ABI_Dev_Notes.md Fix some typos. (#3582) 2020-04-18 14:18:05 -07:00
Android_testing.md Removed BUILD.md from master as source now lives in gh-pages (#6709) 2021-02-19 11:34:21 -08:00
C_API_Guidelines.md Add C API Guidelines document (#5686) 2020-11-04 18:50:31 -08:00
cmake_guideline.md
Coding_Conventions_and_Standards.md Change onnxruntime::make_unique to std::make_unique (#7502) 2021-04-29 17:04:53 -07:00
ContribOperators.md Decoder Attention CUDA Op (#9792) 2021-11-19 19:25:36 -08:00
FAQ.md Add FAQ page (#3324) 2020-05-06 15:43:32 -07:00
How_To_Update_ONNX_Dev_Notes.md Remove onnxruntime/core/protobuf (#8617) 2021-08-10 09:36:27 -07:00
Model_Test.md
NotesOnThreading.md Support multi-loop parallel sections, use multi-loop sections in GRU (#5602) 2020-11-10 12:24:57 +00:00
ONNX_Runtime_Server_Usage.md Update docs/ONNX_Runtime_Server_Usage.md (#7818) 2021-05-26 16:17:20 -07:00
onnxruntime_dependencies.dot Update dependencies graph 2020-04-17 07:38:45 -07:00
onnxruntime_dependencies.png Update dependencies graph 2020-04-17 07:38:45 -07:00
onnxruntime_extensions.md Enable linking in exception throwing support library when build onnxruntime wasm. (#8973) 2021-09-10 22:09:16 +08:00
OperatorKernels.md Decoder Attention CUDA Op (#9792) 2021-11-19 19:25:36 -08:00
ORTMobilePackageOperatorTypeSupport.md Add supported operators/types documentation for the ORT Mobile package (#7807) 2021-05-26 15:57:40 +10:00
PR_Guidelines.md Add guidelines for writing a good PR. (#3830) 2020-05-05 16:28:21 -07:00
Privacy.md [C# and Python APIs] Expose knobs to enable/disable platform telemetry collection (#5481) 2020-10-21 10:32:13 -07:00
Python_Dev_Notes.md Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871) 2020-05-12 17:07:06 -07:00
Reduced_Operator_Kernel_build.md Support required types when excluding typed registrations (#6871) 2021-03-08 08:22:07 -08:00
ReleaseManagement.md Updated TPN for OpenMPI and cleanup (#3932) 2020-05-14 11:42:44 -07:00
Roadmap.md Doc updates for 1.5 (#5302) 2020-09-30 09:53:33 -07:00
Server.md Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172) 2020-12-18 02:00:42 -08:00
Versioning.md Bumping up to 1.10 (#9006) 2021-09-22 16:34:28 -07:00
WinML_principles.md Winml_principles_change (#5727) 2020-11-12 10:39:24 -08:00