onnxruntime/tools
Ye Wang 6856619b18
Decoder Attention CUDA Op (#9792)
* add kernel interface

* register kernel

* add self/cross qkv projection without cache

* add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H)

* refactor ConcatPastToPresent

* DecoderQkvToContext interface

* q,k,v buffer and cache as output

* qk, pv and transctx

* fix compiler error on linux machine

* key_padding_mask

* add test_parity file. However not runnable

* add partial unittest

* made partial attributes to inputs

* --gen_doc

* change kernel interface, add more tests

* morre parity tests

* fix test

* fix typo

* transpose optimizer has bug. remove it temporarily

* add input shape checks

* add type/shape inference

* fix cache shape check

* fix rocm build failure

* fix rocm build error

* review comments

* review comments
2021-11-19 19:25:36 -08:00
..
ci_build Decoder Attention CUDA Op (#9792) 2021-11-19 19:25:36 -08:00
doc Add graphviz into Dockerfile images for Python API documentation (#7819) 2021-06-02 16:12:54 -07:00
nuget Enable building winml with --build_nuget (#9632) 2021-11-04 00:42:51 -07:00
perf_util Update mysql-connector-java (#5802) 2020-11-16 14:09:14 -08:00
python Update mobile prebuilt package ops to add support for opset 14 and 15 (#9717) 2021-11-18 10:44:39 +10:00