Commit graph

357 commits

Author SHA1 Message Date
Edward Chen
3bc91c2151
Move reduced ops files into build directory (#10030)
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
2021-12-28 19:04:20 -08:00
Vincent Wang
ceb17f82ff
Use FusedMatMul When Transpose is Between First Dim and Contiguous Batch Dims (#9734)
* fusedmatmul support transpose batches

* fix win build

* fix contrib op md

* more comments
2021-12-27 10:49:46 +08:00
Yufeng Li
12ee2e942f
add int8_t for Resize (#10067)
As we support quantization for format s8s8, we need Resize to support int8_t.
2021-12-17 15:36:09 -08:00
Tianlei Wu
ef36488df0
Add BeamSearch operator for GPT-2 decoding (#9680)
* Add BeamSearch operator and CPU implementation
* Add ONNX conversion script
2021-12-16 16:08:05 -08:00
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
jingyanwangms
8043a9facc
Bump master version to 1.11 (#9957)
* Bump master version to 1.11

* Update Windows AI version

* update version in onnxruntime_c_api.cc
2021-12-14 23:32:06 -08:00
Nat Kershaw (MSFT)
b4434c7694
Automate generation of C/C++ API docs (#9997) 2021-12-10 17:45:50 -08:00
Yufeng Li
ffdafb2012
add fallback of s8s8 support on x64 (#9995)
* add fallback of s8s8 support on x64
2021-12-10 11:33:19 -08:00
Scott McKay
00c979db4d
Update doc for operators/opsets supported by mobile package (#9899) 2021-12-02 13:51:22 +10:00
Sherlock
6de79d82c8
Fix Training Packaging pipeline (#9885)
* Fix Training Packaging pipeline
2021-11-30 15:26:10 -08:00
Yufeng Li
a0afd7303d
add int8_t support for pool operators (#9852)
* add int8_t support for pool operators
2021-11-29 18:43:43 -08:00
Ye Wang
6856619b18
Decoder Attention CUDA Op (#9792)
* add kernel interface

* register kernel

* add self/cross qkv projection without cache

* add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H)

* refactor ConcatPastToPresent

* DecoderQkvToContext interface

* q,k,v buffer and cache as output

* qk, pv and transctx

* fix compiler error on linux machine

* key_padding_mask

* add test_parity file. However not runnable

* add partial unittest

* made partial attributes to inputs

* --gen_doc

* change kernel interface, add more tests

* morre parity tests

* fix test

* fix typo

* transpose optimizer has bug. remove it temporarily

* add input shape checks

* add type/shape inference

* fix cache shape check

* fix rocm build failure

* fix rocm build error

* review comments

* review comments
2021-11-19 19:25:36 -08:00
Vincent Wang
f390347c11
Add CUDA Kernels of RandomNormal[Like], RandomUniform[Like] (#9761) 2021-11-19 08:18:34 +08:00
Viswanath Boga
9d84811fb6
fixing pypi pipeline for release (#9716)
* fixing pypi pipeline for release

* updated the script and correct python version

* updated the version correctly with script changes

* Remove 1.9.1
2021-11-10 17:33:51 -08:00
satyajandhyala
229c9a4e1c
Added Trilu CUDA kernel. (#9633)
* Added Trilu CUDA kernel.

* Added TriluGrad.

* Added a training testcase for Trilu.

* Added Trilu gradient checker test.
2021-11-09 11:20:17 -08:00
Hariharan Seshadri
bbeceb7541
Support optional type in ORT (#8339) 2021-11-04 15:01:42 -07:00
Viswanath Boga
85874bb315
embed layer fusion gpt2 (#9336)
* Changes to fuse embed layer for gpt2, kernal changes pending

* verified add output and regular add match

* Test added for additional output embedlayernorm, working on CUDA

* Test passing on CPU

* updated convert_to_onnx toll to check parity correctly

* removed some debugs

* couple of TODO left as in optimizer.py

* removed changes to optimizer.py

* fixing build

* fixing build

* updated order of initilization

* added a test case for float16

* updating the docs

* updating tests failing due to embed layer fusion

* update unit tests

* updating CUDA documentation in operatorkernels.md

* addressing comments

* OperatorKernels.md updated with CUDA

* adding TODO to qembed_layer

* minor edit

* updated docs

* addressing comments

* adding position ids to embed layer gpt2

* updating fused gpt2 model

* added extra test

* remove comments

* addressing comments

* contrib_defs.cc updated

* all tests passing

* fixing a typo

* minor edit

* trigger build

* qembedlayernorm checkinputs updated

* fixing build error

* fixing build error

* fixing build error
2021-10-28 11:06:26 -07:00
Bowen Bao
e983f37121
Bifurcation detector for aggressive decoding (#9432)
```
Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens,
starting from previous suffix match index, and predicted tokens.
Concat predicted tokens, starting from bifurcation index, to the back
of current tokens. This forms the output tokens.
Detect suffix match index in source tokens, between source tokens and output tokens.
Detection is based on finding the appearances of last n-gram in output tokens
in source tokens.
A match is considered found if source tokens contain a single matching n-gram.
Return the index of the start of the n-gram in source tokens.
No matching if found if src tokens contain multiple or zero matching n-grams. Return -1.
```
2021-10-19 19:53:56 -07:00
Hariharan Seshadri
4698b73725
Fix output shape description of Attention op's schema (#9406) 2021-10-19 15:56:35 -07:00
Xavier Dupré
11f0081c1e
Remove tensorflow, tf2onnx from the list of dependencies for the documentation (#9221)
* Remove tensorflow, tf2onnx from the list of dependencies for the documentation
* improve documentation
* update API
2021-10-14 18:07:35 +02:00
mindest
f9cf62912a
Add same_shape case for BiasDropout (#9188)
* bias dropout improvement

* add transform case for same shape case

* combine kernel

* merge with vectorized kernel

* use "has_same_shape_bias"

* minor: a "N % 4 != 0" case

* add op UT for has_same_shape_bias

* address comments; add param case for 1d bias;
add param case tests for 1d and same-shape bias

* rewrite logic condition

Co-authored-by: Peng Wang <pengwa@microsoft.com>
2021-10-12 19:57:38 +08:00
ashbhandare
35c2102cfa
Fixes for GatherND, Multinomial (#9143)
* register gathernd kernel, aten multinomial

* fix CI, add test

* review comments
2021-10-05 14:51:58 -07:00
Ye Wang
4934455ab6
Bumping up to 1.10 (#9006)
* bump to 1.10

* Update Versioning.md

* Update README.rst

* Change opset version to 15
2021-09-22 16:34:28 -07:00
Jason
4e5bc8365b
Add Paddle2ONNX to Versioning.md (#9067)
* Add Paddle2ONNX to Versioning.md
2021-09-22 13:38:14 -07:00
Pranav Sharma
dae37dc946
Fix S360 issue by using "use strict" for javascript code. (#9128) 2021-09-20 20:32:44 -07:00
Ryan Hill
6ae5f7a244
C API Docs - Add build instructions (#9106)
* Update Doxyfile, add build instructions to header
* Update paths in README.md
2021-09-17 18:40:27 -07:00
Ryan Hill
280e79463a
FIll in more documentation (#9088)
Fix plural values with %s
Fix more symbol links
Add custom header for web metrics
2021-09-16 17:08:27 -07:00
Zuwei Zhao
ff66cfdfa6
Enable linking in exception throwing support library when build onnxruntime wasm. (#8973)
* Enable linking in exception throwing support library when build onnxruntime webassembly containing onnxruntime-extensions.

* Add flag in build.py to enable linking exceptions throwing library.

* Update onnxruntime-extensions document and bind custom_ops build flag with use_extensions.

* Update doc.

* Update cgmanifest.json.

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-09-10 22:09:16 +08:00
Ryan Hill
2439ced3ec
API Documentation (#8948)
* Make help information compile properly
2021-09-09 22:04:51 -07:00
ytaous
0193490cbf
ReduceMin - add int64 cuda kernel support for opset12/13 (#8966)
* ReduceMin - int64 support

* fix doc

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-07 17:01:26 -07:00
Ye Wang
e2194797a7
bumping up to version 1.9 (#8982)
* bump up version

* makes the windowAI column align with ORT version

* update the hardcoded version string

* fix a typo
2021-09-07 14:30:55 -07:00
Zuwei Zhao
89e8bff121
Enable selecting custom ops in onnxruntime-extensions. (#8826)
* Enable selecting custom ops in onnxruntime-extensions.

* Move cmake_helper.py.

* Remove over-indented spaces.

* Add doc.

* Remove onnxruntime-extensions from git submodules, and user should pass path of onnxruntime-extensions for build.

* Modify doc.

* Remove argument --enable_onnxruntime_extensions and use --onnxruntime_extensions_path.

* Fix build error.

* Fix build error.

* Use onnxruntime_extensions_path.

* support both submodule and external source folders

* refinement

* Update cgmanifest.json

* Support building onnxruntime-extensions from either git submodule or pre-pulled path.

* Update doc.

* more standard name

* update docs

* add the copyright header

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-08-27 21:45:52 -07:00
Hariharan Seshadri
cee79526fd
Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442) 2021-08-25 12:04:20 -07:00
Hariharan Seshadri
17b0664e34
Optimize sequence type usage on CUDA [2/n] (#8720) 2021-08-24 10:40:28 -07:00
XiyinOSS
19b82b438b
GridSample OP implementation for CPU and CUDA (#8551)
* GridSample OP implementation for CPU and CUDA

**Description**: This change contains implementation for torch grid_sample OP.
Cuda implementation contains contribution from Muscle Wu.

* Use interpolation for out-of-bound points in zero padding mode

Out-of-bound points in zeros padding mode changed from constant 0 to
interpolation of surrounding pixels. This aligns with Pytorch implementation.

A bug in CUDA batch offset calculation is fixed.

Custom op exporter type is added.

* Fix nearest bug in CPU

* Update per CI build finding and review comments

* Force float to avoid potential integer T issue

* Style update

* PR update

* Remove c++17 feature from cuda code
2021-08-20 12:37:38 -07:00
harshithapv
c24335246b
Support bool type for Pad Op and fix Unsqueeze in Tile grad for Opset 13 (#8602)
* changes

* tile grad unsqueeze fix for opset 13

* clean up

* remove bool support for opset 2 to 12 for Pad as it is not supported.

* Copy OperatorKernels.md from artifacts of Windows CI build.
2021-08-11 11:21:02 -07:00
Xavier Dupré
064a385b59
Support int8 for operator Split (#8615)
* Support int8 for operator Split
2021-08-10 23:04:16 +02:00
Changming Sun
ed17ca3595
Remove onnxruntime/core/protobuf (#8617)
* remove onnxruntime/core/protobuf

* Update How_To_Update_ONNX_Dev_Notes.md
2021-08-10 09:36:27 -07:00
Guoyu Wang
52a212e4f1
Bump ORT master version to 1.8.2 (#8646) 2021-08-09 11:10:29 -07:00
Yulong Wang
1b902d0227
doc: add ort-web related instructions to update onnx doc (#8500)
* doc: update instructions for ort web docs

* revise readme
2021-08-06 15:09:11 -07:00
Ashwini Khade
96eb9810ba
Update onnx (#8458)
* updates for picking pnnx commit

* add tests filter to c# tests

* plus test fixes

* fix versioning for contrib ops

* fix tests

* test filter for optional ops

* more versioning related updates

* fix test

* fix layernorm spec

* more updates

* update docs

* add more test filters

* more filters

* update binary size threshold

* update docs

* plus more fixes

* updates per review

* update to release commit

* add filters for optional type tests

* plus updates
2021-08-05 09:21:44 -07:00
Chun-Wei Chen
9d88b1de78
correct supported ONNX version (#8590) 2021-08-05 06:49:50 -07:00
Yufeng Li
ceeb1a65d6
Add quantization support of GEMM directly with QGemm (#8447)
QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision.

Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute.

The formula for quantized GEMM is:
Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which,
C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)
2021-07-27 21:21:49 -07:00
Xavier Dupré
a9fc3c448c
Improves documentation, show InferenceSession contructor attributes (#8494)
* include constructor parameters in the python documentation
* expose more classes into the documentation
2021-07-26 15:58:47 +02:00
Dmitri Smirnov
950fe5e28b
Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038)
SparseTensor support
  Implement Builder pattern
  Fix support for 1-D and 2-D COO indices
  Implement and test CSR support.
  Handle shape inference for SparseTensors
  Implement conversion for COO, CSR and tests.
  Address the case where constant sparse initializer is the output.
  Implement test infra for SparseTensors
  Implement SparseDenseMatMul for Csr and COO and tested it.
  Add hash for SparseToDenseMatMul
  Finish shared provider refactor
  Refactor GetOrCreate to Create
  Working on py interface
  Expose OrtDevice and use it in allocate_numpy
	Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
	Add and test to_cuda()
	Add accessors to format specific indices
	Test values and indices views, read-only flag, after GC access
	Add sparse related methods to OrtValue
	Re-work SparseTensor wrapper, add OrtValue methods
	Rework numpy_array_to_cuda/to_cpu
	Add run_with_ort_values
	Add models and test sparse_mat_mul with run_with_ort_values
	Refactor sparse tensor to use a single buffer
        Ifdef x86 Eigen CSR sparse matmul implementation
        Exclude broken test, check for string type when copying cross device
       Split pybind schema, regenerate docs, add exclusion
       Conditionally exclude schema module
       Update docs fix cuda build
       Add test to a filter and renerate JS docs
      Add conversion and test string support for sparse tensors
      Exclude conversion utils from minimal build
      Add CUDA Memcpy and adjust provider interfaces
2021-07-22 15:24:36 -07:00
DeyuHuang
4275055868
Add Gridsampler contrib op (#8372)
* add Gridsampler contrib op

* fix gridsampler_paddingmode_border test

* disable the tests until the kernel added

* fix CI failure

* change GridSampler to GridSample
2021-07-22 15:39:28 +08:00
harshithapv
0f989c6162
bumping onnxruntime version to 1.8.1 (#8429) 2021-07-19 16:48:56 -07:00
Viswanath Boga
afce0e2543
Attention kernel update to handle different Q,K,V hidden sizes (#8039)
* changes working to convert akv nodes

* changes to replace nodes

* changes to accomodate qkv hidden sizes as attributes

* kernel to accept qkv_hidden_size attributes

* Working till compute for varied dimension, todo applyattention()

* changes to make all regression tests work

* inference running successfully without prepack

* success inference with pre-pack weights

* add test for diff sizes

* bias shape need not be a mul of 3

* get the output_hidden_size from input

* infer output shape from input

* merge with master

* cleaning up files that got merged wrong

* accurancy at accepted level

* added unit test case for different dimensions

* all unit tests passing

* packed weights working for attention

* prepacked weights working

* added test case for newly added extra qk input

* updated unit test to test only extra add qk

* fixing build error

* removing few debugs

* reverting test changes

* all python test passing

* cleaning up

* new unit test added, major clean up of code

* removed extra code

* minor

* minor fix to tests

* prepack weights code cleaned up

* compacted compute() in attention.cc

* reformat compute()

* making a parameter T

* adding 3 q,k,v buffers in all cases

* fixing build

* running tests only on cpu

* Updating docs

* trigger ci builds

* Addressing comments in PR

* addressing some more comments

* get add_qk_str from add_qk node directly

* updating docs, added extra check to verify attn inputs

* Optimized the extra add by parallelizing

* added attention_shape to symbolic_shape_infer.py

* minor refactoring to address comments
2021-07-19 12:21:33 -07:00
Ye Wang
04297110c3
Support int64 in ReduceMin cuda op for Opset 14 (#8307)
* reducemin int64_t support

* fix xxcuda.so load error

* testtest

* refactor

* update doc

* propagate types to opset14

* re-generate doc

* rename macro
2021-07-13 16:18:06 -07:00
Zuwei Zhao
0a5b75f5cd
Update submodule onnxruntime-extensions. (#8282)
* Update submodule onnxruntime-extensions to latest.

* Add document for onnxruntime-extensions.

* Update cgmanifest.json for onnxruntime-extensions.

* Add example in JavaScript.

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-07-13 10:21:11 +08:00