Commit graph

326 commits

Author SHA1 Message Date
Zuwei Zhao
89e8bff121
Enable selecting custom ops in onnxruntime-extensions. (#8826)
* Enable selecting custom ops in onnxruntime-extensions.

* Move cmake_helper.py.

* Remove over-indented spaces.

* Add doc.

* Remove onnxruntime-extensions from git submodules, and user should pass path of onnxruntime-extensions for build.

* Modify doc.

* Remove argument --enable_onnxruntime_extensions and use --onnxruntime_extensions_path.

* Fix build error.

* Fix build error.

* Use onnxruntime_extensions_path.

* support both submodule and external source folders

* refinement

* Update cgmanifest.json

* Support building onnxruntime-extensions from either git submodule or pre-pulled path.

* Update doc.

* more standard name

* update docs

* add the copyright header

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
Co-authored-by: Wenbing Li <wenbingl@outlook.com>
Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
2021-08-27 21:45:52 -07:00
Hariharan Seshadri
cee79526fd
Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442) 2021-08-25 12:04:20 -07:00
Hariharan Seshadri
17b0664e34
Optimize sequence type usage on CUDA [2/n] (#8720) 2021-08-24 10:40:28 -07:00
XiyinOSS
19b82b438b
GridSample OP implementation for CPU and CUDA (#8551)
* GridSample OP implementation for CPU and CUDA

**Description**: This change contains implementation for torch grid_sample OP.
Cuda implementation contains contribution from Muscle Wu.

* Use interpolation for out-of-bound points in zero padding mode

Out-of-bound points in zeros padding mode changed from constant 0 to
interpolation of surrounding pixels. This aligns with Pytorch implementation.

A bug in CUDA batch offset calculation is fixed.

Custom op exporter type is added.

* Fix nearest bug in CPU

* Update per CI build finding and review comments

* Force float to avoid potential integer T issue

* Style update

* PR update

* Remove c++17 feature from cuda code
2021-08-20 12:37:38 -07:00
harshithapv
c24335246b
Support bool type for Pad Op and fix Unsqueeze in Tile grad for Opset 13 (#8602)
* changes

* tile grad unsqueeze fix for opset 13

* clean up

* remove bool support for opset 2 to 12 for Pad as it is not supported.

* Copy OperatorKernels.md from artifacts of Windows CI build.
2021-08-11 11:21:02 -07:00
Xavier Dupré
064a385b59
Support int8 for operator Split (#8615)
* Support int8 for operator Split
2021-08-10 23:04:16 +02:00
Changming Sun
ed17ca3595
Remove onnxruntime/core/protobuf (#8617)
* remove onnxruntime/core/protobuf

* Update How_To_Update_ONNX_Dev_Notes.md
2021-08-10 09:36:27 -07:00
Guoyu Wang
52a212e4f1
Bump ORT master version to 1.8.2 (#8646) 2021-08-09 11:10:29 -07:00
Yulong Wang
1b902d0227
doc: add ort-web related instructions to update onnx doc (#8500)
* doc: update instructions for ort web docs

* revise readme
2021-08-06 15:09:11 -07:00
Ashwini Khade
96eb9810ba
Update onnx (#8458)
* updates for picking pnnx commit

* add tests filter to c# tests

* plus test fixes

* fix versioning for contrib ops

* fix tests

* test filter for optional ops

* more versioning related updates

* fix test

* fix layernorm spec

* more updates

* update docs

* add more test filters

* more filters

* update binary size threshold

* update docs

* plus more fixes

* updates per review

* update to release commit

* add filters for optional type tests

* plus updates
2021-08-05 09:21:44 -07:00
Chun-Wei Chen
9d88b1de78
correct supported ONNX version (#8590) 2021-08-05 06:49:50 -07:00
Yufeng Li
ceeb1a65d6
Add quantization support of GEMM directly with QGemm (#8447)
QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision.

Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute.

The formula for quantized GEMM is:
Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which,
C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)
2021-07-27 21:21:49 -07:00
Xavier Dupré
a9fc3c448c
Improves documentation, show InferenceSession contructor attributes (#8494)
* include constructor parameters in the python documentation
* expose more classes into the documentation
2021-07-26 15:58:47 +02:00
Dmitri Smirnov
950fe5e28b
Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038)
SparseTensor support
  Implement Builder pattern
  Fix support for 1-D and 2-D COO indices
  Implement and test CSR support.
  Handle shape inference for SparseTensors
  Implement conversion for COO, CSR and tests.
  Address the case where constant sparse initializer is the output.
  Implement test infra for SparseTensors
  Implement SparseDenseMatMul for Csr and COO and tested it.
  Add hash for SparseToDenseMatMul
  Finish shared provider refactor
  Refactor GetOrCreate to Create
  Working on py interface
  Expose OrtDevice and use it in allocate_numpy
	Adjust Sparse interfaces, add support for string SparseTensor. Add tests.
	Add and test to_cuda()
	Add accessors to format specific indices
	Test values and indices views, read-only flag, after GC access
	Add sparse related methods to OrtValue
	Re-work SparseTensor wrapper, add OrtValue methods
	Rework numpy_array_to_cuda/to_cpu
	Add run_with_ort_values
	Add models and test sparse_mat_mul with run_with_ort_values
	Refactor sparse tensor to use a single buffer
        Ifdef x86 Eigen CSR sparse matmul implementation
        Exclude broken test, check for string type when copying cross device
       Split pybind schema, regenerate docs, add exclusion
       Conditionally exclude schema module
       Update docs fix cuda build
       Add test to a filter and renerate JS docs
      Add conversion and test string support for sparse tensors
      Exclude conversion utils from minimal build
      Add CUDA Memcpy and adjust provider interfaces
2021-07-22 15:24:36 -07:00
DeyuHuang
4275055868
Add Gridsampler contrib op (#8372)
* add Gridsampler contrib op

* fix gridsampler_paddingmode_border test

* disable the tests until the kernel added

* fix CI failure

* change GridSampler to GridSample
2021-07-22 15:39:28 +08:00
harshithapv
0f989c6162
bumping onnxruntime version to 1.8.1 (#8429) 2021-07-19 16:48:56 -07:00
Viswanath Boga
afce0e2543
Attention kernel update to handle different Q,K,V hidden sizes (#8039)
* changes working to convert akv nodes

* changes to replace nodes

* changes to accomodate qkv hidden sizes as attributes

* kernel to accept qkv_hidden_size attributes

* Working till compute for varied dimension, todo applyattention()

* changes to make all regression tests work

* inference running successfully without prepack

* success inference with pre-pack weights

* add test for diff sizes

* bias shape need not be a mul of 3

* get the output_hidden_size from input

* infer output shape from input

* merge with master

* cleaning up files that got merged wrong

* accurancy at accepted level

* added unit test case for different dimensions

* all unit tests passing

* packed weights working for attention

* prepacked weights working

* added test case for newly added extra qk input

* updated unit test to test only extra add qk

* fixing build error

* removing few debugs

* reverting test changes

* all python test passing

* cleaning up

* new unit test added, major clean up of code

* removed extra code

* minor

* minor fix to tests

* prepack weights code cleaned up

* compacted compute() in attention.cc

* reformat compute()

* making a parameter T

* adding 3 q,k,v buffers in all cases

* fixing build

* running tests only on cpu

* Updating docs

* trigger ci builds

* Addressing comments in PR

* addressing some more comments

* get add_qk_str from add_qk node directly

* updating docs, added extra check to verify attn inputs

* Optimized the extra add by parallelizing

* added attention_shape to symbolic_shape_infer.py

* minor refactoring to address comments
2021-07-19 12:21:33 -07:00
Ye Wang
04297110c3
Support int64 in ReduceMin cuda op for Opset 14 (#8307)
* reducemin int64_t support

* fix xxcuda.so load error

* testtest

* refactor

* update doc

* propagate types to opset14

* re-generate doc

* rename macro
2021-07-13 16:18:06 -07:00
Zuwei Zhao
0a5b75f5cd
Update submodule onnxruntime-extensions. (#8282)
* Update submodule onnxruntime-extensions to latest.

* Add document for onnxruntime-extensions.

* Update cgmanifest.json for onnxruntime-extensions.

* Add example in JavaScript.

Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>
2021-07-13 10:21:11 +08:00
Hariharan Seshadri
5369821ad6
Support SpaceDepth ops in the CUDA and ROCM EPs (#7960) 2021-07-09 01:00:22 -07:00
Nick Kreeger
800b62a139
Create a quantized EmbedLayerNorm for ORT. (#8124)
Create a quantized EmbedLayerNorm Op for ORT
2021-06-25 17:51:43 -05:00
Negin Raoof
80b7b134bf
Adding optional ops in contrib ops (#7946)
* Added optional const spec
2021-06-24 13:16:31 -07:00
Bowen Bao
51c12a715b
Add NGramRepeatBlock contrib op (#8078)
**Description**: 
Enforce no repetition of n-grams. Scores are set to `-inf` for tokens that form a repeated n-gram if added to the back of the input_ids.

**Motivation and Context**
Needed by transformer models in sequence generation algorithms (greedy search and beam search). This module has heavy impact on performance, and can be highly parallelized.
2021-06-21 10:21:48 -07:00
Olivia Jain
c72a8c7ff4
Upgrade tf 2.4.1 to 2.4.2 for component governance (#8036)
* Upgrade tf 2.4.1 to 2.4.2 for component governance

* Trial run with tf 2.5.0
2021-06-14 09:30:58 -07:00
Xavier Dupré
6d7461795f
Update Version.md (#8021)
Fix the correct supported opset 1.8.0.
2021-06-13 18:52:40 +02:00
RandySheriffH
1a5ee11dbd
Implement Sequence Ops GPU (#7863) 2021-06-07 15:30:26 -07:00
Thiago Crepaldi
c45ac166d3
Add graphviz into Dockerfile images for Python API documentation (#7819) 2021-06-02 16:12:54 -07:00
Scott McKay
0fbec1b9c1
Update the operator documentation generation (#7787)
* Update the operator documentation generation
  - Make layout a little nicer
  - Update to latest supported operators including training
  - Fix some links that are broken when the docs content is copied to github-pages
  - Fix incorrect usage of 'onnx.ai.ml' as the default domain
    - ML ops are now separated from the real default domain of 'onnx.ai'
  - Include CPU, CUDA and training kernels
    - exclude DNNL as it's not an EP we own

* There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library.

* Enable validation of both contrib ops and operator kernels in build
Filter generation so it's deterministic
Add ability for CI to publish the md files as build artifacts if they differ so a developer can download and add to their PR to resolve any diffs.
Remove workarounds for github-pages as that will now link to the github docs which display correctly
2021-06-02 17:47:40 +10:00
Siva Popuri
c08bb4eee3
Update docs/ONNX_Runtime_Server_Usage.md (#7818)
Making it clear in the documentation to proactively inform users.
2021-05-26 16:17:20 -07:00
Scott McKay
57782b3463
Add supported operators/types documentation for the ORT Mobile package (#7807)
* Add ability to generate documentation for the ORT Mobile package using the build configuration as input.
2021-05-26 15:57:40 +10:00
Xueyun Zhu
e92b3c1394
bumping up version number to 1.8 (#7733)
* bump to 1.8

* fix windows AI
2021-05-18 09:03:37 -07:00
Thiago Crepaldi
4fe2ffae16
Fix ORTModule python doc generation (#7704)
* Fix ORTModule python doc generation

* Address comment
2021-05-17 09:55:49 -07:00
Yufeng Li
a74e41e47d
Add non-zero zp support for quant matmul and attention (#7570)
* add non-zero zp support
* support A and B scale with any dimensions
2021-05-14 16:50:31 -07:00
Zhang Lei
50c5edcf13
Add nhwc support for QLinearAveragePool operator (#7656)
* Add nhwc support for QLinearAveragePool operator

* Update ContribOperators.md

* Update OperatorKernels.md with cpu,dnnl and cuda enabled.
2021-05-13 22:05:30 -07:00
Faith Xu
7cb9077043
Fix readme page (#7659)
* Delete mobile page

Moved to: https://www.onnxruntime.ai/docs/how-to/deploy-on-mobile.html

* Delete ONNX_Runtime_Mobile_NNAPI_perf_considerations.md

Moved to: https://www.onnxruntime.ai/docs/reference/execution-providers/NNAPI-ExecutionProvider.html#performance-tuning

* Fix links to website docs

* Update some summary text

* Add space
2021-05-12 14:30:23 -07:00
Tracy Sharpe
16297a8e61
Implement NCHWc Upsample linear mode (#7623)
Extend the existing NCHWc Upsample operator to support linear modes too.
2021-05-10 12:16:16 -07:00
Ye Wang
803837df63
Add 4dmask support for attention cuda kernel (#7591)
* checkin

* add 4dmask support in attention cuda op

* trim

* add comments

* fix build/test error

* review comments and add tests

* sync doc

* review comments

* minor change
2021-05-07 20:17:29 -07:00
Scott McKay
d6df5764d7
Android package infrastructure (#7430)
* Include ORT format model conversion scripts and infrastructure in ORT python package.
  - tweak existing script setup so it can be easily run directly and from the ORT python package
Add config file and readme for Android minimal build package
Update ORT Mobile doco
Disable warning if 'all' optimizations are enabled but NCHWc transformer is excluded (device specific optimizations don't apply in this scenario so the warning is moot).

* Address PR comments
2021-04-30 14:23:54 +10:00
Changming Sun
1012535dab
Change onnxruntime::make_unique to std::make_unique (#7502)
1. Change onnxruntime::make_unique to std::make_unique
2. Add "-std=c++14" to ROCM EP's build flags.
2021-04-29 17:04:53 -07:00
KeDengMS
8e21329206
Update nuphar notebook model download url (#7475) 2021-04-27 21:18:06 -07:00
Edward Chen
d21304ceb0
Initial Objective-C API (#7366)
Initial implementation of an Objective-C API.
2021-04-27 10:06:30 -07:00
Tracy Sharpe
d13e5b2fd9
NCHWc: ReorderInput improvements (#7442)
Implement various improvements related to reordering a tensor for use by NCHWc operations:

Relax the requirement that the input channel count must be a multiple of the NCHWc block size (either 8 or 16 depending on ISA). The requirement now is that the channel count must be a multiple of 4. The implementation of MlasReorderInputNchw would need further work to support relaxing this further, but I don't have any models where I've observed this to be necessary yet.
Support fusing a Transpose(NHWC->NCHW) into a following ReorderInput. ReorderInput now has a channels_last attribute as was done in the past for ReorderOutput. This helps with models converted from TF where the converter is unable to remove all Transpose operations.
Add threading support to ReorderInput to accelerate performance (ReorderOutput will come later).
2021-04-26 19:16:39 -07:00
Zhang Lei
ada0fbbd2d
Implement qlinear concat and unit test. (#7341)
* Implement qlinear concat and unit test.
Add quantization tools for QLinearConcat and it quantization tests.

* Add kernel def hash for QLinearConcat.

* Change according to PR. Add qdq transformer support for QLinearConcat.

* Add QDQ Transformer unittest. Fix typo on domain.

* remove dup logic of no use.

* fix x86 build error.

* Update operator docs.
2021-04-26 13:38:40 -07:00
Changming Sun
afa7b23609
Update docs/ContribOperators.md and the script that generates it. (#7399) 2021-04-21 16:20:56 -07:00
Changming Sun
5bd192c439
Update ContribOperators.md (#7246) 2021-04-05 17:11:33 -07:00
Thiago Crepaldi
867804bea1
Add auto doc gen for ORTModule API during CI build (#7046)
In addition to ORTModule auto documentation during packaging, this PR also update golden numbers to fix CI
2021-03-22 10:20:33 -07:00
Xavier Dupré
514444d820
Fix pipeline generating python documentation (#7027)
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
2021-03-17 16:57:51 -07:00
Raduan Al-Shedivat
743a93faf3
Fix broken link in server usage and remove absolute path from dockerfiles readme (#6926) 2021-03-09 11:54:21 -08:00
Edward Chen
b6c4a7ac54
Support required types when excluding typed registrations (#6871) 2021-03-08 08:22:07 -08:00
Edward Chen
09a5d6a9dc
Update docs/ONNX_Runtime_for_Mobile_Platforms.md with info about op type reduction. (#6747) 2021-02-23 10:25:23 -08:00