onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-22 22:01:08 +00:00

Author	SHA1	Message	Date
RandySheriffH	62226d030f	Cherry-pick tagged commits to 1.12.0 release candidate (#12097 ) * Update ONNX to 1.12 (#11924) Follow-ups that need to happen after this and before the next ORT release: * Support SequenceMap with https://github.com/microsoft/onnxruntime/pull/11731 * Support signal ops with https://github.com/microsoft/onnxruntime/pull/11778 Follow-ups that need to happen after this but don't necessarily need to happen before the release: * Implement LayerNormalization kernel for opset version 17: https://github.com/microsoft/onnxruntime/issues/11916 Fixes #11640 * Dll version fix ovep4.1 (#11953) * Setting default version values for ovep dlls as well * Update backend_manager.cc Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: mohsin <mohsinx.mohammad@intel.com> * Optimize t5 encoder in beam search (#11926) * ooptimize t5 encoder * update * update * update * refactor expand impl * cuda tests passed * update * alignment * more alignments * review comments * Allow saving on CPU usage for infrequent inference requests by reducing thread spinning (#11841) Introduce Start/Stop threadpool spinning switch Add a session config option to force spinning stop at the end of the Run() * Restructure function inliner (#11731) * Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change * Deprecate APIs returning raw ptrs and provide replacements (#11922) Provider better documentation * register signal ops for opset 17 (#11778) * Register signal ops for op set 17 Note code is mostly being moved, not added. These ops were previously only registered as Microsoft contrib ops and only built if `BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx standard op set in version 17. Main components of this change: * Move the kernels from the conrib_ops directory to the core directory. * Add function bodies for ms experimental ops. This will allow old models that use the contrib ops to continue to function. All the function bodies consist of a single op (the new standard op), so performance overhead should be minimal. Minor clean-up also in this change: * De-duplicate get_scalar_value_from_tensor: put it in a new utils.h. * Fix some bugs that caused compilation errors with the experimental ops. Tested with `build.sh --ms_experimental` * Fix some spelling errors and lint violations. * Replace a couple of switch statements with `MLTypeCallDispatcher`. * Use `InlineVector` instead of `std::vector`. Unblocks https://github.com/microsoft/onnxruntime/issues/11640 * Include opset 15 in Conv+BatchNormalization fusion (#11960) * Fix WinML Tests are still targetting deprecated (deleted) experimental signal op definitions (#12006) * fix winml tests * remove legacy test * switch idft -> dft+inverse attr * upgrade opset 13->17 for signal ops tests * [C# Tests] Add support for double tensor output in TestPreTrainedModels. (#12008) Add support for double tensor output in TestPreTrainedModels. * DML EP ResNet50 opset 15 fails in ONNX checker for FusedBatchNormalization lacking training_mode attribute (#12010) FusedBatchNormalization include training_mode attribute * Generalize native op creation (#11539) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo * [DML EP] Pad operator: Handle negative pad counts (#11974) * Pad fallback to CPU * Added queryPad in operatorRegistration.cpp * Acknowledged PR comments * Used any_of * used none_of instead of any_of Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> * Add warning about future computation change for ConvTranspose with auto_pad (#11984) * Add warning about future computation change for Convtranspose with auto_pad * improve msg * update TODO to make lint happy * update more contents for warning and add if * valid was not infected * move it into kernel registration * parse auto_pad myself * try to use conv_transpose_attrs_.auto_pad directly * update roialign cuda impl to onnx opset16 (#12036) * roialign opset16 * fix * fix * Fix windows eager build break by pinning to torch version 1.11.0 (#12033) Fix windows and linux eager build to torch 1.11.0. * Skip Constant Folding for ops producing an optional type output (#11839) * Disable sequence-type tests since C# infra doesn't support well (#12037) * Extend lifetime of KernelDef when creating a standalone op (#12057) place tmp kernel def as local variable to cover the lifetime of kernel creation * Add targets files for new .net6 frameworks (#12016) * Add net6 targets. Remove maccatalyst as we don't have a native build targetting that. * Set platform in macos targets * Add targetFramework entries * Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file. Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions. Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6. * Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines * Fix patch version mismatch Add some extra debug info in case it helps * Debug nuget location in CI * Add workspace entry back in * Add steps * One more attempt with hardcoded nuget.exe path and original android31.0 version * Better fix - found explicit nuget download and updated version there. * flake8 fixes * Fix black complaints. * Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS. * Removed outdated comment * Fix DML custom operators which set descriptor heap to command list (#12059) * Make C# runtest.sh automatically set latest opset (#12039) * Update C# runtest.sh for opset 17 Should have been part of https://github.com/microsoft/onnxruntime/pull/11924 * get appropriate opset version from onnx doc * use absolute rather than relative path * fix typo in var name * Disable DML command list reuse for Xbox (#12063) disable cl reuse for xbox * Add data type check in ConvAddRelu fusion (#12058) * Add undocumented attribute to disable generation of Java bindings from the Android AAR. (#12075) The generated bindings causes C# build errors that require workaround code. Disabling generation should avoid the need for any workarounds. As the user has the C# ORT package with the C# to C bindings there's no need for binding generation that calls the ORT Java API (which is C# -> Java ->C). * enable the extensions custom build for java and android (#11823) * generate quantization parameter for outputs (#12089) * DML EP Update to DML 1.9 (#12090) * Update to DML 1.9 * Appease obnoxious Python formatting tool * Fix orttraining-linux-ci-pipeline - Symbolic shape infer (#11965) fix symbolic shape error due to upgraded numpy + legacy sympy * check consumers of dq node before swap dq and transpose (#12099) * check consumers of dq node before swap dq and transpose * add unit test Co-authored-by: Gary Miguel <garymiguel@microsoft.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: mohsin <mohsinx.mohammad@intel.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> Co-authored-by: Sheil Kumar <smk2007@gmail.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: sumitsays <sumitagarwal330@gmail.com> Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com> Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Wil Brady <25513670+WilBrady@users.noreply.github.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com> Co-authored-by: Justin Stoecker <justoeck@microsoft.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: pengwa <pengwa@microsoft.com>	2022-07-06 21:35:19 -07:00
Ye Wang	859ef277a0	apply zcode changes to the beam search op (#11880 ) * apply zcode changes to the beam search op * fix pipeline failure * add doc * workaround for C# * update * update * use name zcode * review comment * review comments * fix cpplint * review coments	2022-06-20 18:39:07 -07:00
Tianlei Wu	6ee2c1b5fc	Remove temperature input from BeamSearch operator (#11896 ) * remove temperature input * update index of remaining inputs	2022-06-20 09:50:45 -07:00
Vincent Wang	02724c54ff	[CUDA] Implement BitmaskDropout, BitmaskBiasDropout and BitmaskDropoutGrad (#11534 ) * Implement BitmaskDropout and associated unit tests. * Implement BitmaskDropoutGrad and associated unit tests. * Implement Dropout -> BitmaskDropout rewrite rule and associated unit tests. * Implement (Dropout,DropoutGrad) -> (BitmaskDropout,BitmaskDropoutGrad) rewrite rule. This commit does not yet include unit tests for this rewrite rule. This commit also introduces improved documentation for all changes which will be grouped into this PR. * bitmask dropout * fix win build * bugfix for rocm * bugfix * fix code format * fix ut * fix build break * fix ut in win * resolve comments * fix ut in trt * resolve comments * fix rocm build error * fix typo Co-authored-by: Aidan Beggs <aidanbeggs@microsoft.com>	2022-05-27 17:24:47 +08:00
Xavier Dupré	c37d2728bf	Implement TreeEnsemble for opset(ai.onnx.ml)==3 (#10821 ) * Implement TreeEnsemble for opset(ai.onnx.ml)==3 * use of InlineVector * refactoring * improve attributes retrieval * avoid creating a temporary buffer * modifies onnx.ml.cpu.json * use unordered_map * update docs/OperatorKernels.md * address PR comments (TH -> ThresholdType, ORT_RETURN...) * add a python unit test to load a TreeEnsembleRegressor following ai.onnx.ml==3 specifications	2022-03-30 12:53:12 +02:00
Vincent Wang	6a6840d5c6	Fuse LayerNormalization for Apex O2 (#10233 )	2022-03-29 21:22:04 +08:00
pengwa	89ef987ab1	Improve NonZero on CUDA/ROCM (#10307 ) * improve NonZero * fix megatron_fp16 optimzier, fix the doc * multi_tensor_applier * resolve comment * fix building warning * fix build error when enabling training and use tensorrt	2022-03-25 07:35:45 +08:00
Hariharan Seshadri	a9d9c6b486	Register CPU, CUDA and ROCM opset-16 kernels for some operators (#10643 )	2022-03-08 09:18:39 -08:00
liqun Fu	da885a72e8	update with onnx 1.11 release (#10441 )	2022-03-07 21:10:55 -08:00
Tianlei Wu	36c3271546	BeamSearch op cuda (#10556 ) Add BeamSearch cuda implementation with support of fp16 GPT-2 subgraph	2022-02-25 13:08:55 -08:00
Scott McKay	df841ee87d	Fix incorrect type constraint registration for operator kernels. (#10489 ) * Fix incorrect type constraint registration for RoiAlign. This led to the input type not actually being checked when matching a kernel as the invalid constraint name is treated as a missing optional input. * fix missing dependency for the unit test exe. Whilst it doesn't link against the CUDA providers lib, without the dependency VS doesn't know it needs to rebuild the library if there are changes. * Add check for invalid type constraints. * Fix invalid registrations for other kernels. * Add hash replacement logic to provide backwards compatibility in ORT format models when the registration is fixed. * Add tests	2022-02-18 16:55:32 +10:00
Viswanath Boga	ad9d2e2e89	Prefix match in first iteration of beam search OP (#10231 ) * Add BeamSearch op schema * Add ONNX conversion for beams search * remove attention_mask and change input order * add option to run baseline * add check data type NULL * applies VerifyNodeAndOpMatch to subgraph * update input_ids shape * Add node name for Cast node * expose API for topk * parse parameters * Add beam search scorer * output results * fix typo * use c++ template and format python * fix build pipeline errors * symbolic shape infer of input onnx * output scores * add kernel def hash * Handle vocab_mask; move CheckSubgraph * undo insert_cast_transformer.cc and fusion_utils.py * fix typo * fix merge * update doc * add repetition penalty * refactoring: add GptSubgraph class * move BeamSearchState from .h to .cc file * adjust logits processor order * add batch generation example * fix repetition penalty for dup words in sequence * Add test * Add no repeat ngram processor * refactoring: move logits processor to classes * fix build warning * show latency * use allocator in beam state * use allocator in sequences * fix build error * move next_positions to beam state * Changes for prefix matching * removing debugs * removing more debugs * clean up * clean up * cpu doc updated * Updated docs * updated prefix_vocab_mask dimension in convert script * changes to support bxs prefix_vocab_mask in beamsearchop kernel * doc update * OperatorKernels.md updated * matching docs from artifacts * minor change in logits processor * Addressing comments * Updated the prefix vocab mask usage properly Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2022-02-03 00:14:39 +05:30
Yufeng Li	1aa0789691	add qdq support for QGemm (#10414 ) * add qgemm in quantization tool * add qdq support for QGemm * fix build break * fix OperatorKernels.md	2022-02-02 10:35:29 -08:00
Yi-Hong Lyu	e27f2dc932	int8/uint8 support for Argmax for opset 1, 11, 12 (#10296 )	2022-01-18 14:37:34 -08:00
Vincent Wang	44e2db9397	CUDA BFloat16 Refactor (#10085 )	2022-01-14 19:38:56 +08:00
Yi-Hong Lyu	499f1d5fd7	Quantization of Argmax (#10213 ) This patch includes: * int8/uint8 support for Argmax * Quantization tool support for Argmax	2022-01-12 14:12:56 -08:00
Yufeng Li	12ee2e942f	add int8_t for Resize (#10067 ) As we support quantization for format s8s8, we need Resize to support int8_t.	2021-12-17 15:36:09 -08:00
Tianlei Wu	ef36488df0	Add BeamSearch operator for GPT-2 decoding (#9680 ) * Add BeamSearch operator and CPU implementation * Add ONNX conversion script	2021-12-16 16:08:05 -08:00
Yufeng Li	ffdafb2012	add fallback of s8s8 support on x64 (#9995 ) * add fallback of s8s8 support on x64	2021-12-10 11:33:19 -08:00
Yufeng Li	a0afd7303d	add int8_t support for pool operators (#9852 ) * add int8_t support for pool operators	2021-11-29 18:43:43 -08:00
Ye Wang	6856619b18	Decoder Attention CUDA Op (#9792 ) * add kernel interface * register kernel * add self/cross qkv projection without cache * add LaunchTransQkv2 for (S,B,X,N,H) -> (X,B,N,S,H) * refactor ConcatPastToPresent * DecoderQkvToContext interface * q,k,v buffer and cache as output * qk, pv and transctx * fix compiler error on linux machine * key_padding_mask * add test_parity file. However not runnable * add partial unittest * made partial attributes to inputs * --gen_doc * change kernel interface, add more tests * morre parity tests * fix test * fix typo * transpose optimizer has bug. remove it temporarily * add input shape checks * add type/shape inference * fix cache shape check * fix rocm build failure * fix rocm build error * review comments * review comments	2021-11-19 19:25:36 -08:00
Vincent Wang	f390347c11	Add CUDA Kernels of RandomNormal[Like], RandomUniform[Like] (#9761 )	2021-11-19 08:18:34 +08:00
satyajandhyala	229c9a4e1c	Added Trilu CUDA kernel. (#9633 ) * Added Trilu CUDA kernel. * Added TriluGrad. * Added a training testcase for Trilu. * Added Trilu gradient checker test.	2021-11-09 11:20:17 -08:00
Hariharan Seshadri	bbeceb7541	Support optional type in ORT (#8339 )	2021-11-04 15:01:42 -07:00
Viswanath Boga	85874bb315	embed layer fusion gpt2 (#9336 ) * Changes to fuse embed layer for gpt2, kernal changes pending * verified add output and regular add match * Test added for additional output embedlayernorm, working on CUDA * Test passing on CPU * updated convert_to_onnx toll to check parity correctly * removed some debugs * couple of TODO left as in optimizer.py * removed changes to optimizer.py * fixing build * fixing build * updated order of initilization * added a test case for float16 * updating the docs * updating tests failing due to embed layer fusion * update unit tests * updating CUDA documentation in operatorkernels.md * addressing comments * OperatorKernels.md updated with CUDA * adding TODO to qembed_layer * minor edit * updated docs * addressing comments * adding position ids to embed layer gpt2 * updating fused gpt2 model * added extra test * remove comments * addressing comments * contrib_defs.cc updated * all tests passing * fixing a typo * minor edit * trigger build * qembedlayernorm checkinputs updated * fixing build error * fixing build error * fixing build error	2021-10-28 11:06:26 -07:00
Bowen Bao	e983f37121	Bifurcation detector for aggressive decoding (#9432 ) ``` Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens, starting from previous suffix match index, and predicted tokens. Concat predicted tokens, starting from bifurcation index, to the back of current tokens. This forms the output tokens. Detect suffix match index in source tokens, between source tokens and output tokens. Detection is based on finding the appearances of last n-gram in output tokens in source tokens. A match is considered found if source tokens contain a single matching n-gram. Return the index of the start of the n-gram in source tokens. No matching if found if src tokens contain multiple or zero matching n-grams. Return -1. ```	2021-10-19 19:53:56 -07:00
ashbhandare	35c2102cfa	Fixes for GatherND, Multinomial (#9143 ) * register gathernd kernel, aten multinomial * fix CI, add test * review comments	2021-10-05 14:51:58 -07:00
ytaous	0193490cbf	ReduceMin - add int64 cuda kernel support for opset12/13 (#8966 ) * ReduceMin - int64 support * fix doc Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-09-07 17:01:26 -07:00
Hariharan Seshadri	cee79526fd	Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442 )	2021-08-25 12:04:20 -07:00
Hariharan Seshadri	17b0664e34	Optimize sequence type usage on CUDA [2/n] (#8720 )	2021-08-24 10:40:28 -07:00
XiyinOSS	19b82b438b	GridSample OP implementation for CPU and CUDA (#8551 ) * GridSample OP implementation for CPU and CUDA Description: This change contains implementation for torch grid_sample OP. Cuda implementation contains contribution from Muscle Wu. * Use interpolation for out-of-bound points in zero padding mode Out-of-bound points in zeros padding mode changed from constant 0 to interpolation of surrounding pixels. This aligns with Pytorch implementation. A bug in CUDA batch offset calculation is fixed. Custom op exporter type is added. * Fix nearest bug in CPU * Update per CI build finding and review comments * Force float to avoid potential integer T issue * Style update * PR update * Remove c++17 feature from cuda code	2021-08-20 12:37:38 -07:00
harshithapv	c24335246b	Support bool type for Pad Op and fix Unsqueeze in Tile grad for Opset 13 (#8602 ) * changes * tile grad unsqueeze fix for opset 13 * clean up * remove bool support for opset 2 to 12 for Pad as it is not supported. * Copy OperatorKernels.md from artifacts of Windows CI build.	2021-08-11 11:21:02 -07:00
Xavier Dupré	064a385b59	Support int8 for operator Split (#8615 ) * Support int8 for operator Split	2021-08-10 23:04:16 +02:00
Ashwini Khade	96eb9810ba	Update onnx (#8458 ) * updates for picking pnnx commit * add tests filter to c# tests * plus test fixes * fix versioning for contrib ops * fix tests * test filter for optional ops * more versioning related updates * fix test * fix layernorm spec * more updates * update docs * add more test filters * more filters * update binary size threshold * update docs * plus more fixes * updates per review * update to release commit * add filters for optional type tests * plus updates	2021-08-05 09:21:44 -07:00
Yufeng Li	ceeb1a65d6	Add quantization support of GEMM directly with QGemm (#8447 ) QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision. Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute. The formula for quantized GEMM is: Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which, C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)	2021-07-27 21:21:49 -07:00
Dmitri Smirnov	950fe5e28b	Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038 ) SparseTensor support Implement Builder pattern Fix support for 1-D and 2-D COO indices Implement and test CSR support. Handle shape inference for SparseTensors Implement conversion for COO, CSR and tests. Address the case where constant sparse initializer is the output. Implement test infra for SparseTensors Implement SparseDenseMatMul for Csr and COO and tested it. Add hash for SparseToDenseMatMul Finish shared provider refactor Refactor GetOrCreate to Create Working on py interface Expose OrtDevice and use it in allocate_numpy Adjust Sparse interfaces, add support for string SparseTensor. Add tests. Add and test to_cuda() Add accessors to format specific indices Test values and indices views, read-only flag, after GC access Add sparse related methods to OrtValue Re-work SparseTensor wrapper, add OrtValue methods Rework numpy_array_to_cuda/to_cpu Add run_with_ort_values Add models and test sparse_mat_mul with run_with_ort_values Refactor sparse tensor to use a single buffer Ifdef x86 Eigen CSR sparse matmul implementation Exclude broken test, check for string type when copying cross device Split pybind schema, regenerate docs, add exclusion Conditionally exclude schema module Update docs fix cuda build Add test to a filter and renerate JS docs Add conversion and test string support for sparse tensors Exclude conversion utils from minimal build Add CUDA Memcpy and adjust provider interfaces	2021-07-22 15:24:36 -07:00
Viswanath Boga	afce0e2543	Attention kernel update to handle different Q,K,V hidden sizes (#8039 ) * changes working to convert akv nodes * changes to replace nodes * changes to accomodate qkv hidden sizes as attributes * kernel to accept qkv_hidden_size attributes * Working till compute for varied dimension, todo applyattention() * changes to make all regression tests work * inference running successfully without prepack * success inference with pre-pack weights * add test for diff sizes * bias shape need not be a mul of 3 * get the output_hidden_size from input * infer output shape from input * merge with master * cleaning up files that got merged wrong * accurancy at accepted level * added unit test case for different dimensions * all unit tests passing * packed weights working for attention * prepacked weights working * added test case for newly added extra qk input * updated unit test to test only extra add qk * fixing build error * removing few debugs * reverting test changes * all python test passing * cleaning up * new unit test added, major clean up of code * removed extra code * minor * minor fix to tests * prepack weights code cleaned up * compacted compute() in attention.cc * reformat compute() * making a parameter T * adding 3 q,k,v buffers in all cases * fixing build * running tests only on cpu * Updating docs * trigger ci builds * Addressing comments in PR * addressing some more comments * get add_qk_str from add_qk node directly * updating docs, added extra check to verify attn inputs * Optimized the extra add by parallelizing * added attention_shape to symbolic_shape_infer.py * minor refactoring to address comments	2021-07-19 12:21:33 -07:00
Ye Wang	04297110c3	Support int64 in ReduceMin cuda op for Opset 14 (#8307 ) * reducemin int64_t support * fix xxcuda.so load error * testtest * refactor * update doc * propagate types to opset14 * re-generate doc * rename macro	2021-07-13 16:18:06 -07:00
Hariharan Seshadri	5369821ad6	Support SpaceDepth ops in the CUDA and ROCM EPs (#7960 )	2021-07-09 01:00:22 -07:00
Nick Kreeger	800b62a139	Create a quantized EmbedLayerNorm for ORT. (#8124 ) Create a quantized EmbedLayerNorm Op for ORT	2021-06-25 17:51:43 -05:00
Bowen Bao	51c12a715b	Add NGramRepeatBlock contrib op (#8078 ) Description: Enforce no repetition of n-grams. Scores are set to `-inf` for tokens that form a repeated n-gram if added to the back of the input_ids. Motivation and Context Needed by transformer models in sequence generation algorithms (greedy search and beam search). This module has heavy impact on performance, and can be highly parallelized.	2021-06-21 10:21:48 -07:00
RandySheriffH	1a5ee11dbd	Implement Sequence Ops GPU (#7863 )	2021-06-07 15:30:26 -07:00
Scott McKay	0fbec1b9c1	Update the operator documentation generation (#7787 ) * Update the operator documentation generation - Make layout a little nicer - Update to latest supported operators including training - Fix some links that are broken when the docs content is copied to github-pages - Fix incorrect usage of 'onnx.ai.ml' as the default domain - ML ops are now separated from the real default domain of 'onnx.ai' - Include CPU, CUDA and training kernels - exclude DNNL as it's not an EP we own * There are separate paths for CUDA and CUDNN as they are not guaranteed to be in the same location on a Windows machine. Use the CUDNN path when looking for the CUDNN library. * Enable validation of both contrib ops and operator kernels in build Filter generation so it's deterministic Add ability for CI to publish the md files as build artifacts if they differ so a developer can download and add to their PR to resolve any diffs. Remove workarounds for github-pages as that will now link to the github docs which display correctly	2021-06-02 17:47:40 +10:00
Zhang Lei	50c5edcf13	Add nhwc support for QLinearAveragePool operator (#7656 ) * Add nhwc support for QLinearAveragePool operator * Update ContribOperators.md * Update OperatorKernels.md with cpu,dnnl and cuda enabled.	2021-05-13 22:05:30 -07:00
Ye Wang	803837df63	Add 4dmask support for attention cuda kernel (#7591 ) * checkin * add 4dmask support in attention cuda op * trim * add comments * fix build/test error * review comments and add tests * sync doc * review comments * minor change	2021-05-07 20:17:29 -07:00
Zhang Lei	ada0fbbd2d	Implement qlinear concat and unit test. (#7341 ) * Implement qlinear concat and unit test. Add quantization tools for QLinearConcat and it quantization tests. * Add kernel def hash for QLinearConcat. * Change according to PR. Add qdq transformer support for QLinearConcat. * Add QDQ Transformer unittest. Fix typo on domain. * remove dup logic of no use. * fix x86 build error. * Update operator docs.	2021-04-26 13:38:40 -07:00
Nat Kershaw (MSFT)	8a03b6e5c7	Render Operator documentation as compliant markdown (#3658 )	2020-09-02 15:07:50 -07:00
Hariharan Seshadri	1599562016	Fix BatchNorm CUDA kernel definition	2020-04-18 17:21:29 -07:00
Hariharan Seshadri	b4457ecb7a	Fix `gen_doc` build option and refresh documentation (#3545 ) * Support listing keys in custom metadata map via C/C++ API * nit * PR feedback * Nit * Initial commit * More changes * Support listing keys in custom metadata map via C/C++ API * nit * PR feedback * Nit * Initial commit * More changes * Add md files * Doc changes * Update * revert cmake changes * Update * Doc change * Update * Update	2020-04-17 14:41:04 -07:00
Sreekanth Yalachigere	31ea11a696	Renaming MKL-DNN as DNNL (#2515 ) * DNNL: Moving Files to rename file names * DNNL name change * azure pipeline updated * disable ceil/dialation and enable Opset10 * disable ceil/dialation tests in Python * mlperf_ssd_resnet34_1200 disabled	2019-12-03 07:34:23 -08:00

1 2

51 commits