* Add flag to enable automatic generation of input for models with tensor inputs
* change wording of variable
* Naming convention changes to variables
* Handle free dimensions
* Comment with default allocator
* variable rename
* Remove input_count
* Cast to size_t to avoid warning
Co-authored-by: Ryan Lai <ryalai96@gamil.com>
This is required for running multithreaded with multi-GPUs. Without it, when running in a work thread it would default to GPU 0, while CUDAExecutionProvider is assigned on other GPUs. That might cause CUDA crash when some CUDA resources is from GPU 0, while being used in GPU N>0.
* Make CDist faster via Eigen squaredNorma and GEMM.
* Add call to abs() as the GEMM output may differ slightly due to floating point accuracy and result in a negative distance which returns NaN if sqrt() is applied to it.
* Update math::Gemm to use the type for alpha and beta instead of hardcoding to float. Matches the GemmEx definition.
* Provide Eigen based replication of the GEMM call on x86 if T=double.
* Make test model data deterministic.
* Do the GEMM first so we can avoid potentially subtracting two numbers that are very close to each other.
* update C# API to optimize inference latency
* rename PinnedOnnxValue to fixedBufferOnnxValue and fix build break
* add more test cases
* add conditions on string tensors for pre-allocated outputs
* change to random inputs
* fix word spell
* resolve comments
* resolve comments
* remove FixedBufferOnnxValueTests.cs
* fix trivial typos in doc
* wangye/pivot (#3432)
* check in
* work version
* add ForecastingPivot kernel
* fix mac os and linux build error
* update FeaturizerLibrary Version
* resolve comments
* remove changes
* Add Kernel for LagLeadOperator & RollingWindowFeaturizer (#3434)
* update
* update todo
* resolve comments
* relax eps for TruncatedSVD transformer
* mute TruncatedSVD_transformer due to undeterministic test result
* resolve comments
* update
* test
* update
* fix
* Make the Slice implementation based on type sizes and reduce templatized code to a minimum.
* Remove using 'dynamic' as a template param to Slice as well.
The commit 06fc9506fd which refactored cpu Pool class broke ACL EP build.
Also worked on the commit a4fe60c4d3 as it also affects the new class.
Move the declaration of the new MaxPoolV8 cpu class in the header file. Implement MaxPool 8-11 in ACL EP.
Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
* Add CPU implementation for FastGelu operator
* Update optimization script to fuse Gelu or FastGelu according to Elf or Tanh is used in graph.
* Merge BiasGelu and FastGelu into one class
* Enable FastGelu Fusion optimizer for CPU Execution Provider.
Add opt_level option for graph optimization level in bert perf test.
Support BERT models that output each layer, where SkipLayerNormalization has more than 4 children.
Check weight and bias are 1D for layer norm fusion.
Add a dummy class Gpt2OnnxModel for further changes of GPT2 model.
An ExternOp's input needs buffers, so we cannot add compute_inline
schedule on it even if it's a scalar tensor. Instead, we need to
schedule it as compute_root.
* Rework SVMClassifier
- use GEMM for initial scoring
- minimize data allocations and copies
- parallelize the second half of the scoring for larger batches
1. Fix a bug in FunctionImpl::FunctionImpl. It set wrong name for the new attribute.
2. Set error code to NOT_IMPLEMENTED if a function contains a not implemented op.
Advance ONNX commit to pickup the latest ArgMax, ArgMin,
ReduceMax/ReduceMin, MaxPool
Declare new versions for CPU/CUDA.
Implement infrastructure support for int8/uint8.
Adust GatherOp test for a new error.
Adjust Scan9.BadShape test.
Add exclusions for index out of bounds checks.
Rework result verification for SVDTransformer.
* Add benchmark script and notebook for GPT2
* Update Reshape fusion for GPT2 model
* Add opt_level option for bert_model_optimization to disable onnxruntime by setting --opt_level 0
* Fix keras optimization
1. Support the new fields for Constant in opset 12
2. Support SparseTensor in the Constant node by converting to dense tensor when lifting the Constant to an initializer. Will make a model with a sparse tensor in a Constant work but isn't an overly efficient approach.