* Add Embed Layer Normalization and Skip Layer Normalization ops for bert optimization.
* add float16 test for skiplayernorm
* Add test for EmbedLayerNormalization op
* fix cpu build error
* fix build warning
* update HasCudaEnvironment function
* handle cuda error
* Mention OrtCreateSessionFromArray in C API doc
* fix seq of tensors
* changes on 9/30
* All tests passing
* Add SequenceAt op
* Fix shared_lib non_tensor_types test
* Address some PR comments
* Address PR comments
* Add support in python bindings to accept seq(tensor)
* Change data type from vector<Tensor> to TensorSeq
* Change data type from vector<Tensor> to TensorSeq
* Added some documentation
* Added missing test model
* Fix Linux build
* Fix Mac build
* Fix Mac build
* Add Attention op for multi head self attention in BERT
* Add test cases
* Move op from kOnnxDomain to kMSDomain.
Limit test to run by CUDA provider only.
* fix test
* Add float16 test
* fix cpu build error
* handle cuda error
* get last cuda error when failed
* save status: add tiling layout; add avx512 skylake cpuid info
* unit tests and matmul integer model passed on skylake, need to verify model
* save commit before update master
* fix check
* address comments
* Added GatherElements to Nuphar
This change added GatherElements (op_ver 11) to the Nuphar provider.
* address CR feedback
* create a utilify function for accessing index safely
* address more CR
* SafeIndex -> ClampIndex
1. use MlasErf for Gelu. Eigen's erf is very slow.
2. change the ErfUpperAbsRange to 3.925 because MlasErf doesn't return 1 for 3.725
Motivation and Context
* Add Unique operator.
* Enable onnx tests. Disable one with incorrect expected output and add unit test to validate ORT behavior. Need onnx update to fix (will address that separately but don't want to block this checkin on that change).
* Added Scatter and ScatterElements to Nuphar
Implemented Scatter (op_ver 9 - 10) and ScatterElements (op_ver 11)
nuphar.
Because TVM's compute is output-oriented, our current implementation
uses extern calls for simplicity.
* fixed build issue after rebase
* remove dead code
* Address CR
* removed dead code
* use GetAttrOrDefault
* Address more CR feedback
* add GetStrides to codegen/common/utils.h
* added a unit test for Bool input data
* Gelu contrib op & transformer
* Gelu kernels for CPU&cuda
* Merged PR 5034: fix a condition for gelu transformer
The ONNX models doesn't guarantee to assign an unique name to each node, so the previous condition could fail.
(cherry picked from commit e335ef5466444cb0aae45f885ea3a825ed9f1088)
* Fix builds
* remove useless comments
* fix test failure when nocontribp
* Move impelmentation under KMSdomain
* fix comments
* fix linux build
* Fix few comments
* fix linux build
* Update Cast op to use precision of 8 when casting floating point numbers to strings. This matches numpy precision.
Update unit tests to include non-trivial floats in the input.
Update onnx test infrastructure to document why the test cases are disabled
* py fallback initial commit.
* fixes.
* update NGRAPHCustomOp::Initialize() to return Status
* fixes in session.py
* FAIL status to EP_FAIL in ngraph custom op
* disable fallback for backend api
Remove gsl subodule and replace with a local copy of gsl-lite
Refactor for onnxruntime::make_unique
gsl::span size and index are now size_t
Remove lambda auto argument type detection.
Remove constexpr from fail_fast in gsl due to Linux not being happy.
Comment out std::stream support due to MacOS std lib broken.
Move make_unique into include/core/common so it is accessible for server builds.
Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
due to x86 build.
Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized