* Implement a more stable SoftMax
e^x is represented as infinity if x is large enough, like 100.f. Infinity divided by Infinity is a NAN. Thus, softmax gets a NAN if one or more item are large enough.
A math transform as below is leveraged to get a stable softmax:
e^xi/(e^x1 + ...e^xn) = e^(xi - max) / (e^(x1 - max) + ... + e^(xn - max))
And for convenience, force max to 0.f if all xi are negative
* Fix C# handling of unicode strings
* more tests
* check for handle before freesing
* variable reuse efficiency
* refactor and cleanup utf8 o utf16 conversion block
* Add missig env variables for mac pipeline test (#2595)
* Java API for onnxruntime (#2215)
* Rename automl python tools folder to featurizer_ops. (#2593)
* Make sure fenced tensor could not reuse other tensor. (#2561)
* Add support for opset 11 in reshape fusion (#2592)
* Support opset 11 subgraph of Squad model in Embed Layer Normalization (#2605)
* Allow providers to be set for InferenceSession at construction (#2606)
* EmbedLayerNormalization Fusion For Dynamic Squad Model Opset 10 (#2613)
* Improve Embed Layer Norm Fusion for SQuAD with static input shape (#2621)
* Improve cuda expand() opeator's performance. (#2624)
* Cuda pad optimize when no padding is needed. (#2625)
* Shortcut cuda Pad() when no padding is needed.
* Improve performance of resize() in Nearest mode (#2626)
* Optimize cuda scatter() on 2D compatible. (#2628)
* Optimize cuda scatter() on 2D compatible.
* fix float16 comparison in initializer (#2629)
* epsilon attribute for layernormalization fusion (#2639)
* Fix memory exception in Layer Norm Fusion (#2644)
* Add missig env variables for mac pipeline test (#2595)
* Java API for onnxruntime (#2215)
* Rename automl python tools folder to featurizer_ops. (#2593)
* change c++14 to c++11
* add ld lib path for centos
* enable csharp tests on macos
* fix C API test on MacOS + fix manylinux dotnet install
* fix manylinux dotnet install
* fix lib link
Rework TensorSeq in a manner consistent with Tensor and SparseTensor
in terms of type system setup.
Reduce templating. Introduce helpers to ensure the same
data type.
Make OrtValue __dtor not virtual.
Introduce ContainerChecker
* enabme telemetry
* enable telemetry
* set enable telemetry as default
* for debugging
* remove log and set disable telemetry as default back
* delete private file while testing
* resolve comment: mainly add license header, rename macro and update docs
* rewording in privacy.md
* add centos tests to linux cpu ci pipeline
* Disable failing test
* use centos6 instead of centos7
* change back to centos7
* add dotnet runtime dependency
* fix dotnet runtime dependencies
* install dotnet sdk instead of runtimes
* add more dotnet dependencies
* temporary skip failing test
* ix lib path
* reenable failing test
Add support of GPT2 model optimization:
* Match subgraph of Gelu Approximation (using Tanh).
* Fuse LayerNormalization if SkipLayerNormalization is not ready.
* Output model even if embedding layer is not fused.
* Improve Reshape Fusion to improve coverage.
* Refine constant input checking, and output fused op counter.
Update script according to latest op improvements:
* Fusion of Add Bias and Gelu.
* Fuse SkipLayerNormalization and Add Bias.
Other:
* Add ReduceSum for mask as intermediate step.
* Refactor verbose setting.