* Implement Bitonic and Radix TopK
* remove needless print out
* fix com err
* add negative support
* fix comments
Co-authored-by: Randy <45701928+RandyShuai@users.noreply.github.com>
* add calibration tool
* add model for e2e example
* format readme
* some more formatting updates
* plus a few more updates
* plus review comments
* plus updates
* more updates
Disable DML in Windows GPU CI build for now, because there are some wired model test failure and I don't know how to fix it. Will seek help from WinML team.
* Add bert related transformers to doc
* Add execution provider and comment for bert optimizations
* Add comment about accuracy impact of approximation
* use correct type for for loop
* explicitly specify void for parameters of OrtGetApiBase because the function is defined in c, so when the function is just (), it is interpreted as having an unknown number of parameters. This was causing compiler warning C4276.
* Remove allocator type from the key comparison in ExecutionProviders.
Remove usage of DummyArena as it's no longer necessary.
* Fix x86 tests where arena allocator is disabled.
Make initialization of OrtMemoryInfo clearer by adding Invalid enum value.
* Make OrtValueNameIdxMap::MaxIdx more intuitive.
WinML would like to update the googletest submodule. They want some newer features (namely GTEST_SKIP to skip tests programmatically and be able to skip entire fixtures easily) and would need to update the submodule version.
However, because the new version of code hit a bug in gcc, even though the bug is already fixed in the latest gcc but we're using gcc 4.8.x and it won't get patched for the bug, so we have to do a compromise, change our code a little bit to make it work.
The gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51213
Add defs and imlementation for OneHotEncoders, adjuist date_time_transformer kernel and test.
Add OneHotEncoder kernel test.
Add HashOneHotVectorizerTransformer unit test.
This does not link due to multiple definitions of functions
that are included into header from a CPP file.
Make kernels non-template. Add input constraint for learnt data.
Fixup tests.
Add two more featurizers along with tests. Tests fail.
min_max_scalar_transformer
robust_scalar_transformer
Fix tests serialized stream by prepending version bytes.
Add inputation_marker_transfomer and the test.
Fix up float/double type designations.
Added label_encoder_transformer along with a test.
string_throw case is broken at the momement.
Fix labelencodertransfomer_test.cc string_throw case
Rename maxabsscalertransformer_test.cc
Add MissingDummiesTransformer along with the test.
Update manifest.
Add TimeSeriesImputerTransformer definition, implementation and tests
* don't run cuda tests if building with tensorrt
* remove unnecessary build options for win trt ci
* refactor win gpu tensorrt ci yml
* --numpy_version=1.17
* update
* update
* azcopy and cuda path
Added the optimized implementation for depthwise convolution for both ACL v19.02 and ACL 19.05.
Also the pointwise convolution seems to be more optimal in the CPU implementation so we opted for that instead.
When it is posible we use a fully connected layer instead of the gemm implementation.
This will let the library use the best implementation based on the input data.
* Implement a more stable SoftMax
e^x is represented as infinity if x is large enough, like 100.f. Infinity divided by Infinity is a NAN. Thus, softmax gets a NAN if one or more item are large enough.
A math transform as below is leveraged to get a stable softmax:
e^xi/(e^x1 + ...e^xn) = e^(xi - max) / (e^(x1 - max) + ... + e^(xn - max))
And for convenience, force max to 0.f if all xi are negative