* Simplify Normalizer as the spec only requires support for 2D input.
Tried using eigen (LpNorm<1>(), and norm()) on each row but that was much slower.
* Remove unused variable
* Use MlasComputeLogistic instead of manually computing values.
* Update test script to allow the tolerance to be specified when checking float output from logreg_iris.onnx.
1. Do not reuse the main thread.
2. Do not plus one when mlas calculate the number of tasks to schedule. (It was me put the plus one there)
This is the second try of #1839
It's known that this change has negative performance impact on some of the models.
* Avoid use of vectors for tracking reader/writer offsets as it adds too much overhead if there are a lot of readers or writers.
Tracy found improvements in resnet34-ssd1200 and BERT Squad with this approach.
* api goverannce draft
* Update CONTRIBUTING.md
updated for ABI proposals
* Update CONTRIBUTING.md
* Update CONTRIBUTING.md
* Incomplete, a draft iteartion of 2 more changes - api docs and high levle design
* pushing to see how the picture size works on screen.
* added 2 charts on api choice and distribution choice
* details on contract checking
* lint cleanup and links
* PR feedback.
* fixed markdown and lists
* more markdown and lists
* fixed broken links
* PR feedback
* commas
* PR comments from nick
* PR feedback
* fixed build section
Co-authored-by: Nick Geisler <36938193+ngeisler11@users.noreply.github.com>
Discussed with Faith, because the data size is very small and changes are gradual, there is no need to delete the old data. We want to keep all the history.
* update GeluFusion to support pattern from PyTorch 1.4;
* Fix a bug that missing the check of an edge between mul2 and root.
* update script to fuse gelu from PyTorch 1.4
* Add test for python optimizer
This change fixes#3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
Override native package name. Preserve managed package name the same.
Specify pckage name for validation purposes.
Fix up validation package name parameter.
(1) Add performance test tool for bert model.
(2) Add accuracy test tool to compare inference results of original and optimized bert models.
(3) Add test data generator tool to create test data for onnxruntime_perf_test.exe
(4) Improve bert optimization script: Verify model producer for model_type; Add warning if model is not fully optimized.
(5) Add shape optimizer tool to assist developing optimization script.
(6) Update readme.
Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess.
So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts.
(The problem only affect the C# test, not the C/C++ tests that forked from build.py).
* add dml gpu pipelines
* add x86 to the gpu dml dev build pipeline
* Enable DML x86 builds
* Fix uint64_t -> size_t warning
* fix warnings
* enable dml on x86 ci builds
* operatorHelper 773 error uint32_t vs uint64_t
* operatorHelper 773 error uint32_t vs uint64_t
* make x86 pipeline use the gpu pool
* more warnings
* fix x86 directml path
* make dml nuget package
* disable tf_pnasnet_large
* disable zfnet512
* make validation use wildcards
* disable x86 dml gpu tests
* add args.
* update gpu.yml
* change nupkg wildcard
* add debug statements
* package x86 dml nupkg
* dont drop managed nuget again from dml pipeline build
* Add DML EULA
* directml license should be renamed to not clobber the existing license
* casing on dml package....
* {} to ()
* fix license name
* disable dml from x86 ci
* typo and cr feedback
* remove featurizers
* ship the dml pdb as well
* Add a summary for each ExecutionProviderAppend methods in SessionOptions.cs
OnnxRuntime managed dll is EP agnostic meaning it will expose all methods pertaining to all possible EPs supported by OnnxRuntime in general. Not all these methods are really "available" to use for a .NET developer unless they have the correpsonding native onnxruntime shared library. Adding a summary line so that intellisense points that out.
* remove empty line
* Initial commit
* More changes
* More changes
* More changes 3
* More changes 4
* More changes 5
* More changes 5
* More changes 6
* More changes 7
* More changes 8
* Remove C# ifdefs
* More changes 10
* More changes 11
* YAML changes for other release pipelines
* Add release notes metadata
* Props and Targets change
* Add CSHarp proj
* More changes 12
* More changes
* Minor fix
* Minor fix
* Fix yaml
* Some missing logic for winml
* Minor update
* Fix casing for winmd file
* Fix casing
* Add targets and props for managed section into native nuget
* revert file
* a
* Switch to CUDA10.2
* Update win-gpu-tensorrt-ci-pipeline.yml
* Update win-gpu-tensorrt-ci-pipeline.yml
* remove dynamic_shape
* update onnx-tensorrt submodule
* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests
* fix format issue
* add shape inference instruction for TensorRT
* update according to the reviews
* Update win-gpu-tensorrt-ci-pipeline.yml