This change fixes#3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
Override native package name. Preserve managed package name the same.
Specify pckage name for validation purposes.
Fix up validation package name parameter.
(1) Add performance test tool for bert model.
(2) Add accuracy test tool to compare inference results of original and optimized bert models.
(3) Add test data generator tool to create test data for onnxruntime_perf_test.exe
(4) Improve bert optimization script: Verify model producer for model_type; Add warning if model is not fully optimized.
(5) Add shape optimizer tool to assist developing optimization script.
(6) Update readme.
Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess.
So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts.
(The problem only affect the C# test, not the C/C++ tests that forked from build.py).
* add dml gpu pipelines
* add x86 to the gpu dml dev build pipeline
* Enable DML x86 builds
* Fix uint64_t -> size_t warning
* fix warnings
* enable dml on x86 ci builds
* operatorHelper 773 error uint32_t vs uint64_t
* operatorHelper 773 error uint32_t vs uint64_t
* make x86 pipeline use the gpu pool
* more warnings
* fix x86 directml path
* make dml nuget package
* disable tf_pnasnet_large
* disable zfnet512
* make validation use wildcards
* disable x86 dml gpu tests
* add args.
* update gpu.yml
* change nupkg wildcard
* add debug statements
* package x86 dml nupkg
* dont drop managed nuget again from dml pipeline build
* Add DML EULA
* directml license should be renamed to not clobber the existing license
* casing on dml package....
* {} to ()
* fix license name
* disable dml from x86 ci
* typo and cr feedback
* remove featurizers
* ship the dml pdb as well
* Add a summary for each ExecutionProviderAppend methods in SessionOptions.cs
OnnxRuntime managed dll is EP agnostic meaning it will expose all methods pertaining to all possible EPs supported by OnnxRuntime in general. Not all these methods are really "available" to use for a .NET developer unless they have the correpsonding native onnxruntime shared library. Adding a summary line so that intellisense points that out.
* remove empty line
* Initial commit
* More changes
* More changes
* More changes 3
* More changes 4
* More changes 5
* More changes 5
* More changes 6
* More changes 7
* More changes 8
* Remove C# ifdefs
* More changes 10
* More changes 11
* YAML changes for other release pipelines
* Add release notes metadata
* Props and Targets change
* Add CSHarp proj
* More changes 12
* More changes
* Minor fix
* Minor fix
* Fix yaml
* Some missing logic for winml
* Minor update
* Fix casing for winmd file
* Fix casing
* Add targets and props for managed section into native nuget
* revert file
* a
* Switch to CUDA10.2
* Update win-gpu-tensorrt-ci-pipeline.yml
* Update win-gpu-tensorrt-ci-pipeline.yml
* remove dynamic_shape
* update onnx-tensorrt submodule
* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests
* fix format issue
* add shape inference instruction for TensorRT
* update according to the reviews
* Update win-gpu-tensorrt-ci-pipeline.yml
* WIP: Re-enable x86 .NET testing in Release pipelines
Enabling x86 testing will make sure that ORT packages doesn’t break x86 projects of customers
* Remove setting some env variables
* Comment out a test failing on x86 builds
* More changes
* Minor fix
* More changes
* More changes
* s
* s
* s
* Revert minor change
* More changes
* More changes
* More changes 2
* explicitly set platform target
* Delete bin and obj folders
* Clean output dirs
* Add back TargetFramwork
* Disable x86 .net framework tests
* Skip x86 tests in MKLML pipeline
* Don't create a copy of model proto when checking to see if there is fp16 input
* PRcomments about making functions const
* Loop through nodeargs in graph object to see if there are fp16 datatypes
* Rename check to checking only inputs
* port the mimalloc allocator
* hook mimalloc opt into common.h and reduction ops
* repurpose USE_MIMALLOC to only denote subbing in of default allocator with mimalloc and some refactoring
* fix unintended cherry pick diffs
* polish alloctor_mimalloc
* explicitly disable mimalloc where it already had been disabled
* update mimalloc to pull in stl allocator
* switch mimalloc stl allocator to use mimalloc library version
* turn mimalloc on by default (only the stl changes are enabled, the python interacting ones are off already and shall remain so)
* move FastAllocVector into cpu specific code
* separate out defines into arena and stl changes
* the rest of the define renames
* bfc arena allocator
* some typos and rename the bfc arena allocator to fit existing class naming conventions
* adjustments in response to comments
* different template instantiations are friends
1. Add LTCG back. It was set to default OFF in my previous PR to speed up Windows build. It is only needed in release pipelines.
2. Remove --use_featurizers from all the packaging pipelines
3. Make sure all the packages have openmp
Use CUDA 10.1 for Linux build
(Windows change is already in)
Please note, cublas 10.2.1.243 is for CUDA SDK 10.1.243, not CUDA 10.2.x. CUDA 10.2.89 need cublas 10.2.2.89. They match on the last part of the digits.
libcublas10-10.1.0.105 won't work!!!
The cuda docker image by viswamy is already using 10.1, no need to change.