warn that initializers are in graph input
provide a tool to move initializer out of graph input
Motivation and Context
ONNX model from IR_VERSION 4 only treats initializers that appear in graph input as non-constant. This may fail some of the graph optimizations, like const folding, operator fusion and etc. Warn the case and provide a tool.
* added support for ios crosscompilation under linux
* reverted cmake generator change
* if --ios is added protoc can be compiled for host system
* accidently reverted change to compile protoc for host system for ios if protoc exe is not set
* wdata is now used
* accidentally pasted CMAKE_OSX_ARCHITECTURES into CmakeLists.txt, also made bad merge on build.py previously
* removed print
* fixed typeo, deleted commented statements for earlier debugging
* reverted accidental delete
* added asmmacro.h for aarch64 asm
now MlasSgemmKernel**** gets underscore added if needed
no need anymote to differentiate between iOS arm64 and normal amr64 build
onnxruntime.cmake: added check if iOSCross is set to properly set RPATH
* removed 2 spaces
* fix: logcial error fixed, now protoc gets compiled if not supplied with --path_to_protoc_exe
* removed unecessarily added spaces
* removed some more spaces
* Fix WCOS/Win32 linking bugs
* Remove unused NODEFAULTLIB flags
* Avoid plain target_link_libraries signature
* Avoid plain target_link_libraries signature
* Fix library list escaping
* Use library list instead of string
* Remove duplicate link to windowsapp.lib
* Remove Win32 build workarounds
* Specify CMake policies before initializing language
* Expose Win32 header definitions during build
* Force set API family
* Enable Win32 APIs in featurizer
* Use MT dynamic CRT
* Expose Win32 specific functions
* Disable app container globally
* Disable default wide functions in featurizers
* Add featurizers to test include path
* Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428
* Revert pipeline debugging hacks
* Skip /FI in CUDA sources
* Default to Win32 builds
* Enable WCOS when using WinML
* Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only
* Add support for sessions to share a global threadpool.
* Fix build issues
* Add tests, fix build issues.
* Added some documentation
* Fix centos issue when threadpools become nullptr due to 1 core.
* Fix mac and x86 build issues
* Address some PR comments
* Disabled test for android, added few more tests and addressed more PR comments.
* const_cast
1. Fix onnxruntime server docker file build failure. Tested with the notebook in ONNX tutorial, it works well.
2. Delete the docker files for the other EPs, because currently they don't work and I don't have enough time to update them.
Discussed with Faith, because the data size is very small and changes are gradual, there is no need to delete the old data. We want to keep all the history.
Override native package name. Preserve managed package name the same.
Specify pckage name for validation purposes.
Fix up validation package name parameter.
Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess.
So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts.
(The problem only affect the C# test, not the C/C++ tests that forked from build.py).
* add dml gpu pipelines
* add x86 to the gpu dml dev build pipeline
* Enable DML x86 builds
* Fix uint64_t -> size_t warning
* fix warnings
* enable dml on x86 ci builds
* operatorHelper 773 error uint32_t vs uint64_t
* operatorHelper 773 error uint32_t vs uint64_t
* make x86 pipeline use the gpu pool
* more warnings
* fix x86 directml path
* make dml nuget package
* disable tf_pnasnet_large
* disable zfnet512
* make validation use wildcards
* disable x86 dml gpu tests
* add args.
* update gpu.yml
* change nupkg wildcard
* add debug statements
* package x86 dml nupkg
* dont drop managed nuget again from dml pipeline build
* Add DML EULA
* directml license should be renamed to not clobber the existing license
* casing on dml package....
* {} to ()
* fix license name
* disable dml from x86 ci
* typo and cr feedback
* remove featurizers
* ship the dml pdb as well
* Initial commit
* More changes
* More changes
* More changes 3
* More changes 4
* More changes 5
* More changes 5
* More changes 6
* More changes 7
* More changes 8
* Remove C# ifdefs
* More changes 10
* More changes 11
* YAML changes for other release pipelines
* Add release notes metadata
* Props and Targets change
* Add CSHarp proj
* More changes 12
* More changes
* Minor fix
* Minor fix
* Fix yaml
* Some missing logic for winml
* Minor update
* Fix casing for winmd file
* Fix casing
* Add targets and props for managed section into native nuget
* revert file
* a
* Switch to CUDA10.2
* Update win-gpu-tensorrt-ci-pipeline.yml
* Update win-gpu-tensorrt-ci-pipeline.yml
* remove dynamic_shape
* update onnx-tensorrt submodule
* check if input shape is specified for TensorRT subgraph input and enable some TensorRT unit tests
* fix format issue
* add shape inference instruction for TensorRT
* update according to the reviews
* Update win-gpu-tensorrt-ci-pipeline.yml
* WIP: Re-enable x86 .NET testing in Release pipelines
Enabling x86 testing will make sure that ORT packages doesn’t break x86 projects of customers
* Remove setting some env variables
* Comment out a test failing on x86 builds
* More changes
* Minor fix
* More changes
* More changes
* s
* s
* s
* Revert minor change
* More changes
* More changes
* More changes 2
* explicitly set platform target
* Delete bin and obj folders
* Clean output dirs
* Add back TargetFramwork
* Disable x86 .net framework tests
* Skip x86 tests in MKLML pipeline
* port the mimalloc allocator
* hook mimalloc opt into common.h and reduction ops
* repurpose USE_MIMALLOC to only denote subbing in of default allocator with mimalloc and some refactoring
* fix unintended cherry pick diffs
* polish alloctor_mimalloc
* explicitly disable mimalloc where it already had been disabled
* update mimalloc to pull in stl allocator
* switch mimalloc stl allocator to use mimalloc library version
* turn mimalloc on by default (only the stl changes are enabled, the python interacting ones are off already and shall remain so)
* move FastAllocVector into cpu specific code
* separate out defines into arena and stl changes
* the rest of the define renames
* bfc arena allocator
* some typos and rename the bfc arena allocator to fit existing class naming conventions
* adjustments in response to comments
* different template instantiations are friends
1. Add LTCG back. It was set to default OFF in my previous PR to speed up Windows build. It is only needed in release pipelines.
2. Remove --use_featurizers from all the packaging pipelines
3. Make sure all the packages have openmp
Use CUDA 10.1 for Linux build
(Windows change is already in)
Please note, cublas 10.2.1.243 is for CUDA SDK 10.1.243, not CUDA 10.2.x. CUDA 10.2.89 need cublas 10.2.2.89. They match on the last part of the digits.
libcublas10-10.1.0.105 won't work!!!
The cuda docker image by viswamy is already using 10.1, no need to change.
* update onnx-tensorrt submodule to trt7 branch
* add fp16 option for TRT7
* switch to master branch of onnx tensorrt
* update submodule
* update to TensorRT7.0.0.11
* update to onnx-tensorrt for TensorRT7.0
* switch to private branch due to issues in master branch
* remove trt_onnxify
* disable warnings c4804 for TensorRT parser
* disable warnings c4702 for TensorRT parser
* add back sanity check of shape tensort input in the parser
* disable some warnings for TensorRT7
* change fp16 threshold for TensorRT
* update onn-tensorrt parser
* fix cycle issue in faster-rcnn and add cycle detection in GetCapability
* Update TensorRT container to v20.01
* Update TensorRT image name
* Update linux-multi-gpu-tensorrt-ci-pipeline.yml
* Update linux-gpu-tensorrt-ci-pipeline.yml
* disable rnn tests for TensorRT
* disable rnn tests for TensorRT
* disabled some unit test for TensorRT
* update onnx-tensorrt submodule
* update build scripts for TensorRT
* formating the code
* Update TensorRT-ExecutionProvider.md
* Update BUILD.md
* Update tensorrt_execution_provider.h
* Update tensorrt_execution_provider.cc
* Update win-gpu-tensorrt-ci-pipeline.yml
* use GetEnvironmentVar function to get env virables and switch to Win-GPU-2019 agent pool for win CI build
* change tensorrt path
* change tensorrt path
* fix win ci build issue
* update code based on the reviews
* fix build issue
* roll back to cuda10.0
* add RemoveCycleTest for TensorRT
* fix windows ci build issues
* fix ci build issues
* fix file permission
* fix out of range issue for max_workspace_size_env
Provide alternative std::mutex implementation on Windows. OrtMutex is no longer an alias of std::mutex.
We do it because:
1. This new thing is faster and much much simpler.
2. Static constructors are considered harmful. We should avoid such thing as possible as we can.
* Enable ARM64 release builds
* Add ARM release
* Skip C# dll signing in ARM
* Copy ARM binaries to Nuget
* Restore nuget packages before ARM packaging
* wip
* Use host protoc at C# build
* Set ProtocDirectory on cross-compiled builds
* wip
* Fix typo