* Rework SVMClassifier
- use GEMM for initial scoring
- minimize data allocations and copies
- parallelize the second half of the scoring for larger batches
1. Fix a bug in FunctionImpl::FunctionImpl. It set wrong name for the new attribute.
2. Set error code to NOT_IMPLEMENTED if a function contains a not implemented op.
Advance ONNX commit to pickup the latest ArgMax, ArgMin,
ReduceMax/ReduceMin, MaxPool
Declare new versions for CPU/CUDA.
Implement infrastructure support for int8/uint8.
Adust GatherOp test for a new error.
Adjust Scan9.BadShape test.
Add exclusions for index out of bounds checks.
Rework result verification for SVDTransformer.
* Add benchmark script and notebook for GPT2
* Update Reshape fusion for GPT2 model
* Add opt_level option for bert_model_optimization to disable onnxruntime by setting --opt_level 0
* Fix keras optimization
1. Support the new fields for Constant in opset 12
2. Support SparseTensor in the Constant node by converting to dense tensor when lifting the Constant to an initializer. Will make a model with a sparse tensor in a Constant work but isn't an overly efficient approach.
* added support for ios crosscompilation under linux
* reverted cmake generator change
* if --ios is added protoc can be compiled for host system
* accidently reverted change to compile protoc for host system for ios if protoc exe is not set
* wdata is now used
* accidentally pasted CMAKE_OSX_ARCHITECTURES into CmakeLists.txt, also made bad merge on build.py previously
* removed print
* fixed typeo, deleted commented statements for earlier debugging
* reverted accidental delete
* added asmmacro.h for aarch64 asm
now MlasSgemmKernel**** gets underscore added if needed
no need anymote to differentiate between iOS arm64 and normal amr64 build
onnxruntime.cmake: added check if iOSCross is set to properly set RPATH
* removed 2 spaces
* fix: logcial error fixed, now protoc gets compiled if not supplied with --path_to_protoc_exe
* removed unecessarily added spaces
* removed some more spaces
Fix 3 bugs:
node names duplicate in calibration augment_graph if the name of node to quantize is empty.
If output nodes are quantized, output value are quantized and not dequantized back
Gather with data type int64 should not be quantized
* Avoid "infinite" loop in optimizer
When symbolic dimensions are present and can be overridden,
FreeDimensionOverrideTransformer always sets modified flag to true. As a
consequence, the optimizer loops until the iteration limit is reached.
1. Copy tensorflow's thread pool class to ORT, so that we can get a better implementation of thread pool based parallelfor
2. Copy Eigen's thread pool class to ORT
3. Support thread affinity
4. Remove RNN kernel’s private thread pool
5. Modify pool kernels to use the thread pool when openmp is disabled.
* skip optional inputs for scan subgraphs
We may have cases where the subgraph has optionial inputs that appear
in both subgraph's input and initializer, but not in the node's input.
In such cases, the input model might be invalid, but let's not choke
on it. Instead, let's issue a warning, skip the optional inputs,
and keep going forward.
* address CR feedback
* Fixed two issues in symbolic_shape_infer script
This change addressed #3293
There were two issues in the script:
* We need to handle a special case for infer_Reshape, where input_shape
is empty and target shape_value is [-1]. In such case, we need to
get sympy data for the output dim (or create one if it doesn't exist).
* We need to update computed dims for newly-created shape for Range op
* also call _update_computed_dims for _infer_Expand
addressed CR feedback
* added ai.onnx into opset list
* instead of manipulating _infer_Reshape, call _update_computed_dims
from _infer_Expand to update newly-computed dims
* Enable sequence of tensor
* add tests
* small updates
* There should only be 2 elements returned
* CR feedback, and another 6->2 check update in the test.
* missing semicolon...
* Add explicit to constructor taking pointer paramter
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Implement operator[] for TArray and simplify the code.
* fix a build error.
* add a constructor with std::vector input
* fix build error
* update based on code review feedback
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Update ReformatSourcePython.bat to use YAPF to format python code, and add onnxruntime\test directory to be formatted.
Add onnxruntime\.style.yapf for configuration. The style is based on google, except max column width 120.
Format python scripts using ReformatSourcePython.bat.
* Add notebook for bert squad model exported by python 1.4
* update bert performance test tool:
(1) set OpenMP environment variable before importing onnxruntime.
(2) launch new process for each test.
* Add notebook
Reduce combinations in perf test
* update readme
* fix quote
* Allow test multiple batch_size
* Add latency percentile
* Add warm up run
Reset logger for notebook
* refine default settings to test for cpu/gpu
* Add script to dump machine info
* Add notebooks for PyTorch SQuAD model GPU and CPU inference
* Update machineinfo.py: add license header; format by yapf
* Do not reset log handler. Skip adding handler if existed.
* Add comments about GPU result diff.
Filter rows of batch set to keep only one setting.
* update according to review feedback
* Download script from master branch
* Add notebook for bert model exported by keras2onnx
* format columns in result table
* re-run and update notebook
* Fix WCOS/Win32 linking bugs
* Remove unused NODEFAULTLIB flags
* Avoid plain target_link_libraries signature
* Avoid plain target_link_libraries signature
* Fix library list escaping
* Use library list instead of string
* Remove duplicate link to windowsapp.lib
* Remove Win32 build workarounds
* Specify CMake policies before initializing language
* Expose Win32 header definitions during build
* Force set API family
* Enable Win32 APIs in featurizer
* Use MT dynamic CRT
* Expose Win32 specific functions
* Disable app container globally
* Disable default wide functions in featurizers
* Add featurizers to test include path
* Workaround https://gitlab.kitware.com/cmake/cmake/issues/19428
* Revert pipeline debugging hacks
* Skip /FI in CUDA sources
* Default to Win32 builds
* Enable WCOS when using WinML
* Use generator expression to apply CMAKE_MSVC_RUNTIME_LIBRARY to C++ only
* Add support for sessions to share a global threadpool.
* Fix build issues
* Add tests, fix build issues.
* Added some documentation
* Fix centos issue when threadpools become nullptr due to 1 core.
* Fix mac and x86 build issues
* Address some PR comments
* Disabled test for android, added few more tests and addressed more PR comments.
* const_cast
Moved path_lib.h/cc from onnxruntime/core/framework to onnxruntime/core/platform and from the onnxruntime_framework to the onnxruntime_common libraries.