* Added config flags for VPU Fast Recompile
* clean-up ifdefs
* Add VPU Fast compile config option
Adds an option that enables Fast compilation of models to VPU
hardware specific format.
* Add config option to choose specific device id for inference
Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.
* Add Python API to list available device IDs
* code cleanup
* Add second C/C++ API with settings string parameter
Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.
* Append 'Ex' to the extended C/C++ API
* Use set_providers Py API to set config options.
Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.
* avoid globals for py config options where possible
Co-authored-by: intel <you@example.com>
(1) Save gpt2 test data during test generation.
(2) Use torch fp32 model as baseline when onnx model is fp16.
(3) Refine logic to compose onnx model path
* Add SetLanguageProjection C Api and use it in four projections
* static cast enum languageprojection to uint32_t
* resolve comments
* fix typo and line added unintentionally
* revert unecessary change
* reorder c# api
* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes
* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.
* Address PR comments
- tweak SessionOptions config to avoid double lookup
- merge duplicated functionality in python binding around registering an EP with optional options
Fix a couple of build issues.
* Update C API to be consistent with python API
- only load model in InferenceSession ctor if required
- support loading ORT model in minimal build
* Fix nodejs test.
We get an invalid path error from LoadInterOp first now
* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.
The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.
* Fix couple of build issues.
* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.
* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.
* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Export GPT-2 ONNX model without postion_ids and attention_mask inputs
* allow benchmark_gpt2 on user's model
* refactor: get_dummy_inputs returns a data class.
* Next round of changes.
Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
* adding generic configurations for session options
* fix a build break on linux
* fix training ci build break
* fix training ci build break
* addressed CR comments
* fix traning ci build break
* move config_key from enum to string
* add c# api
* add python api
* fix build break
* move prepacking from 2 new api entries to session options configs
* fix traning ci build break
* add python test, update some comments, move const key definition to avoid build break
* addressed comments
* move definitions of keys to common.h
* move api to version 5
* remove accidental change in build.py
* remove pragma to avoid build break
* addressed CR comments
* fix the python build break, and move location of config keys definition
* small typo changes
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
* improve calibration tool
* modify calibration interface name
* modify calibration interface name
* refine calibrate and calibrate_user
* refine and add type info
* refine and add type info
* add e2e user example file
* remove unnecessary files
* remote test images no longer needed
* update readme document
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Add python API for specifying CUDA device id
* Modification for providing session based python api for specifying
device id
* When include header file pybind11/stl.h, conversion between c++
containers and Python list, vector and dict data structure are
automatically enabled.
https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#
Therefore, refactor the code for better leverage this advantage.
* Make struct CudaDeviceOptions as default cuda device options
* Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts)
But still stay consistent with existing sess.set_providers(list_of_provider)
* Add cuda provider option default setting
* Add support for setting cuda cuda_mem_limit and arena_extend_strategy.
Also resolved the merge conflict on session.py
* Use python ctypes to call cuda library to help python unittest
* Refine the code with reviewer's suggestions
* Add the capability of getting execution provider's configuration
- Once we introduced the capability to set execution provider's
configuration, it makes sense to add capability of getting ep's configuration.
* Modify the code with reviewer's suggestions.
* Using stoull() and stoul() depends on 32/64-bits architecture.
* Rewrite the testcases for testing setting CUDA device id
Note: We need to make sure every ORT process be run on one CUDA device
at a time.
* Make sure old session object is destroyed by python gc before new
session object is being created
* Move testcases to original onnxruntime_test_python.py
* Fix bugs to pass CI build
* Make it pass CI build (cont.)
* Make it pass CI build (cont.)