* remove shape inference and fix save large model problem
* remove unnecessary import
* refine code and add external format for quantize_qat
* remove initializers in tensors_to_calibrate
* small refine
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Initialize tensorrt perf script
* Add bert-squad dependencies
* Modified code to make ort inference with CUDA/Tensorrt
* Add get CUDA/TRT version
* uncomment bert-squad
* Add BERT-SQUAD inputs.json
* Add FastRCNN
* Make preprocess/validation in to common functions
* Add MaskRCNN and SSD and consolidate the code
* Add dependencies for MaskRCNN
* following modifications are made:
- create common fetch function to get inputs/outputs of model from ONNX model zoo.
- create common validation function to compare inference outputs with reference outputs from ONNX model zoo.
- move run/repeat time to argument list. (still working on other arguments, like fp16 or fp32, latency percentile).
- generate table in csv file to show the latency comparison (TRT vs CUDA) side by side.
* Add approache to analyze profling file and also update model related
settings
* Add models
* Add most of models from ONNX model zoo
* Add model input name and print all the model names at the end of run
* Add system info
* Add TRT fp16 support
* Refine the code
* Handle TRT fall back and modify the way to get input data
* Refine code
* Modify code
* Add more precise approach to measure inference
* Add io-binding
* Add YoLoV4
* Refine the code
* Refine the code
* Add models
* Add yolov4 notebook for jetson device
* Update notebook
* Update notebook
* Add CVS models
* Add missing model
* Add support of float16
* Add new way to get trt version
* Add "validate" and "benchmark" mode
* Add randomly generated input
* Refine perf script
* Refine the code.
* Add README
* Refine the code
* Update README.md
* Refine code
* Update README.md
* Remove all the model related python and instead using model_list.json as
models configuration.
Refine the benchmark.py
* Refine the code
Co-authored-by: Chi Lo <lochi@microsoft.com>
* Added config flags for VPU Fast Recompile
* clean-up ifdefs
* Add VPU Fast compile config option
Adds an option that enables Fast compilation of models to VPU
hardware specific format.
* Add config option to choose specific device id for inference
Inference of all subgraphs will be scheduled only on this device
even if other devices of the same type are available.
* Add Python API to list available device IDs
* code cleanup
* Add second C/C++ API with settings string parameter
Adds an additional C/C++ API that allows passing multiple
key-value pairs for settings as a single string. Multiple
settings are delimited by '\n' while the key and value
within a setting are delimited by '|'.
* Append 'Ex' to the extended C/C++ API
* Use set_providers Py API to set config options.
Uses Session.set_providers Python API to set EP runtime config
options as key/val pairs
Deprecated older module function definitions for config settings.
Updates documentation.
* avoid globals for py config options where possible
Co-authored-by: intel <you@example.com>
(1) Save gpt2 test data during test generation.
(2) Use torch fp32 model as baseline when onnx model is fp16.
(3) Refine logic to compose onnx model path
* Add SetLanguageProjection C Api and use it in four projections
* static cast enum languageprojection to uint32_t
* resolve comments
* fix typo and line added unintentionally
* revert unecessary change
* reorder c# api
* add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis
* Changes to enable saving and loading an ORT format model via the public APIs.
Cleanup session.py to try and make slightly more understandable. More refactoring is needed here.
Couple of bug fixes
* Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info.
* Address PR comments
- tweak SessionOptions config to avoid double lookup
- merge duplicated functionality in python binding around registering an EP with optional options
Fix a couple of build issues.
* Update C API to be consistent with python API
- only load model in InferenceSession ctor if required
- support loading ORT model in minimal build
* Fix nodejs test.
We get an invalid path error from LoadInterOp first now
* Another attempt at fixing nodejs test.
Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent.
The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed.
* Fix couple of build issues.
* Disable test temporarily so PR can be checked in.
Will fix in separate PR that adds final pieces for minimal build as the test is required there.
* Give up on nodejs test and make the match simpler.
Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion.
* Fix call to Session.__init__ in TrainingSession.
Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.
Improve quantization tools:
1. Support QAT
2. Make quantization tool to register Operators.
3. Make the API clear to use
Co-authored-by: t-yguo <t-yguo@microsoft.com>
* Export GPT-2 ONNX model without postion_ids and attention_mask inputs
* allow benchmark_gpt2 on user's model
* refactor: get_dummy_inputs returns a data class.
* Next round of changes.
Remove inclusion of ONNX schema header
Exclude custom registry related things
Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.
* adding generic configurations for session options
* fix a build break on linux
* fix training ci build break
* fix training ci build break
* addressed CR comments
* fix traning ci build break
* move config_key from enum to string
* add c# api
* add python api
* fix build break
* move prepacking from 2 new api entries to session options configs
* fix traning ci build break
* add python test, update some comments, move const key definition to avoid build break
* addressed comments
* move definitions of keys to common.h
* move api to version 5
* remove accidental change in build.py
* remove pragma to avoid build break
* addressed CR comments
* fix the python build break, and move location of config keys definition
* small typo changes
(1) Add bert-base-cased and gpt2 benchmark results on V100
(2) Update list of supported models.
(3) Add comments to gpt2_helper.
(4) Use IO Binding in test parity by default.
* improve calibration tool
* modify calibration interface name
* modify calibration interface name
* refine calibrate and calibrate_user
* refine and add type info
* refine and add type info
* add e2e user example file
* remove unnecessary files
* remote test images no longer needed
* update readme document
Co-authored-by: t-yguo <t-yguo@microsoft.com>