* Add netstandard2.0 to nuget managed package.
Re-does PR that was backed out due to packaging pipeline changes.
Allows deprecation of netstandard1.1 in the following release as netstandard2 is the preferred lowest level framework.
* copy changes from trt_and_mem
* second edits
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* change to cuda 11.4
* build with cuda 11.4
* Update Dockerfile.ubuntu_cuda11_1_tensorrt7_2
* add cmake extra defines
* cmake architectures
* fix cmake arch
* Delete ubuntu-18.04.Dockerfile
* Rename Dockerfile.ubuntu_cuda11_1_tensorrt7_2 to Dockerfile.ubuntu_cuda11_4_tensorrt7_2
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines
* removing previous ort args
* rename to cuda 11.4
* remove cuda 10_2
* delete trt 7.1
* remove 7.1
* Passing in cuda architecture to reduce build time
* always add submodule sync due to recursive cloning
* fix run command
* add and
* take away unused arms and share python installation script
* Update linux-gpu-tensorrt-ci-perf-pipeline.yml
* Update Dockerfile.tensorrt
* cleanup file
* install python directly on dockerfile - move to scripts in future
* Update Dockerfile.custom-trt-perf
* adding cuda 11.1 for missing Libnvrtc.so.11.1
* Delete install_python.sh
* Include pytorch_export_contrib_ops in inference builds
Rename / move it from tools/python/register_custom_ops_pytorch_exporter
to onnxruntime/python/tools/pytorch_export_contrib_ops.
Rationale for inclusion in inference builds:
This code is potentially useful for anyone using ORT, not just training.
Rationale for new name:
"Contrib op" is the nomenclature used within ORT to refer to the set of
ops that are not in the standard op set but are included by default with
ORT. This is more specific than "custom op", which is what the PyTorch
exporter uses to refer to any non-standard op.
Step 1 of addressing #8818. After this is merged I will update the docs.
* Enable test_pytorch_export_contrib_ops.py in CI
Fixes AB#1342330
* Use PROTOBUF_LIB instead of protobuf::libprotbuf
* Moved setdlopenflags to _pybind_state.py
* Copy the generated _pybind_state.py to required location for Windows.
- Move flatbuffers SessionState access code into helper functions instead of duplicating them between InferenceSession and SessionState.
- Trim VerifyEachNodeIsAssignedToAnEp(), e.g., disable verbose log output in a minimal build.
* special case concat and split when sizes are equal
* add tests for 16 and 32 inputs with same dim
* add tests for 16/64 inputs on concat or 16/64 outputs on split
* try eliminate windows warning
* outter => outer
* Change the strided copy to switch on data size not data type.
Move to header so we can reduce on the enabled types.
Setup type reduction for Concat now that it's using this implementation.
* test running hf bert-large
* try again
* try again
* include other models
* correct names
* disable deberta-v2-xxlarge
* avoid torch.distributed
* add compare json loss and perf for bert-large to test
* fix sed expression
* remove pytest
* add more models
* move unit tests u
* display samples/sec
* support register external ep lib inforation; make eager mode share the same ep pools with training workloads
* fix inference code
* fix build break
* fix the message