Commit graph

7863 commits

Author SHA1 Message Date
senysenyseny16
3be96f8a15
fix: import error in TrtTable::Dict method (#8940) 2021-09-07 00:28:49 -07:00
Ye Wang
5d47b2e431
Add Einsum and Reciprocal op support in symbolic shape inference (#8931)
* fix 1

* fix 2

* update

* support einsum

* format

* test

* format

* add test for eimsum
2021-09-06 16:54:48 -07:00
Changming Sun
60c98a86b7
CMake file changes for macOS universal2 support (#8953) 2021-09-04 13:30:33 -07:00
stevenlix
a9776d1c70
Add QDQ model support in TensorRT EP (#8969)
* disable setting dynamic range for QDQ model

* update cgmanifest

* Update cgmanifest.json
2021-09-03 19:33:34 -07:00
ytaous
53eb79f9f6
Gemm/Transpose fusion - additional pattern coverage (#8941)
* gemm transpose fixes

* enforce condition

* add comments

* rm redundant code

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-03 15:24:47 -07:00
Scott McKay
eebcc20f10
Add netstandard2.0 framework to nuget managed package. (#8960)
* Add netstandard2.0 to nuget managed package.
Re-does PR that was backed out due to packaging pipeline changes.
Allows deprecation of netstandard1.1 in the following release as netstandard2 is the preferred lowest level framework.
2021-09-04 08:01:46 +10:00
Olivia Jain
a0c9408f0d
Make TRT Version Configurable (#8864)
* copy changes from trt_and_mem

* second edits

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* change to cuda 11.4

* build with cuda 11.4

* Update Dockerfile.ubuntu_cuda11_1_tensorrt7_2

* add cmake extra defines

* cmake architectures

* fix cmake arch

* Delete ubuntu-18.04.Dockerfile

* Rename Dockerfile.ubuntu_cuda11_1_tensorrt7_2 to Dockerfile.ubuntu_cuda11_4_tensorrt7_2

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* removing previous ort args

* rename to cuda 11.4

* remove cuda 10_2

* delete trt 7.1

* remove 7.1

* Passing in cuda architecture to reduce build time

* always add submodule sync due to recursive cloning

* fix run command

* add and

* take away unused arms and share python installation script

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml

* Update Dockerfile.tensorrt

* cleanup file

* install python directly on dockerfile - move to scripts in future

* Update Dockerfile.custom-trt-perf

* adding cuda 11.1 for missing Libnvrtc.so.11.1

* Delete install_python.sh
2021-09-03 13:32:27 -07:00
Chi Lo
1f576e1766
Detect necessary files inside GPU packages (#8955)
* Rename files

* Update YAML files

* Update validation script and YAML
2021-09-03 13:28:28 -07:00
liqun Fu
a7f5bd226b
retarget torch181 to torch182 (#8947)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-03 09:44:42 -07:00
baijumeswani
0cc2909573
Auto forward non method attribute lookups to the user's model and bind custom methods to ORTModule (#8798) 2021-09-03 08:25:44 -07:00
Vincent Wang
c343f7cb43
Add Algorithm Search for ConvGrad (#8613)
* algo search for conv grad

* global cache, bigger workspace size

* fix build error

* refactor

* refactor

* resolve comments

* fix rocm

* change lock places

* rename variable

* remove setting for inference

* resolve comments
2021-09-03 11:25:17 +08:00
Tianlei Wu
91f05f387a
Update embed layer norm fusion to work with transformers v4.9 (#8914) 2021-09-02 19:48:07 -07:00
Hariharan Seshadri
e348929019
Minor cleanup from #7592 (#8952) 2021-09-02 18:46:57 -07:00
Scott McKay
5f30be3e92
Exclude training support from BatchNorm in minimal build (#8939)
* Exclude changes to BatchNorm that are training specific from minimal build.

Previous changes [excluded](https://github.com/microsoft/onnxruntime/pull/7704) training specific code but that was recently [undone](https://github.com/microsoft/onnxruntime/pull/8269) to support a pytorch CI need that isn't relevant to minimal builds.
2021-09-03 08:02:19 +10:00
Gary Miguel
47435311f4
Include pytorch_export_contrib_ops in inference builds (#8878)
* Include pytorch_export_contrib_ops in inference builds

Rename / move it from tools/python/register_custom_ops_pytorch_exporter
to onnxruntime/python/tools/pytorch_export_contrib_ops.

Rationale for inclusion in inference builds:
This code is potentially useful for anyone using ORT, not just training.

Rationale for new name:
"Contrib op" is the nomenclature used within ORT to refer to the set of
ops that are not in the standard op set but are included by default with
ORT. This is more specific than "custom op", which is what the PyTorch
exporter uses to refer to any non-standard op.

Step 1 of addressing #8818. After this is merged I will update the docs.

* Enable test_pytorch_export_contrib_ops.py in CI

Fixes AB#1342330
2021-09-02 14:26:58 -07:00
Gary Miguel
06bb2ec561
ignore direnv configs (#8861)
https://direnv.net/ is a useful tool but its configs are developer-specific
2021-09-02 11:53:57 -07:00
Thiago Crepaldi
fe7f30aa14
Enable all-or-nothing fallback by default (#8911) 2021-09-02 10:45:14 -07:00
Changming Sun
1a34775fe9
Fix the benchmark code (#8926) 2021-09-02 10:36:24 -07:00
Tianlei Wu
6490191f58
Fix non deterministic of --input_int32 of transformer optimizer (#8927) 2021-09-02 10:20:48 -07:00
Ye Wang
7647caa520
update Tensorflow_Tf2onnx_Bert-Squad_OnnxRuntime_CPU.ipynb (#8898)
* init checkin

* update

* update

* Update Tensorflow_Tf2onnx_Bert-Squad_OnnxRuntime_CPU.ipynb

* Update Tensorflow_Tf2onnx_Bert-Squad_OnnxRuntime_CPU.ipynb

* use prettrained model

* re-run

* re-run
2021-09-02 09:59:40 -07:00
satyajandhyala
4570d85f20
Move setdlopenflags calls into _pybind_state.py (#8916)
* Use PROTOBUF_LIB instead of protobuf::libprotbuf

* Moved setdlopenflags to _pybind_state.py

* Copy the generated _pybind_state.py to required location for Windows.
2021-09-02 09:54:32 -07:00
Wei-Sheng Chin
f711d8992a
Not to calc memory for inference (#8935) 2021-09-02 09:49:54 -07:00
Changming Sun
fbb6f0f599 Fix an error in Nuget pipeline caused by merge conflict 2021-09-02 09:26:25 -07:00
Scott McKay
b058dee648
Fix a couple of issues mentioned in the PR comments. (#8936) 2021-09-02 17:58:29 +10:00
Hariharan Seshadri
ddbc8bc5fc
Fix CPU Xor implementation (#8934) 2021-09-01 21:38:55 -07:00
Edward Chen
1985616262
Trim InferenceSession binary size. (#8917)
- Move flatbuffers SessionState access code into helper functions instead of duplicating them between InferenceSession and SessionState.
- Trim VerifyEachNodeIsAssignedToAnEp(), e.g., disable verbose log output in a minimal build.
2021-09-01 18:18:32 -07:00
Sunghoon
332c2ba4f4
[js/web] Integrate ONNX Runtime Web CI with BrowserStack (#8859)
* Integrate ONNX Runtime Web CI with BrowserStack

* Rename a pipeline from browserstack to multi-platform
2021-09-01 17:25:57 -07:00
liqun Fu
757e9e6df7
do not post cuda version mismatch warning if cannot find local cudart version (#8924)
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-09-01 17:11:54 -07:00
liqun Fu
f126a12699
decouple pytorch from onnxruntime training build (#8815) 2021-09-01 16:31:53 -07:00
Tianlei Wu
9467f511ac
Disable some ORT graph optimizers in offline transformers optimization tool (#8923)
walkaround "Unsupported operator FusedMatMul" during symbolic shape inference
2021-09-01 15:47:57 -07:00
Suffian Khan
225439193e
Optimize Concat and Split on CUDA to eliminate host-to-device copies when sizes are all the same (#8833)
* special case concat and split when sizes are equal

* add tests for 16 and 32 inputs with same dim

* add tests for 16/64 inputs on concat or 16/64 outputs on split

* try eliminate windows warning

* outter => outer
2021-09-01 15:25:45 -07:00
Scott McKay
858989293d
Reduce binary size of strided copy used by Concat (#8913)
* Change the strided copy to switch on data size not data type.
Move to header so we can reduce on the enabled types.
Setup type reduction for Concat now that it's using this implementation.
2021-09-02 08:19:20 +10:00
satyajandhyala
9e661b64ae
Fix cast propagation to not change casts from bool type. (#8925)
* Added new models to test bool->float and bool->float16 casts

* Fixed bool casts. Added new test cases.
2021-09-01 15:15:37 -07:00
Changming Sun
6299a60bf8
Nuget: splitting PDB files to a separated package (#8903) 2021-09-01 09:07:24 -07:00
Suffian Khan
00b0a9c127
Add hugging-face models loss curve and performance guards to ROCm CI pipeline. (#8915)
* test running hf bert-large

* try again

* try again

* include other models

* correct names

* disable deberta-v2-xxlarge

* avoid torch.distributed

* add compare json loss and perf for bert-large to test

* fix sed expression

* remove pytest

* add more models

* move unit tests u

* display samples/sec
2021-09-01 09:03:10 -07:00
Chi Lo
43d6951fa5
Add warning message for combined trt +cuda python pkg (#8906)
* Add warning message

* update message

* fix line too long

* fix flake8 issue
2021-09-01 07:28:01 -07:00
Hariharan Seshadri
acd9db7fad
Fix location planning for initializers used only in nested subgraphs (#8642) 2021-09-01 00:02:08 -07:00
Tang, Cheng
4dc0ddf606
support register external ep lib information (#8897)
* support register external ep lib inforation; make eager mode share the same ep pools with training workloads

* fix inference code

* fix build break

* fix the message
2021-08-31 20:51:22 -07:00
pengwa
3eb08d4dc7
custom autograd func memory (#8901)
* remove PythonOpGrad control dependency && avoid segement fault

* comment alignment

* fix bugs
2021-09-01 09:29:26 +08:00
Yulong Wang
feb747173e
[js/web] Update browser support table (#8900)
* [js/web] Update browser support table

update section 'Compatibility' for Edge browser

* update linux
2021-08-31 17:39:51 -07:00
Guoyu Wang
8404a2d011
Add NNAPI E2E test for Android java package (#8912)
* Add NNAPI E2E test for Android java package

* address cr comment
2021-08-31 17:34:33 -07:00
Changming Sun
a9a0d3f6fa Update min supported macOS version to 10.14 2021-08-31 16:09:48 -07:00
baijumeswani
70ca03d491
Correctly set the skip check flags for ORTModule (#8891) 2021-08-31 15:28:04 -07:00
Corentin Schreiber
69ab4670f7
CUDA UpsampleNearest performance improvement (#7592)
* Made rank a template parameter of _UpampleNearestKernel

* Added error checking for rank specified to UpampleImpl

* Added __restrict__ keyboard to input and output arrays in Upsample
2021-08-31 14:25:42 -07:00
Changming Sun
129722db37
Add android binary size monitor back (#8904) 2021-08-31 14:13:55 -07:00
ashbhandare
cd4b9f7753
Fix EP in transform (#8909) 2021-08-31 13:52:57 -07:00
George Nash
dc75a135c8
Add elementwise operators to DNNL execution provider (#8899)
The following ops have been added to the DNNL execution provider
Abs, Elu, Exp, Log, *Relu, Round, Sigmoid, Softplus, Sqrt, and Tanh

*Relu op was moved from its individual file to the elementwise operators

The error tolerance for the LogGrad unit test had to be decreased slightly
when using OneDNN.  Still investigating why a differet tolerance value is
needed.

DnnlSubgraph::AddKernels() member function was moved to the top of the file
since this is eddited every time a new operator is added to the the execution
provider this places the code at the top which mean less scrooling when
adding new kernels.

Signed-off-by: George Nash <george.nash@intel.com>
2021-08-31 12:20:49 -07:00
Zhang Lei
2e37fe3f68
Fuse HardSigmoid with conv. (#8674)
* Fuse HardSigmoid with conv.
Add transform test case and FusedConv testcase.

* Limit Conv/HardSigmoid fusion in CpuExecutionProvider.

* Fix typo for arm build.

* change format one place
2021-08-31 12:19:34 -07:00
Yulong Wang
206537936f
[js/web] enable proxy worker for wasm backend (#8862) 2021-08-31 10:23:42 -07:00
Olivia Jain
33c0b3e94b
Perf test fixes (#8863)
* fix anubis wheel upload and symbolic shape infer location

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml for Azure Pipelines

* fix symbolic path

* use master and call mem_test after build

* Update linux-gpu-tensorrt-ci-perf-pipeline.yml

* use installed symbolic shape infer TODO: check upon error

* catch symbolic shape errors
2021-08-31 10:03:47 -07:00