Commit graph

5077 commits

Author SHA1 Message Date
Thiago Crepaldi
5c2e1bbb0a
Fix input schema extrator for ORTModule (#8098) 2021-06-18 21:47:49 -07:00
baijumeswani
7701c8703e
Add module attribute to ORTModule to support HuggingFace Trainer save_model (#8088) 2021-06-18 13:13:45 -07:00
Hariharan Seshadri
08eeb8763d
Loosen validation checks in Concat to unblock execution of model in #8020 (#8080) 2021-06-18 11:14:36 -07:00
Olivia Jain
b2247ece25
Make Perf Test Configurable (#7836)
- Allow anyone to kick off a perf test here. Customize: branch, eps, model selection, cuda version.
- Only run shape inference when required.
- Kill errored out memory processes.
- Remove warmup run.
- Clean up script.
- Standalone_TRT is it's own "EP" vs as an additional run with TRT EP
2021-06-18 11:11:19 -07:00
Edward Chen
aa68157c3d
[Mobile package] Update required operator config with additional ops for wav2vec2. (#8079)
Add some additional ops to the mobile package that are needed for the wav2vec2 model.
2021-06-17 13:08:15 -07:00
Guoyu Wang
d83f7fd4aa
[NNAPI EP] Enable Slice support (#8031)
* Enable slice for NNAPI EP

* Add ANEURALNETWORKS_STRIDED_SLICE support

* Addressed CR comments

* Addressed CR comments, rename PrepareForCompute to PrepareForComputeHelper to avoid confusion
2021-06-17 12:36:12 -07:00
Changming Sun
96989b83ee
Create python packages for DML (#8061) 2021-06-16 16:59:12 -07:00
Nick Kreeger
d924fd205b
Create and move quantization tests to a shared Quantized utils file. (#8054)
* Create a shared quantization util for all unit tests.

* Cleanup qlinear_binary_op_test.cc

* save

* save

* save

* cleanup

* save

* cleanup for linux build
2021-06-16 17:00:36 -05:00
Guoyu Wang
32ef39be58
[Android] Move add header files into AAR to using Gradle (#8068)
* Move add header files into AAR to using Gradle

* fix gradle format violation
2021-06-16 12:03:42 -07:00
Ryan Hill
1d8edd0b5b
Fix missing files on linux (#8066) 2021-06-16 11:05:03 -07:00
Wei-Sheng Chin
c76172fab6
Fix PythonOp with input which has no gradient (#8011)
* Fix PythonOp with input has no gradient

* Fix another bug which happens when inputs require gradient

* Remove comments

Co-authored-by: Peng Wang <pengwa@microsoft.com>
2021-06-17 00:19:41 +08:00
Vincent Wang
de8f2ecda9
Reduce Kernel Optimization (#8067)
* reduce optimization

* bug fix

* add a check

* add ut

* refactor

* add ut cases for keepdims=true
2021-06-16 19:53:46 +08:00
Ryan Hill
0ebaa71f49
Improve Windows Platform system error messages (#8063) 2021-06-15 22:17:35 -07:00
Chen Fu
32e118bef0
Fix microbenchmark build failure (#8064)
Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-06-15 20:49:39 -07:00
Tang, Cheng
e31784b6cf
decouple the python module construction from pybind_state (#8060)
* fix broken tests

* decouple the module construction to a seperate file
2021-06-15 18:52:26 -07:00
Changming Sun
96cf533c76 Remove DML from Windows GPU CUDA 10.2 pipeline 2021-06-15 16:53:24 -07:00
George Wu
25c49a5fe0
fix issue with cmake path (#8055) 2021-06-15 15:09:15 -07:00
iperov
07b166bb1b fix PATH addition in windows
should set PATH, not add to the tail the copy of PATH
2021-06-15 14:18:00 -07:00
Sunghoon
887c3149e3
[js/react_native] Use a mobile ORT instead of a full ORT (#8042)
* Change full ort to mobile ort

* Update Android example to load mobile ort

* Change the format of test models to ort

* update ios to use mobile ort

* revise README

* use onnxruntime-mobile-c CocoaPods in a npm package
2021-06-15 13:36:05 -07:00
Nick Kreeger
6a1b000125
Fix unit test typo in test_op_embed_layernorm.py (#8056) 2021-06-15 15:27:44 -05:00
Changming Sun
07788e082e
Enable python GPU tests (#7854) 2021-06-15 10:24:58 -07:00
G. Ramalingam
8079c76383
Create ORT opschema library (#7903)
* Op schema library

* Create ORT opschema library and sample app

* delete message in cmake

* Fix cmake

* Address PR feedback and add dependency

* Add cmake dependency

* Cmake fix

* Add dependency for nsync

* Add dependency for nsync

* Reorder dependencies

* Testing for dependencies on all platforms

* Resolve dependencies on GetStackTrace, floatToHalf

* Compiler strict-aliasing warning

* Merge with master

* Minor cleanup
2021-06-14 14:02:33 -07:00
Olivia Jain
c72a8c7ff4
Upgrade tf 2.4.1 to 2.4.2 for component governance (#8036)
* Upgrade tf 2.4.1 to 2.4.2 for component governance

* Trial run with tf 2.5.0
2021-06-14 09:30:58 -07:00
George Nash
9acf93b90a
Take graph topology into account when creating dnnl subgraphs (#7910)
Check the inputs of all nodes are part of the subgraph for all
operators.  Previously the code assumed all operators only had
a single input except for the "Sum" operator.

This resolves issue seen when adding new operators that a subgraph
was incorrectly accepting a node when the subgraph should not have
because it was not following the topology of the nodes.

Signed-off-by: George Nash <george.nash@intel.com>
2021-06-13 19:23:37 -07:00
Xavier Dupré
6d7461795f
Update Version.md (#8021)
Fix the correct supported opset 1.8.0.
2021-06-13 18:52:40 +02:00
Pranav Sharma
ad6a306a7f
Add pragma once (#8040) 2021-06-11 23:47:26 -07:00
Scott McKay
96ead2be91
Avoid hashing the operator type in the GraphViewer priority node check unless the string has a chance of matching. (#7972)
* Avoid hashing the operator type in the GraphViewer priority node check unless the string has a chance of matching.

Below are perf numbers from a test that loads 16 models multiple times. I was checking that some unrelated changes didn't have unexpected perf cost and found the PriorityNodeCompare overwhelmed any contribution the other changes were making.

*Before*

CPU Time:74.678s

CPU Time for relevant Top Hotspots
std::_Hash_array_representation<char> 20.834s
onnxruntime::PriorityNodeCompare::IsHighPri 7.589s
onnxruntime::Graph::KahnsTopologicalSort 4.487s

*After*

CPU Time:47.103s

CPU Time for relevant Top Hotspots
onnxruntime::Graph::KahnsTopologicalSort 4.465s
onnxruntime::PriorityNodeCompare::IsHighPri 2.873s
2021-06-12 14:11:33 +10:00
Edward Chen
6e134c2cc3
[Objective-C API] Add support for documentation generation (#7999)
Adding support for generating API documentation with the Jazzy tool.
It's a manual process now, but we can eventually make it a part of the release pipeline.
2021-06-11 17:49:00 -07:00
Nick Kreeger
1d7f44a832
Add unit test for EmbedLayerNormalization quantization op. (#8033) 2021-06-11 17:33:55 -05:00
Ye Wang
e6225c62a5
transformers test CI pipeline fix (#8016)
* init checkin

* Restore initial environment

* -y

* testtest

* fix

* fix indent
2021-06-11 12:57:52 -07:00
sumitsays
43c45ddd66
Update DirectML EP changes from DmlDev as of 2021-06-07 (#7987)
* Merged PR 6093117: Fix test_DynamicQuantizedLinear_max_adjusted_expanded by allowing Identity operator to run on non-float inputs

Motivation:
As part of the OnnxConformance Backend tests, DynamicQuantizedLinear_max_adjusted_expanded is failing.

Root Cause:
- The test model has `Identity` operator as one of the node. The input of this node is of non-float data type.
- In DML, `Identity` operator is registered as operator which requires floating input.
- As per `DirectMLSchema.h`, support for non-float input has been added for `Identity` operator in DML but the same has not been reflected in the `OperatorRegistration.cpp`.

Changes:
- Removed all traces of the requiresFloatFormatsForGraph flag from it's definition and usage. This flag was only used for Identity and it's related operator.
- Added null check for the graphOutput nodeArg in GraphDescBuilder.cpp to stop the crash of the test.

Related work items: #33076298

* Merged PR 6103324: Remove usage of non-generic error code (FWP_E_NULL_POINTER)

Motivation:
Addressing Dwayne comment on the previous PR. [Ref: [6093117](https://dev.azure.com/microsoft/WindowsAI/_git/onnxruntime/pullrequest/6093117?discussionId=44292162&path=%2Fonnxruntime%2Fcore%2Fproviders%2Fdml%2FDmlExecutionProvider%2Fsrc%2FGraphPartitioner.cpp)]

Changes:
Inside the DML EP, we should not use some other platform specific error codes. Instead we should a appropriate generic error code.

Related work items: #33076298

Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-06-11 11:09:48 -07:00
Vincent Wang
2f2aaf2cf6
Fix Memory Leak from DlpackToOrtValue (#8029) 2021-06-11 15:48:13 +08:00
Jeff Daily
d02de9c1bc
[ROCm] dockerfile updates (#7955)
* do not remove onnxruntime build directory in Dockerfile.rocm4.1.pytorch

* restore ONNX Runtime Training Examples to rocm 4.2 dockerfile
2021-06-10 23:50:19 -07:00
Scott McKay
00d48d9c30
Add enhanced partitioning utils for use by compiling EPs (#7991)
* Add enhanced partitioning utils and convert internal testing EP to use it. Will convert NNAPI EP once checked in.

Background:
Currently most EPs do their partitioning by iterating the model in the topologically sorted order. Whilst this works, it doesn’t ensure that all nodes which could possibly be added to the current group are, as the group is closed as when the first unsupported node is seen.

Changes:
- Ask EP for all nodes it supports first
- Do partitioning aware topological sort
  - Groups nodes and flips between processing supported and unsupported nodes to maximise inputs that will be available for each partition
- Create groups of nodes for the partition using the new order of nodes
- Create ComputeCapability for each group

There’s also an additional ability to specify operators to stop at. The processing will find all downstream nodes from ‘stop at’ operators and exclude them. If NonMaxSuppression is specified we can prevent the post-processing from SSD Mobilenet and MobileDet attempting to use NNAPI (so easy way to have parity with the TF Lite behavior). I don’t think there’s an automated way to determine what if any ‘stop at’ operators are required for a model, so this will need to be a configuration parameter for the EP and we’ll need to document recommended values for popular models.
2021-06-11 15:23:21 +10:00
Suffian Khan
35ca3c99d1
Fix ROCm wheels pipeline after changes to manylinux scripts (#8026)
* update

* try fix rocm pipeline

* avoid already isntalled error

* ignore python3.10 since build fails

* fix

* try setting user

* try again

* try again

* try again

* fix script

* disable inference docs generation

* try print device id

* fix name qual

* try again

* try again

* try again

* provider_options

* add device verify

* rty again

* try again

* try aggain

* print video/render gid

* try again

* run as root

* try again with uid, gid

* cleanup

* run as root

* temp fix

* add /bin/bash

Co-authored-by: Changming Sun <chasun@microsoft.com>
2021-06-10 21:01:28 -07:00
Scott McKay
20579595c8
Make logic in InsertCastTransformer around forcing a node to fp32 more precise. (#8018)
* Address #7981

Reworked the logic around forcing a node to run on fp32 even if it was supported on fp16.

The github issue had multiple factors. In ORT 1.8 we remove Identity nodes that produce graph outputs as they're not needed. That resulted in a Loop node no longer having output nodes (it produces graph outputs instead), which meant the check in IsSingleInputNodeFloat16Node returned true as there was no longer a downstream Identity node processing fp16 data.

We shouldn't only force a node to fp32 in very specific circumstances, and the changes hopefully check for those more precisely.
2021-06-11 13:54:40 +10:00
Nat Kershaw (MSFT)
0237225117
Add @file annotation to support doxygen generation of C API docs (#7458) 2021-06-10 16:10:32 -07:00
baijumeswani
b2ed4fb0a4
Merge orttraining and ortmodule single gpu ci pipelines (#8022)
* Merge orttraining and ortmodule single gpu ci pipelines

* Remove Debug from orttrainer build config
2021-06-10 15:58:23 -07:00
Rachel Guo
4d1b48632c
[CoreML EP] Add ArgMax op support and modify OpBuilder interface (#7924)
* add argmax skip cast op support [initial]

* modify some op support related logic

* fix typo

* add cast node into the partition

* update cast op builder and add int32 graph output

* modify op_builder interface

* exclude unused header file

* clean minor update

* minor change

* address comments

* address more comments

* add UT test for the case

* address more comments

* minor change

* refine

* update

* refine

* make UT test run on CPU

* minor formatting

* update

* switch UT test implementation

* minor refine

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2021-06-10 14:39:15 -07:00
Changming Sun
b313c4581c
Remove CC/CXX env settings from C API packaging pipeline (#8014) 2021-06-10 11:36:52 -07:00
Sherlock
2a74f5e85b
Save module output for backward if needed (#8010)
* Save module output for backward if needed
2021-06-10 09:56:35 -07:00
Changming Sun
c74265667e
Remove CUDA architectures 35 and 86 from GPU packages (#8004)
Because our python packages are oversize.
2021-06-09 17:47:34 -07:00
Ryan Hill
b03383f6d5
Add cuda provides files (#8002) 2021-06-09 15:31:24 -07:00
Guoyu Wang
f013b0c0eb
[NNAPI EP] Add support of Elu, merge in NNAPI updates for API level 30 (#8001)
* Add elu, integrate new Android NNAPI API changes

* add slice check

* update previous typo

* Move sdk level check to nnapi feature level check

* update readme
2021-06-09 12:39:02 -07:00
Changming Sun
aa45545af7
Update orttraining-linux-gpu-perf-test-ci-pipeline.yml (#8005)
I changed the OS version. It's is Ubuntu 20.04 + python 3.8 now. So I need to update the python command.
2021-06-09 10:22:14 -07:00
pengwa
cb5f411da3
Fix Python Packaging Pipeline && Build Clean Up (#7993)
* remove link to python

* revert orttraining-linux-ci build env change introduced by pr
https://github.com/microsoft/onnxruntime/pull/7993.

* fix builds

* fix builds

* clean up

* fix builds

* Fix unused params

* fix some comments.
2021-06-09 17:35:17 +08:00
Ye Wang
d433aa2459
Add transformers tool test to pipeline (#7959)
* checkin transformers pipeline

* add docker requirements

* only trigger linux cpu

* temp remove tf instalation due to numpy version conflicts

* test numpy>=1.7

* revert numpy and disable transformers

* add coloredlogs

* enable shape_infer_helper and install transformers when needed

* pip3?

* testtest

* enable more tets

* line too long

* remove pytorch1.4 test and added back some onnx  files

* add tests

* copy dir

* disable 2 teests

* trim lines

* add missing onnx

* fix type

* fix  version conflicts

* install psutil

* change file path

* mfix path

* remove cached files

* add back attention fusion test

* labeled the shape infer test as slow

* fix

* enable tf2onnx test and enable pytest

* refactor path

* fix typo

* add cwd
2021-06-08 19:43:59 -07:00
Vincent Wang
f0f3012666
Add SoftmaxCrossEntropyLossInternal to Support Dynamic ignore_index Input (#7899)
* add SoftmaxCrossEntropyLossInternal

* bugfix and ut

* fix ut

* fix ut

* support torch1.8.1

* function body for nll_loss_internal
2021-06-09 10:29:46 +08:00
baijumeswani
f38200e209
Override ORTModule named_modules to support extra arg (#7954) 2021-06-08 17:32:09 -07:00
Yulong Wang
1cc896c8ae
optimize js package folder structure (#7989)
* put generated .js files into dist/ folder

* format code
2021-06-08 16:49:06 -07:00