Commit graph

4142 commits

Author SHA1 Message Date
Jesse Benson
c4b6559be9 Update reduction_all.cu 2021-02-04 15:00:05 -08:00
Jesse Benson
5fc377f21e Partial updating of ROCM reduction code. 2021-02-04 15:00:05 -08:00
Edward Chen
318b82ca7e
Cast Op performance fix. (#6509)
Update CPU Cast implementation to fix performance regressions.
Update Cast unit tests for more coverage.
2021-02-04 14:52:37 -08:00
Edward Chen
2ef792ae6e
Don't resolve symlink in resolve_executable_path(). (#6540) 2021-02-04 12:32:03 -08:00
Changming Sun
aa31ba5774
Merge CPU packaging pipelines (#6480)
1. Merge Nuget CPU pipeline, Java CPU pipeline, C-API pipeline into a single one.
2. Enable compile warnings for cuda files(*.cu) on Windows.
3. Enable static code analyze for the Windows builds in these jobs. For example, this is our first time scanning the JNI code.
4. Fix some warnings in the training code.
5. Enable code sign for Java. Previously we forgot it.
6. Update TPN.txt to remove Jemalloc.
2021-02-04 08:38:56 -08:00
Guoyu Wang
0d35f0e2c0
[CoreML EP] Add support of Conv operator (#6510)
* [CoreML EP] Add support of Conv operator

* Ignore an corner case setting empty padding

* Add handle autopadding

* Addressed CR comments
2021-02-04 00:30:10 -08:00
Guoyu Wang
6cf54ff296
Switch Android CI java build to JDK 11 (#6552)
* switch to jdk11

* fix java

* Update
2021-02-03 17:49:23 -08:00
Ryan Lai
c7feb48083
Don't send out Runtime error telemetry when can't create LearningModelDevice on machine without hardware adapters (#6535)
* Checkoutpoint 1

* Remove global logruntime error telemetry. This isn't necessary and doesn't contain relevant information

* Make macro simpler

Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2021-02-03 14:27:29 -08:00
Guoyu Wang
464dbef143
[NNAPI EP] add uint8 support for Transpose/Concat/Maxpool, add support of QLinearSigmoid (#6534)
* Init change

* Add QlinearSigmoid support

* Update tests

* Add resize int8 support

* Add version check for resize linear uint8 and add scale/zero point check for concat uint8

* Address CR comments

* minor fix and add test for uint8 handling

* Address CR comments

* Fixed an existing bug

* Fix the new UT break, due to different rounding of 0.5 in device and emulator
2021-02-03 13:45:49 -08:00
Scott McKay
6cb8f8c812
Support disabling a typed kernel registration that uses the output type (#6530)
* Update infrastructure to support disabling a typed kernel registration that uses output 0 for the type (vs. the normal use case of input 0).
2021-02-03 14:22:32 +10:00
Scott McKay
8d53ef69e5
Add type reduction support to Min, Max and Pow (#6519)
* Add type reduction support to Min, Max and Pow
Update the C++ type reduction infrastructure to allow specifying an opset for the supported types list, as those can change across opset versions.
Minor updates to the type usage tracking script
* Add 'all opsets' macros and constant
2021-02-03 06:51:35 +10:00
Thiago Crepaldi
fbb24b57d0
Update code owners for pytorch frontend team (#6329) 2021-02-02 11:09:10 -08:00
ashbhandare
85434273ff
Fix CUDA Reduction kernel for ArgMax/ArgMix for when reduction dim=1 (#6490)
* Fix for when reduction dim=1

* Disable test for AMD GPUs

* Specify Async
2021-02-02 09:50:16 -08:00
Derek Murray
14f7d56c81
Add optimized version of ConvGrad for pointwise convolutions. (#6531)
Co-authored-by: Tracy Sharpe <tracysh@microsoft.com>
2021-02-02 08:09:20 -08:00
Cian Hayes
6fc5237d9e
Introduce --enable_training_ops build flag (#6523)
* minimal_build with training ops

* Removing redundant comment from an earlier attempt at a fix

* Fixing a bad merge conflict resolution

* Responding to PR feedback

* tweaking the makefiles based on feedback

* combining two enable_training blocks in CMakeLists.txt
2021-02-01 21:54:16 -08:00
Tracy Sharpe
9a6e71574a
MLAS: improve quantized depthwise convolution (#6513) 2021-02-01 21:22:27 -08:00
Scott McKay
588ddeb82f
Add level 1 optimized mnist model so that the required_ops.config includes the ops in that (which are used in mnist.level1_opt.ort). NNAPI unit tests need this. (#6514)
Update required_ops.config with automatically generated version. There have been some other changes to testdata which show up as diffs here.
2021-02-02 14:06:49 +10:00
Yufeng Li
7264a067a9
Implement QuantizeLinear with avx512 (#6260)
* Implement QuantizeLinear with AVX512
2021-02-01 14:33:44 -08:00
Scott McKay
5b69cbe80e
Fix Windows CI builds by updating test scripts to work with numpy 1.20. (#6518)
* Update onnxruntime_test_python.py to work with numpy 1.20.

Some aliases are deprecated in favor of the built-in python types. See https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

np.array with bytes for entries and dtype of np.void no longer automatically pads. Change a test to adjust for that.

* Fix another test script
2021-02-01 04:49:55 -08:00
Scott McKay
e5cbcec17f
Fix issues with ArmNN build setup (#6495)
* ArmNN build fixes
* Update BUILD.md to document that the ACL paths must be specified to build ArmNN
* Fix CUDA build error. We don't setup the link libraries correctly/consistently so improve that.
2021-01-31 09:01:32 +10:00
Guoyu Wang
f2872ffd64
Print a warning message for using newer c_api header on old binary (#6507) 2021-01-29 19:29:14 -08:00
Chi Lo
7c5bfbaaab
Lochi/refactor yolov3 quantization (#6290)
* Refactor the code and move data reader, preprocessing, evaluation to
E2E_example_mode

* Refactor the code.

Move data reader, preprocessing, evaluation to model specific example
under E2E_example_mode

* refactor code

* Move yolov3 example to specific folder and add additional pre/post
processing
2021-01-29 19:28:09 -08:00
George Nash
a36f627a4c
Dnnl training (#6045)
* Add ReluGrad and ConvGrad ops for the dnnl provider

* the mnist sample is updated to add the --use_dnnl option that
will cause the sample to use the dnnl execution provider for
nodes that exist in dnnl provider.

* Added the ability to find forward ops. Dnnl backward gradient
ops require the forward primitive description and workspace
from the forward operation.

* Enable specifying the execution provider for Gradient Checker Tests

* Prevent memory leak when running dnnl_provider in training mode

Prevent creating a SubgraphPrimitivePool when the code is built with the
ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly.

The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be
stashed in a map for reuse. Due to the way the Training Loop uses threads
the pool of SubgraphPrimitives were not being reuse instead a new pool of
SubgraphPrimitives being created each run. The old pool was not instantly
freed. This behavior could be a language error when using thread_local
memory.

Signed-off-by: George Nash <george.nash@intel.com>

* Added fixes to maxpoolgrad and memory leak.

Maxpoolgrad will now pass all unit tests.
With the conv and convgrad disabled for dnnl, mnist is able to train till 95%

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Fixed misc issues when testing training code with dnnl provider

* fix conv_grad dnnl tests with dilation to run dnnl execution provider

* update mnist training sample to accept convolution type models

  convolution models require the input shape to be {1, 28, 28}
  instead of the flat {728} image that is used for the gemm models

  this will enable models that require the different shape by adding
 `--model_type conv` to the command line when running the mnist sample.
 (while testing a workaround was used see #4762)

* Disable weight caching in dnnl conv operator when using training

  When training we can not use cached weights because the weight
  will be updated each run. This re-enables dnnl Conv and ConvGrad Ops.
  The weight caching was the source of the error from Conv when training.

* Fix issues found when building grad ops on Linux
  * The dnnl_convgrad code was over using the scope operator
    causing a compilation problem.
  * The dnnl_maxpoolgrad code had a logic error that is was
    comparing with the source description when it should have
    been comparing with the destination despription.

* Update BUILD.md so it shows DNNL for training
  * Updated the table of contents. Since the same providers
    are listed twice. Once for Infrance and again for Training
    an HTML anchor was added to distinguish the second header
    from the first for the TOC.

* Fix build failure when not using --enable-training build option

* reorganize the gradient operators so they are grouped together

* Fix issues found when running onnx_backend_test_series.py

* Pooling code only supports 2 outputs when built with --enable-training

* Address code review feedback
  * class member variables end in underscore_
  * use dst instead of dist to match pattern use elsewhere in DNNL code.

* Remove workaround that was introduced to handle problems running
  convolution based training models. See issue #4762

Signed-off-by: George Nash <george.nash@intel.com>

* Isolate training code and code cleanup

* Do not build if dnnl_gpu_runtime if enable_training is set training code
  does not support dnnl_gpu_runtime yet.
* Isolated Training code inside ifdefs so that they wont affect
  project if built without training enabled
* Inadvertant changes in whitespace were removed to make code review simpler
* Undid some code reordering that was not needed
* comments added to closing #endif statments to simplify reading complex ifdefs
* Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw
  pointer. This matches what was done in Pool code and is safer memory code.

Signed-off-by: George Nash <george.nash@intel.com>

* Address code review issues

- whitespace changes caused by running clang-format on the code
- Several spelling errors fixed
- Removed/changed some ifdefs to improve readability
- other misc. changes in responce to code review.

Signed-off-by: George Nash <george.nash@intel.com>

* Code changes to address code review

- Simplify iteration code using `auto` keyword
- remove C style cast that was not needed
- remove instance variable that was not needed [relugrad.h]
- added the execution providers to `ComputeGradientErrorInternal()`
  and `ComputeTheoreticalJacobianTranspose()` instead of using
  a pointer to an instance varaible [gradient_checker.h/.cc]

Signed-off-by: George Nash <george.nash@intel.com>

* Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function.
This will reduce repeated code.
Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created.
This will prevent memory leak from convolution kernels being pushed constantly onto the stack.
Signed-off-by: chethan.palangotu.keshava@intel.com

* Code clean up and formating updates

 - Removed empty else statment
 - updated indentation of code that was causing double curly brackets to look unususal
 - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code.
 - isolated training code

Signed-off-by: George Nash <george.nash@intel.com>

* Restore inadvertantly removed ConvGrad tests

When combining the DNNL and CPU version of the ConvGrad
tests two test were inadvertantly excluded.  This adds
back the Conv3d and Conv3d with strides test cases.

Signed-off-by: George Nash <george.nash@intel.com>

* Add validation to ConvGrad

This validates the dimensions of the ConvGrad match the
passed in Convolution forward primitive description.

The current code for DNNL ConvGrad makes the assumption that the ConvGrad
nodes will be visited in the reverse order from the corresponding Conv nodes

The added validation will return an error if this assumption is not true.

Signed-off-by: George Nash <george.nash@intel.com>

* Do not create new execution providers in provider_test_utils

This removes the code that generated new execution providers in the
OpTester::Run function. This was added because the std::move was
leaving the `entry` value empty so subsequent calls would cause a
segfault.

Problem is this potentially changed the execution_provider because it
would create the default provider dropping any custom arguments.

When the now removed code was originally added the std::move was causing
crashes when the GradientChecker unit tests were run.  However, it is no
longer causing problems even with the code removed.

Signed-off-by: George Nash <george.nash@intel.com>

* Change the forward conv stack to a forward conv map

This changes how the forward conv kernel is mapped to the bwd ConvGrad
kernel the problematic stack is no longer used.

The convolution stack made the assumption that the corresponding
ConvGrad operator would be visited in reverse order of the forward
Conv operators.  This was always problematic and was unlikely to
work for inception models.

Important changes:
- The weight_name is added to the ConvGrad dnnl_node making it
  possible to use the weight_name as a lookup key to find the
  Conv forward Kernel
- the `std::vector fwd_conv_stack_` has been replaced with a
  `std::map fwd_conv_kernel_map_`
- Although it is not needed lock_guards were added when writing
  to and reading from the fwd_conv_kernel_map_ as well as the
  fwd_kernel_map_. These should always be accessed by a single
  thread when preparing the dnnl subgraphs so the guard should not
  be needed but its added just in case.
- Updated the comments ConvGrad.h code to no longer mention the
  stack. The error check is not removed. It will be good to verify
  there are no errors as we continue to test against more models.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>
2021-01-29 16:05:58 -08:00
Ori Levari
3a30ad7b57
handle hr error conditions (#6449)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-29 15:01:08 -08:00
Ori Levari
531eb064ab
fix sdl bugs for uninitialized variables and returns (#6450)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-29 15:00:44 -08:00
Ori Levari
76f5d9edc6
add explicit barriers for buffer overread and overrwrite (#6484)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-29 14:59:56 -08:00
Weixing Zhang
7f5731741d
Optimize GatherGrad for AMD GPU (#6381)
* optimize gathergrad

* address comments

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-01-29 13:50:08 -08:00
Suffian Khan
76bc0e479c
Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)
* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD

* enable Equal tests

* enable fast_matrix_reduction test case
2021-01-29 13:12:34 -08:00
Scott McKay
8c6d76a4c0
Update to match new test setup. (#6496)
* Update to match new test setup.

* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.
2021-01-30 06:27:19 +10:00
Tianlei Wu
8306150e0e
Refine transformers profiler output (#6502)
* output nodes in the original order; grouped by node name
* add document for profiler
2021-01-29 12:14:50 -08:00
Guoyu Wang
06a6c63434
[CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498)
* Init change

* Move some helper from nnapi ep to shared

* Add transpose support

* Fix trt ci build break
2021-01-29 11:51:40 -08:00
RandySheriffH
a19c48f5cb
Fuse cuda conv with activation (#6351)
* optimize cuda conv by fused activation

* remove needless print out

* exclude test from cpu

* handle status error from cudnn 8.x

* add reference to base class

* add hipify
2021-01-29 10:58:10 -08:00
liqunfu
71389ff9ab
nuphar test to avoid test data download to improve passing rate (#6467)
nuphar test to avoid test data download to improve passing rate
2021-01-29 10:16:42 -08:00
Tianlei Wu
d3203adc26
Update document of transformer optimization (#6487) 2021-01-29 05:47:01 -08:00
Tim Harris
066520f6c1
Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454)
Description: This PR makes two changes identified while looking at a PGAN model.

First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes.

Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer.

Motivation and Context

Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation.
2021-01-29 11:19:18 +00:00
Zhang Lei
7abb5b667f
Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325)
Support pad operator in quantization tool.
Support pad operator in quantized nhwc transformer.
Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value.
Add more test cases to cover pad() operator bug fixed here.
Fix quantization tools uint8/int8 value overflow issue when quantize weights in python.
2021-01-29 00:00:14 -08:00
suryasidd
1a5b75a554
[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py
2021-01-28 23:00:41 -08:00
Ori Levari
3b1227c5ce
SDL annotation fixes (#6448)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-28 22:34:10 -08:00
Ori Levari
21b4842c34
SDL fixes: add proper casts/format specifiers (#6446) 2021-01-28 22:33:04 -08:00
Guoyu Wang
d4e1f5ab78
Fix of support api version bug for [de]quantize (#6492) 2021-01-28 20:12:21 -08:00
Xiang Zhang
ce46f37ff2
expose learningmodelpixelrange property (#5877) 2021-01-28 15:29:55 -08:00
Guoyu Wang
3f60b27703
Speed up the Mac CI runs (#6483) 2021-01-28 15:13:44 -08:00
Sheil Kumar
ea2b560055
Fix test breaks in Windows ingestion pipeline (#6476)
* fix various build breaks with Windows build

* fix runtime errors loading libraries from system32

* add build_inbox check to winml_test_common

* use raw string

* cleanup

* fix dll load

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-01-28 14:37:15 -08:00
liqunfu
00afd00059
merge e2e with distributed pipeline (#6443)
merge e2e with distributed pipeline
2021-01-28 14:17:47 -08:00
Scott McKay
c84bb9df9f
Add ability to track per operator types in reduced build config. (#6428)
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs
2021-01-29 07:59:51 +10:00
Guoyu Wang
752627c5bb
[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)
* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.
2021-01-28 12:25:46 -08:00
baijumeswani
2e228d74d0
Increase the distributes tests pipeline timeout to 120 minutes (#6479) 2021-01-28 12:04:26 -08:00
Adam Pocock
77d0eb3f56
Fixing a leak in OnnxSequences with String keys or values. (#6473) 2021-01-28 11:28:56 -08:00
Edward Chen
d850fa63bf
Op kernel type reduction infrastructure. (#6466)
Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.
2021-01-28 07:27:19 -08:00
Changming Sun
91b19b8364
Delete nuget extra configs (#6477) 2021-01-27 20:25:45 -08:00