Commit graph

7863 commits

Author SHA1 Message Date
Ori Levari
531eb064ab
fix sdl bugs for uninitialized variables and returns (#6450)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-29 15:00:44 -08:00
Ori Levari
76f5d9edc6
add explicit barriers for buffer overread and overrwrite (#6484)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-29 14:59:56 -08:00
Weixing Zhang
7f5731741d
Optimize GatherGrad for AMD GPU (#6381)
* optimize gathergrad

* address comments

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-01-29 13:50:08 -08:00
Suffian Khan
76bc0e479c
Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504)
* Permit dense seq optimization on BERT-L pytorch export by enabling ReduceSumTraining, Equal, and NonZero on AMD

* enable Equal tests

* enable fast_matrix_reduction test case
2021-01-29 13:12:34 -08:00
Scott McKay
8c6d76a4c0
Update to match new test setup. (#6496)
* Update to match new test setup.

* Add Gemm(7) manually for now.
Will fix properly on Monday. It's used by mnist.ort as that is created by optimizing mnist.onnx to level 1 causing 2 nodes to be replaced by a Gemm and the op to be missing from the required list as that is created using the original onnx model.
2021-01-30 06:27:19 +10:00
Tianlei Wu
8306150e0e
Refine transformers profiler output (#6502)
* output nodes in the original order; grouped by node name
* add document for profiler
2021-01-29 12:14:50 -08:00
Guoyu Wang
06a6c63434
[CoreML EP] Add support for some activations/Transpose, move some shared helpers from NNAPI to shared space (#6498)
* Init change

* Move some helper from nnapi ep to shared

* Add transpose support

* Fix trt ci build break
2021-01-29 11:51:40 -08:00
RandySheriffH
a19c48f5cb
Fuse cuda conv with activation (#6351)
* optimize cuda conv by fused activation

* remove needless print out

* exclude test from cpu

* handle status error from cudnn 8.x

* add reference to base class

* add hipify
2021-01-29 10:58:10 -08:00
liqunfu
71389ff9ab
nuphar test to avoid test data download to improve passing rate (#6467)
nuphar test to avoid test data download to improve passing rate
2021-01-29 10:16:42 -08:00
Tianlei Wu
d3203adc26
Update document of transformer optimization (#6487) 2021-01-29 05:47:01 -08:00
Tim Harris
066520f6c1
Improve work distribution for Expand operator, and sharded LoopCounter configuration (#6454)
Description: This PR makes two changes identified while looking at a PGAN model.

First, it uses ThreadPool::TryParallelFor for the main parallel loops in the Expand operator. This lets the thread pool decide on the granularity at which to distribute work (unlike TrySimpleParallelFor). Profiling showed high costs when running "simple" loops with 4M iterations each of which copied only 4 bytes.

Second, it updates the sharded loop counter in the thread pool so that the number of shards is capped by the number of threads. This helps make the performance of any other high-contention "simple" loops more robust at low thread counts by letting each thread work on its own "home" shard for longer.

Motivation and Context

Profiling showed a PGAN model taking 2x+ longer with the non-OpenMP build. The root cause was that the OpenMP build uses simple static scheduling of loop iterations, while the non-OpenMP build uses dynamic scheduling. The combination of large numbers of tiny iterations is less significant with static scheduling --- although still desirable to avoid, given that each iteration incurs a std::function invocation.
2021-01-29 11:19:18 +00:00
Zhang Lei
7abb5b667f
Support pad operator in quantization and quantized nhwc transformer. Fix Pad operator bug. (#6325)
Support pad operator in quantization tool.
Support pad operator in quantized nhwc transformer.
Fix pad() operator bug when pad input's inner(right) most axis value is zero for Edge and Reflect mode, it copied wrong value to the cells to be padded. Note the Constant mode will not trigger this bug, as Edge/Reflect need copy value from the already copied array while Constant mode only fill specified value.
Add more test cases to cover pad() operator bug fixed here.
Fix quantization tools uint8/int8 value overflow issue when quantize weights in python.
2021-01-29 00:00:14 -08:00
suryasidd
1a5b75a554
[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493)
* Removed OpenVINO 2020.2 support

* Updated documentation and build.py

* Removed unnecessary libraries from setup.py
2021-01-28 23:00:41 -08:00
Ori Levari
3b1227c5ce
SDL annotation fixes (#6448)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-28 22:34:10 -08:00
Ori Levari
21b4842c34
SDL fixes: add proper casts/format specifiers (#6446) 2021-01-28 22:33:04 -08:00
Guoyu Wang
d4e1f5ab78
Fix of support api version bug for [de]quantize (#6492) 2021-01-28 20:12:21 -08:00
baijumeswani
785e51d22f
Export the model with torch.no_grad() context (#6472) 2021-01-28 18:18:26 -08:00
Xiang Zhang
ce46f37ff2
expose learningmodelpixelrange property (#5877) 2021-01-28 15:29:55 -08:00
baijumeswani
93aa72e468
Support inputs to ORTModule forward method that require gradient (#6420) 2021-01-28 15:24:26 -08:00
Guoyu Wang
3f60b27703
Speed up the Mac CI runs (#6483) 2021-01-28 15:13:44 -08:00
Sheil Kumar
ea2b560055
Fix test breaks in Windows ingestion pipeline (#6476)
* fix various build breaks with Windows build

* fix runtime errors loading libraries from system32

* add build_inbox check to winml_test_common

* use raw string

* cleanup

* fix dll load

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-01-28 14:37:15 -08:00
liqunfu
00afd00059
merge e2e with distributed pipeline (#6443)
merge e2e with distributed pipeline
2021-01-28 14:17:47 -08:00
Scott McKay
c84bb9df9f
Add ability to track per operator types in reduced build config. (#6428)
* Add ability to generate configuration that includes required types for individual operators, to allow build size reduction based on that.
  - Add python bindings for ORT format models
    - Add script to update bindings and help info
  - Add parsing of ORT format models
  - Add ability to enable type reduction to config generation
  - Update build.py to only allow operator/type reduction via config
    - simpler to require config to be generated first
    - can't mix a type aware (ORT format model only) and non-type aware config as that may result in insufficient types being enabled
  - Add script to create reduced build config
  - Update CIs
2021-01-29 07:59:51 +10:00
Guoyu Wang
752627c5bb
[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481)
* Add macos coreml CI and coreml_flags

* Move save debuggubg model to use environment var

* Move pipeline off from macos CI template

* Fix an issue building using unix make, add parallel to build script

* Fixed build break for shared_lib and cmpile warning

* Fix a compile warning

* test

* Revert the accidental push from another branch

This reverts commit 472029ba25d50f9508474c9eeceb3454cead7877.
2021-01-28 12:25:46 -08:00
baijumeswani
2e228d74d0
Increase the distributes tests pipeline timeout to 120 minutes (#6479) 2021-01-28 12:04:26 -08:00
Adam Pocock
77d0eb3f56
Fixing a leak in OnnxSequences with String keys or values. (#6473) 2021-01-28 11:28:56 -08:00
Thiago Crepaldi
237b275bd8
Enable device change during training + minor forward() refactoring (#6417) 2021-01-28 10:42:42 -08:00
Edward Chen
d850fa63bf
Op kernel type reduction infrastructure. (#6466)
Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.
2021-01-28 07:27:19 -08:00
Changming Sun
91b19b8364
Delete nuget extra configs (#6477) 2021-01-27 20:25:45 -08:00
Faith Xu
7a0ab9c450
Update pypi package metadata (#6354)
* Update setup file data

* add missing comma

* remove python 3.5

* fix typo bracket
2021-01-27 19:27:37 -08:00
Ori Levari
b6ac35fed3
use tickcount64 (#6447)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-27 15:52:35 -08:00
Yulong Wang
ed1ebd2e21
fix SDL rule (#6464) 2021-01-27 15:32:45 -08:00
Yulong Wang
1ce1a51d46
fix SDL native rule warning #6246 (#6461) 2021-01-27 15:32:34 -08:00
Adam Pocock
0100f336d7
[java] Adds support for OrtEnvironment thread pools (#6406)
* Updates for Gradle 7.

* Adding support for OrtThreadingOptions into the Java API.

* Fixing a typo in the JNI code.

* Adding a test for the environment's thread pool.

* Fix cuda test, add comment to failure.

* Updating build.gradle
2021-01-27 13:25:22 -08:00
Yufeng Li
f68eb35aed
dequantize 1st input of lstm back if it is quantized (#6444) 2021-01-27 13:13:57 -08:00
Sheil Kumar
d5f51c4033
Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460)
* update load library code to have the fullly qualified path

* make it work for syswow32

* git Revert "make it work for syswow32"

This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-01-27 12:25:03 -08:00
Guoyu Wang
c05adb1147
Initial version of CoreML EP (#6392) 2021-01-27 10:43:17 -08:00
pengwa
fd43806252
fix max norm clipping test in python packaging pipeline test (#6468)
* fix python packaging pipeline

* make clip norm test compatabile with both V100 and M60 GPUs
2021-01-28 01:09:12 +08:00
Hector Li
b5d1a49b30
Share allocator between CUDA EP & TRT EP. (#6332)
* Share allocator between CUDA EP & TRT EP.
limitation:
1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it
2. Need to have more identifiers to make it able to share CPU allocator across all EPs
2021-01-27 00:14:43 -08:00
Ryota Tomioka
9835b46a1d
Add an option to save the training graph after optimization (#6410)
* expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions`
2021-01-27 07:39:46 +00:00
Sheil Kumar
0d20104a72
only build experimental api in redist (#6465)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-01-26 22:56:30 -08:00
Tianlei Wu
afd7b8b3f7
add tool for generating test data for longformer (#6415) 2021-01-26 16:34:29 -08:00
stevenlix
76dbd88526
Expose graph ModelPath to TensorRT shared library (#6353)
* Update graph_viewer.cc

* Update tensorrt_execution_provider.cc

* Update graph_viewer.h

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update provider_api.h

* Update provider_bridge_ort.cc

* Update provider_interfaces.h

* Update provider_interfaces.h

* expose GraphViewer ModelPath API to TRT shared lib

* add modelpath to compile

* update

* add model_path to onnx tensorrt parser

* use GenerateMetaDefId to generate unique TRT kernel name

* use GenerateMetaDefId to generate unique TRT engine name

* fix issue

* Update tensorrt_execution_provider.cc

* remove GetVecHash

* Update tensorrt_execution_provider.h

* convert wchar_t to char for tensorrt parser

* update tensorrt parser to include latest changes

* fix issues

* Update tensorrt_execution_provider.cc

* merge trt parser latest change

* add PROVIDER_DISALLOW_ALL(Path)
2021-01-26 10:41:31 -08:00
Yufeng Li
7e42840298
fix null dereference warning (#6437) 2021-01-25 16:50:32 -08:00
M. Zeeshan Siddiqui
f3a0344f9a
Farewell TrainableDropout (#5793)
* Deprecate TrainableDropout kernel.

* Update bert_toy_postprocessed.onnx to opset 12.

* Add more dropout tests.

* Fix BiasDropout kernel.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
2021-01-25 16:37:42 -08:00
liqunfu
6ed12402a4
Liqun/liqun/enable pipeline parallel test2 (#6399)
* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-25 15:15:26 -08:00
Hariharan Seshadri
24f1bd6156
Minor cmake change (#6431) 2021-01-25 11:46:12 -08:00
Yufeng Li
c20965f9b2
enable pipeline to run quantization tests (#6416)
* enable pipeline to run quantization tests
setup test pipeline for quantization
2021-01-25 09:33:08 -08:00
Scott McKay
e1dc268e45
Add support for custom ops to minimal build. (#6228)
* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.
2021-01-25 10:41:00 +10:00
Ori Levari
6507b4f818
Reintroduce experimental api changes and fix remote build break (#6385)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-22 15:15:53 -08:00