Commit graph

4094 commits

Author SHA1 Message Date
Edward Chen
d850fa63bf
Op kernel type reduction infrastructure. (#6466)
Add infrastructure to support type reduction in Op kernel implementations.
Update Cast and IsInf CPU kernels to use it.
2021-01-28 07:27:19 -08:00
Changming Sun
91b19b8364
Delete nuget extra configs (#6477) 2021-01-27 20:25:45 -08:00
Faith Xu
7a0ab9c450
Update pypi package metadata (#6354)
* Update setup file data

* add missing comma

* remove python 3.5

* fix typo bracket
2021-01-27 19:27:37 -08:00
Ori Levari
b6ac35fed3
use tickcount64 (#6447)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-27 15:52:35 -08:00
Yulong Wang
ed1ebd2e21
fix SDL rule (#6464) 2021-01-27 15:32:45 -08:00
Yulong Wang
1ce1a51d46
fix SDL native rule warning #6246 (#6461) 2021-01-27 15:32:34 -08:00
Adam Pocock
0100f336d7
[java] Adds support for OrtEnvironment thread pools (#6406)
* Updates for Gradle 7.

* Adding support for OrtThreadingOptions into the Java API.

* Fixing a typo in the JNI code.

* Adding a test for the environment's thread pool.

* Fix cuda test, add comment to failure.

* Updating build.gradle
2021-01-27 13:25:22 -08:00
Yufeng Li
f68eb35aed
dequantize 1st input of lstm back if it is quantized (#6444) 2021-01-27 13:13:57 -08:00
Sheil Kumar
d5f51c4033
Bug 31463811: Servicing: Redist (Nuget) conflicts with Microsoft.AI.MachineLearning starting 21H1+ (#6460)
* update load library code to have the fullly qualified path

* make it work for syswow32

* git Revert "make it work for syswow32"

This reverts commit b9f594341b7cf07241b18d0c376af905edcabae3.

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-01-27 12:25:03 -08:00
Guoyu Wang
c05adb1147
Initial version of CoreML EP (#6392) 2021-01-27 10:43:17 -08:00
pengwa
fd43806252
fix max norm clipping test in python packaging pipeline test (#6468)
* fix python packaging pipeline

* make clip norm test compatabile with both V100 and M60 GPUs
2021-01-28 01:09:12 +08:00
Hector Li
b5d1a49b30
Share allocator between CUDA EP & TRT EP. (#6332)
* Share allocator between CUDA EP & TRT EP.
limitation:
1. Does not cover the per-thread allocator created by CUDA EP, still need to figure out the way to remove it
2. Need to have more identifiers to make it able to share CPU allocator across all EPs
2021-01-27 00:14:43 -08:00
Ryota Tomioka
9835b46a1d
Add an option to save the training graph after optimization (#6410)
* expose optimized_model_filepath in SessionOptions as `debug.graph_save_paths.model_with_training_graph_after_optimization_path` in `ORTTrainerOptions`
2021-01-27 07:39:46 +00:00
Sheil Kumar
0d20104a72
only build experimental api in redist (#6465)
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-01-26 22:56:30 -08:00
Tianlei Wu
afd7b8b3f7
add tool for generating test data for longformer (#6415) 2021-01-26 16:34:29 -08:00
stevenlix
76dbd88526
Expose graph ModelPath to TensorRT shared library (#6353)
* Update graph_viewer.cc

* Update tensorrt_execution_provider.cc

* Update graph_viewer.h

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update provider_api.h

* Update provider_bridge_ort.cc

* Update provider_interfaces.h

* Update provider_interfaces.h

* expose GraphViewer ModelPath API to TRT shared lib

* add modelpath to compile

* update

* add model_path to onnx tensorrt parser

* use GenerateMetaDefId to generate unique TRT kernel name

* use GenerateMetaDefId to generate unique TRT engine name

* fix issue

* Update tensorrt_execution_provider.cc

* remove GetVecHash

* Update tensorrt_execution_provider.h

* convert wchar_t to char for tensorrt parser

* update tensorrt parser to include latest changes

* fix issues

* Update tensorrt_execution_provider.cc

* merge trt parser latest change

* add PROVIDER_DISALLOW_ALL(Path)
2021-01-26 10:41:31 -08:00
Yufeng Li
7e42840298
fix null dereference warning (#6437) 2021-01-25 16:50:32 -08:00
M. Zeeshan Siddiqui
f3a0344f9a
Farewell TrainableDropout (#5793)
* Deprecate TrainableDropout kernel.

* Update bert_toy_postprocessed.onnx to opset 12.

* Add more dropout tests.

* Fix BiasDropout kernel.

Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Sergii Dymchenko <sedymche@microsoft.com>
2021-01-25 16:37:42 -08:00
liqunfu
6ed12402a4
Liqun/liqun/enable pipeline parallel test2 (#6399)
* enable data and pipeline parallism test

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-01-25 15:15:26 -08:00
Hariharan Seshadri
24f1bd6156
Minor cmake change (#6431) 2021-01-25 11:46:12 -08:00
Yufeng Li
c20965f9b2
enable pipeline to run quantization tests (#6416)
* enable pipeline to run quantization tests
setup test pipeline for quantization
2021-01-25 09:33:08 -08:00
Scott McKay
e1dc268e45
Add support for custom ops to minimal build. (#6228)
* Add support for custom ops to minimal build.
Cost is only ~8KB so including in base minimal build.
2021-01-25 10:41:00 +10:00
Ori Levari
6507b4f818
Reintroduce experimental api changes and fix remote build break (#6385)
Co-authored-by: Ori Levari <orlevari@microsoft.com>
2021-01-22 15:15:53 -08:00
M. Zeeshan Siddiqui
3c3d36334b
Continue memory planning when unknown shape tensor is encountered. (#6413) 2021-01-22 14:37:53 -08:00
Edward Chen
61ecf52c24
Fix generate_submodule_cgmanifest.py Windows issues. (#6404) 2021-01-22 12:25:55 -08:00
ashbhandare
60c772e2bc
Megatron checkpointing (#6293)
* Add bart fairseq run script

* Add frontend change to enable megatron

* Initial changes for checkpointing

* Megatron optim state loading, checkpoint aggregation, frontend distributed tests for H, D+H

* Add load_checkpoint changes

* Fix CI

* Cleanup

* Fix CI

* review comments

* review comments

* review comments:
2021-01-22 11:26:47 -08:00
S. Manohar Karlapalem
4442d94c6c
OpenVino docker file changes to bypass privileged mode
Description: Builds and installs libusb without UDEV support, which is used for communicating with the VPU device.

Motivation and Context

This enables the resulting docker container to be run without '--privileged' and '--network host' options which may not be suitable in deployment environments.
2021-01-22 09:43:47 -08:00
Changming Sun
bba185a582
Fix some compile warnings (#6316) 2021-01-21 16:40:42 -08:00
Martin Man
98cc7b5a9e
Load the model path correctly (#6369) 2021-01-21 09:23:50 -08:00
Vincent Wang
99a38f4023
fix build on cuda11 (#6394)
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
2021-01-21 16:37:07 +08:00
Guoyu Wang
eb946c4177
Unblock Android CI code coverage failure (#6393) 2021-01-20 21:26:10 -08:00
Hariharan Seshadri
8574854d23
[Perf] Optimize Tile CPU and CUDA kernels for a corner case (#6376) 2021-01-20 17:52:32 -08:00
Hariharan Seshadri
d9e4795385
Fix Windows x86 compiler warnings in the optimizers project (#6377) 2021-01-20 17:50:16 -08:00
Hariharan Seshadri
33f60a06d5
Dont use default string marshalling in C# (#6219) 2021-01-20 17:44:36 -08:00
Wenbing Li
69af0440b1
Add the custom op project information (#6334) 2021-01-20 15:23:24 -08:00
pengwa
453431f7bb
Add max_norm for gradient clipping. (#6289)
* add max_norm as user option for gradient clipping

* add adam and lamb test cases for clip norm

* add frontend tests
2021-01-21 01:01:11 +08:00
Hariharan Seshadri
a1b5bfc4f8
Fix SDL warning (#6390) 2021-01-20 08:35:42 -08:00
Hariharan Seshadri
d7bdd96425
Refine auto_pad based pad computation in ConvTranspose (#6305) 2021-01-19 19:01:49 -08:00
Ye Wang
ac36596fb8
fix convert_common version retrival (#6382) 2021-01-19 13:56:34 -08:00
Tianlei Wu
baac7c91e2
Support MLFloat16 in CumSum Cuda op for Opset 14 (#6355)
* Add CumSum-14 for Cuda
2021-01-19 09:55:44 -08:00
wezuo
5b6753ce27
Wezuo/memory analysis (#5658)
* merged alloc_plan

* pass compilation

* Start running, incorrect allocation memory info

* add in comments

* fix a bug of recording pattern too early.

* debugging lifetime

* fix lifetime

* passed mnist

* in process of visualization

* Add code to generate chrome trace for allocations.

* in process of collecting fragmentation

* before rebuild

* passed mnist

* passed bert tiny

* fix the inplace reuse

* fix the exception of weight in pinned memory

* add guards to ensure the tensor is in AllocPlan

* add customized profiling

* debugging

* debugging

* fix the reuse of differnt location type

* add rank

* add the rank

* add fragmentation

* add time_step_trace

* Add summary for each execution step (total bytes, used/free bytes).

* add top k

* change type of top k parameter

* remove prints

* change heap to set{

* add the name pattern

* add the useage for pattern

* add partition

* change to static class

* add custom group

* remove const

* update memory_info

* in process of adding it as runtime config

* change the memory profiling to be an argument

* add some comments

* add checks to recored meomry_info in traaining session

* set the "local rank setting" to correct argument.

* addressing comments

* format adjustment

* formatting

* remove alloc_interval

* update memory_info.cc to skip session when there is no tensor for a particular memory type

* fix memory_info multiple iteration seg-fault

* consolidate mainz changes

* fixed some minor errors

* guard by ORT_MINIMAL_BUILD

* add ORT_MEMORY_PROFILE flag

* added compiler flag to turn on/off memory profiling related code

* clean up the code regarding comments

* add comments

* revoke the onnx version

* clean up the code to match master

* clean up the code to match master

* clean up the code to match master

Co-authored-by: Jesse Benson <benson.jesse@gmail.com>
Co-authored-by: Wei Zuo <wezuo@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-mgtbby.eastus.cloudapp.azure.com>
Co-authored-by: wezuo <wezuo@az-eus-v100-32gb-5-worker-yclzsf.eastus.cloudapp.azure.com>
2021-01-19 08:30:55 -08:00
Ryan Lai
4db4982a5e
This added telemetry isn't needed (#6363) 2021-01-15 16:36:59 -08:00
stevenlix
eab164e1a5
Add python example of TensorRT INT8 inference on ResNet model (#6255)
* add trt int8 example on resnet model

* Update e2e_tensorrt_resnet_example.py

* remove keras dependency and update class names

* move ImageNetDataReader and ImageClassificationEvaluator to tensorrt resnet example

* simplify e2e_tensorrt_resnet_example.py

* Update preprocessing.py

* merge tensorrt_calibrate

* Update calibrate.py

* Update calibrate.py

* generalize calibrate

* Update calibrate.py

* fix issues

* fix formating

* remove augment_all
2021-01-15 09:59:56 -08:00
Ashwini Khade
f5a4f7fc2a
fix -Wdangling-gsl (#6357) 2021-01-15 09:30:41 -08:00
Pranav Sharma
c8e37e3a36
Fix one more SDL warning (#6359) 2021-01-15 09:22:41 -08:00
Ryan Lai
961bb62ae4
Add create session to WinML telemetry to track WinML Usage (#6356) 2021-01-14 22:42:55 -08:00
Wei-Sheng Chin
8ce252caa9
Pipeline Parallel Experimental Python API (#5815) 2021-01-15 12:07:28 +08:00
Dmitri Smirnov
6d0fb3ebb3
Java: Set C language warnings to W4 and adjust JNI code (#6347)
Set /W3 for C language and fix up JNI warnings.
2021-01-14 15:04:47 -08:00
Scott McKay
e54e2f969d
Use readelf for minimal build binary size checks. (#6338)
* Use readelf for minimal build binary size checks.
The on-disk size grows in 4KB chunks which makes it hard to see how much growth an individual checkin causes.
Only downside is that the sum of the sections is larger than the on-disk size (assumably things get packed smaller on disk and some of the section alignment constraints can be ignored)

* Remove unused function
2021-01-15 07:46:02 +10:00
Ye Wang
5d9552cc8b
fix longformer benchmark io_binding output_buffers (#6345)
* fix longformer benchmark io_binding output_buffers

* format

* import benchmark_helper from parent directory.
2021-01-14 11:29:31 -08:00