Commit graph

2633 commits

Author SHA1 Message Date
Tracy Sharpe
3f7b97a63d
MLAS: more code cleanup (#4101)
Cleanup vector intrinsics, optimized SSE quantized GEMM.
2020-06-01 21:19:42 -07:00
Changming Sun
08e5f89b37
Fix the nuget gpu pipeline (#4106) 2020-06-01 20:42:15 -07:00
Dmitri Smirnov
afca0d15ee
Create Java publishing pipeline (#3944)
Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.
2020-06-01 18:18:57 -07:00
Dwayne Robinson
51d78bc5e6
Fix DML EP doc link to C API (#4105)
Path used "\" instead of "/".
2020-06-01 16:49:17 -07:00
Pranav Sharma
6c1b2f33b7
Fix crash reported in #4070. (#4091)
* Fix crash reported in #4070.

* Add newline to warning message

* Add comment for using cout instead of the logger
2020-06-01 15:27:14 -07:00
Cecilia Liu
8813d205cc
Update GPT2 Model Benchmark Script to Support IO Binding (#4088)
GPT2 benchmark support io binding
2020-06-01 15:07:48 -07:00
edgchen1
ba74914c5a
Remove evaluation output from training e2e test baseline data. (#4092) 2020-06-01 15:06:21 -07:00
Changming Sun
3eaec57c38
Fix the daily pipeline failures (#4084)
1. Fix the nuget cpu pipeline and put code coverage pipeline back.
2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now.
3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.
2020-06-01 14:44:49 -07:00
Derek Murray
f54518bae9
Actually switch the spdlog submodule to the master branch. (#4100)
This is a follow-up to #4087, which did not fix the whole problem.

Fixes #4077.

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-06-01 14:32:16 -07:00
ytaous
72d508b7a0
New perf metric - e2e throughput (#4085)
* new metric

* on comments

* tab to spaces

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-01 12:11:34 -07:00
Ashwini Khade
70d91a8550
re-enable graph optimizations during build phase (#4044)
* re-enable graph optimizations during build phase

* fix

* re-enable optimizers for all provider tests
2020-06-01 10:32:42 -07:00
Changming Sun
ff16ca54e1
Fix the flake8 warning in generate_nuspec_for_native_nuget.py (#4089) 2020-06-01 10:32:22 -07:00
edgchen1
a715d55bcc
Training Python package fixes (#4063)
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
2020-06-01 09:30:56 -07:00
Derek Murray
9d748afff1
Set spdlog submodule branch to "master" explicitly. (#4087)
The default branch for the spdlog repository on GitHub recently changed from "master"
to "v1.x", which has a different API for `syslog_sink::syslog_sink()`. This breaks
builds of the server for anyone who has checked out the submodules since that change.

Fixes #4077.

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-05-29 17:53:40 -07:00
Scott McKay
1d441f89ac
Re-enable PEP8 check in Win CI build (#4075)
* Add flake8 to Win CI build so it's re-enabled. It was in the static analysis build that is currently disabled so checks are not running.
Fix build.py to be compliant again.
Add prefix to flake8 output so it's (hopefully) easier to identify the errors in build output.

* Add to all builds in Windows CPU CI so they all fail quickly if there's an issue.
2020-05-30 09:10:05 +10:00
Scott McKay
b85805ed01
Handle edge case with implicit input and multiple levels of subgraphs (#4031)
* Handle edge case where an implicit input for a subgraph may not get wired in correctly.

Conditions required:
  - two or more levels of nested subgraph
  - an implicit input from above the bottom two levels is used in both levels of subgraph
    - this creates a NodeArg for the implicit input at both levels
  - something changes to the first level subgraph to no longer use the implicit input
    - could be constant folding, could be partitioning of nodes results in a copy of the implicit input being made to a different device

When that occurs we lose the wiring through to the second level of nested subgraph as there's a NodeArg in the first level but the implicit input is no longer used there. Fix that by doing a final check for outer scope values once we know all the outputs produced by the current graph.

Found by commenting out the CUDA implementations of the control flow nodes and running ssd_mobilenet_300 from the mlperf models.

* Add test case.
2020-05-30 07:08:21 +10:00
Sheil Kumar
c331d8cffc
WinML custom operator header is missing from nuget package. (#4083)
* publish mloperatorauthor.h in the nuget

* build dmlep into arm/arm64 builds

* update to not use --use_dml everywhere, but enable custom ops everywhere

* always download directml nuget in winml builds

* always build with dml

* dont build dml for arm

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-05-29 13:24:22 -07:00
Linnea May
6c7eaff676
fixed typo in readme (#4076) 2020-05-29 12:39:28 -07:00
Tixxx
6404aba5ae
Orttraining rc1 master merge (#4080)
* fixed seg fault when using concrete shape
disable gradient as output

* fix evaluation hang issue for multiple gpu run

* Remove dead code, ORTModel and improve docstrings (#3814)

* Refine ORTTrainer docstring descriptions (#3907)
2020-05-29 12:28:12 -07:00
Wei-Sheng Chin
e951b29a0b
Fix a macro and memory regression (#4068)
onnxruntime_training_bert can run the following command again.

./onnxruntime_training_bert --model_name=bert-large-uncased_L_24_H_1024_A_16_V_30528_S_512_Dp_0.1_optimized_layer_norm --num_train_steps=16 --train_batch_size=52 --mode=train --train_data_dir=/bert_data/128/books_wiki_en_corpus/train --test_data_dir=/bert_data/128/books_wiki_en_corpus/test --gradient_accumulation_steps=16 --optimizer=Lamb --learning_rate=3e-3 --max_seq_length=128 --max_predictions_per_seq=20 --warmup_ratio=0.2843 --warmup_mode=Poly --display_loss_steps=100  --use_mixed_precision=True --allreduce_in_fp16 --use_nccl
2020-05-29 09:24:40 -07:00
edgchen1
38d76cc904
Clean up training E2E test (#4078)
Update training E2E build to not go through CTest and call test scripts directly.
2020-05-29 09:20:47 -07:00
Prabhat
dd43623da2
Remove ONNX from requirements.txt (#4073)
* Avoid installing ONNX package on aarch64

* Removed onnx from requirements

* Add note in backend.py
2020-05-29 21:44:20 +05:30
KeDengMS
348ed698ec
Add more symbolic compute support in symbolic shape inference (#4057)
* Add more symbolic compute support in symbolic shape inference

* Refinements
2020-05-29 02:00:30 -07:00
Scott McKay
2a96be83f6
skottmckay/bugfix/SubgraphInput (#4004)
Description:
Fix 2 edge cases as described here: #3755 (comment)

Create a NodeArg for subgraph inputs even if they have no type. If they are only used as an implicit input to another level of nested subgraph we will not create a NodeArg via any other path

Allow an If output to have no shape. Obscure edge case where a loop carried dependency to a Loop node is passed through a nested If node subgraph (i.e. the Loop subgraph contains an If node with a nested subgraph for the else_branch/then_branch). We can't infer a shape for a loop carried dependency (they may change across iterations), which means we can't infer a shape for the nested If subgraph output either. We have delayed allocation support for If outputs so use that.

Motivation and Context
#3755
2020-05-29 14:48:07 +10:00
Hariharan Seshadri
c55634d2e6
Fix initial value of loop variable in RNN op (#4055) 2020-05-28 19:19:39 -07:00
pengwa
6d03470587
Add e2e measurement for training (#4049)
* add e2e measurement
2020-05-29 10:08:29 +08:00
Yufeng Li
26be762b35
Make CPU QuantizeLinear support optional zero point (#4065)
* Disable DequantizeLinear_Without_Zero_Point test for nGraph
* make quantizelinear support optional zero point
2020-05-28 14:33:26 -07:00
Tianlei Wu
60fa4b1f90
Update benchmark of gpt2 model with past state (#4043)
* update benchmark_gpt2 to use past state only
* update dynamic axes of input/output tensors
* Remove --use_openmp option since it is default for onnxruntime 1.3 cpu.
* Use same option names as benchmark.py
2020-05-28 13:55:43 -07:00
Ryan Lai
ed0a8e5b5c
Enable disabled tests and add fixed model (#4059)
Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2020-05-28 13:24:12 -07:00
Brian Martin
279f9aa865
Update WinRT_API.md to reflect 1.3 release (#4074)
fix broken link, add new release to the release table, and point to the 1.3 nuget package
2020-05-28 11:01:49 -07:00
Changming Sun
c94d9685b6
Fix a problem in StacktraceTests::BasicTests (#4069)
result.size() could be zero, in this case, we shouldn't access result[0]
2020-05-28 10:06:16 -07:00
Changming Sun
a859dc422c
Delete google::protobuf::io::FileInputStream class from our source code (#4067)
This class is already part of the protobuf-lite library. We don't need a copy here.
And if we do, we must ensure the signature of every function is exactly the same as the original. However, the upstream code may get changed over time. For example, recently protobuf added a "const" modifier to the FileInputStream::GetErrno(), which may break the build if a user want to use the latest protobuf.
2020-05-28 10:05:47 -07:00
Faith Xu
1e82ecfd5c
Fix link in readme (#4058) 2020-05-28 06:57:58 -07:00
Tianlei Wu
7f750b65ce
support model > 2GB in transformer optimizer (#4038)
* Enable optimizer on models with external data (>2GB)
* Refactoring optimizer: move fusion to separate file
* Update benchmark: (1) output datatime to csv (2) Add option --onnx_dir to benchmark.py for onnx model directory path (3) add gpt2-large (4) loose thrsholds for fp16 validation
* update optimizer (1) Add attribute of ConstantOfShape in fp16 conversion (2) Use OnnxRuntime level 1 optimization
* update bert_perf_test.py: rename --input_ids to --input_ids_name
2020-05-28 01:16:41 -07:00
edgchen1
9f7d245446
Add noexcept to various OrtCallback utility class methods to fix warnings. (#4056) 2020-05-27 18:03:58 -07:00
Yufeng Li
23c313cb73
fix crash in dequantizelinear/quantizelinear for optional zero point (#4047)
fix the issue #4032 and #3802 in OnnxRuntime side. For the quantizeLinear, there also needs a fix in ONNX type inference. Will do that in ONNX repo.
2020-05-27 17:11:55 -07:00
liqunfu
6665d5e2bc
Liqun/a transformer example (#3845)
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model

Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-05-27 15:21:35 -07:00
Matthieu Darbois
a983509ed3
Pad: Add support for all datatypes in opset-11 spec (#4021)
* Pad: Add support for all datatypes in opset-11 spec

Pad opset-11 implementation supports:
int32, int64, float & double

Per specification, Pad opset-11 also supports:
uint8, uint16, uint32, uint64, int8, int16 & float16

This commit add support for those types to get full coverage of Pad opset-11 operator.

* Pad: Remove 16-bit datatypes support

These types are unused at the moment and binary size is impacted. Remove support for those type to lower binary size.
2020-05-28 08:05:13 +10:00
Tianlei Wu
930c6a59da
Allow optional cast in embed layer norm be optional. (#4040) 2020-05-27 14:55:03 -07:00
Yulong Wang
b3ec8035ee
[Node.js binding] add build flag for node.js binding (#3948) 2020-05-27 13:30:22 -07:00
edgchen1
ee6371d0a8
Clean up CUDAExecutionProvider's associated PerThreadContexts on destruction (#4017)
Clean up a CUDAExecutionProvider's associated PerThreadContext instances when that CUDAExecutionProvider is destroyed.

Revert workaround (introduced in #3767) to lazily initialize CUDA handles to avoid segmentation fault. For that case, the CUDA handle cleanup was happening quite a bit later than the CUDAExecutionProvider destructor. This should be a cleaner way to fix that.
2020-05-27 11:01:43 -07:00
Xueyun Zhu
633008b5ef
Add pipeline online partition logic for pipeline (#3996)
* online partition

* fix when multiple consumer nodes is in cut info

* fix windows build

* address feedback

* adding test

* feedback

* address feedback

* add parser for cut edge

* windows build
2020-05-26 17:44:09 -07:00
Tracy Sharpe
0d8abc1a99
MLAS: qgemm refactoring (#4030)
Treat U8U8 as U8S8 for VNNI for performance and optimize SSE2 kernel.
2020-05-26 17:27:32 -07:00
Tianlei Wu
abcd1576c9
Add Linux bash and Windows batch scripts for running transformers benchmarks (#3997) 2020-05-26 16:42:12 -07:00
Cecilia Liu
212efb6cde
Match New Pattern for Reshape Fusion (#3931)
Fuse reshape subgraph.
2020-05-26 14:10:42 -07:00
Paul Fultz II
7759136610
Add amd migraphx execution provider to onnx runtime (#2929)
* Add amd migraphx execution provider to onnx runtime

* rename MiGraphX to MIGraphX

* remove unnecessary changes in migraphx_execution_provider.cc

* add migraphx EP to tests

* add input requests of the batchnorm operator

* add to support an onnx operator PRelu

* update migrapx dockerfile and removed one unused line

* sync submodules with mater branch

* fixed a small bug

* fix various bugs to run msft real models correctly

* some code cleanup

* fix python file format

* fixed a code style issue

* add default provider for migraphx execution provider

Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
2020-05-27 04:24:59 +08:00
Vincent Wang
9d0534c0eb
Optimize OneHot CUDA Kernel (#4012)
* Optimize for OneHot with zero off value.

* Add test cases for indices out of range.

Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-05-26 18:12:11 +08:00
Changming Sun
0a6d9dd301 Remove Openmp from the GPU docker files 2020-05-25 14:17:48 -07:00
Changming Sun
30efe65e95 Add use_openmp back to the docker files 2020-05-25 14:17:48 -07:00
Wenhao Hu
bd8993cb15 remove --use_openmp in build.sh 2020-05-25 14:17:48 -07:00