Commit graph

2651 commits

Author SHA1 Message Date
Miguel de Icaza
ea368f69db Add Swift/macOS sample, a port of the Windows MNist sample 2020-06-05 21:16:41 -07:00
Yulong Wang
2e58097f8f
fix build: pipeline Node.js version to 12.16.3 (#4145) 2020-06-05 17:56:03 -07:00
Bowen Bao
1e5307d458
Bug fix for parameter names of models not using wrapper (#4061)
* bug fix for models not using wrapper

* add test case for no wrapper case

* update test case to use internal learning rate

* fix bug with frozen weight update
2020-06-05 12:03:38 -07:00
Scott McKay
9790e19424
Handle mem pattern allocation failure better. Make BFCArena behavior more consistent (#4062)
* Fixes from investigating issue running BERT-Squad model with larger batch sizes. When the batch size gets large enough the initial run will be successful (no memory pattern in use) but the second will fail to allocate the memory pattern block.

The cause of this failure is that we still have the smaller blocks from the first run allocated, as BFCArena has no logic to free those. This essentially results in 2x the memory being required to run the model.

There was inconsistency in BFCArena::Extend which on one path threw an exception if it couldn't do the allocation, and on another just returned false (resulting in Alloc returning a nullptr). Make the behavior consistent by always throwing if BFCArena fails to find a buffer to return. There are a huge number of places in the code where we assume Alloc returns a valid pointer so throwing will result in more correct behavior as a whole. It's also consistent with what happens when CUDA or the standard library fails to allocate memory.

Next, update ExecutionFrame to check for this failure and not insert a memory block entry if it happens. With the existing code if BFCArena Alloc returned a nullptr we happily inserted that in the blocks, delaying detection of the failure to when we attempted to use the block in AllocateMLValueTensorSelfOwnBufferHelper.

Finally update AllocateMLValueTensorSelfOwnBufferHelper to expect a location may not have a block. A log message will be provided when the block allocation fails so it's not necessary to have more on each individual allocation that would have used the block. Falls through to default behavior of doing a normal allocation.
2020-06-05 18:54:01 +10:00
Thiago Crepaldi
81101c9efd
Fix DropoutGrad op (#4052)
Dropout op was recently changed to accept a new input named
'training_mode', which is passed in to DropoutGrad automatically.

This PR updates the DropoutGrad schema to accommodate the new input.
Tests were also update to reflect the API change

Co-authored-by: Thiago Crepaldi <thiag.crepaldi@microsoft.com>
2020-06-04 15:00:02 -07:00
Dmitri Smirnov
6199ef1375 Change group id to com.microsoft.onnxruntime per requirements. 2020-06-03 22:30:13 -07:00
Scott McKay
16cef90e29
General enhancements/cleanups to test exes (#4109)
* General enhancements/cleanups to test exes
  - Support running onnxruntime_perf_test with no output file
    - if you're profiling the output file is often unused and can be very large
  - Allow failure to override early success if doing multiple runs of a test using running onnx_test_runner
    - e.g. if the second run fails that's more important as a final status
  - Clarify ownership semantics
  - Cleanup naming, line lengths, usage of references for required parameters etc.
2020-06-04 07:01:39 +10:00
Yufeng Li
197da135eb
Implement quantized Attention on cpu (#4111)
* Implement QAttention on CPU
* support QAttention in quantization tool
* refine attention code
* add more unit tests
2020-06-03 13:42:00 -07:00
Andrews548
62b44527e5
Add ArmNN Execution Provider (#3714)
* Add ArmNN Execution Provider

Add a new execution provider targeting Arm architecture based on ArmNN.
Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models.

reviewed-by: mike.caraman@nxp.com

* Minor fixes

- renamed onnxruntime_ARMNN_RELU_USECPU to onnxruntime_ARMNN_RELU_USE_CPU
- fixed acl typo

* remove extra includes. added exception for ArmNN in test

* fix indentation

* Separated the activation implementation from the cpu and fixed the blockage from the endif

Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>
2020-06-03 22:57:51 +05:30
Scott McKay
62af8da3f6 Use OrtMutex and OrtCondVar everywhere instead of std::mutex/std::condition_variable for consistency.
Needed to change the MissingTrack enum naming due to ort_mutex.h including Windows.h which #defines TRUE and FALSE (via inclusion of fdi_fci_types.h), breaking usage of MissingTrack::TRUE and MissingTrack::FALSE.
2020-06-03 08:42:16 -07:00
liqunfu
905c535626
still need to make the test stable. Lower the acc number a bit to make the test pass for now (#4117)
Co-authored-by: liqun fu <liqun@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-06-02 21:37:48 -07:00
KeDengMS
d63b90538e
Symbolic shape inference exit on models without onnx opset used (#4090)
* Symbolic shape inference exit on models without onnx opset used

* Temporary fix for ConvTranspose with symbolic input dims

Co-authored-by: Changming Sun <me@sunchangming.com>
2020-06-02 19:39:46 -07:00
KeDengMS
6f8a4f4cad Fix Nuphar test failure 2020-06-02 18:03:38 -07:00
KeDengMS
32d8a76f2f Fix Nuphar build in gcc 7 (Ubuntu 18.04) 2020-06-02 18:03:38 -07:00
ashbhandare
f18a99b245
Exclude non-trainable torch buffers from trainable weights (#4099)
* Initial changes

* Removed redundant fix

* Revert unintended formatting change.

* Add unit test
2020-06-02 14:05:44 -07:00
Faith Xu
e5cec7237d
Clarify telemetry collection (#4102) 2020-06-02 13:12:27 -07:00
S. Manohar Karlapalem
baa0697982
[OpenVINO-EP] Add missing dependency libs in Dockerfile (#4064)
* Fixed libjson-c_dev_fix and Updated Readme

* Fix VAD-M naming inconsistency in docs

* Avoid removal of sudo in install_common_deps

* Remove 'sudo' for wget in install_common_deps.sh for dockerfiles

'sudo' is not required, and hinders running script from within
proxy environments. Removing it also makes lines consistent with
each other (there are other wget lines without sudo).

Co-authored-by: gundaarx <mayax.vijayan@intel.com>
2020-06-02 02:42:58 -07:00
Yulong Wang
647a886587
[Nodejs binding] create a new pipeline to generate signed binaries (#4104)
* add yml files

* update pipeline

* fix yaml syntax

* yaml pop BuildCSharp

* udpate yaml

* do not stage codesign summary
2020-06-02 01:28:05 -07:00
Tracy Sharpe
3f7b97a63d
MLAS: more code cleanup (#4101)
Cleanup vector intrinsics, optimized SSE quantized GEMM.
2020-06-01 21:19:42 -07:00
Changming Sun
08e5f89b37
Fix the nuget gpu pipeline (#4106) 2020-06-01 20:42:15 -07:00
Dmitri Smirnov
afca0d15ee
Create Java publishing pipeline (#3944)
Create CPU and GPu Java publishing pipelines. Final jars are tested on all platforms. However, signing and publishing to maven are manual steps.
2020-06-01 18:18:57 -07:00
Dwayne Robinson
51d78bc5e6
Fix DML EP doc link to C API (#4105)
Path used "\" instead of "/".
2020-06-01 16:49:17 -07:00
Pranav Sharma
6c1b2f33b7
Fix crash reported in #4070. (#4091)
* Fix crash reported in #4070.

* Add newline to warning message

* Add comment for using cout instead of the logger
2020-06-01 15:27:14 -07:00
Cecilia Liu
8813d205cc
Update GPT2 Model Benchmark Script to Support IO Binding (#4088)
GPT2 benchmark support io binding
2020-06-01 15:07:48 -07:00
edgchen1
ba74914c5a
Remove evaluation output from training e2e test baseline data. (#4092) 2020-06-01 15:06:21 -07:00
Changming Sun
3eaec57c38
Fix the daily pipeline failures (#4084)
1. Fix the nuget cpu pipeline and put code coverage pipeline back.
2. Reduce onnx_test_runner's default logging level from WARNING to ERROR. Because there are too many log messages now.
3. Enlarge the protobuf read buffer size for onnx_test_runner. It was missed from PR #4020.
2020-06-01 14:44:49 -07:00
Derek Murray
f54518bae9
Actually switch the spdlog submodule to the master branch. (#4100)
This is a follow-up to #4087, which did not fix the whole problem.

Fixes #4077.

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-06-01 14:32:16 -07:00
ytaous
72d508b7a0
New perf metric - e2e throughput (#4085)
* new metric

* on comments

* tab to spaces

Co-authored-by: Ethan Tao <ettao@microsoft.com>
2020-06-01 12:11:34 -07:00
Ashwini Khade
70d91a8550
re-enable graph optimizations during build phase (#4044)
* re-enable graph optimizations during build phase

* fix

* re-enable optimizers for all provider tests
2020-06-01 10:32:42 -07:00
Changming Sun
ff16ca54e1
Fix the flake8 warning in generate_nuspec_for_native_nuget.py (#4089) 2020-06-01 10:32:22 -07:00
edgchen1
a715d55bcc
Training Python package fixes (#4063)
- Add support for ENABLE_LANGUAGE_INTEROP_OPS in training build which is enabled for nightly builds
- Fix passing of environment variables to `sudo docker run` in build definitions
- Fix setup.py package naming logic
2020-06-01 09:30:56 -07:00
Derek Murray
9d748afff1
Set spdlog submodule branch to "master" explicitly. (#4087)
The default branch for the spdlog repository on GitHub recently changed from "master"
to "v1.x", which has a different API for `syslog_sink::syslog_sink()`. This breaks
builds of the server for anyone who has checked out the submodules since that change.

Fixes #4077.

Co-authored-by: Derek Murray <demurra@microsoft.com>
2020-05-29 17:53:40 -07:00
Scott McKay
1d441f89ac
Re-enable PEP8 check in Win CI build (#4075)
* Add flake8 to Win CI build so it's re-enabled. It was in the static analysis build that is currently disabled so checks are not running.
Fix build.py to be compliant again.
Add prefix to flake8 output so it's (hopefully) easier to identify the errors in build output.

* Add to all builds in Windows CPU CI so they all fail quickly if there's an issue.
2020-05-30 09:10:05 +10:00
Scott McKay
b85805ed01
Handle edge case with implicit input and multiple levels of subgraphs (#4031)
* Handle edge case where an implicit input for a subgraph may not get wired in correctly.

Conditions required:
  - two or more levels of nested subgraph
  - an implicit input from above the bottom two levels is used in both levels of subgraph
    - this creates a NodeArg for the implicit input at both levels
  - something changes to the first level subgraph to no longer use the implicit input
    - could be constant folding, could be partitioning of nodes results in a copy of the implicit input being made to a different device

When that occurs we lose the wiring through to the second level of nested subgraph as there's a NodeArg in the first level but the implicit input is no longer used there. Fix that by doing a final check for outer scope values once we know all the outputs produced by the current graph.

Found by commenting out the CUDA implementations of the control flow nodes and running ssd_mobilenet_300 from the mlperf models.

* Add test case.
2020-05-30 07:08:21 +10:00
Sheil Kumar
c331d8cffc
WinML custom operator header is missing from nuget package. (#4083)
* publish mloperatorauthor.h in the nuget

* build dmlep into arm/arm64 builds

* update to not use --use_dml everywhere, but enable custom ops everywhere

* always download directml nuget in winml builds

* always build with dml

* dont build dml for arm

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2020-05-29 13:24:22 -07:00
Linnea May
6c7eaff676
fixed typo in readme (#4076) 2020-05-29 12:39:28 -07:00
Tixxx
6404aba5ae
Orttraining rc1 master merge (#4080)
* fixed seg fault when using concrete shape
disable gradient as output

* fix evaluation hang issue for multiple gpu run

* Remove dead code, ORTModel and improve docstrings (#3814)

* Refine ORTTrainer docstring descriptions (#3907)
2020-05-29 12:28:12 -07:00
Wei-Sheng Chin
e951b29a0b
Fix a macro and memory regression (#4068)
onnxruntime_training_bert can run the following command again.

./onnxruntime_training_bert --model_name=bert-large-uncased_L_24_H_1024_A_16_V_30528_S_512_Dp_0.1_optimized_layer_norm --num_train_steps=16 --train_batch_size=52 --mode=train --train_data_dir=/bert_data/128/books_wiki_en_corpus/train --test_data_dir=/bert_data/128/books_wiki_en_corpus/test --gradient_accumulation_steps=16 --optimizer=Lamb --learning_rate=3e-3 --max_seq_length=128 --max_predictions_per_seq=20 --warmup_ratio=0.2843 --warmup_mode=Poly --display_loss_steps=100  --use_mixed_precision=True --allreduce_in_fp16 --use_nccl
2020-05-29 09:24:40 -07:00
edgchen1
38d76cc904
Clean up training E2E test (#4078)
Update training E2E build to not go through CTest and call test scripts directly.
2020-05-29 09:20:47 -07:00
Prabhat
dd43623da2
Remove ONNX from requirements.txt (#4073)
* Avoid installing ONNX package on aarch64

* Removed onnx from requirements

* Add note in backend.py
2020-05-29 21:44:20 +05:30
KeDengMS
348ed698ec
Add more symbolic compute support in symbolic shape inference (#4057)
* Add more symbolic compute support in symbolic shape inference

* Refinements
2020-05-29 02:00:30 -07:00
Scott McKay
2a96be83f6
skottmckay/bugfix/SubgraphInput (#4004)
Description:
Fix 2 edge cases as described here: #3755 (comment)

Create a NodeArg for subgraph inputs even if they have no type. If they are only used as an implicit input to another level of nested subgraph we will not create a NodeArg via any other path

Allow an If output to have no shape. Obscure edge case where a loop carried dependency to a Loop node is passed through a nested If node subgraph (i.e. the Loop subgraph contains an If node with a nested subgraph for the else_branch/then_branch). We can't infer a shape for a loop carried dependency (they may change across iterations), which means we can't infer a shape for the nested If subgraph output either. We have delayed allocation support for If outputs so use that.

Motivation and Context
#3755
2020-05-29 14:48:07 +10:00
Hariharan Seshadri
c55634d2e6
Fix initial value of loop variable in RNN op (#4055) 2020-05-28 19:19:39 -07:00
pengwa
6d03470587
Add e2e measurement for training (#4049)
* add e2e measurement
2020-05-29 10:08:29 +08:00
Yufeng Li
26be762b35
Make CPU QuantizeLinear support optional zero point (#4065)
* Disable DequantizeLinear_Without_Zero_Point test for nGraph
* make quantizelinear support optional zero point
2020-05-28 14:33:26 -07:00
Tianlei Wu
60fa4b1f90
Update benchmark of gpt2 model with past state (#4043)
* update benchmark_gpt2 to use past state only
* update dynamic axes of input/output tensors
* Remove --use_openmp option since it is default for onnxruntime 1.3 cpu.
* Use same option names as benchmark.py
2020-05-28 13:55:43 -07:00
Ryan Lai
ed0a8e5b5c
Enable disabled tests and add fixed model (#4059)
Co-authored-by: Ryan Lai <ryalai96@gamil.com>
2020-05-28 13:24:12 -07:00
Brian Martin
279f9aa865
Update WinRT_API.md to reflect 1.3 release (#4074)
fix broken link, add new release to the release table, and point to the 1.3 nuget package
2020-05-28 11:01:49 -07:00
Changming Sun
c94d9685b6
Fix a problem in StacktraceTests::BasicTests (#4069)
result.size() could be zero, in this case, we shouldn't access result[0]
2020-05-28 10:06:16 -07:00
Changming Sun
a859dc422c
Delete google::protobuf::io::FileInputStream class from our source code (#4067)
This class is already part of the protobuf-lite library. We don't need a copy here.
And if we do, we must ensure the signature of every function is exactly the same as the original. However, the upstream code may get changed over time. For example, recently protobuf added a "const" modifier to the FileInputStream::GetErrno(), which may break the build if a user want to use the latest protobuf.
2020-05-28 10:05:47 -07:00