Commit graph

1725 commits

Author SHA1 Message Date
Brian Martin
09c9caab2d
Brianma/cpu (#2583)
* don't include dml stuff in cpu builds

* tests that link the image lib also need the telemetry lib now
2019-12-07 08:59:42 -08:00
Brian Martin
09ca58044e Merge branch 'layer_dev' into windowsai 2019-12-06 16:23:05 -08:00
Ori Levari
b3c568cf4d
Layer dev dml delayload (#2580) 2019-12-06 15:44:08 -08:00
Ori Levari
be48f05c64
Cmake and preprocessor fixes that where uncovered by building on agents without DML available via SDK 2019-12-06 13:30:19 -08:00
Paul McDaniel
56cbd82c71
Layer dev paulm (#2567)
* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.

* moved files from inc to lib\api.core
cleaned up some of the cmake

* staged changes

* making windowsAI azure dev ops work.

* code review comments.

* revert changes
2019-12-05 18:14:20 -08:00
Ryan Lai
9933b8a5d6
Fix custom ops scenario tests (#2562)
* Do not shutdown protobuf after ort environment gets destroyed. Lazy load lotus environment first time it is needed

* comment typo

* pr comment  about calling phoenix singleton

* Make lotus_environment static in winmladapter
2019-12-05 15:41:01 -08:00
Ori Levari
8294fa72a4
various changes to unblock windowsai ADO build 2019-12-05 13:50:13 -08:00
Xiang Zhang
8fb7b88e0a
Handle exception thrown from all apis in WinMLAdapter (#2539) 2019-12-04 14:08:16 -08:00
Ori Levari
2b8d6d3e31 add missing namespace to winml_trace_logging_provider in lotusenvironment.h (#2542) 2019-12-04 11:30:45 -08:00
Paul McDaniel
a7cf316efb
Layer dev paulm (#2536)
ori said yes
2019-12-03 16:35:27 -08:00
Ryan Lai
3afb7a89fe
Spawn child process to run DeviceLostRecovery scenario test (#2530)
* Spawn child process to run DeviceLostRecovery scenario test
2019-12-03 15:38:04 -08:00
Paul McDaniel
c615002f5d
Layer dev paulm (#2533)
* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.

* moved files from inc to lib\api.core
cleaned up some of the cmake

* staged changes
2019-12-03 15:31:22 -08:00
Jeff Bloomfield
a437d43420 merge master 2019-12-03 13:50:20 -08:00
Ashwini Khade
e32eff826c
enable nuget package testing on centos7 (#2527)
* add centos tests to linux cpu ci pipeline

* Disable failing test

* use centos6 instead of centos7

* change back to centos7

* add dotnet runtime dependency

* fix dotnet runtime dependencies

* install dotnet sdk instead of runtimes

* add more dotnet dependencies

* temporary skip failing test

* ix lib path

* reenable failing test
2019-12-03 10:16:45 -08:00
Brian Martin
f54625f7c5
re-enable warnings for winml builds and fix the warnings that were hiding (#2526)
* turn devmode back on for winml builds

* fix some warnings. include protobuf in a way that disables some warnings

* undo protobufhelpers changes and just ignore 4100 errors in pb code

* attempt to isolate protobufhelpers errors

* add template specialization for getting tensor proto data
2019-12-03 09:57:56 -08:00
RandySheriffH
85a4ed8cf7
fix cuda kernel causing invalid mem access (#2523) 2019-12-03 09:16:00 -08:00
Tianlei Wu
66254eb25a
Update BERT model optimization python script (#2521)
Add support of GPT2 model optimization:
* Match subgraph of Gelu Approximation (using Tanh).
* Fuse LayerNormalization if SkipLayerNormalization is not ready.
* Output model even if embedding layer is not fused.
* Improve Reshape Fusion to improve coverage.
* Refine constant input checking, and output fused op counter.

Update script according to latest op improvements:
* Fusion of Add Bias and Gelu.
* Fuse SkipLayerNormalization and Add Bias.

Other:
* Add ReduceSum for mask as intermediate step.
* Refactor verbose setting.
2019-12-03 08:40:51 -08:00
Sreekanth Yalachigere
31ea11a696 Renaming MKL-DNN as DNNL (#2515)
* DNNL: Moving Files to rename file names

* DNNL name change

* azure pipeline updated

* disable ceil/dialation and enable Opset10

* disable ceil/dialation tests in Python

* mlperf_ssd_resnet34_1200 disabled
2019-12-03 07:34:23 -08:00
Changming Sun
3d627362a0
Upgrade Windows CPU CI pipeline to use VS 2019 (#2519) 2019-12-02 23:05:35 -08:00
Scott McKay
e8b327d657
Fix constant folding of node assigned to CUDA (#2510)
* Constant folding bug fix/improvements
  - Handle constant folding for node that is assigned to a non cpu EP
  - Check for errors in optimizer execution frame setup
  - Improve CUDA partitioning to look for initializers in parent graphs
  - Add unit test

Fixes #2474
2019-12-03 16:28:44 +10:00
Changming Sun
4354023913
Make link time optimization work on Linux (#2477) 2019-12-02 22:25:41 -08:00
baowenlei
25c260fdef Add parallel for tensorized gemm (#2517)
* add parallel for tensorize gemm

* add option to control parallel

* change to a more clean way to control
2019-12-02 22:05:46 -08:00
KeDengMS
c1be615c45
[NupharEP] refine parallel schedule control (#2514)
* [NupharEP] Add parallel schedule to JIT function name
Update Nuphar docker to use Python 3.6 and ubuntu 18.04

* Update notebook

* Avoid JIT cache file name conflict
2019-12-02 17:40:51 -08:00
Zhang Lei
784eca0dcd
Cuda pad() for opset 11 (#2490)
* Cuda pad opset 11.

* Handle type conversion issue in building.
2019-12-02 16:28:17 -08:00
Ori Levari
da897d76e7
add dml binaries to DirectML package and be more explicit about condition variables (#2520) 2019-12-02 16:10:38 -08:00
Jeff Bloomfield
b9faa0b6fd Fix kernel registry validation to reenable DML kernels 2019-12-02 15:43:44 -08:00
Scott McKay
ddaad86605
CUDA Loop (#2444)
* Implement CUDA Loop operator.

* Add control flow node implicit input handling to the memcpy transformer and allocation planner.
2019-12-03 08:29:21 +10:00
Zhang Lei
50eb140119
Cuda Resize Operator for opset 11. (#2484)
* Cuda Resize Operator for opset 11.
2019-12-02 13:42:21 -08:00
xavier dupré
c42148a0c3 Improves softmax function for standard ml 2019-12-02 10:48:46 -08:00
Dmitri Smirnov
ec88f6d8d6
Add DataFrameTool (#2456)
Add DataFrameTool to feed inputs from Panda DataFrame
2019-12-02 10:12:03 -08:00
Yulong Wang
89824b35e9
optimize CPU implementation of Attention (#2496) 2019-12-01 14:43:38 -08:00
Tianlei Wu
0f57e0a49e
Change mask input of EmbedLayerNormalization op to be optional (#2495)
Change mask input of EmbedLayerNormalization op to be optional
2019-12-01 08:36:06 -08:00
Tiago Koji Castro Shibata
092d8f2866
Make tests dependend on winml_dll (#2509) 2019-11-30 15:05:50 -08:00
Brian Martin
ecb3228e43 Merge branch 'windowsai' into layer_dev 2019-11-29 08:18:18 -08:00
Brian Martin
5adab88eed Merge branch 'master' into windowsai 2019-11-29 07:50:17 -08:00
liuziyue
0edd4ef6ca
EmbedLayerNormalization fusion (#2452)
Embed Layer Normalization Fusion
2019-11-28 14:03:58 -08:00
KeDengMS
60208463a9
[NupharEP] Enable parallel schedule (#2505)
* [NupharEP] Enable parallel schedule
* Update TVM with the fix to TVM threadpool to use OpenMP if possible
* Add parallel schedule when trying to vectorize
With this change, BERT squad perf on a 4-core (8 HT) CPU goes from 187ms to 150ms

* Address CR, docs and cmake update

* Doc fix

* Fix mkl

* Fix TVM windows build when using mklml
2019-11-28 08:35:56 -08:00
Yufeng Li
005305be6e
Implement AddGelu and SkipLayerNorm (#2487)
* Implement AddGelu and SkipLayerNorm
2019-11-28 08:29:59 -08:00
Zhang Lei
ee0bde6b69 Enable three type of Equal() to version 11. (#2508) 2019-11-28 03:03:43 -08:00
Paul McDaniel
301d407b39
Layer dev paulm (#2507)
* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml

* cleaned up namepsace aliases.
renamed _winmla to winmla
this was good PR feedback from tiago a while back.
2019-11-27 15:50:49 -08:00
Dmitri Smirnov
75b4747701
Fix a memleak in pybind. (#2503) 2019-11-27 15:32:05 -08:00
Ryan Lai
197fd9ea3d
Remove usage of IOBinding in WinML and use C_API Run method (#2504)
* remove usage of iobinding

* Change data structure to use vector of Ort::Values

* Polish bind input / output

* Use C APIrun method

* Update providers on evaluate getresults

* Remove run and IObinding interface from WinMLAdapter

* Remove use of IObinding

* bind unbound outputs code moved to learningmodelbinding

* clean up unneeded istensor adapter function

* Fix comment

* Check if session is closed before binding and clearing

* PR feedback
2019-11-27 15:31:30 -08:00
Paul McDaniel
e8e285dd97
Layer dev paulm (#2506)
* commetns for dml graph transformer
fixed ort value passing using the allocatir info

* fixed and coded maps and sequences across the abi

* cleaned up w4's
cleaned up the model info ABI
delayload directml.dll from winml
2019-11-27 15:04:47 -08:00
Scott McKay
1fdf1006ac
Various fixes coming out of discussions in #2436 (#2497)
- Add --skip_tests option to build.py based on github feedback
  - Add debug output at end of run_subprocess so it's clearer when the output is from a different process running
  - Add check for scipy as it's required by gen_test_models.py for the onnx tests
  - Use log.warning instead of warnings.warn for consistency. We use the logger almost everywhere and somewhat randomly used warnings.warn in two places.
  - Add check for 'wheel' dependency not being found in setup.py and handle more gracefully
  - Fix invalid input name in Keras tests
2019-11-28 07:03:23 +10:00
Zhang Lei
04b6097db4
Cuda Clip() for op set 11. (#2411)
* Cuda Clip() for op set 11.
* make min_val and max_value input CPU memory directly.
* Remove original cu file useless "#pragma once"
* merge duplicate logic into one class.
2019-11-27 12:42:45 -08:00
Yulong Wang
ccbd778d0d optimize CPU implementation of EmbedLayerNorm (#2491)
* optimize CPU implementation of EmbedLayerNorm
* use atomic in parallelization
2019-11-27 12:34:57 -08:00
Ori Levari
2cfee5744b
Layer dev release pipeline (#2488)
Adds winml binaries to existing cpu nuget package, and creates new gpu dml nuget package with winml binaries and DML EP.
2019-11-27 11:36:20 -08:00
Tiago Koji Castro Shibata
fd8105640f
Link scenario tests to DML when it's enabled (#2502) 2019-11-27 11:29:09 -08:00
Tiago Koji Castro Shibata
9169c95a0e
Add CLI parameters to test runner, build WinML in ARM and x86 CI (#2479)
* Support test parameters through CLI arguments

* Add WinML do Windows x86/ARM CI builds

* Code style fixes

* Update googletest

Remove GPUTEST macros everywhere now that GTEST_SKIP is supported

* Refactor main.cpp

* Build scenario tests without DML
2019-11-27 10:33:00 -08:00
Tianlei Wu
e57b735bb9 Add a transformer to use Gelu approximation for cuda provider (#2480)
* Add Gelu Approximation Transformer to convert Gelu or AddGeluFusion to FastGelu to get better inference performance.
2019-11-27 10:15:50 -08:00