Ashwini Khade
75e054cd33
pick onnx release candidate ( #7177 )
...
* pick onnx release candidate
* fix typo
* filter batchnorm tests
* add implementation for reshape 14
* add identity op kernel for opset 14
* fix typo
* update onnx commit
* update commit to latest master
* add hashes for new kernel registrations and update 1
* TEST commit
* update onnx back to right commit
* Update onnx to latest in rel-1.9.0
* temp fix
* remove nonzeroshapesetter transformer
* pick rel branch latest commit
* fix build failures
* fix build failures
* fix build failures
* update the commit to latest in release branch
* add test filters for not impemented op14 ops in c# tests
* plus review comments
2021-04-22 23:57:09 -07:00
Guoyu Wang
d414039189
Add ios coreml ci, and speedup ios ci run ( #7420 )
2021-04-22 23:41:58 -07:00
sumitsays
d67c86265b
Enabled fp16-inception-v1 test ( #7406 )
...
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2021-04-22 23:05:03 -07:00
Yulong Wang
b56dd037d3
increase timeout for nodejs binding test ( #7422 )
...
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-22 21:40:40 -07:00
raviskolli
4c8513a627
SimplifiedLayerNormalization kernel for ROCM EP ( #7409 )
...
* Add SimplifiedLayerNormalization kernel to ROCM ep.
2021-04-22 21:25:09 -07:00
Changming Sun
6822ae95ec
Reduce the number of TensorRT tests needed to run ( #7419 )
2021-04-22 19:14:39 -07:00
Thiago Crepaldi
771a6d235b
Fix IsContiguousTensor check on backend ( #7391 )
2021-04-21 17:01:17 -07:00
Changming Sun
afa7b23609
Update docs/ContribOperators.md and the script that generates it. ( #7399 )
2021-04-21 16:20:56 -07:00
Brian Popow
1bbe538379
Update references
2021-04-21 13:36:10 -07:00
Brian Popow
aa1ce726aa
Remove unnecessary encoding step
2021-04-21 13:36:10 -07:00
Changming Sun
65b2b87f83
Update CI build docker images ( #7386 )
...
Update CI build docker images: delete ubuntu 16.04 support.
2021-04-21 13:18:34 -07:00
raviskolli
09313d9e1f
Added GreaterOrEqual and LessOrEqual Ops to RocmEP ( #7398 )
...
* Added GreaterOrEqual and LessOrEqual Ops to Rocm EP
2021-04-21 11:44:24 -07:00
Changming Sun
b4cfa88bf7
Update protobuf to the latest version ( #7396 )
2021-04-21 10:30:06 -07:00
Changming Sun
243713c464
Upload detailed code coverage result to azure blob storage ( #7392 )
2021-04-21 08:24:44 -07:00
Sherlock
16ca7677e6
Relax ConvGrad Test tol ( #7393 )
...
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-21 08:06:00 -07:00
Changming Sun
b5493d724c
Update rnn_helpers.cc: add #ifdef to DumpMatrixImpl ( #7389 )
2021-04-20 22:11:38 -07:00
Hariharan Seshadri
7b11283af0
Add ability to allocate initialized tensor memory from non-arena memory ( #7267 )
2021-04-20 20:27:48 -07:00
Thiago Crepaldi
8421124344
Add support to **kwargs in ORTModule forward() method ( #7360 )
2021-04-20 16:21:52 -07:00
ashbhandare
76cc118dbe
Gemm transpose fusion ( #7306 )
...
* Gemm transpose fusion
* Correct rewrite rule effect
* Add to inference transforms to trigger on gradient graph
2021-04-20 09:35:05 -07:00
Xiaoyu Liu
913ea8264b
GPT2 with one step beam search ( #7163 )
...
* beam search refactoring checkin
* add factory class and deduplicate code
* one step beam search works on gpu
Co-authored-by: Xiaoyu Liu <xiaoyu@xiaoyu-VM.z4vh1dzj5eoevgybsksdpz2izh.jx.internal.cloudapp.net>
2021-04-20 06:23:52 -07:00
mindest
1a3ddf0714
Add gradient registration and tests for Min/Max ( #7217 )
...
* Add gradient registration and tests for Min/Max
* Add helper function for min/max grad test
* limit Min/Max Grad to accept at most two inputs; modify test case accordingly
* resolve merge error
2021-04-20 18:14:31 +08:00
Sherlock
ce7ff27bac
Fix perf issue in Conv CUDA kernel ( #7348 )
...
* Fix perf issue in Conv CUDA kernel
* Read avaiable memory from device
* assuming 10% fragmentation
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-19 23:37:05 -07:00
ashbhandare
ac346a1b90
Modify SimplifiedLayerNormFusion to allow fusion in the presence of Casts optionally ( #7352 )
...
* LN transform partial changes
* LN transform fix
* Make transform optional, remove unnecessary code
* Fix windows build
* review comment, windows CI fix
* review comments
2021-04-19 19:59:23 -07:00
ytaous
7abe1fd392
Identity elimination with graph output ( #7312 )
...
* Identity removal
* fix build
* fix build
* fix build
* fix builld
* UTs
* fix UT
* fix UTs
* per comments
* fix UTs
* fix UTs
* per comments
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-19 16:36:35 -07:00
Sheil Kumar
265db2ad96
Fix Microsoft.AI.MachineLearning .NET5 publishing and C# Store Release build ( #7373 )
...
* fix .net publishing
* make experimental api build with microsoft.ai.machinelearning.idl import
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-04-19 15:36:43 -07:00
satyajandhyala
bb1e417da0
Add logging support to Cast Propagation transformation from python ( #7353 )
...
* Fixes needed to PropagateCast transformation.
* Added number of passes to the logs.
* Added logging support to OrtModuleGraphBuilder.
* Added new testcases.
* Added NodeArgToConsumerMap
2021-04-19 12:14:30 -07:00
M. Zeeshan Siddiqui
6dda1e0681
Flag for tensor memory re-use in allocation planner. ( #7359 )
2021-04-16 17:53:25 -07:00
Guoyu Wang
96cdc65d57
Fix android CI failure after gradle updated to 7.0 ( #7364 )
...
* Fix android ci failure after gradle updated to 7.0
* minor update
2021-04-16 15:28:28 -07:00
Yulong Wang
009f342caf
[JS] refactor Javascript/Typescript libraries in ONNX Runtime ( #7308 )
...
* working on re-organizing js code for ortweb
* remove dup files
* move folder
* fix common references
* fix common es5
* add webpack to common
* split interfact/impl
* use cjs for node
* add npmignore for common
* update sourcemap config for common
* update node
* adjust folder/path in CI and build
* update folder
* nit: readme
* add bundle for dev
* correct nodejs paths
* enable ORT_API_MANUAL_INIT
* set name for umd library
* correct name for commonjs export
* add priority into registerBackend()
* fix npm ci pwd
* update eslintrc
* revise code
* revert package-lock lockfileVersion 2->1
* update prebuild
* resolve comments
* update document
* revise eslint config
* update eslint for typescript rules
* revert changes by mistake in backend.ts
* add env
* resolve comments
2021-04-16 01:33:10 -07:00
Sunghoon
ded2b08380
WebAssembly multi-threads support. ( #7326 )
...
* WebAssembly multi-threads support.
* PROXY_TO_PTHREAD is not required for wasm library
* Remove an unnecessary line commented out
2021-04-15 21:46:11 -07:00
Guoyu Wang
28e229ac4c
Enable build dynamic framework for macOS/iOS ( #7343 )
...
* Enable build dynamic framework for macOS/iOS
* Address CR comments
2021-04-15 16:47:53 -07:00
Chen Fu
ef1aaa367a
Adding interface for batched integer gemm ( #7249 )
...
Parallelize MinMax, Quantize and batched quantize GEMM
Performance problem identified in T5 decoder model (quantized). DynamicMatMul operator is identified as the culprit. This operator spend time on getting MinMax of a Tensor, quantize a tensor, and perform a batched qgemm. All of these can be parallelized.
Currently GEMM is parallelized. However, in batched GEMM, we sequentially call GEMM multiple times. This causes multiple starting and ending of parallel sections, which can be slow sometimes. So we made the following changes:
Parallel task partition no longer depends on degree of parallelism, only on shape of the matrices.
In a single GEMM, perform 2D partition of the multiplication, along panel lines, to reduce repeated packing.
For batched GEMM, all parallel tasks are executed in a single parallel section, reducing the cost of starting threads and waiting for them to finish.
2021-04-15 10:25:31 -07:00
Changming Sun
f1c1c38d44
Delete an unused var in nuget pipelines( #7345 )
2021-04-15 07:29:52 -07:00
Tianlei Wu
aa9ab565f5
FastGelu fusion for Megatron model ( #7344 )
...
* add a fastgelu pattern from Megatron model
* update comment
* add test
2021-04-15 00:39:33 -07:00
satyajandhyala
0da085ed48
Propagate Cast operations to maximize lower precision (float16) computation ( #7191 )
...
* Added propagate_cast_ops option and PropagateCastOps transformation.
* Added test cases to propagate Cast operations.
* Expose GraphTransformerConfiguration to python interface and added propagate_cast_ops options.
* Added functionality to propagate Cast operations.
* Added logging.
* Apply cast propagation to the subgraphs.
2021-04-14 20:54:24 -07:00
Jesse Benson
be79575c6a
Use built-in reduce_sum() for simple reduction cases, specifically reduce all to a scalar.
2021-04-14 08:55:35 -07:00
Brian Martin
3eb2d349a6
fix typo in scenariotestscppwinrt.cpp ( #7334 )
...
the word is spelled, "resetting".
2021-04-14 08:26:55 -07:00
Oliver Rausch
87bd836886
Fixes in symbolic shape inference ( #7258 )
...
* Add symbolic shape inference for Transpose
* Support steps in symbolic shape inference for Slice
* Add inference for BatchNormalization
* Address review changes
* Address review changes
2021-04-13 22:17:30 -07:00
liqunfu
75d8319286
Liqun/ort package name2 ( #7337 )
2021-04-13 20:36:24 -07:00
Zhang Lei
f62db1a09c
quantization tools support qlinear average pool ( #7309 )
2021-04-13 18:22:42 -07:00
liqunfu
4c862c73ed
for training to use new python package naming convention to explicitl… ( #7204 )
2021-04-13 16:19:42 -07:00
ashbhandare
6ceee5d131
IsInf ReduceSum transform ( #7188 )
...
* IsInf ReduceSum transform
* Revert unnecessary changes, add isinf_only and isnan_only attr
* add tests, review comments
* Disable test for non-cuda
* Move IsAllFinite from training to contrib op
* review comments
* Review comment, formatting
* Enable test for ROCm EP
2021-04-13 16:05:21 -07:00
G. Ramalingam
f8a36dd6b3
Add DropoutGrad function body ( #7310 )
...
* Add DropoutGrad function body
* Add DropoutGrad function body
* Fix documentation and add test cases
* Fix template specialization
* Check expansion for float16 and bfloat16
2021-04-13 14:31:53 -07:00
harshithapv
a5d3a52d1a
Add Tile grad ( #7289 )
...
* tile grad
* fixed bugs
* added tile grad test
* bug fix
* Added tests. Addressed comments
* added optimization recommended and addressed comments
* fixed comment
2021-04-13 12:54:45 -07:00
Edward Chen
ce9cd6ad9a
Update usage of generator expression $<COMPILE_LANGUAGE:L1,L2> which is not available in CMake 3.14. ( #7318 )
2021-04-13 11:18:34 -07:00
Hariharan Seshadri
2c96050336
Fix SDL warning ( #7331 )
2021-04-13 11:14:43 -07:00
Ahmad Zakaria
f34468a309
Fix TRT EP memory leak ( #7195 revisited) ( #7276 )
...
* pass trt_profile by pointer pointer to avoid memory leak
* have 1 optimization profile per state instead of 1 per provider instance
2021-04-13 09:43:19 -07:00
Zhang Lei
f616ea632e
remove mlas unittest.cpp which is already refactored. ( #7319 )
2021-04-13 09:24:56 -07:00
Guoyu Wang
fce67e2b9b
Create Android Package pipeline ( #7295 )
...
* Create Android Package pipeline
* adress CR comments
* Switch to jdk11
2021-04-12 17:56:25 -07:00
Sheil Kumar
b7c89ce78a
User/sheilk/add api usage telemetry ( #7320 )
...
* winml telemetry
* change name to ApiUsage
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-04-12 17:51:25 -07:00