Commit graph

649 commits

Author SHA1 Message Date
Chandru Ramakrishnan
d8bcb3d6a4
Added virtual destructor to adasum_interface.h (#7882) 2021-05-30 11:11:10 -04:00
Ryan Hill
5a63904aa9
Remove some templated versions of functions that are no longer needed (#7868)
* Switch to non template version of function
2021-05-28 13:22:45 -07:00
baijumeswani
ddf4aaaae1
Resolve issue with wrapped ORTModule load_state_dict (#7847)
* Encapsulate children modules inside a ModuleAccessor object to prevent erroneuos iteration over children while loading the state dictionary

* Add named_models, models, apply methods, change ModuleAccessor to ModuleMetadata and modify unit tests

* Change ModuleMetadata module getter logic, raise NotImplementedError for add_modules

* Add comment explaining why overriding _load_from_state_dict method is needed
2021-05-27 16:11:37 -07:00
Edward Chen
45a7352622
Update Mac CI builds to use macOS-10.15 image, Xcode 12.4. (#7437)
Update Mac CI builds to use macOS-10.15 image, Xcode 12.4.
2021-05-27 09:39:34 -07:00
Sherlock
fc472a04be
Relax tol for Conv1D fp16 test (#7844)
* Relax tol for Conv1D fp16 test

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-26 17:04:35 -07:00
Thiago Crepaldi
c5ea5907c0
Fix permission error for ORTModule lock file (#7814) 2021-05-26 14:18:25 -07:00
harshithapv
4fe59c8b29
delete model_copy to save memory allocated in forward call (#7832)
* delete model copy

* add flag

* address comments

* address flag comment

Co-authored-by: root <root@OrtTrainingDev0.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-25 22:22:13 -07:00
Jesse Benson
29c68888af Update BERT convergence baseline. 2021-05-25 17:11:46 -07:00
ytaous
ff655175ff
Eliminate no op node - add 0 (#7798)
* eliminate add 0

* typo

* rank check

* fix build

Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-25 13:01:34 -07:00
baijumeswani
13a129054f
Prevent unnecessary re-initialization of the graph when model has unused parameters (#7799) 2021-05-22 20:52:26 -07:00
baijumeswani
a6ca9f0a40
Use list comprehensions instead of list appends where possible (#7753)
* Use list comprehensions instead of list appends where possible

* Add OrtValueVector class as an opaque object in pybind

* Add dlpack methods to the OrtValueVector pybind class
2021-05-21 10:28:09 -07:00
Sherlock
2a02871157
Disable reuse for YieldOp's inputs (FW partial graph's output) (#7767)
* Disable reuse for YieldOp's input

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-20 21:39:36 -07:00
Peng
c2435d24ec
Clean up ROCm4.1 Dockerfile build directory (#7732)
* Clean up ROCm4.1 Dockerfile build directory

* remove the UCX and OMPI build directories after installation
2021-05-20 10:04:49 -07:00
Ryan Hill
c99aa3a3f3
Ryanunderhill/cuda shared (#7626)
* First iteration of making cuda a shared provider.
Separated out shared OpKernel change, so doing this to merge with that change.

* More cuda shared library refactoring

* More cuda shared library refactoring

* More build options tested, converted the training ops over.

* Fix merge breaks

* Fix submodules

* Fix submodules

* Fix submodules

* Fix python

* Fix compile errors

* Duplicate symbol fix

* Test fix for ROCM provider

* Another ROCM test workaround

* ROCM Build Test

* ROCM build fix

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM

* ROCM test

* Reduce header dependencies

* Remove redundant namespace

* Test fix for linux

* Fix linux build

* Fix Eigen build error

* Fix unused parameter warning

* Test link error

* Another linker test

* Linker test

* Linker test

* Another test

* Another build test

* Fix linux link error

* Build test

* Fix control flow ops to use common base class with core code

* Remove extra qualifiers

* Fix template syntax for linux

* Fix cuda memory leak

* Fix pybind

* Test disabling cast

* Cleanup

* Restore cuda in test

* Remove more header dependencies

* Test not adding cuda provider to session

* Make GetProviderInfo_CUDA throw

* No-op cuda provider creation

* Fix some setup issues

* Fix memory cleanup on unload

* Diagnostics

* Don't unload library

* Add diagnostics

* Fix deleting registry at right time.

* Test disabling profiler

* Fix merge break

* Revert profiler change

* Move unloading of shared providers into Environment

* Free more global allocations before library unloads

* Add more diagnostics

* Move unloading back to the OrtEnv as there are multiple Environments created during a session.

Remove some library dependencies for tests.

* Fix more cmake files

* ERROR -> WARNING

* Fix python shutdown

* Test not using dml in pipeline

* Change python version and disable dml

* Update python version

* Test adding unload method for shared providers

* Disable DLL test

* Python test

* Revert "Python test"

This reverts commit c7ec2cfe98.

* Revert "Disable DLL test"

This reverts commit e901cb93aa.

* Revert "Test adding unload method for shared providers"

This reverts commit c427b78799.

* Point to RyanWinGPU

* Revert python version

* Fix id_to_allocator_map

* Another python exit test

* Remove extra debug messages
Try a more clean python shutdown through DllMain

* Revert DllMain idea, it didn't work

* Merge conflicts

* Fix merge with master issues.

* Comments

* Undo edit to file

* Cleanup + new training ops

* Revert yml changes

* Fix another merge error

* ROCM fix

* ROCM fix v2

* Put back Linux hack, it is necessary

* Stupid fixes

* Fix submodule out of sync

* ROCM fix 3

* ROCM 4

* Test java fix

* Fix typos

* Java test on my VM

* Fix build error

* Spotless fix

* Leave temp file around to load properly

* Fix cleanup on exit

* Fix break

* Java comments

* Remove LongformerAttentionBase workaround

* Spotless fix

* Switch yml back to regular build pool

* Revert "Switch yml back to regular build pool"

This reverts commit be35fc2a5a.

* Code review feedback

* Fix errors due to merge

* Spotless fix

* Fix minimal build

* Java fix for non cuda case

* Java fix for CPU build

* Fix Nuphar?

* Fix nuphar 2

* Fix formatting

* Revert "Remove LongformerAttentionBase workaround"

This reverts commit 648679b370.

* Training fix

* Another java fix

* Formatting

* Formatting

* For orttraining

* Last orttraining build fix...

* training fixes

* Fix test provider error

* Missing pass command

* Removed in wrong spot

* Python typo

* Python typos

* Python crash on exit, possibly due to unloading of libraries.

* Remove test_execution_provider from training build
Only enable python atexit on windows
Remove assert on provider library exit

* Still can't unload providers in python, alas.

* Disable Nvtx temporarily

* MPI Kernels for Training

* MPI Kernels part 2

* Patch through INcclService

* Oops, wrong CMakeLists

* Missing namespace

* Fix missing ()

* Move INcclService::GetInstance around to link nicer

* Missing }

* Missing MPI libraries for Cuda

* Add extra GetType functions used by MPI

* Missing Nccl library

* Remove LOGS statements as a test

* Add in a couple more missing GetType methods

* Update comments

* Missed a logging reference in mpi_context.h

* Convert aten_op to shared (due to marge with master)

* Test moving DistributedRunContext instance into shared provider layer
(with purpose error to verify it's being built properly)

* Test passed, now with fix

* Missing static

* Oops, scope DistributedRunContext to just NCCL

* Merge related issues and code review feedback.

* Merge error

* Bump to rel-1.9.1 (#7684)

* Formatting

* Code review feedback for Java build on non Windows

* Remove cupti library dependency from core library

* Test Java pipeline fix

* Linux build fix

* Revert "Linux build fix"

This reverts commit a73a811516.

* Revert "Remove cupti library dependency from core library"

This reverts commit 6a889ee8bf.

* Packaging pipeline fixes to copy cuda shared provider for tensorrt & standard packages

* Add cuda to Tensorrt nuget package

* onnxruntime_common still has a cuda header dependency

Co-authored-by: ashbhandare <ash.bhandare@gmail.com>
2021-05-20 07:53:47 -07:00
Vincent Wang
47b3cc4bde
GatherGrad Bugfix (#7752)
* gathergrad bugfix

* fix win build
2021-05-19 18:53:57 +08:00
Thiago Crepaldi
e05b15175d
Add cpp ext lock file check during ORTModule init (#7740)
* Add cpp ext lock file check during ORTModule init

* Address comments
2021-05-18 12:57:05 -07:00
baijumeswani
e161213f8e
Handle model with no parameters (#7736)
* Handle model with no parameters

* Set the minimum module_output_grads as 0 to handle parameterless models
2021-05-18 09:33:57 -07:00
baijumeswani
c873f5589d
Fix bug where the output names were sorted lexicographically (#7709) 2021-05-17 10:27:20 -07:00
Thiago Crepaldi
6c41ed597b
Add custom autograd function to prevent input passthrough on ORTModule (#7694)
* Changes for investigation

* Gradient for Identity

* Keep Identity betwen YieldOp and GraphOutput

* Revert debugging changes

* Add custom autograd fn to prevent input passthrough on ORTModule

* Add comment

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-17 09:56:02 -07:00
Thiago Crepaldi
4fe2ffae16
Fix ORTModule python doc generation (#7704)
* Fix ORTModule python doc generation

* Address comment
2021-05-17 09:55:49 -07:00
ashbhandare
bfbcc89db1
Add MLFloat16 support for SoftmaxCrossEntropyLoss for CUDA EP (#7679)
* Forward op changes

* Add tests, improve kernel

* add opset 13 registration, remove unnecessary changes

* Add fp16 grad for SCELoss, review comments
2021-05-14 09:00:27 -07:00
baijumeswani
37f69fcee5
Regain performance by caching initializer names in ORTModule (#7685) 2021-05-13 20:54:49 -07:00
raviskolli
4b37901f10
Aten support for rocm (#7680)
* Aten support for rocm

* Removed aten_ops.cc as it is resued from cuda version
2021-05-13 15:56:03 -07:00
Aswin John Mathews
4afdc19958
ROCm optimized layernorm for MI100 (#7682)
* layernorm optimizations

* Changed HIP flag from HIP_VERSION to __HIP_PLATFORM_HCC__
2021-05-13 15:54:06 -07:00
satyajandhyala
d90a99aad5
Fix the build on dev machines by replacing std::tuple with two arguments with std::pair (#7683) 2021-05-13 15:11:51 -07:00
harshithapv
31ca21b782
Replace Where Grad "Mul" with "Where" (#7672)
* replace where grad mul with where

* clean up

* auto formatting

* remove not for second input
2021-05-13 08:54:43 -07:00
Vincent Wang
dac24f7d63
Add ATenOp and call aten::embedding and its Backward Op from ORT (#7590)
* build with libtorch and impl torchembedding

* fix op shape infer

* local commit

* atenfunctionop

* call aten operator from online extension

* rollback build.py

* resolve comments

* bugfix

* fix build

* fix ortmodule test

* remove external outputs, resolve comments

* resolve comments

* export embedding to microsoft::atenop

* bugfix
2021-05-13 09:24:27 +08:00
Weixing Zhang
9241f62e4c
enable MatMulScale and cast propagation for ROCm EP. (#7657) 2021-05-12 13:43:24 -07:00
M. Zeeshan Siddiqui
5d9885f706
Fix BadNames. (#7658) 2021-05-11 16:06:10 -07:00
baijumeswani
c5aeaa9419
Support for unused model initializers (#7631)
* Support for unused model initializers

* Change graph_info.initializer* to sets
2021-05-11 12:26:56 -07:00
satyajandhyala
9f69b2f291
Added InsertAndReduce strategy to PropagateCastOps transformation in addition to FloodFill strategy (#7454)
* Moved GraphTransformerConfiguration to a separate file and added strategy option to PropagateCastOps transformation.

* Added testing both FloodFill and InsertAndReduce stratigies for cast propagation.

* Added AddConsumer and RemoveConsumer functions to in graph.h for efficient graph editing.

* Added PropagateCastOps code documentation

* Added GraphTransformationConfiguration class hierarchy information

* Added RemoveInputOutputUpDownCasts
2021-05-10 20:46:28 -07:00
baijumeswani
08fbfe9607
Resolve issue where a registered buffer was parsed incorrectly as a user input (#7617) 2021-05-10 19:04:27 -07:00
Pranav Prakash
a684e9aa52
Add pre-training transform to convert BatchNorm to BatchNormInternal (#7539)
* Add transformer for BatchNorm -> BN Internal

* Add test for BN replacement transformer
2021-05-10 15:13:59 -07:00
baijumeswani
88c95ef06b
Support for primitive types in ortmodule (#7588) 2021-05-10 10:59:47 -07:00
Hariharan Seshadri
4b691a5c0d
Add ability for memory arenas to "shrink" periodically (#7284) 2021-05-08 07:53:21 -07:00
Scott McKay
9fc4116d51
Use ASSERT_STATUS_OK so the error message is output if there's a failure. (#7515) 2021-05-07 20:23:34 +10:00
Vincent Wang
0c91b643fe
Bugfix for Scatter and GatherElementsGrad (#7593)
* bugfix for scatter and gather elements grad

* resolve comments
2021-05-07 14:02:26 +08:00
Derek Murray
94c97ac8c2
Fix compiler warnings treated as errors in GistEncodeDecode. (#7568)
* Fix compiler warning in GistEncodeDecode.

* Fix other use of member variable.

* Make `compression_type_` const.

* Change floor to floorf in CUDA code.

* Statically cast size_t to int in GIST CUDA kernels

* Add explicit cast to `long` in gist.cc

Co-authored-by: Derek Murray <demurra@microsoft.com>
2021-05-05 09:05:11 -07:00
Xavier Dupré
ade6ed51eb
Speed up Reduce operators for consecutive reduced axes (#7206)
* Improves Reduction for three specific configurations
* Support ReduceMean
* add ReduceMax, ReduceMin
* refactoring
2021-05-05 09:14:00 +02:00
Sergii Dymchenko
a647da3e1a
Fix 2 input Gemm grad (#7561)
* Add test for 2 input Gemm grad.

* Fix 2 input Gemm grad.
2021-05-04 12:00:14 -07:00
harshithapv
d812354ebd
Tile grad fix (#7556)
* tile grad fix

* code clean up
2021-05-04 11:16:26 -07:00
Fanny Nina Paravecino
c3c4db2c1b
Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli (#6262)
* Add gist nodes, kernels, optimizer rule, and cli

* Add Gist CUDA kernels

* Added/updated gist compression cli to bert, gpt2, mnist

* Fix decode priority generator for large models

* Fix hardcoded decode priority generator, update gist training test

* Fix incomplete if/else sequence for CI build

* Added MSFP15 for gist compression type

* fix Msfp15 bug

* Resolved azure pipeline errors - unsupported ORT_RETURN macro format, cudastream argument

* Resolved hardcoded cudastream argument, Pack8 zero error

* Resolved PR comments - except gist tests

* Added TypeInference to Gist Nodes, To attribute to Gist Decoder, Updated Gist Test Cases

* Reverted error in merge commit

* Updated logger usage in Gist rule, Updated GistPackMSFP15 compressed tensor's explaination

* Converted onnxruntime::make_unique to std::make_unique based on PR 7502

Co-authored-by: Fanny Nina Paravecino <faninapa@microsoft.com>
Co-authored-by: Aayush Ankit <aayushankit@microsoft.com>
Co-authored-by: Aayush Ankit <Aayush-Ankit@users.noreply.github.com>
Co-authored-by: Fanny Nina Paravecino <fanny.nina@microsoft.com>
2021-05-04 10:33:35 -07:00
Sherlock
c1ed647170
ORTModule enable run_symbolic_shape_infer by default (#7423)
* ORTModule enable run_symbolic_shape_infer by default

* Fix UTs by replacing Relu with Softmax

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-04 10:08:14 -07:00
Sherlock
6714f2f85d
Improve tol value logging in ORTModule test (#7544)
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-05-03 09:43:40 -07:00
Pranav Prakash
8ba6ed953f
Fix batch norm training op on CPU (#6946)
* Fix batch norm training op on CPU

* Add BatchNorm 14 Op Support

* Update hashes for BN

* Exclude TRT and OpenVINO for BatchNorm training test
2021-05-01 11:25:19 -07:00
Sherlock
668a65f1a7
Complete GetGlobalAveragePoolGradient (#7514)
* Improve GetGlobalAveragePoolGradient

Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-04-30 18:04:01 -07:00
Thiago Crepaldi
9ba9da0c95
Fix unused registered buffers issue on ORTModule (#7525) 2021-04-30 13:50:23 -07:00
Tang, Cheng
54db6648af
kerne invoker api for eager mode (#7473)
* initial draft for kernel invoke api

* initial implementation of kernel invoker

* [eager] fix build on Mac

* [eager] increment input name in kernel invoker

* temp fix for type in eager mode

* use global default log manager

* rollback the previous commit since it break linux build

* Revert "rollback the previous commit since it break linux build"

This reverts commit 58c2c3423a.

* Eager Mode: fix linking on macOS

* optimizer_execution_frame: ignore unused lambda capture (model_path)

* fix link issue

* ORTInvoker: set correct input argument tensor element proto types

Do not set a type proto on output arguments to allow ORT to deduce them

* ORTInvoker: create only one logging manager

* Minor fix to set execution provider type correctly. (#7000)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* training fix

* support config output ml values in frame, so we can use it to implement inplace update

* Fix range loop error while building. (#7087)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* Conditionally link with nsync_cpp if not windows. (#7151)

Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>

* Fixed initialization order in ORT kernel invoker (#7342)

* Updated constructor of ort_kernel_invoker to take a logger.

* Changed linking order.

* Updated test.

* add inplace ut

* add build option

* Update include/onnxruntime/core/eager/ort_kernel_invoker.h

Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>

* resolve comments in pr

* fix build break;merge from master

* fix build break

Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Aaron Bockover <abock@microsoft.com>
Co-authored-by: Chandru Ramakrishnan <41447659+chandru-r@users.noreply.github.com>
Co-authored-by: Chandru Ramakrishnan <chandru-r@github.com>
Co-authored-by: Derek Murray <Derek.Murray@microsoft.com>
2021-04-30 13:33:58 -07:00
Changming Sun
1012535dab
Change onnxruntime::make_unique to std::make_unique (#7502)
1. Change onnxruntime::make_unique to std::make_unique
2. Add "-std=c++14" to ROCM EP's build flags.
2021-04-29 17:04:53 -07:00
sabreshao
e6a3308db7
Optimize cuComputeGradInput performance. (#7479)
Move the checking of gamma to host and specialize both case through template.
2021-04-28 17:08:31 -07:00