* publish mloperatorauthor.h in the nuget
* build dmlep into arm/arm64 builds
* update to not use --use_dml everywhere, but enable custom ops everywhere
* always download directml nuget in winml builds
* always build with dml
* dont build dml for arm
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* fixed seg fault when using concrete shape
disable gradient as output
* fix evaluation hang issue for multiple gpu run
* Remove dead code, ORTModel and improve docstrings (#3814)
* Refine ORTTrainer docstring descriptions (#3907)
Description:
Fix 2 edge cases as described here: #3755 (comment)
Create a NodeArg for subgraph inputs even if they have no type. If they are only used as an implicit input to another level of nested subgraph we will not create a NodeArg via any other path
Allow an If output to have no shape. Obscure edge case where a loop carried dependency to a Loop node is passed through a nested If node subgraph (i.e. the Loop subgraph contains an If node with a nested subgraph for the else_branch/then_branch). We can't infer a shape for a loop carried dependency (they may change across iterations), which means we can't infer a shape for the nested If subgraph output either. We have delayed allocation support for If outputs so use that.
Motivation and Context
#3755
* update benchmark_gpt2 to use past state only
* update dynamic axes of input/output tensors
* Remove --use_openmp option since it is default for onnxruntime 1.3 cpu.
* Use same option names as benchmark.py
This class is already part of the protobuf-lite library. We don't need a copy here.
And if we do, we must ensure the signature of every function is exactly the same as the original. However, the upstream code may get changed over time. For example, recently protobuf added a "const" modifier to the FileInputStream::GetErrno(), which may break the build if a user want to use the latest protobuf.
* Enable optimizer on models with external data (>2GB)
* Refactoring optimizer: move fusion to separate file
* Update benchmark: (1) output datatime to csv (2) Add option --onnx_dir to benchmark.py for onnx model directory path (3) add gpt2-large (4) loose thrsholds for fp16 validation
* update optimizer (1) Add attribute of ConstantOfShape in fp16 conversion (2) Use OnnxRuntime level 1 optimization
* update bert_perf_test.py: rename --input_ids to --input_ids_name
Add transformer glue test example to show how to use ORTTrainer to fine-tune a transformer model
Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
* Pad: Add support for all datatypes in opset-11 spec
Pad opset-11 implementation supports:
int32, int64, float & double
Per specification, Pad opset-11 also supports:
uint8, uint16, uint32, uint64, int8, int16 & float16
This commit add support for those types to get full coverage of Pad opset-11 operator.
* Pad: Remove 16-bit datatypes support
These types are unused at the moment and binary size is impacted. Remove support for those type to lower binary size.
Clean up a CUDAExecutionProvider's associated PerThreadContext instances when that CUDAExecutionProvider is destroyed.
Revert workaround (introduced in #3767) to lazily initialize CUDA handles to avoid segmentation fault. For that case, the CUDA handle cleanup was happening quite a bit later than the CUDAExecutionProvider destructor. This should be a cleaner way to fix that.
* online partition
* fix when multiple consumer nodes is in cut info
* fix windows build
* address feedback
* adding test
* feedback
* address feedback
* add parser for cut edge
* windows build
* Add amd migraphx execution provider to onnx runtime
* rename MiGraphX to MIGraphX
* remove unnecessary changes in migraphx_execution_provider.cc
* add migraphx EP to tests
* add input requests of the batchnorm operator
* add to support an onnx operator PRelu
* update migrapx dockerfile and removed one unused line
* sync submodules with mater branch
* fixed a small bug
* fix various bugs to run msft real models correctly
* some code cleanup
* fix python file format
* fixed a code style issue
* add default provider for migraphx execution provider
Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>
* Optimize for OneHot with zero off value.
* Add test cases for indices out of range.
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
In this PR, we
1. create some APIs for creating NVTX objects
2. apply those APIs in pipeline-related operators and sequential executor.
As a result, we can explicitly see how a pipeline schedule is run by GPUs in
Nvidia's visual profiler. Note that these APIs are Linux only due to Nvidia's
limited support.
* Remove 'model_.' prefix for onnx model initializers in training
* fix test case remove redundant device test
* rename
* Fix state_dict/load_state_dict with frozen_weight
* nit
* Add monkey patch for pt opset 10
* remove pt patch in CI
* nit: newline
onnxruntime init failure due to wrong path of reading native libraries. In OS X 64 system, the arch name is detected as x86 which generates invalid path to read native libraries.
Exception java.lang.UnsatisfiedLinkError: no onnxruntime in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at ai.onnxruntime.OnnxRuntime.load(OnnxRuntime.java:174)
at ai.onnxruntime.OnnxRuntime.init(OnnxRuntime.java:81)
at ai.onnxruntime.OrtEnvironment.<clinit>(OrtEnvironment.java:24)
Change training perf test build to use "docker" instead of "sudo docker". The training perf test build runs in an environment that supports calling "docker" and not "sudo docker".
* gpt2 training perf
* gpt2 training perf
* debug
* debug
* debug
* fix bug
* minor
* on comments
* dynamic sql
* fix build
* minor
* linked hash
* on comments
* minor
* mem
* minor
Co-authored-by: Ethan Tao <ettao@microsoft.com>