Commit graph

6653 commits

Author SHA1 Message Date
pengwa
9765ef8b4e
fix build warnings (#11213)
* fix build warning
2022-04-18 21:09:09 +08:00
Vincent Wang
0bad5b1b5a
[CUDA] Rollback TileMemcpy and TileBatchedMemcpy when Block Size is Small (#11187) 2022-04-16 07:46:43 +08:00
George Nash
d9eeb48393
One dnn v2.6 update (#11220)
* Disable training code in DNNL LayerNorm code

The capability code already does not claim the LayerNorm and
SkipLayerNorm that require more than one output. However,
building with training enabled was causing issues.

The training specific code has been removed even when building with
training enabled.

Signed-off-by: George Nash <george.nash@intel.com>

* Fix for DNNL FusedMatMul op.
The bug was in the transpose code.

Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>

* Use agreed upon memory format type when runnig Pooling Gradient in dnnl ep

The dnnl ep does not currently have a way to pass memory_format information
between the forward pooling primitive to the backward pooling primitive.

This change explicitly sets the memory_format to use match that of Onnxruntime.
For both the forward and backward pooling code. This will prevent using un-matched
memory format that could result in an `unimplemented` error from dnnl ep.

Signed-off-by: George Nash <george.nash@intel.com>

* Update dnnl ep to use OneDNN v2.6

Do not run ReduceInfLogSum on the kDnnlExecutionProvider due to a
calculation bug when doing Log or infinity valuse. The fix for this
issue will be part of the next OneDNN release.

Signed-off-by: George Nash <george.nash@intel.com>

* Update PrintMemory function in dnnl ep

This modification can be used to enable/disable memory printing
for dnnl ep develpers.  This is considered a developer only feature
and is disabled by default. It must be enabled and code recompiled
to use.

Even if it is enabled it will not actually print any memory because
the developer needs to take the extra step of spefifying the memory
that will be printed to the screen.

Signed-off-by: George Nash <george.nash@intel.com>

* Update binary ops to run on intel GPU when using dnnl ep

Binary ops (i.e. Add, Div, Mul, and Sub ) was updated to no longer
call GetMemoryAndReshape in the past this would move the memory from
CPU to the GPU.  This extra call is no longer needed since it is taken
care of by the GetMemoryInOrtFormat call. Removing the GetMemoryAndReshape
prevented copying the memory to GPU twice.

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com>
2022-04-15 12:51:11 -07:00
sumitsays
227bc7264e
Fixed compilation error for ARM architecture (#11223)
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
2022-04-15 09:24:21 -07:00
ytaous
bc296c706e
MatMulScaleFusion - handling scale input (#11121)
* scale input

* more condition check

* alternative

* per comments

* fix comments

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-14 21:54:04 -07:00
Yi Zhang
94032357e2
use int storage (#11185) 2022-04-15 09:56:36 +08:00
Ahmad Zakaria
63ff391b16
add AppendExecutionProvider_CUDA_V2 to the C++ api (#11153) 2022-04-14 17:33:27 -07:00
chausner
c2b4054c74 Fix typos 2022-04-14 13:53:50 -07:00
stevenlix
5216a43c9d
Consolidate TensorRT subgraphs to reduce inference overhead (#11211)
* add trt node list consolidation

* add more log

* fix typo

* seperate cycle detection and removal

* update

* change function name

Co-authored-by: Ubuntu <azureuser@orttrtlinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2022-04-14 11:05:27 -07:00
Faruk D
a00d24066a
Fix CITATION.cff and add automatic validation of your citation metadata (#10478)
* Add cffconvert.yml to validate CITATION.cff

* Fix CITATION.cff by removing duplicate title and correcting the license

Co-authored-by: Abel Soares Siqueira <abel.s.siqueira@gmail.com>
2022-04-13 10:03:52 -07:00
Vincent Wang
9707181257
fix build error (#11199) 2022-04-13 13:09:19 +08:00
Scott McKay
3b3b23bcf9
Add new python helper dirs to wheel. (#11196) 2022-04-13 13:34:07 +10:00
Chen Fu
0d0edc071f
Detecting ARM64 CPU core micro-architectures in Windows (#11145)
Some micro-architectures of power efficient cores in ARMv8 system have narrow 64b load/store resources, which require specialized computing kernels in MLAS. We leverage pytorch CPUinfo package for detecting these cores. Unfortunately CPUinfo package does not work on Windows.

This commit implements ARM64 micro-architecture detection.
2022-04-12 16:47:11 -07:00
ashbhandare
ddb17294b2
Fix gradient builder for Cast (#11008)
* fix grad builder for cast

* reviw comments

Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-12 16:08:21 -07:00
Gary Miguel
e84c338989
minor improvements to CONTRIBUTING doc (#11080) 2022-04-12 15:22:34 -07:00
Faith Xu
5337972f92
Update to use teams instead of individual GH handles (#11163)
* Update to use teams instead of individual GH handles

* Fix typo

* Update CODEOWNERS

* Update CODEOWNERS

* Update team name
2022-04-12 12:06:12 -07:00
Edward Chen
38e67e66a2
Add script and Dockerfile to build custom Android package (#11144)
* Handle relative paths in --include_ops_by_config.

* Add dockerfile.

* update comments

* refine

* update perms

* refine

* wording

* Change readme to md file, add link to docs site.
2022-04-12 10:16:10 -07:00
RajalakshmiSR
e397d8e63e
POWER: Optimize MlasTranspose functions (#11172)
This patch makes use of POWER vector intrinsics to improve performance
of MlasTranspose functions.

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2022-04-12 09:51:20 -07:00
Xavier Dupré
833f5d5604
Remove dependancy on EP TVM in unit test project (#11170) 2022-04-12 09:03:57 +02:00
Ryan Hill
625cc0ab99
Add Initialize() to shared providers to allow for reload (#11066) 2022-04-11 22:58:50 -07:00
Changming Sun
8237568b65
Fix the rocm packaging pipeline package upload problem (#11174)
In #11114 , I changed the script to use azcopy instead of azure blob storage's python APIs. However, it doesn't work for the AMD rocm pipeline, because:

1. The machines do not have azcopy installed
2. The machines are not in Azure, so they don't have Azure managed identity. So they still need to use SAS.

Therefore in this PR I get the old python file back, but only use it in the AMD pipeline.
2022-04-11 13:59:44 -07:00
dependabot[bot]
04fe1bd2ed
Bump electron from 12.2.3 to 13.6.6 in /js/web (#10978)
Bumps [electron](https://github.com/electron/electron) from 12.2.3 to 13.6.6.
- [Release notes](https://github.com/electron/electron/releases)
- [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md)
- [Commits](https://github.com/electron/electron/compare/v12.2.3...v13.6.6)

---
updated-dependencies:
- dependency-name: electron
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-11 12:51:56 -07:00
Olivia Jain
ae243c2bb5
Pull Nightly Wheel File and Cleanup Perf (#11164)
* delete unused files

* only use one dockerfile, otherwise install

* Update pipeline file

* get other changes

* minimal packages

* update pull nightly variable

* try logical boolean

* test boolean

* have build ort as boolean

* case senstive

* use the current head not the previous commit

* add helpful note
2022-04-11 11:41:11 -07:00
Yi-Hong Lyu
749c0ddd1e
Upsample support NHWC (#10824)
This patch implement bilinear interpolation for Upsample/Resize 4-D input with
the outermost and innermost scale (usually channel of NHWC) as 1. It is
parallelized with output_height * output_width instead of one dimension only.

Besides, I also revert the HandleResize back to the original implementation for
TransposeOptimizerTests.TestResize* tests.

Finally, I add microbenchmark BM_NhwcUpsampleBilinear.
2022-04-11 11:39:17 -07:00
Edward Chen
269be2fe63
Remove unnecessary option from convert_onnx_models_to_ort.py, fix old instructions. (#11088)
Remove unnecessary --nnapi_partitioning_stop_ops option from convert_onnx_models_to_ort.py, fix old instructions.
2022-04-11 11:19:21 -07:00
Tianlei Wu
00b595e389
move longformer and t5 to models subdirectory (#11161)
* move longformer scripts to models subdirectory
* Copy transformers\models\t5 to python package as well
2022-04-09 22:35:14 -07:00
Erick Muñoz
f24523e0eb
Enable LayerNorm and SkipLayerNorm in OneDNN EP (#11128) 2022-04-08 23:10:13 -07:00
liqun Fu
d96230065e
fix code error in function.cc (#11148) 2022-04-08 10:04:21 -07:00
Dmitri Smirnov
12c687f594
Rework initializer.cc to eliminate code duplication (#11131)
Rework initializer.cc to eliminate code duplication and add type enforcement.
 Address review comments.  Add literal operators for MLFloat16 abd BFloat16 and tests.
2022-04-08 09:42:31 -07:00
Vincent Wang
bcc62e0cbf
move some process out of training step (#11150) 2022-04-08 17:30:11 +08:00
Lukas
4c37f15c1b
Find boost, nsync, json, cpuinfo system libs with CMake option onnxruntime_PREFER_SYSTEM_LIB (#11146) 2022-04-08 00:11:02 -07:00
Weixing Zhang
0aaf3a676a
Update reduce norm1/norm2 and layernorm kernels with ROCm 4.3.1 (#9399)
* update layernorm to reflect the fix in ROCm 4.3.1

* fix UT

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-07 22:54:12 -07:00
Lukas
1b664e6d4c
Link cpuinfo only if supported (#11147)
* Remove unnecessary target_include_directories for cpuinfo

Headers already exposed as public by CMake target: 5916273f79/CMakeLists.txt (L213)

* Link to cpuinfo library only if supported
2022-04-07 21:32:12 -07:00
IkerAriz
541eff8d89
Directly use memory mapped data for external data initializers (#11127) 2022-04-07 18:00:43 -07:00
Baiju Meswani
5637f17189
Remove python frontend codeowners (#11143) 2022-04-07 15:57:30 -07:00
Justin Stoecker
7609694464
Enable building with a GDK (#11126) 2022-04-07 15:06:31 -07:00
Changming Sun
4983d6e5d6
Call pluggable EP's shutdown function in Environment::~Environment() (#11120)
I disabled some tests temporarily. I will move them to a separated executable file in another PR.

In the future, I want to combine onnxruntime::Environment and OrtEnv classes. Now we have 3 env classes, it is too confusing:

1. onnxruntime::Env
2. onnxruntime::Environment
3. OrtEnv
Our python binding uses onnxruntime::Environment, while all other language bindings use OrtEnv. So python doesn't unload EPs but the others do. It's better to make them consistent.

Please note even I added the call, currently the unload function still is a no-op on Linux. So, currently on Windows we must unload the EPs while on Linux we must not do it.
2022-04-07 14:11:29 -07:00
Dmitri Smirnov
2700261f7c
Provide an API to supply external initializers data from user buffers (#11109)
Imlpement AddExternalInitializers
2022-04-07 12:21:53 -07:00
ytaous
eec5187801
Remove Rocm 4.2 from CI Build (#11130)
* remove rocm42 CI

* update torch to v1.11.0

Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-07 11:42:09 -07:00
Vincent Wang
e612018127
[CUDA] Tile Kernel Optimization (#11053)
* tile cuda kernel optimization

* resolve comments and fix win build error
2022-04-07 14:59:54 +08:00
Changming Sun
26fceca90f
Update tools/ci_build/upload_python_package_to_azure_storage.py to not use the azure blob storage python package (#11114) 2022-04-06 14:30:51 -07:00
Maajid khan
81fa28bc56
OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025)
* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Modification to include new api 2.0 changes in the code

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Log comments updated

* Changes to enable 2.0 api

* Enabling ov-ep for 2022.1 Release

->Added ov-ep 2022.1 flow
->Validated CPU Unit tests with OV
Master using onnxruntime_test_all unit
tests.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix for output mismatch b/w OpenVINO and ONNX

Refer:
https://jira.devtools.intel.com/browse/CVS-60310

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enabling Adobe ops

->Enable Resize op for iGPU
->Enable Add op for iGPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing irrelevant conditions

->Removing some conditions from
GetCapability() which are now not
required. (Removed conditions for
OV version support less than 2021.2)

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable upsample op

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Enable Adobe proxy-e model

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Removing any extra conditions for Opset13 ops

* Opset13 changes

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Exception handling for devices

* Added comments

* Implement GPU Throttling feature

*Added GPU Throttling feature for iGPU's.
when user enables it as a runtime option,
it helps in reducing overall CPU usage
of the application

*Added changes to exercise this option
using onnxruntime_perf_test application.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Renaming the runtime config option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added the user to video and users group

* Handling_GPU.0_GPU.1

* Handling special conditions

->Handling corner cases for
device_type checks

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added opset13 changes

->Enabled Few ops
->Added Debug info for case 3b in getcapability()

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix build issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes issues

*Fixes compiler warnings c4458 on windows.
*Fixes the bug in device_type check logic
*Adds print info for enable_opencl_throttling
option in onnxruntime_perf_test

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* commit to make openvino_2021.4 compatible

* Fixed IO Buffer Optimization

* Fix output names issue

* Fix 2021.3 branch

* Bug Fix for Multiple inputs/outputs

- Assigns the right output_name and
input_name for the graph when
returned by CompiledModel::inputs()
OV function.

- Also takex care of output mismatch
issue b/w openvino output and onnx
output

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add comments for the changes made

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* IO Buffer Changes

* Commit for Disabling GPU Throttling for 2021.4

* Updated branch

* Fix windows build

->Fixed windows build in debug mode
->Disabled scatternd3_tensor_int64

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed CPP Unit tests for CPU

-Fixed shrink, MVN, ReduceL2, Maxpool,
upsample, scatter, slice, reshape,
unsqueeze.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed first set of GPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed additional failing tests on GPU

->Added conditions to disable certain ops
under certain conditions

->Disabled certain tests

->Added some op supports for no_dimension
supported

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added Expand op support for CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added condition for squeeze op

->Shape can't have empty axes attribute

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add support for LessOrEqual op function

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV Interface wait for replaced by indefinite wait call

* use names from ONNX model to access OV tensors

This chnage is to use the input/output names
retrieved from original onnx model to access
OV tensors and to check if there's any input
or output names mismatch b/w ONNX naming
and OV naming.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes Myriad unit tests and other issues

->Fixes Myriad CPP unit tests
->Fixes output mismatch issue with models with
sub graph partitioning

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fix segfault issue

->Fixed case 3b condition in get_capability()
which was causing the segfault issue

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed build isuse with ov 2021.4 with I/O buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disables performance counters for I/O Buffer

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed inputs/outputs mismatch for HDDL with 2022.1

Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>

* Fix to enable GPU FP16

* Enabled mlperf_ssd_mobilenet_300 model fully on CPU

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added ov version specific dll packaging for nuget

* Fixed conditions for few ops

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Dockerfile updates

* Updated License Info

-Updated the copyrights License Info
-modified FP16 transformations with OV 2022.1

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling mlperf_ssd_mobilenet_300 model

->Disabled this model for openvino. The
test is failing in Internal_CI pipelines.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Disabling failing python CPU Tests

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed flake8 python errors

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: hdgx <harinix.d.g@intel.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>
Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>
2022-04-06 13:30:33 -07:00
Xavier Dupré
3f42665a40
Improve transfered time from ort to torch (#9610)
* Improve transfered time from ort to torch
* Use static_cast
* fix call to Python API for python <= 3.8
* investigation
* fix ref counts
* disable import if no training
* one function to convert multiple ortvalues
* add proto_type
* enforce dlpack->deleter to be not null
* fix _ortvalues_to_torch_tensor for eager mode
* rename proto_type into element_type in the Python API
* conversion from ort to torch 2x times faster
* fix conversion of list of OrtValue
* replace has_bool_tensor by bool_tensor_indices
* introduce _ortvalues_to_torch_tensor_list
* use _ortvalues_to_torch_tensor_list for cache
* fix ambiguity between c and python classes

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-04-06 09:12:58 +02:00
Scott McKay
58d97691ac
Set dims for constant with multiple values (#11116)
* Also fix issue with data transfer not handling Tensor<std::string> correctly.
2022-04-06 07:39:07 +10:00
Abhishek Jindal
91c940b619
adding fill scalar for torch ones direct initialization on ort device (#10898)
* adding fill scalar for torch ones direct initialization on device and adding test case for it

* using ConstantOfShape to for implementing fill Scalar in atenops

* adding case for handling at::Tensor attribute

* handling the at::Tensor type for ConstantOfShape

* handling the at::Tensor type for ConstantOfShape with attr type

* handling the at::Tensor type case

* converting the data to tensor in case of aten tensor mapping is needed

* handling aten tensor case

* handling aten tensor case and reversing the string case

* changing type of scalar
2022-04-05 11:17:25 -07:00
G. Ramalingam
2c2408814f
Add function body for SoftmaxCrossEntropyLossGrad (#10779)
* Add function definition for SoftmaxCrossEntropyLossGrad

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Cleanup

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate unused variable

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix index of weight tensor

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* A few fixes to handle typing and weight

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix for zero D dimensions

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add function body to internal op also

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* A few fixes

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix type variable name

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix type constraint var

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix ignore_index handling in testcase

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Add fun def for SoftmaxCrossEntropyLossInternal

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Specify opset

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

* Handle opset in NLL function

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Address PR feedback

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Modify onehot

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Eliminate duplicate statement

Co-authored-by: Ganesan Ramalingam <grama@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2022-04-05 10:52:40 -07:00
Ben Niu
20fbf603d3
Fix ARM64EC build breaks (#11111)
Apply this 4c015dbb49 to fix ARM64EC build breaks.
2022-04-05 10:00:42 -07:00
Erick Muñoz
25fdf8b167
Add Dequantize Linear operator on OneDNN EP (#11036) 2022-04-05 08:32:26 -07:00
Baiju Meswani
8db180c245
orttraining cuda 10.2 to not build for compute_80 (#11103) 2022-04-04 17:22:05 -07:00
Jack·Boos·Yu
01631893cd
[cmake] Re-factor pre-compile header usage (#11093) 2022-04-04 16:28:34 -07:00