onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-09 17:28:58 +00:00

Author	SHA1	Message	Date
Rachel Guo	db4e664f7c	Re-enable react native e2e android unit test for CI and upgrade targetSDK level for test project (#14989 ) ### Description <!-- Describe your changes. --> Re-enable the react native e2e android unit test for react native CI as recent change of specifying `default` instead of `google-apis` in android emulator CI tests gives pretty stable result for now. Upgrade the targetSDKversion for gradle test project in react-native/android to meet minimum target api level requirement for Google Play apps. https://support.google.com/googleplay/android-developer/answer/11926878?hl=en ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> React Native CI issue.	2023-03-14 13:35:38 -07:00
Alex Kogan	8b09702b88	Enable parallel computation in Clip ops (#14925 ) ### Description <!-- Describe your changes. --> This PR speeds-up Clip operations by replacing their sequential implementation with a parallelized one. The parallelization is achieved by dividing the input data into chunks of size N and using a thread pool to process the chunks in parallel. The chunk size N is set to 16K based on performance evaluation on input tensors of 10^i elements for i in [1 .. 6]. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The Clip operation is frequently executed in image processing models. Its implementation can be easily parallelized and therefore sped up when executed on a multi-core machine. On long inputs (>= 100K elements) this PR achieves speedup of over 2x. On shorter inputs, this PR does not introduce any substantial performance change.	2023-03-14 09:41:44 -07:00
PeixuanZuo	2ff7f3e93a	[ROCm] support optimized Stable Diffusion model (#14980 ) Add BiasSplitGelu/BiasAdd/GroupNorm/NhwcConv operator for ROCm EP. 1. BiasSplitGelu and BiasAdd operators can be automatically hipified from CUDA EP. 2. GroupNorm was hipified from CUDA EP and modified to build. 3. NhwcConv is similar to NhwcConv in CUDA EP, But the MIOpen API and cuDnn API are different. `miopenConvolutionForwardbias` and `miopenOpTensor` of MIOpen doesn't support NHWC layout now, use BinaryElementwise to replace miopenConvolutionForwardbias(NHWC layout).	2023-03-14 23:15:37 +08:00
PeixuanZuo	ff2850029b	[ROCm] refact SkipLayernorm long if-elseif statements (#14795 ) Refact SkipLayernorm long if-elseif statements.	2023-03-14 23:04:55 +08:00
Ye Wang	0fa00429d5	[T5 optimization] script fusions and fixes (#14967 ) ### Description <!-- Describe your changes. --> 1. added script for t5 encoder self attention and t5 decoder self/cross attention fusions. 2. added simplified layernorm fusion for --external_data_format senario. (otherwise relying on ORT optimizer) 3. added rel_pos_bias shape inference code, modified attention/mha shape inference script. 4. reworked graph_topologic_sort() because the currently implementation is not functioning correctly. also added an option to topo-sort the graph in a deterministic way to let tests pass. note: 1. the t5-beamsearch export code is slightly modified. specifically, encoder_hidden_states(ehs) is no longer an input to the t5 decoder since the ehs is not actually used in the graph execution. 2. recent PRs do not add optimizations to t5 on cpu. 3. the fp32 model(encoder and decoder) for t5-small, t5-base and t5-large can get a parity of e-5 and the corresponding beam search models generate same results as pytorch. 4. fp16(mixed-precision) models, however, get a parity around 3e-2 and some has maximum diff a bit over 3e-2. But the beam search models still generate same results as pytorch (based on limited input data) 5. mt-5 model has a parity issue at the moment, even before any optimization. will investigate later. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-13 23:35:56 -07:00
Christian Veenhuis	59dfcfdce7	Fix typos in sources: operater, tranform, neccessary, trainig (#14907 ) ### Description While browsing the sources I found several typos here and there. I collected them to a single PR and fixed them. Namely these typos are: operater, tranform, neccessary, trainig. After fixing none of them was found anymore: $ git grep "operater" $ git grep "tranform" $ git grep "neccessary" $ git grep "trainig" $ ### Motivation and Context Since some of the typos are in example notebooks and markdown files, users can see them.	2023-03-13 22:45:04 -07:00
Ye Wang	538d64891a	[t5 optimization] kernel changes to t5 (#14928 ) ### Description <!-- Describe your changes. --> 1. support optional bias in Attention op (used in T5 encoder) 2. support broadcasting rel_pos_bias in attention_softmax.h 3. add scale in MHA op's attributes 4. support past_key/past_value and present_key/present_value in MHA 5. UT and parity tests are added 6. fix an issue: https://github.com/microsoft/onnxruntime/issues/14920 note: the fusions will be in another PR since mt5 needs to be tested and an issue from github will be investigated. Future works: 1. support shared buffer for past/present 2. enable trt kernels when possible and investigate (trt/cutlass)kernels with rel_pos_bias) 3. support KV/QKV packing with past/present ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-13 14:29:16 -07:00
Dmitri Smirnov	b34e570ad0	Enable LeakyRelu latest and refactor fast_gelu_fusion to enable the script (#15003 ) ### Description Enable LeakyRelu latest since the last version differs only in type support. Refactor `fast_gelu_fusion` to enable the script, because our script is unable to check if any of the optimizers are outdated and no longer in effect. ### Motivation and Context We do not want to loose performance. Next step is to file improvements issues if any are required.	2023-03-13 14:20:11 -07:00
Nat Kershaw (MSFT)	a5d814008c	Fix API docs deploy so that a PR is not required (#15011 ) Fixes this [issue](https://github.com/microsoft/onnxruntime/actions/runs/4387534694/jobs/7682945415#step:12:534) and removes the extra PR step in the workflow. Also logs the commit of the main branch that the docs were generated from to a file called version.txt at the root of the API docs tree. Tested for Java API docs and results staged here: https://natke.github.io/onnxruntime/docs/api/java/index.html If approved, I can migrate all of the other API docs generation workflows to use this scheme.	2023-03-13 09:36:08 -07:00
pengwa	44dda08b51	Renaming files (#15015 ) ### Renaming files for compute optimizer ### Motivation and Context A follow up for https://github.com/microsoft/onnxruntime/pull/14832	2023-03-13 17:07:59 +08:00
PeixuanZuo	c55f347689	[ROCm] change miopen_conv_use_max_workspace=true (#14982 ) Change miopen_conv_use_max_workspace=true to get best algorithm during `miopenFindConvolutionForwardAlgorithm` process.	2023-03-13 16:19:23 +08:00
pengwa	448e989df8	Op slicing upstream refactor (#14832 ) ### Slice op upstream refactor A refactor work for https://github.com/microsoft/onnxruntime/pull/13672. ### Motivation and Context There is a similar optimization opportunity for other operator upstreaming, to reduce compute flops. So refactor the existing code base for making it easier to support other ops. The changes in this PR are mainly about renaming and moving. - Move common logic (from compute_optimizer.h/cc) into upstream_transformer_base.h/cc and shared_utils.h/cc. - For upstream common logic, they are moved into upstream_transformer_base.h/cc - For shared utilities, they are moved to shared_utils.h/cc. - After the move, compute_optimizer.h/cc mainly for upstreaming gather implementation (inheriting upstream_transformer_base.h/cc). Ideally it should be renamed, but for easier review this time, I keep its name.	2023-03-13 08:19:32 +08:00
Yi-Hong Lyu	cce9e0eaad	Add float32 hardsigmoid tests (#14948 )	2023-03-12 10:56:29 -07:00
G. Ramalingam	930e009567	[WIP] Update call to GetFunction (#14949 ) ### Description OpSchema::GetFunction() changed in ONNX to support opset-version-dependent function-body. Update the call to GetFunction appropriately. ### Motivation and Context Motivated by https://github.com/microsoft/onnxruntime/issues/14810 --------- Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>	2023-03-11 07:04:17 -08:00
Yi Zhang	ca315b9148	Use ADO cache to cache docker image instead of ACR (#14496 ) ### Description Now, we only enable image cache in pipeline cache for Linux Aten Pipeline. It'll be enabled in other Linux pipelines gradually. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fixed [AB#13143](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13143) ### Verification 1. No Image Cache in Pipeline https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904531&view=results 2. Use Cached Image in Pipeline https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=904533&view=results	2023-03-11 10:32:02 +08:00
Vincent Wang	7950189920	[CUDA] Optimize Perf for AtomicAdd of Half Type (#14992 )	2023-03-11 08:52:01 +08:00
Changming Sun	a8ad0edbeb	BUG FIX: the if...else in telemetry-steps.yml does not really work (#14972 ) ### Description BUG FIX: the if...else in telemetry-steps.yml does not really work. It always says "Telemetry is disabled." even through the pipeline doesn't have the pipeline variable. ### Motivation and Context For example, recently I setup a new pipeline in https://dev.azure.com/onnxruntime/onnxruntime/_build without setting the ADO variable, but the powershell code still thinks that we have enabled telemetry. See: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=910107&view=results The reason it didn't work because when the pipeline variable("TELEMETRYGUID") doesn't exist, the occurrence of "$(TELEMETRYGUID)" would be not replace to anything. It will remain as it is.	2023-03-10 15:39:07 -08:00
Adrian Lizarraga	d8ddd25272	Add InstanceNormalization operator to QNN EP (#14867 ) ### Description QNN EP: - Adds the [InstanceNormalization](https://onnx.ai/onnx/operators/onnx__InstanceNormalization.html) operator to QNN EP. - Fixes graph composition bug when Transpose node is the last node in a graph. - Adds check for input shape when GetCapability is called (before and after layout transformation) - Should add similar checks for other layout sensitive ops (conv, pool, ...) in a separate PR - Adds initial QNN op tests for QDQ conv and QDQ InstanceNormalization - Should add tests for other ops in a separate PR Optimizer: - Makes InstanceNormalization a layout sensitive operator. - Adds a custom QDQ group selector for InstanceNormalization. Quantization tool: - Adds QDQ support for InstanceNormalization operator. - Adds python unit test for InstanceNormalization quantization. ### Motivation and Context Needed to support stable diffusion models with QNN. --------- Co-authored-by: Hector Li <hecli@microsoft.com>	2023-03-10 14:42:41 -08:00
Ryan Hill	a5c436e148	Fix prefast warnings (#14975 ) ### Description In transpose.cc: Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). In cuda_provider_factory.h: The type 'struct onnxruntime::CUDA_Provider' with a virtual function needs either public virtual or protected non-virtual destructor (c.35).	2023-03-10 14:31:55 -08:00
Dmitri Smirnov	0d7855ea5a	Re-work global objects dependancies in pybind layer. (#14941 ) ### Description Re-work handling of static objects in pybind. Make sure we ref-count Environment from Sessions. The following has been done: - Make global objects function static. This ensures that the objects are constructed on demand. The first object constructed is destructed last. This is platform independent. - Make global objects ownership shared as suggested by pybind since they are not surfaced at Python level, and they cannot be referred to by dependent python objects. Verified that all python objects are GCed before globals are destroyed. This takes care of inference session dependency on environment and its default logger and this is also platform independent. - Utilize pybind atexit mechanism to clear execution providers and unload CUDA libraries (as suggested by https://github.com/microsoft/onnxruntime/pull/14903) . Since this is registered for module exit, it takes place before any other global are destroyed and clears shared objects state or even unloads the libraries. This should also work in a platform independent way. ### Motivation and Context - Global object destruction order is managed manually and that becomes source of trouble. We want to make it deterministic and platform independent. - Frequent hangs in Python layer due to the static object's destruction order. Some of the Python session objects are being garbage collected after main exits and they require ORT environment to be alive. (Use after free)	2023-03-10 13:55:31 -08:00
Adrian Lizarraga	e2febe87f6	[QNN EP] Update QNN SDK to 2.8 (#14978 ) ### Description - Add QNN 2.8 SDK - Make QNN SDK version a pipeline template parameter for QNN pipelines. ### Motivation and Context Updates to latest QNN SDK version, and allows testing different QNN SDK versions without modifying yaml files.	2023-03-10 13:21:19 -08:00
Edward Chen	bd142bfb04	Gradle clean up (#14973 ) - Use java/gradlew directly in .github/workflows/publish-java-apidocs.yml. - Remove use of deleted step from tools/ci_build/github/azure-pipelines/android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml. - Remove Gradle installations and PATH updates from Dockerfiles and scripts. Now Gradle wrapper is used so a system Gradle installation is not needed.	2023-03-10 10:50:32 -08:00
Baiju Meswani	748758c135	Address issue with uninitialized variable (#14988 )	2023-03-10 09:24:04 -08:00
Maximilian Müller	ad4db12699	TensorRT EP - timing cache (#14767 ) ### Description This will enable a user to use a TensorRT timing cache based on #10297 to accelerate build times on a device with the same compute capability. This will work across models as it simply store kernel runtimes for specific configurations. Those files are usually very small (only a few MB) which makes them very easy to ship with an application to accelerate the build time on the user end. ### Motivation and Context Especially for workstation use cases TRT build times can be a roadblock. With a few model from ONNX model zoo i evaluated speedups when a timing cache is present. `./build/onnxruntime_perf_test -e tensorrt -I -t 5 -i "trt_timing_cache_enable\|true" <onnx_path>` \|Model \| no Cache \| with Cache\| \| ------------- \| ------------- \| ------------- \| \|efficientnet-lite4-11 \| 34.6 s \| 7.7 s\| \|yolov4 \| 108.62 s \| 9.4 s\| To capture this is had to modify the onnxruntime_perf_test. The time is sometimes not captured within "Session creation time cost:" which is why i introduced "First inference time cost:". --------- Co-authored-by: Chi Lo <Chi.Lo@microsoft.com>	2023-03-10 09:02:27 -08:00
Yi Zhang	acbb7ad453	enable cache in orttraining-mac-ci (#14979 ) ### Description enable compilation cache in orttraining-mac-ci ### Motivation and Context The workflow duration can be reduced to 12 minutes from about 100 minutes at best. https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=911536&view=results	2023-03-10 07:34:25 +08:00
Yulong Wang	1187d4ade6	[wasm] extend build timeout for static lib (#14952 ) ### Description extend build timeout for web assembly static lib.	2023-03-09 15:03:34 -08:00
Preetha Veeramalai	79d47c1530	Enable sorting of initializers (#14631 ) Add intializers to model proto in sorted order. ### Motivation and Context Onnxruntime OpenVino Execution Provider interacts with Openvino API by passing onnx serialised model proto. Current flow is that onnx serialised model proto will be passed into Read_model() API of OpenVino that creates an OpenVino execution network thats passed to compile_model() API. As part of optimizations we have combined the API's (Read_model and Compile_model) into single compile_model() API that directly accepts serialized onnx model proto. A hash function will be computed on this serialized input for internal Openvino optimizations. This requires the model_proto to be deterministic during each inference requests. With the current flow, the [initializers are added to model_proto](`c1ff4b468d/onnxruntime/core/graph/graph_proto_serializer.cc (L48)`) from an [unordered_map data structure](`8ed3dfe063/onnxruntime/core/providers/shared_library/provider_interfaces.h (L93)`) that brings in random ordering of these initializers for inference runs. The proposed solution is to add these initializers by iterating through a sorted[ vector consisting of the initializer names](`2c7146cef8/onnxruntime/core/graph/graph_proto_serializer.cc (L49)`).	2023-03-09 12:12:46 -08:00
Jian Chen	b4fe98ac2e	Update to MacOS-12 (#14924 ) ### Description <!-- Describe your changes. --> Update to MacOS-12 ### Motivation and Context Fixed [AB#13233](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/13233)	2023-03-09 10:18:14 -08:00
cloudhan	51b67fa15c	Make ROCm Attention biased+masked and biased+nomask scaling logic consistent (#14976 ) The biased+masked and biased+nomask have different scaling logic in current ROCm implementation Currently, biased + masked: (QK'+ bias) * scale + convert(mask) biased + nomask: QK' * scale + bias which is not correct. What we want is QK' * scale [+ bias] That is, bias should not be scaled. This effectively follows https://github.com/microsoft/onnxruntime/pull/14517/files?w=1#diff-e4768ce15a73499f584f9cd7d71adcb1ff2ed8d68ad7e496723a4775cbc35e33	2023-03-09 23:37:50 +08:00
mindest	f83923d5df	fix rocBLAS extensions API issue; add batched- and strided_batched- cases (#14883 ) ### Description For rocBLAS extensions API: * fix `alpha`/`beta` dtype mismatch in `rocblas_gemm_ex()`, which should be the same as `compute_type`. * add support for `BatchedGemm` and `StridedBatchedGemm` cases.	2023-03-09 23:23:35 +08:00
mindest	bf2cc808a1	[ROCm] SkipLayerNorm: add more configs for block size; loosen constraints (#14900 ) ### Description * add more configs for `threads_per_block` in SkipLayerNorm, also in kernel explorer. * loosen constraints for hidden_size, so that `SkipLayerNormSmallOp` can be selected for larger hidden sizes. * add flag for optional output in kernel_explorer ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-09 22:27:01 +08:00
Yi Zhang	d55ae490e1	detach patch manylinux from get_docker_image (#14958 ) ### Description Make patch manylinux one single step. ### Motivation and Context If we want to use hash of docker-related files as the cache key, the files should keep consistent before and after docker build. And changes in generated build_scripts should trigger rebuilding the image as well.	2023-03-09 15:40:58 +08:00
zhijiang	80e25ad6ac	fix cg issue (#14372 ) ### Description tensorboard depends on rsa>=3.1.4, while rsa 4.5 has vuln issue, so pin it to higher version as suggested Fixed [AB#7352](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/7352) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-09 15:28:11 +08:00
Yulong Wang	3c4efd2e77	[js/common] allows polyfill for bigint (#14921 ) ### Description This change delays the execution of checking whether bigint is available in the context. This allows polyfill for `BigInt64Array`/`BigUint64Array` (if there is any)	2023-03-08 15:29:04 -08:00
Yulong Wang	8844474083	[js] remove 'npm bin' (#14943 ) ### Description 'npm bin' is deprecated in latest version. use 'npx' instead. This PR resolves #14934	2023-03-08 15:03:27 -08:00
Ye Wang	d8d96f0788	Fix a build issue (#14944 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/14940	2023-03-08 13:05:49 -08:00
Edward Chen	c46c7ccba5	Update Gradle version (#14862 ) - Update Gradle version used in most places from 6.8.3 to 8.0.1. Update Android Gradle Plugin version where applicable. Not updated in this change: React Native Android projects (under `js/react_native/`). That can be done later along with updating the React Native projects. - Add Gradle wrapper in `java/` to make it easier to consistently use a specific Gradle version.	2023-03-08 12:22:06 -08:00
Changming Sun	d9436407b6	Use safe allocator for JNI code (#13999 ) ### Description Use a customized allocarray function to replace the original malloc calls to avoid integer overflow. ### Motivation and Context Fix Prefast warnings. Fixed [AB#8990](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/8990) Fixed [AB#8991](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/8991) Fixed [AB#9016](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/9016)	2023-03-08 11:40:55 -08:00
Adam Pocock	47f00b5d49	[Java] Initial on device training support (#14027 ) contributor: @Craigacp	2023-03-08 10:01:08 -08:00
Ashwini Khade	f14ab63c19	fix prefast warnings (#14931 ) ### Description Fixes prefast warnings Fixed [AB#11328](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11328) Fixed [AB#11329](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11329)	2023-03-08 09:49:15 -08:00
Hariharan Seshadri	112a4d215a	[CUDA] Support decoding multihead self-attention implementation (#14848 )	2023-03-08 09:17:54 -08:00
Kyushick Lee	c696392f0c	Support external output tensors for DORT (#14516 ) ### Description <!-- Describe your changes. --> Support externally-managed output tensors (torch Tensors) for dort. Add `preallocate_output` option to OrtBackend to rely on externally-managed output tensors for dort. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> DORT currently allocates and returns output ortvalues and convert them to torch Tensors. The conversion based on dlpack does not support torch Tensors for custom Aten backends, and it is not yet possible to transfer the ownership from ortvalue to external handle (torch Tensor). To avoid this issue, the PR change provides an option (`preallocate_output`) to allocate output tensors externally in pytorch, which creates torch Tensor for an Aten backend, and let dort take pointers from torch Tensors to construct output ortvalues instead of allocating them inside InferenceSession.	2023-03-07 21:32:23 -08:00
edgchen1	2ef25a2200	Update CODEOWNERS file.	2023-03-07 17:56:37 -08:00
edgchen1	5b3f79a11a	Add gradle wrapper validation workflow.	2023-03-07 17:56:37 -08:00
Ashwini Khade	f71ac9859e	Update acpt image in the training pipeline (#14855 ) ### Description Current pipeline refers to an old image which is causing test failures. Updating the image to the latest one. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? Fixes pipeline failure: https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=198 - If it fixes an open issue, please link to the issue here. -->	2023-03-07 14:10:32 -08:00
pengwa	5d8ce817cb	Fix simplified layer norm fusion for training (#14866 ) ### Fix simplified layer norm fusion for training Co-author with @prathikr. Fix bug identified by @prathikr. https://github.com/microsoft/onnxruntime/issues/14822. Running T5 model enabling deepspeed, we see simplified layer norm is not fused because the device check did not pass `b7fde84341/onnxruntime/core/optimizer/layer_norm_fusion.cc (L568)`. Since during pretraining optimization pass, there is no device placement, so the device check not fulfilled is expected. On the other hand, the device check is still valid to avoid simplified layer norm fusion works correctly for CPU runs. As a mitigation, added a flag to indicate whether the fusion is triggered by pre-training optimization or not. There is a risk though, when we run ORTModule training with CPU EP, but I feel the risk can be much reduced if we check CUDA/ROCM is enabled for the build. ``` CUDA_VISIBLE_DEVICES=0 python examples/onnxruntime/training/summarization/run_summarization.py --model_name_or_path t5-small --do_train --dataset_name cnn_dailymail --dataset_config "3.0.0" --source_prefix "summarize: " --predict_with_generate --overwrite_output_dir --output_dir /bert_ort/pengwa/output --fp16 --max_steps 1 --logging_steps 1 --deepspeed aml_ds_config_zero_1.json ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-07 13:59:20 -08:00
Patrice Vignola	65f1f840f6	[DML EP] Fix Attention regression caused by removing transposes (#14908 ) By removing the transposes and using strides instead, the metacommands are not able to be reached anymore since it's not using NCHW layout.	2023-03-07 11:17:28 -08:00
Xavier Dupré	6b604521a6	Fix tree implementation when left, right node have lower index (#14839 ) ### Description Previous implementation did not support left or right node of a node to have an index lower than the node itself. This condition would forbid the tree to enter an infinite loop. Lightgbm does not follow that rule. The changes do not change the algorithm but remove the test enforcing that condition. ### Motivation and Context It fixes a regression introduced by #14670.	2023-03-07 19:47:12 +01:00
Hitesh Shah	66101c02a2	Implement AllToAll collective op	2023-03-07 10:17:07 -08:00
Adam Pocock	150043f74f	Adds a Java accessor for GetVersionString (#14876 ) ### Description Java part of #14873.	2023-03-07 09:46:56 -08:00

1 2 3 4 5 ...

8301 commits