onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Vrajang Parikh	960e320dff	Objective C Training API: TrainingSession (#16374 ) ### Description - Implement Objective-C binding for `ORTTrainingSession` - Add `ORTUtils` utility class to handle conversion between C++ and Objective-C types - Add test case for saving checkpoint - Add unit test cases for `ORTTrainingSession` ### Motivation and Context This PR is part of implementing Objective-C bindings for training API. It implements objective-c binding for training session. The objective-C API closely resembles the C++ API. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-06-28 09:13:56 -07:00
Christian Bourjau	6dd4e4801a	Allow custom operator functions to safely propagate errors through the C-API (#16479 ) ### Description This PR implements a backward-compatible way to define custom operators with fallible compute functions. The C++ API templated gained an optional `Fallible` argument. Closes #14287 ### Motivation and Context #14287 contains more context. The gist is that the current C-API defines compute operations of custom operators as functions returning `void` rather than an `OrtStatusPtr`. Currently, errors are often propagated across the C-ABI using C++ exceptions. That is very unsafe and undefined behavior. Moreover, it is difficult for languages other than C++ to use this approach even if they wanted to. A C-compliant sound and safe way to propagate errors allows for non-C++ fallible custom operators. ### An example in action https://github.com/cbourjau/ort-custom-op/pull/6/files is a demonstration of how this PR can be used to write safe and fallible custom operators in Rust.	2023-06-28 08:16:32 -07:00
cloudhan	15f16ef36e	[ROCm] Add DecoderMaskedMultiHeadAttention (#16362 ) Reuse MultiHeadAttention to implement DecoderMaskedMultiHeadAttention.	2023-06-28 19:18:07 +08:00
Baiju Meswani	cbfbe210a8	Fix bug that accidentally disabled training op tests (#16488 )	2023-06-27 18:39:54 -07:00
Yi Zhang	fb7e1f133f	[Fix] TSA Upload failed in nuget pipeline. (#16476 ) ### Description partially revert PR #16244. ### Motivation and Context npm pipeline couldn't triggered if nuget pipeline status is warning. ### Test Run https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=321873&view=logs&s=b17bed5b-cc14-5026-390a-fb2feea063f2	2023-06-28 06:40:49 +08:00
cao lei	e5270e3b4f	shared allocator for on device training (#16432 ) ### Description <!-- Describe your changes. --> New logic to share allocators among module, optimizer and eval sessions for Training scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previously on device training using shared allocator by sharing EP, now with new mechanism to share allocator, we need to explicitly register allocator in the environment. --------- Co-authored-by: Lei Cao <leca@microsoft.com>	2023-06-27 15:10:42 -07:00
Ryan Hill	1001ec93a7	Ryanunderhill/beamscorer gpu (#16272 ) ### Description Make BeamScorer run on the GPU vs the CPU. Brief overview: Adds a CUDA 'CudaBeamSearchScorer' implementation of IBeamScorer Instead of a 'done' flag per beam, there is one single 'not done' variable that is copied to the CPU every iteration Removes some of the extra CPU side buffers and parameters that are no longer needed Remaining future optimizations: CPU copied beam indices is still used in the non DecoderMaskedSelfAttention case. An extra kernel can be written to avoid PickGptPasteState needing CPU copied beam indices (called from UpdateGptFeeds). ### Motivation and Context It's faster to keep the work on the GPU to avoid GPU->CPU->GPU copies of data.	2023-06-27 15:08:44 -07:00
Adrian Lizarraga	f5e9625c36	[QNN EP] Properly skip HTP test on x64 (#16500 ) ### Description Fixes a typo that prevents skipping a test that targets the QNN HTP backend on Windows x64. ### Motivation and Context - Windows x64 machines cannot load/run the QNN HTP backend. Therefore, we need to skip such tests on Windows x64. - Fixes the QNN_Nuget_Windows pipeline.	2023-06-27 14:59:46 -07:00
Rachel Guo	892b1b19ea	[js/rn] limit x86_64 arch in detox xcodebuild for react native e2e test (#16460 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Works with local onnxruntime-c pod in js/rn/e2e test.	2023-06-27 09:45:04 -07:00
Michael Klimenko	c3db1d3628	Replace float_t with float (#16484 ) A couple of places in onnxruntime used `float_t` data type alias as an alternative to `float`. However, this is not entirely correct, since `float_t` is an implementation-defined type alias, which may be `float`, `double`, `long double` or some other implementation-defined data type, depending on the state of the internal `FLT_EVAL_METHOD` macro: https://en.cppreference.com/w/c/numeric/math/float_t On most major platforms and compilers (clang, GCC, MSVC) this is only a cosmetic change and will not lead to any changes. However, icpx compiler (and legacy icc) tends to substitute `float_t` with `long double`, resulting in a linker error (unresolved reference) to the base onnx library, that only contains the `ParseData` function for `float` and `double` as in [here](`9264e09367/onnx/defs/tensor_proto_util.cc (L133-L134)`). Overall, this PR cleans up the implementation-defined behaviour and enables building onnxruntime with icpx.	2023-06-27 09:28:38 -07:00
guyang3532	4768ac5f30	Fix onnxruntime-CI-nightly-ort-pipeline Failure (#16495 ) The image for the onnxruntime-CI-nightly-ort-pipeline is too old. The ort package in the image is older than latest test code in nightly ci. This causes the nightly ci failed.	2023-06-27 23:19:23 +08:00
pengwa	a49bb85cfe	Manage ORTModule configurations consistently (#16396 ) ### Manage ORTModule options Move all env vars that used for feature ON/OFF into runtime options for consistent managements. Be noted: the features' switch are assigned in 2 phases: default values, overwritten by env vars (if specified by users). So env vars take the highest priority when all 2 phases both given value explicitly for one feature. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-27 19:19:36 +08:00
pengwa	403bebfb51	Use PadAndUnflatten to replace GatherGrad for restore (#16429 ) ### Use PadAndUnflatten to replace GatherGrad for restore ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-27 15:07:20 +08:00
cloudhan	ae6da03438	ROCm decoding (#16339 )	2023-06-27 13:19:39 +08:00
Yi Zhang	6e9541046e	extend react native ci timeout limit (#16469 ) ### Description <!-- Describe your changes. --> ### Motivation and Context 2 consecutive runs in npm pipeline failed due to time out	2023-06-27 08:44:03 +08:00
guyang3532	eb4e6d2062	Support Mul and Sub in padding elimination (#16478 ) ### Description Support Mul and Sub in padding elimination	2023-06-27 07:43:29 +08:00
Edward Chen	4a331ef667	Rework CPU MeanVarianceNormalization kernel to support arbitrary axes. (#16420 )	2023-06-26 15:29:50 -07:00
Yifan Li	e2c214d81f	[TensorRT EP] TRT 8.6 minor version update (#16475 ) ### Description * Minor version update: TRT 8.6.0.12->8.6.1.6 * CI pipeline ymls/dockerfiles are updated * cgmanifest.json/deps.txt/download-deps.yml are updated; Win trt binaries uploaded to [win img 307029](https://aiinfra.visualstudio.com/AI%20Infra%20Management/_build/results?buildId=307029&view=results) * Re-enable unit tests which were failed in 8.6.0 and re-gained support in 8.6.1	2023-06-26 10:44:27 -07:00
Baiju Meswani	1f60414bc2	Load CheckpointState from a buffer (#16457 )	2023-06-26 09:18:38 -07:00
Yifan Li	efe0af3720	[TensorRT EP] Fix nullptr check (#16468 ) ### Description Fix the nullptr check so that it would check the actual existence of engine/context (Currently, it checks the address of unique_ptr, which is always not null. Thx @jslhcl for pointing that out) > A quick recall of struct [trt_state](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h#L104): > ``` > std::unique_ptr<nvinfer1::ICudaEngine>* engine = nullptr; >std::unique_ptr<nvinfer1::IExecutionContext>* context = nullptr; >``` ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/15982 The incorrect check couldn't stop TRT EP from loading incompatible engine cache on purpose, which invokes unhandled exception	2023-06-26 09:02:59 -07:00
Adam Louly	c55c6255e0	Eliminate safe nodes that are followed by a shape node. (#16065 ) ### Description Eliminate Cast operator if Shape is the next one. ### Motivation and Context #### Cast When working with onnx opset 15 and above, the shape operator now accepts all types of variables. This change is documented in the [onnx Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#Shape-15). As a result, casting variables right before the shape operation becomes unnecessary. Removing these unnecessary casts will improve the graph and potentially provide better performance gains. ## Results On : torchrun examples/onnxruntime/training/language-modeling/run_clm.py --model_name_or_path gpt2 --do_train --overwrite_output_dir --output_dir ./outputs/ --seed 1337 --fp16 True --per_device_train_batch_size 4 --num_train_epochs 1 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --learning_rate 2e-5 --report_to none --optim adamw_ort_fused without changes: *** train metrics * epoch = 1.0 train_loss = 3.2981 train_runtime = 0:02:13.29 train_samples = 2318 train_samples_per_second = 17.39 train_steps_per_second = 4.351 With my changes: * train metrics *** epoch = 1.0 train_loss = 3.2981 train_runtime = 0:02:08.98 train_samples = 2318 train_samples_per_second = 17.971 train_steps_per_second = 4.497 We see around 3% gain. --------- Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-06-26 16:35:07 +08:00
Scott McKay	48eff09664	Fix file list for test of build with IO debug (#16474 ) ### Description <!-- Describe your changes. --> Update file list to adjust for recent changes to test infra. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-26 16:36:22 +10:00
PeixuanZuo	7e211f0e03	[ROCm] Move mount data step into docker container (#16471 ) Some CI jobs may interrupted unexpectedly and didn't execute umount data step. The data left in host device will cause `device or resource busy` and make subsequent CI jobs fail. Move the mount data step into docker container, the host machine will not be occupied when CI jobs exit incorrectly.	2023-06-26 10:25:06 +08:00
Guenther Schmuelling	8971af72af	fix webnn build (#16464 ) remove unused variable.	2023-06-25 11:09:58 -07:00
zhijiang	9206b7cdc6	Zhijxu/cast propagation softmax (#16408 ) enhance cast-propagation for "softmax can be put at fp16 when data flow is cast-to-fp32 > softmax > cast-to-fp16" this optimization can save gpu memory and have performance gain	2023-06-25 10:28:54 +08:00
Tianlei Wu	9407c3270c	GPT-2 attention fusion for transformers >= 4.27 (#16461 ) ### Description Before transformers 4.27, the causal mask uses uint8 data type, so there is extra Cast node to convert it to bool. This adds a pattern that without Cast node to support attention fusion for GPT-2 models exported with transformers >= 4.27. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/16453	2023-06-23 15:38:35 -07:00
Hector Li	a8c313dec4	[QNN EP] Python script to modify Onnx model to make it aligned with converted QNN model (#16423 ) Python script to modify Onnx model to make it aligned with converted QNN model ### Description Onnxruntime QNN EP can support context binary file generated by QNN tool chain. However QNN generated context binary file uses channel last and 8 bits or 16 bits for input and output. This script get the QNN model input & output information from QNN converted model_net.json file, and insert Cast, Transpose nodes to Onnx model if required.	2023-06-23 11:00:51 -07:00
Ryan Hill	d5b606d50d	Remove now duplicated symbol (#16458 ) ### Description Change #16161 broke rocm by duplicating this symbol. This removes the duplicate to unblock the tests.	2023-06-23 09:21:03 -07:00
Chen Fu	5c125b4366	Cfu revertamx (#16455 ) ### Description This is to revert two PRs that aim at reducing AMX toolchain requirements. Unfortunately we still have some pipeline issues. https://github.com/microsoft/onnxruntime/pull/16390 https://github.com/microsoft/onnxruntime/pull/16086 ### Motivation and Context Looks like gcc link time optimization does not work very well with inline assembly in the above PRs.	2023-06-23 09:20:23 -07:00
Rachel Guo	04dbdc96bf	[js/rn] Fix React Native CI pipeline E2E test (#16447 ) ### Description <!-- Describe your changes. --> Based on this kindly provided quick fix: https://github.com/microsoft/onnxruntime/pull/16411 See more description in the above linked pr about bumping AGP version, etc. Also fixed import header file path in detox e2e test. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Good build: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1041757&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=9894c870-b8ce-548d-51ff-8f44d21a4117&l=18	2023-06-22 14:33:49 -07:00
Baiju Meswani	10ba1e270c	Minimal Build for On-Device Training (#16326 ) 🛠️ __Changes in this pull request:__ This pull request introduces two significant changes to the project: - Changing on device training checkpoint format: The current implementation stores the on device training checkpoint as a sequence of tensors in multiple files inside a checkpoint folder, which can be inefficient in terms of storage and performance. In this PR, I have modified the checkpoint format to utilize the flatbuffer table to save the checkpoint to a single file, providing a more compact and efficient representation. The changes around this are twofold: - Add the checkpoint flatbuffer schema that will generate the necessary checkpoint source files. - Update the checkpoint saving and loading functionality to use the new format. - Adding support for onnxruntime minimal build: To support scenarios where binary size is a constraint, I made changes to ensure that the training build can work well with the minimal build. 🔍 __Open Issues:__ - In order to extract the optimizer type, the existing implementation re-loaded the onnx optimizer model and parsed it. This is no longer possible, since the model format can either be onnx or ort. One idea is to do the same for ort format optimizer model. This needs some investigation. - Changes to the offline tooling to generate ort format training artifacts. - End-to-end training example showcasing the use of the minimal training build. - Add support for export model for inferencing in a minimal build.	2023-06-22 12:27:23 -07:00
dependabot[bot]	97f4484df9	Bump actions/setup-python from 3 to 4 (#16404 )	2023-06-22 18:12:11 +00:00
Yi Zhang	8e8840f1de	Enable Web CI on Linux (#16419 ) ### Description 1. Enable Web ci on Linux ### Motivation and Context 1. speed up web ci, the duration can be reduced from 160 minutes to 130 minutes, a time saving of 20% could be be achieved. The total computation time is 455 minutes now. Moved to Linux, it could be reduced to 336 minutes. 2. It's the first step to enable compilation cache for emscripten 3. per Yulong's request, build_web stages are still using windows pool ![image](https://github.com/microsoft/onnxruntime/assets/16190118/c9496408-74bd-45ea-b4ae-a4dd2a574d17) https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1038382&view=results	2023-06-22 15:42:58 +08:00
Pranav Sharma	a270d8407e	Allow saving of large models after optimization (github issue 12882) (#16440 ) ### Description Allow saving of large models after optimization. ### Motivation and Context Addresses https://github.com/microsoft/onnxruntime/issues/12882	2023-06-21 22:46:26 -07:00
Yufeng Li	89f8f20a61	fix protobuf copyfrom 2G limit (#16422 ) ### Description <!-- Describe your changes. --> protobuf CopyFrom doesn't work for model > 2GB for version 4.23. This PR removes the copy for Calibrator. Currently Calibrator copies the ModelProto to avoid changing it. The reason is that: quantize_static passes a ModelProto to Calibrator to calibrate quantitation parameters, and then use it for quantization. If calibrator changes the ModelProto, quantizaiton won't work. This PR changes quantize_static to pass in a model path to Calibrator instead of a ModelProto, and make Calibrator only take in model path as input, which is how it is used in most cases. This PR also remove the optimization from quantization. User needs to call pre-process to optimize the model	2023-06-21 20:45:11 -07:00
kunal-vaishnavi	4b69226fca	Fix input typo in Whisper export with beam search (#16439 ) ### Description This PR fixes a typo with assigning the `repetition_penalty` input in the Whisper export with beam search model. It is a follow-up to the [export stabilization PR](https://github.com/microsoft/onnxruntime/pull/16297). ### Motivation and Context The `repetition_penalty` input should be set to `repetition_penalty` instead of `input_features`.	2023-06-21 18:59:11 -07:00
Baiju Meswani	42489a8a24	Add ability to create ort format models from training offline utility (#16360 )	2023-06-21 18:51:43 -07:00
yf711	0ad0d6ebbf	Unblock Linux MultiGPU TensorRT CI (#16446 ) ### Description Revert docker base image to nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e ### Motivation and Context The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider tests (these three tests have higher error which are above the tolerance) That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor that make maxwell GPU generator higher error. CIs with T4 GPU are not affected.	2023-06-21 17:15:39 -07:00
Yulong Wang	de476c8075	[js/web] update webgl context creating (#16436 ) ### Description Modify the creating of webgl context. Previous behavior: STEP.1 - create canvas (document.createElement), if failed, goto step.2 else step.3 STEP.2 - create offscreenCanvas, if failed abort STEP.3 - use the canvas created in step.1 or 2 to create webgl context. if successful return context else abort Now bahavior: STEP.1 create offscreenCanvas, if failed goto step.3 STEP.2 use it to create webgl context. if successful, return context STEP.3 create canvas (document.createElement). if failed, abort STEP.4 use it to create webgl context. if successful, return context else abort Motivation: we found in some environment, normalCanvas.getContext() returns null but offscreenCanvas.getContext() returns the context object. and when offscreenCanvas is available it is good idea to always prefer to use it.	2023-06-21 17:10:26 -07:00
pallavides	3c2d52a995	DORT support for custom ops (#16392 ) ### Description DORT support for custom ops ### Motivation and Context Custom ops registered via custom_ops shared_library cannot be run using DORT atm. This PR enables it using: 1. registering custom_ops supported in DORT 2. plumbing down session_options from OrtBackend when creating the InferenceSession, that were used to register the custom_ops shared library using `session_options.register_custom_ops_library(shared_library)`	2023-06-21 15:58:05 -07:00
Yulong Wang	da532f3f5a	[js/webgpu] fix GPU to GPU memcpy (#16393 ) ### Description Fixes a GPU to GPU memory copy bug which causes #16267	2023-06-21 15:50:08 -07:00
Tianlei Wu	52e2bdf541	Add license header to CUDA related files (#16437 ) Add license header for files under core/providers/cuda or contrib_ops/cuda/	2023-06-21 13:31:43 -07:00
RandySheriffH	6e29e185f3	Clean AzureEP logics (#16367 ) Moving out AzureEP invokers out of core runtime. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-06-21 09:38:52 -07:00
Chi Lo	4e3cff60fd	CUDA graph support for TRT EP (#16081 ) CUDA EP already supports [CUDA graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs), also we observed some models can benefit from using CUDA graph with `trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP. The implementation is based on https://github.com/microsoft/onnxruntime/pull/9978 with the same [constraints](https://github.com/microsoft/onnxruntime/pull/9978) as below: - Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. - Usage of CUDA Graphs is limited to models where-in all the model ops (graph nodes) can be partitioned to the TRT EP. - The input/output types of models need to be tensors. - Shapes of inputs/outputs cannot change across inference calls. - IObinding is required.	2023-06-21 09:36:45 -07:00
Rachel Guo	961fa7274a	[NNAPI doc] add reducemean to supported op list (#16414 ) ### Description <!-- Describe your changes. --> As title. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-06-21 00:29:20 -07:00
Rachel Guo	b4b126ffb0	Set onnxruntime-c local pod path environment variable for react native e2e tests on ci (#16431 ) ### Description <!-- Describe your changes. --> Set onnxruntime-c local pod path environment variable for react native e2e tests on react-native-ci.yml ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previously the E2E test project is not properly consuming a local built onnxruntime-c version pod. https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816	2023-06-21 00:27:36 -07:00
yf711	9a80955c45	Add compute capacity to trtep engine cache file (#16356 ) ### Description Add "_smXX" to trtep engine cache file name, which "sm" stands for "Streaming Multiprocessor". > The GPU compute capability version is prefixed with "SM" because NVIDIA typically improves and updates the SM in each new GPU architecture. ### Motivation and Context Github issue: https://github.com/microsoft/onnxruntime/issues/15982 Reduce the chance of misusing incompatible engine cache, when user is switching GPU devices with different compute capacity * The prevention can't be 100%, as model size & GPU memory size could be another factor to make cache incompatible	2023-06-20 23:34:24 -07:00
zesongw	64b22cd00f	[WebNN EP] Support Where Op (#16380 ) ### Description Add Where Op for WebNN EP as ternary conditional operator. --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2023-06-20 16:45:40 -07:00
Justin Chu	e2381c42f2	Use M_PI to replace 3.14 constants (#16421 ) ### Description Use M_PI to replace 3.14 constants ### Motivation and Context Fixes #16413	2023-06-20 15:09:10 -07:00
Yuhong Guo	48e6186b1a	Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161 ) ### Description <!-- Describe your changes. --> 1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all CUDA UTs through this ut lib without affecting production lib `onnxruntime_providers_cuda`. 2. Move all test cases from `core/providers/cuda/test/` to `test/providers/cuda/`. These test cases are built into lib `onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all --gtest_filter="CUDA_EP_Unittest"`. Since the lib is only for test, we can use gtest macros in the test cases. Previous implementation do not support using gtest lib in the CUDA UT cases. 3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a bit. A new function `onnxruntime_add_object_library` is to build a object target. The 2 libs `onnxruntime_providers_cuda_ut` & `onnxruntime_providers_cuda` share most of the code, so the object files can be used in both libs, which helps reduce build time. Another function `config_cuda_provider_shared_module` is used to configure all 3 similar targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut). 4. Refactored the test to call `testing::InitGoogleTest` & `RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`. After this change, we can see all the cases running in `CUDA_EP_Unittest.All`: ![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> After https://github.com/microsoft/onnxruntime/pull/13016, there are still test files in test/providers/cuda/ that are not moved to core/providers/cuda/test/ and the test cases are disabled. This PR helps to clean the unfinished TODOs. Even through onnxruntime_shared_lib_test covers some test for CUDA provider. onnxruntime_shared_lib_test works like a coarse grain end-to-end test for CUDA provider. If CUDA unittest can run cases for a single component, this wound be helpful for CUDA developers. --------- Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>	2023-06-20 14:54:55 -07:00

1 2 3 4 5 ...

9052 commits