onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-24 02:47:54 +00:00

Author	SHA1	Message	Date
Chi Lo	b713855a98	Release 1.11.0 cherry pick round 1 (#10915 ) * Update to flatbuffers v2.0.0 (#10866) * Fix Reduced ops pipeline (#10861) * Fix a couple of issues with the python package tools (#10858) * Tweaks to the model utils * Add handling for a dim_value of -1 when replacing the entire input shape. This occurs in models exported from PaddlePaddle * make pytorch helpers accessible in package * make QDQ helpers accessible in package * Fix wrong percentile values returned during calibration (#10847) * Use numpy.percentile to get the lookup value. * Use 1.0 as float value rather than integer. * Add missing cdf parameter for `np.percentile`. * Use 100. instead of 1.0 * Remove print. * Update from @yufenglee * Add support for opset 16 to transpose optimizer. (#10841) * Add support for opset 16 to transpose optimizer. Only change required is for GridSample to be added to the layout sensitive ops. The existing handling for layout transpose works with that as the first input and first output are layout sensitive. Update the optimize to be able to return an error message if it fails. * Use separate build directories for full and mobile iOS packages. (#10835) * Address performance issue with abseil flat_hash_table. (#10819) When returning by value in a cross DLL call, the hash table even though containing all the entries that are originally there can not find at least some of them. Reverting to std::unordered_set pending further investigation. * Mark end of version 11 C API. (#10803) * Mark end of version 11 C API * Add static_assert * avoid using LocalFree on FormatMessageW buffer (#10796) * remove local free * Remove local free from onnxruntime * don't allocate * Change to use constexpr to satisfy CPU build warning * Integrate C-API tests into Pipelines for release packages (#10794) * add c-api test for package * fix bug for running c-api test for package * refine run application script * remove redundant code * include CUDA test * Remove testing CUDA EP temporarily * fix bug * Code refactor * try to fix YAML bug * try to fix YAML bug * try to fix YAML bug * fix bug for multiple directories in Pipelines * fix bug * add comments and fix bug * Update c-api-noopenmp-packaging-pipelines.yml * Remove failOnStandardError flag in Pipelines * Detect runtime CUDA JIT and warn the user (#10781) * Use cudaMalloc vs cudaDeviceSynchronize and show the total time * Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option. * Add multithreading test and put a lock on nvinfer1::createInferRuntime() for TRT EP (#10714) * Add multithread unit test and put lock on library call * update code * remove debug code * add comment * add one session multi-threads inference * Put lock for build engine all the time * Update naming and comment * remove unnecessary lock * Revert "remove unnecessary lock" This reverts commit 9c2317b1d2273dec0ebdeb52160bc757839e5edc. * Fix handling of nodes inserted by NHWC transformer. (#10904) (#10925) * Revert "Upsample support NHWC (#10554)" (#10917) This reverts commit `bd08f11a58`. Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> * [python API] Change raise import error when `C:\Windows\System32\vcruntime140_1.dll` is not found to warning (#10927) * remove throw if C:\\Windows\\System32\\vcruntime140_1.dll cannot be found * Add comments and update warning message * adding back accidentally removed line Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> * [js] Create npm packaging pipeline (#10886) * create npm packaging pipeline * fix indentations * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * Update npm-packaging-pipeline.yml for Azure Pipelines * react-native-ci as a template * fix typos * fix template paths * add a depencendy * change a stage name * set different artifact name for each package * fix typo * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines Set a build Id for node npm package as a parameter * Update npm-packaging-pipeline.yml for Azure Pipelines * Follow up update for python API checking if `vcruntime140_1.dll` is available (#10927) (#10933) Co-authored-by: Hariharan Seshadri <hasesh@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Pranav Sharma <prs@microsoft.com> Co-authored-by: Ryan Lai <rylai@microsoft.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: Yi-Hong Lyu <yilyu@microsoft.com> Co-authored-by: Yufeng Li <liyufeng1987@gmail.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com> Co-authored-by: Sunghoon <35605090+hanbitmyths@users.noreply.github.com>	2022-03-18 11:16:30 -07:00
Dmitri Smirnov	58521fb822	Make training CUDA kernels to adhere established code structure patterns (#10735 ) Current training optimizer kernels include CPU headers that affects changes that we can make in the CPU code with C++14 compiler and other refactoring efforts. Rearrange the kernel according to the established patterns and do not include headers that are not needed.	2022-03-09 09:06:45 -08:00
liqun Fu	da885a72e8	update with onnx 1.11 release (#10441 )	2022-03-07 21:10:55 -08:00
PeixuanZuo	55af7a96a7	update the amd ci pipeline (#10723 ) * [TEST] test to get amd pipeline information * [FIX] lower the threshold * [UPDATE] add retry task * [UPDATE] add retry task * [ERROR] error to occur retry * [FIX] error * [UPDATE] update retryCountOnTaskFailure to 1 time * [UPDATE] add showmeminfo	2022-03-07 18:39:42 +08:00
Abhishek Kulkarni	c2c85dd6b1	Add an option to export ONNX graphs in ORTModule tests (#10579 ) Co-authored-by: Abhishek Kulkarni <abkulkarni@microsoft.com>	2022-03-03 16:56:19 -08:00
Hubert Lu	fe8d867efa	Optimize BinaryElementWise and BiasGeluGrad kernels for AMD (#10594 ) * Optimize elementwise and biasgelugrad kernels for AMD * Clean up for BiasGeluGradDxKernel	2022-03-03 08:07:15 -08:00
Baiju Meswani	f9b6eef05f	orttraining packaging pipeline for rocm 5.0.1 (#10725 )	2022-03-02 12:32:14 -08:00
Vincent Wang	9a22b5d253	Strided Tensor Support for Eager Mode (#10578 ) * strided tensor for eager mode * fix build and resolve comments * fix win x86 build	2022-03-01 14:25:31 +08:00
Thiago Crepaldi	e788cc2a23	Convert com.microsoft::ATen into org.pytorch.aten::ATen onnx op (#10060 ) Signed-off-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-02-28 14:14:45 -05:00
harshithapv	037f08f1ff	Fix unsqueeze for opset 13 for ReduceMean Grad (#10668 ) * fix unsqueeze for opset 13 for reducemean grad * fix input for reduce mean	2022-02-28 09:55:52 -08:00
David Fan	617474e298	Stop gradient edges for aten::argmax (#10650 )	2022-02-24 21:14:53 -08:00
Dmitri Smirnov	2679711bee	Refactor transformers and other code to reduce memory allocation calls (#10523 ) Work on minimizing memory management calls by reducing number of allocations and copies. Replace std::unordered_set to InlinedHashSet and add usage of InlinedVector. Employ std::move() to minimize copying and memory allocations. Remove copying of the const shared data into each of the PropagateCast transformer instances. Move inlined_containers.h header to include/common Adjust AsSpan imlementation for C++ < 17	2022-02-24 16:17:14 -08:00
Tang, Cheng	7660eeef3e	fix ortmodule's output device info when it runs on ort device (#10616 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-02-24 10:22:55 -08:00
Justin D. Harris	742694f679	[python] [orttraining] Add utility to export a graph to compute gradients (#8125 )	2022-02-18 14:00:49 -08:00
Scott McKay	df841ee87d	Fix incorrect type constraint registration for operator kernels. (#10489 ) * Fix incorrect type constraint registration for RoiAlign. This led to the input type not actually being checked when matching a kernel as the invalid constraint name is treated as a missing optional input. * fix missing dependency for the unit test exe. Whilst it doesn't link against the CUDA providers lib, without the dependency VS doesn't know it needs to rebuild the library if there are changes. * Add check for invalid type constraints. * Fix invalid registrations for other kernels. * Add hash replacement logic to provide backwards compatibility in ORT format models when the registration is fixed. * Add tests	2022-02-18 16:55:32 +10:00
Pallavi Deshmukh	ccd7a2d840	Fix build failure when using clang compiler	2022-02-16 17:52:45 -08:00
ytaous	4f76c38686	Revert "Reduce max gradient (#9859 )" (#10574 ) This reverts commit `7443edb0bf`.	2022-02-16 16:02:30 -08:00
Anh Nguyen	7443edb0bf	Reduce max gradient (#9859 ) * ReduceMax gradient builder * Update gradient_builder.cc * Add CI fix * Remove whitepace * Update gradient_builder.cc * Update gradient_ops_test.cc * Fix Window CI tests Co-authored-by: root <tuananhnguyen7198@gmail.com>	2022-02-15 22:38:19 -08:00
Anh Nguyen	0c3e88944d	Fix create ort value hardcoded memory info to CPU (#10510 ) * Fix create ort value hardcoded memory info to CPU * Remove unneeded check * Remove unneeded header * Remove unneeded header * Update ort_ops.cpp * Update ort_ops.cpp * Update ort_ops.cpp * Update ort_ops.cpp Co-authored-by: root <root@QTM-ANHNGUYEN-1.northamerica.corp.microsoft.com>	2022-02-15 10:40:44 -08:00
Baiju Meswani	7691e7ed12	Introduce load balancing dataset samplers (#10163 )	2022-02-14 13:46:14 -08:00
ytaous	4e2a974090	[ROCm] UTs and code clean up (#10511 ) * Fix UT * UT * UTs * enable ROCm UT * fix build attempt * minor * fix UT * fix UT * fix UTs Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>	2022-02-11 08:23:25 -08:00
Edward Chen	f92e47e95b	Remove onnxruntime_util dependency on onnxruntime_framework (#10512 ) There's a circular dependency between onnxruntime_util and onnxruntime_framework. Remove onnxruntime_util's dependency on onnxruntime_framework.	2022-02-10 19:17:08 -08:00
Hubert Lu	c9fbd0b15a	Optimize cuComputePartGradGammaBeta kernel for MI100 (#10475 ) * Optimize cuComputePartGradGammaBeta kernel for MI100 Co-authored-by: root <root@gb-sjc2-10.local.lan> Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2022-02-09 12:51:06 -08:00
ashbhandare	7e5d68eea6	gradient and test (#10455 ) Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-02-08 10:18:22 -08:00
ytaous	435e14d60a	[ROCm] BFloat16 support (#10465 ) * bf16 support * minor clean up * UTs * fix build * UTs * UTs * merge commit 6b5504c * minor * ROCm code cleanup * fix build * fix build * minor Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>	2022-02-07 22:55:15 -08:00
ytaous	63198a6566	[ROCm] BFloat16 support (#10447 ) * bf16 support * bf16 support * UTs * fix build * fix UTs Co-authored-by: root <root@GCRAMDRR1-MI100-087.redmond.corp.microsoft.com>	2022-02-03 11:31:14 -08:00
Changming Sun	ec4362f8f3	Enable more static analysis warnings and enable the analyzer for training cpu (#10176 )	2022-01-27 11:17:20 -08:00
ashbhandare	cf13b9dd5e	Symbolic export for numpy_T (#10390 ) * Export numpy_T as onnx transpose * further fixes, test Co-authored-by: Aishwarya Bhandare <aibhanda@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-01-26 14:14:42 -08:00
Tang, Cheng	9aa51379c9	[eager mode]: add configuration for ort virtual device count (#10346 ) * add configuration for ort virtual device count * fix build break * fix ci build break Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-01-25 16:15:54 -08:00
pallavides	790c3be7e9	Fix Reshape issue when shape size is -1 (#10356 ) * Fix Reshape issue (in_place) when shape size is -1	2022-01-24 19:30:52 -08:00
Dmitri Smirnov	7e092a7e3f	Reduce number of memory allocations based on a customer profiling case (#10193 ) Add abseil and inlined containers typedefs Introduce TensorShapeVector for shape building. Use gsl::span<const T> to make interfaces accept different types of vector like args. Introduce InineShapeVectorT for shape capacity typed instantiations Refactor cuda slice along with provider shared interfaces Refactor Concat, Conv, Pad Build with Conv Einsum and ConvTranspose refactored. Remove TesnorShape::GetDimsAsVector() Refactor SliceIterator and SliceIteratorBase Refactor broadcast Refactor Pads for twice as long Remove memory planner intermediate shapes vector Refactor orttraining Fix passing TenshroShapeVector to tests Remove abseil copy and submodule, use FetchContent_Declare/Fetch Path with separate command Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.	2022-01-24 10:40:46 -08:00
Baiju Meswani	141606534c	Add support for FusedAdam to be mathematically equivalent to pytorch/AdamW (#10106 )	2022-01-21 13:37:59 -08:00
Cheng Tang	13e277525c	fix whitelist	2022-01-21 13:30:53 -08:00
Tang, Cheng	2dcb69685e	support type promotion in binary poerators in eager mode (#10285 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-01-20 10:06:09 -08:00
Baiju Meswani	c67594694c	Add ability to set onnx opset version from json config (#10223 )	2022-01-20 09:10:19 -08:00
Abhishek Jindal	4aa7cee0d8	Abjindal/clean eager backend (#10055 ) * clearing map for eager mode backends * clearing map for eager mode backends manager * making OrtBackendsManager an extern variable and trying to delete it * cleaning backends manager when the python interpret exits * adding ifdef for eager mode code * disabling warning for pybind state file * disabling warning for python module file * running clang auto format and reducing redundancy * remove new line * moving declaration to a new header file * adding the header file for eager mode for python module * removing source files for eager mode * add source file for python module in eager mode * Update orttraining/orttraining/python/orttraining_python_module_eager.h Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-01-19 14:20:09 -08:00
jingyanwangms	a656c55a75	Add _force_exportable_set and pass debug_options (#10282 ) * Add _force_exportable_set and pass debug_options * Update orttraining/orttraining/python/training/ortmodule/experimental/hierarchical_ortmodule/_hierarchical_ortmodule.py Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> * nit fix * Update orttraining/orttraining/python/training/ortmodule/experimental/hierarchical_ortmodule/_hierarchical_ortmodule.py Co-authored-by: Wei-Sheng Chin <wschin@outlook.com> Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>	2022-01-19 10:26:27 -08:00
David Fan	7b14c70cfe	[ortmodule] Ensure contiguous tensor into forward pass (#10315 )	2022-01-18 22:06:37 -08:00
pengwa	e365ad7f3a	fix deadlock in model.train mode forward run only (#9960 ) * fix deadlock in model.train model forward run only * fix tests * clear the grad_fns before every forward run * add clean up on exit * fix * refine code comments	2022-01-14 13:53:29 -08:00
Vincent Wang	44e2db9397	CUDA BFloat16 Refactor (#10085 )	2022-01-14 19:38:56 +08:00
Vincent Wang	3ea7fb0f9f	fix mem leak (#10272 )	2022-01-14 14:54:19 +08:00
ashari4	aff96ce081	remove hardcoded type (#10251 )	2022-01-12 10:00:34 -08:00
PeixuanZuo	7d93498e0e	[FIX] register softmaxgrad_13/logsoftmaxgrad_13 for rocm (#10177 ) * [FIX] register softmaxgrad_13/logsoftmaxgrad_13 for rocm * [FIX] update softmaxgrad_13/logsoftmaxgrad_13 implementation for rocm	2022-01-10 11:33:46 +08:00
Abhishek Jindal	4ac3277743	adding definition of concat operator for mapping it to onnx (#10062 ) * adding definition of concat operator for mapping it to onnx * adding the opgen generator file to include tensorlist type for eager mode	2022-01-06 14:56:35 -08:00
ashari4	4ab891999a	fix hardcoded type (#10205 )	2022-01-06 09:28:22 -08:00
ashari4	7b5464ed7b	aten add_ op supports bf16 (#10084 ) * hand implemented add_	2022-01-05 09:33:28 -08:00
Tang, Cheng	97659495d9	fix aten view op (#10050 ) * fix aten view op * add test case * fix signature * fix the build Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-01-04 08:29:30 -08:00
Edward Chen	3bc91c2151	Move reduced ops files into build directory (#10030 ) In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.	2021-12-28 19:04:20 -08:00
Vincent Wang	f780f06240	ConcatGrad for OpSet13 (#10109 )	2021-12-24 10:02:52 +08:00
satyajandhyala	bd4fb4c5da	Coding style fix. (#10080 )	2021-12-18 12:05:48 -08:00

1 2 3 4 5 ...

905 commits