onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-04 23:59:56 +00:00

Author	SHA1	Message	Date
ashbhandare	2aa89989c4	Not-where fusion (#7182 ) * Not-where fusion * Change to rewrite rule * Add to inference transforms * Support numtiple where consumers * review comments	2021-04-06 16:12:26 -07:00
Yufeng Li	790fc11e60	QDQ: type conversion and more ops support (#7243 ) * QDQ: add int8_t to uint8_t conversion and Relu/AveragePool support	2021-04-06 15:30:31 -07:00
raviskolli	5d759e182b	Allocate external Rocm allocator via PyBind (#7148 ) * Enabled rocm support for graph transformations * Support for external Hip allocator * Added const_cast to reinterpret_cast to fix compiler issue * Another crack at fixing the compile error * More compilation fixes * Added compilation flags to load_inline extension * Added ROCM, ROCM_PINNED constants * Changes to address PR comments * Changed gpu identifier from ROCM to CUDA * Added HIP compilation flag for torch inline functions * Fixed a typo in header allocator string formatting * Fix for runtime error with external_cuda_allocator * Removed cuda/rocm specific code paths for allocators * More name changes to generic gpu from rocm/cuda * Removed duplicate allocator creation * Rename cuda_external_ config options as gpu_external_ * Rename hip_mem_limit to gpu_mem_limit * Rename cuda_mem_limit to gpu_mem_limit	2021-04-06 15:23:51 -07:00
Derek Murray	6308e709cc	Update opset for other training graphs to 12. (#7259 ) Co-authored-by: Derek Murray <demurra@microsoft.com>	2021-04-06 13:02:59 -07:00
G. Ramalingam	a9ff4c29e5	Add function body to GeluGrad schema (#7190 ) * Add GeluGrad function definition * complete gelugrad function definition * add opset to function definition	2021-04-06 12:40:59 -07:00
Zhang Lei	dbcfc4bee6	Add mlas_bench tools. Starting with sconv bench and sgemm bench. (#7139 ) * Add mlas_bench tools. Starting with sconv bench and sgemm bench. * Some update with build related.	2021-04-06 10:30:18 -07:00
ashari4	56b22c1c6b	Fix assert that the tensor's device type is 'cpu' #7248	2021-04-06 09:08:32 -07:00
ashbhandare	e9ffcfa247	Add cuda kernels for GreaterOrEqual, LessOrEqual, Where; modify Clip to avoid memcpy (#7187 ) * Where and Clip cuda kernel support * GreaterOrEqual and LessOrEqual cuda kernels * Clip input GPU mem * review comments * Add CPU kernel as well * review comment * Add kernel def hash for new op kernels * Fix CI	2021-04-06 09:04:38 -07:00
Derek Murray	c85657cfd7	Update test_training_model.onnx to opset 12. (#7251 ) Co-authored-by: Derek Murray <demurra@microsoft.com>	2021-04-06 07:49:58 -07:00
Tracy Sharpe	a9dbb511fb	MLAS: fix qgemm bus error with Android + ARM32 (#7250 )	2021-04-05 22:46:04 -07:00
Olivia Jain	fb40602ea2	Mem trt (#6868 ) * adding trt comparison and memory consumption * creating separate docker file	2021-04-05 22:16:12 -07:00
Changming Sun	2fcd69d644	Cleanup build.py (#7245 )	2021-04-05 18:49:29 -07:00
Changming Sun	5bd192c439	Update ContribOperators.md (#7246 )	2021-04-05 17:11:33 -07:00
Pranav Prakash	3b16afc0db	Make dW optional for convgrad (#7083 )	2021-04-05 17:05:20 -07:00
Guoyu Wang	c5973fbbac	Update the build script for Android AAR package (#7229 ) * Update the build script for Android AAR package * Address CR comments	2021-04-05 16:37:22 -07:00
Suffian Khan	9f14af9809	Add BERT-L perf regression test on MI100 and re-enable batch size test (#7240 ) * restore bs test and add perf test * update perf number and fix path to results	2021-04-05 15:51:52 -07:00
Ryan Lai	10102c09b6	Add better model test error messaging (#7239 )	2021-04-05 14:59:19 -07:00
Ashwini Khade	e7c5dcd572	Fix Zip-Nuget-Java Packaging Pipeline (#7208 ) * Ignore test failures due to opset support * skip identity sequence test * plus fixes	2021-04-05 10:58:13 -07:00
Chun-Wei Chen	3ee9b0ec4d	Add detailed assertion error message (#7232 )	2021-04-05 10:05:40 -07:00
Marek Šuppa	008065aab1	Update README.md (#7043 ) * Fix the precision type (switch from nonexistent `int32` to `fp32`).	2021-04-05 10:03:14 -07:00
ashbhandare	2b8513539e	Div mul fusion (#7183 ) * Div mul fusion * Change to rewrite rule * Add to inference transformers	2021-04-05 09:35:30 -07:00
Weixing Zhang	74ee24cf7f	rename cuda_mem_limit and hip_mem_limit to gpu_mem_limit for both CUDA EP and ROCm EP (#7226 ) With this change, differentiating CUDA EP and ROCm EP is not needed in training script when mem_limit option needs to be set. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-05 09:04:04 -07:00
baijumeswani	68b12a6179	Support for saving and loading pytorch compatible state dictionaries (#7220 ) * Override methods on torch.nn.Module to get direct access to the methods on the original module.	2021-04-05 03:40:41 -07:00
Yufeng Li	8d737f9770	handle optional input in quant topo sort (#7223 )	2021-04-02 20:42:48 -07:00
Weixing Zhang	59b57d8322	HSA_NO_SCRATCH_RECLAIM and RCCL_ALLTOALL_KERNEL_DISABLE are not needed for ROCm 4.1 (#7224 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-02 18:19:11 -07:00
Ahmad Zakaria	ba5f056b09	move trt_profile to TensorrtFuncState and reuse it (#7195 ) use unordered_set instead of unordered_map to keep track of dynamic shape tensors with shape updates fix: insert input_name in the set of input_names move trt_profile to TensorrtFuncState and reuse it	2021-04-02 17:09:03 -07:00
Weixing Zhang	ef88dc912c	enable more unit tests for ROCM EP (#7222 )	2021-04-02 15:57:08 -07:00
Guoyu Wang	afbbeaa30a	[NNAPI/CoreML EP] Add Onnx opset 14 support (#7211 ) * Add opset 14 support for nnapi/coreml ep * Address CR comments	2021-04-02 13:18:47 -07:00
Sherlock	a98c2ebb8c	Enable saving optimized models in OrtModule (#7214 ) * Enable saving optimized models in OrtModule Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-04-02 12:37:05 -07:00
RandySheriffH	ebde320950	Add cupti path for python gpu packaging pipeline (#7200 ) * add cupti dll path for py3.8 * correct path * add prints * replace path join * add all path * restore pipeline * format * expand path only for python 38&39 * add all cupti path Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-04-02 12:12:46 -07:00
Weixing Zhang	2d352056cf	Support SkipLayerNorm for ROCm EP (#7210 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-02 09:03:30 -07:00
Weixing Zhang	a3f17c8b0d	update lamb and GatherGrad kernel for ROCm EP (#7184 ) With ROCm4.1, the CUDA implementation of Lamb and GatherGrad can be utilized for ROCm EP.	2021-04-02 09:02:49 -07:00
Weixing Zhang	17f91ff410	remove un-needed header file. (#7193 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2021-04-01 21:05:58 -07:00
Ryan Hill	5a6d477625	Make IDataTransfer be directly shared with shared providers (#7215 )	2021-04-01 20:39:16 -07:00
Edward Chen	0ebeaf529d	Check kernel def hashes (#7120 ) Add unit test for verifying kernel def hashes. Add way to add new types to kernel definition without changing hash.	2021-04-01 17:42:58 -07:00
ashbhandare	15c67ddbf0	Make output 1 of ConcatTraining Optional and place on CPU (#7199 ) * Optional input 1 on CPU ConcatTraining * Rename output_1	2021-04-01 16:05:17 -07:00
Jesse Benson	4543459984	MIOpen supports MIOPEN_REDUCE_TENSOR_AVG now.	2021-04-01 16:00:34 -07:00
Yufeng Li	34a8b22186	disable prepacking in training (#7201 ) * disable prepacking in training	2021-04-01 14:03:47 -07:00
sfatimar	52bcef4d4f	Openvino ep 2021.3 (#7180 ) * Integrate openvino-ep-2021.3 * operators type * changed the myriad as it is case sensitive * logging information for openvino-ep-2021.3 * Unit test fix * Resize operator added for myriad * Fixed python tests for CPU and GPU * data commit for loop tile and gatherelements failure * adding checks for Where * fixing gatherelements and loop tests * disabling instance normalization test for now as there seems to be a myriad bug, putting loop in ops supported only because all the tests fail * gather elements op test taking care of warning message * condition needs to be an intializers * Disabled python test for Myriad * Disable compilation warning for MSVC windows compiler * softmax_test, threedimaxis0 and 1 test give accuracy mismatch tensoroptest disables test gives accuracy mismatch gather test gives accuracy mismatch * Updated with ov version 2021.3 * Updated with ov version 2021.3 * Updated README * Disabling python tests for cpu * Disabling python tests with accuracy mismatch on cpu * Added fix for Linux CI Pipeline failure -> Disabled tests that were throwing segfault Co-authored-by: sfatimar <sahar.fatima@intel/com> Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: Aravind <aravindx.gunda@intel.com>	2021-04-01 11:28:54 -07:00
baijumeswani	249a2c14ef	Pin version of pytorch to 1.8.1 for ORTModule CI pipeline (#7167 ) * Pin version of pytorch to 1.8.1 for ORTModule CI pipeline * Use pytorch-lightning stable version 1.2.5 * Revert to cuda 10.1	2021-04-01 09:37:47 -07:00
George Wu	fc6ac5bfac	dnnl fixes (#7202 )	2021-04-01 07:34:18 -07:00
Scott McKay	329fd03bb4	Add int32_t as required type to some operators (#7192 ) * Updates to some operators to always support int32 and int64 based on testing of Android package build config with a minimal build. If an operator can be used for shape manipulation (int64) it is frequently used for indices manipulation (int32), so we enable both types for that set of ops. - e.g. BERT models take indices as input - Scatter/Gather ops utilize indices Misc. fix to python bindings to exclude call that fails in a minimal build.	2021-04-01 19:32:34 +10:00
Edward Chen	04679e31ab	Specify CUDA compute capability 7.5 in Linux GPU build (#7203 ) Recently a build agent pool was changed to use T4 GPUs (CUDA compute capability 7.5). Updating some CUDA build options to accommodate that.	2021-03-31 18:51:44 -07:00
Hariharan Seshadri	0e0dd50e39	Support int32 type for TopK CPU op (#7089 )	2021-03-31 18:08:21 -07:00
Xavier Dupré	b370ddbf5e	Removes unnecessary transpose in operator Einsum (#7141 ) * remove one unnecessary transpose * add more unit test	2021-03-31 09:59:08 +02:00
Guoyu Wang	d500c5952b	Add Android AAR packaging script for ORT-Mobile (#7138 ) * Add Android aar packaging script for ORT-Mobile * Address CR comments	2021-03-30 18:42:18 -07:00
Yulong Wang	0fdef1bf47	[Node.js binding] upgrade y18n to v4.0.1 (#7185 )	2021-03-30 16:09:04 -07:00
Negin Raoof	45cb0cae8c	Adding TorchEmbedding contrib op (#7136 ) * Adding TorchEmbedding contrib op * Update contrib_defs.cc * Shape fix * Update shape_inference_test_helper.h * Fix typo * Fix test * Fix for test code * Merge * Fix CI * Fix for CI * Fix CI no-contrib	2021-03-30 14:33:25 -07:00
liqunfu	e545604499	. (#7165 )	2021-03-30 13:58:30 -07:00
RandySheriffH	d880578537	Exclude cpuid.h from Mac non x86 arch (#7166 ) * add ifdef to exclude inclusion from non x86 arch * exclude calling of __cpuid_count Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2021-03-30 11:50:42 -07:00

1 2 3 4 5 ...

4585 commits