onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-06 04:28:32 +00:00

Author	SHA1	Message	Date
Changming Sun	370d194db7	Add a docker file for CI build CUDA 10.2 (#5065 )	2020-09-04 16:28:45 -07:00
Zhang Lei	ec88f14a7a	Implement QLinearMul in mlas (#4593 ) * Implement QLinearMul	2020-09-04 15:02:19 -07:00
Scott McKay	b5c2932ae8	Last major set of ORT format model changes (#5056 ) * Add minimal build option to build.py Group some of the build settings so binary size reduction options are all together Make some cmake variable naming more consistent Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used. Add initial doco and ONNX to ORT model conversion script Misc cleanups of minimal build breaks.	2020-09-05 07:59:01 +10:00
Du Li	6134994db9	Parallelizing elementwise kernels (#4577 ) * Parallelizing unary elementarywise ops. * Parallelizing binary elementwise ops. * Accommodating PR comments.	2020-09-04 14:45:43 -07:00
Xiang Zhang	0dad79b495	Add SetLanguageProjection C Api and use it in four projections (#5023 ) * Add SetLanguageProjection C Api and use it in four projections * static cast enum languageprojection to uint32_t * resolve comments * fix typo and line added unintentionally * revert unecessary change * reorder c# api * add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis	2020-09-04 14:26:39 -07:00
Bowen Bao	6dd4af3936	Fix initializer name only when wrapper is applied (#4920 ) * Fix initializer name only when wrapper is applied * fix inspect import	2020-09-04 12:08:07 -07:00
Ryan Hill	d792af776d	Remove Cuda dependency from TensorRT shared provider (#5014 )	2020-09-04 11:35:02 -07:00
Zhang Lei	78bb53381b	optimize resize op for NN mode for some fasterrcnn model (#4825 ) Also Add test case for 5-D. Disable 5d test for Cuda Provider.	2020-09-04 10:35:36 -07:00
Zhang Lei	8289981f0e	Implement QLinearSigmoid. (#5015 ) Refactor QLinearLeakyRelu using QLinearLookupBase. Paralleling the lookup phase.	2020-09-04 09:37:17 -07:00
Thiago Crepaldi	0fc9c504fe	Re-enable CI tests for the new PyTorch frontend (#5017 ) This PR includes: * Re-enable CI tests for new PyTorch frontend * Re-enable fp16 and adjust tolerances for number matching	2020-09-04 09:36:24 -07:00
Andrews548	bd215b79a2	ACL v20.02 (#4981 ) * Add ACL version 20.02 * fix loging typo * check depthwise operation based on group param * Generate ArmNN runtime inside class constructor * Update to the latest ONNX operation set * Update BUILD.md Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>	2020-09-03 20:44:27 -07:00
Bowen Bao	73456f10cd	Fix contrib ops unregister to match pytorch behavior (#5052 )	2020-09-03 16:32:42 -07:00
Nat Kershaw (MSFT)	d7502eff8f	Add nodejs samples README (#5005 )	2020-09-03 15:58:44 -07:00
xkszltl	4b9b5b6146	Imported protoc cannot have compile options. (#5030 )	2020-09-03 15:20:00 -07:00
Sergii Dymchenko	d7984fe6ba	Add packages from training docker to cgmanifest. (#5033 )	2020-09-03 13:11:41 -07:00
liqunfu	bb13b52291	to allow parallel training with mpi4py (#4942 ) to allow parallel training with mpi4py Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-03 12:47:12 -07:00
Thiago Crepaldi	9388d49c0d	Add warning to non pickable models (#5037 )	2020-09-03 11:53:56 -07:00
Thiago Crepaldi	9d1bdef195	Update CODEOWNERS and minor docstring fix (#5002 ) This PR includes: * Previous CODEOWNERS was encompassing more files than just training files * Polynomial optimizer config is missing part of its docstring	2020-09-03 11:52:38 -07:00
Suffian Khan	546965c2da	Add deterministic path for AllReduceL2 (used to compute gradient norm) (#5027 ) * add deterministic path for reduce l2 * add unit tests * memset zero size off by one * eliminate windows warning as error Co-authored-by: suffian khan <sukha@OrtTrainingDev1.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-03 10:02:41 -07:00
Ashwini Khade	9ba2cfb71b	fix py packaging pipeline (#5038 ) * add test skip logic when opset > allowed opset * fix attribute error * plus fix	2020-09-03 09:32:10 -07:00
Bowen Bao	22ba266bd6	Add flag to _internal_use to control export of contrib ops in ort trainer (#4968 )	2020-09-03 09:11:47 -07:00
Scott McKay	28445c88f9	Changes to enable saving and loading an ORT format model (#4995 ) * Changes to enable saving and loading an ORT format model via the public APIs. Cleanup session.py to try and make slightly more understandable. More refactoring is needed here. Couple of bug fixes * Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info. * Address PR comments - tweak SessionOptions config to avoid double lookup - merge duplicated functionality in python binding around registering an EP with optional options Fix a couple of build issues. * Update C API to be consistent with python API - only load model in InferenceSession ctor if required - support loading ORT model in minimal build * Fix nodejs test. We get an invalid path error from LoadInterOp first now * Another attempt at fixing nodejs test. Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent. The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed. * Fix couple of build issues. * Disable test temporarily so PR can be checked in. Will fix in separate PR that adds final pieces for minimal build as the test is required there. * Give up on nodejs test and make the match simpler. Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion. * Fix call to Session.__init__ in TrainingSession. Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.	2020-09-03 09:10:48 -07:00
Tim Harris	bbb9d92a5f	Remove SchedulingParams variants of ThreadPool::TryParallelFor (#5050 )	2020-09-03 09:04:31 -07:00
gwang-msft	fde7a2c848	Temporarily switch SafeInt to a fork for an option to disable exceptions (#5041 ) * Removed submodule * Add safeint fork	2020-09-02 23:21:39 -07:00
Ryan Hill	e0d1cf19a6	Fix allocator bug (#5042 )	2020-09-02 21:21:18 -07:00
Weixing Zhang	3268717615	Enable TF32 for training on A100 (#4914 ) * enable TF32 for training on A100 it can be disabled by env: NVIDIA_TF32_OVERRIDE = 0	2020-09-02 19:21:54 -07:00
Hariharan Seshadri	a9db287bd7	Return windows error code for library loading and unloading failure (#5036 )	2020-09-02 18:07:36 -07:00
Ye Wang	b4e9e98cee	Add more huggingface models in benchmark tools (#4986 ) * checkin more huggingface models * review comments * review comments	2020-09-02 16:41:58 -07:00
Sherlock	a935731bd3	Neg Gradient (#5022 ) Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-09-02 15:54:17 -07:00
Dudeldu	4a0f6595eb	Enable metadata and signature changes in graph transformers (#4783 ) After applying all the graph transformations the metadata and signature could have changes (e.g.: new outputs got added, or the outputs/inputs got renamed). Therefore the local copies of metadata and signature, that InferenceSession administrated for faster lookup, has to be updated. For this the `SaveModelMetadata`, that now has to be idempotent, should be called after resolving the transformed graph	2020-09-02 15:46:36 -07:00
Hariharan Seshadri	4fd4b74149	Change session option values if they don't work with EPs being registered for the session (#4991 )	2020-09-02 15:13:23 -07:00
Nat Kershaw (MSFT)	8a03b6e5c7	Render Operator documentation as compliant markdown (#3658 )	2020-09-02 15:07:50 -07:00
Dmitri Smirnov	e1901a7e10	Improve performance of CUDA implementations for GatherElements and Greater, Equal and Less (#4989 ) Make GatherElements kernel process 16 items each. unroll the constant loop. Quit loops early for zero dividend. Optimize Binary CompareFunction and remove Impl_Cast invocation.	2020-09-02 10:17:39 -07:00
Changming Sun	d5d5e37e76	Build system enhancements (#5012 ) 1. Add a docker file for CUDA11 2. Support setting CUDA_ARCHITECTURES from command line.	2020-09-02 10:13:26 -07:00
Thiago Crepaldi	aabed34d5c	Fix checkpoint API and improve loss scaler handling (#4950 ) This PR also includes: * More LossScaler tests * Minor LossScaler improvement * Check model after extra post processing * Improve basic training tests to include all optimizers * Set rtol=1e-7 tolerance for Legacy vs Experimental frontend API tests * Increase number of training tests for Legacy vs Experimental tests * Minor refactoring on existing tests * Fix Checkpoint API for Gradient Accumulation / fp16 scenarios	2020-09-02 09:38:02 -07:00
Thiago Crepaldi	eebc2cccce	Fix fetches when eval_step's input is a subset of train_step's input (#4966 ) This PR also includes MNIST sample using the new forntend as a sample	2020-09-02 08:57:44 -07:00
Vincent Wang	a6e219deff	Pass Model Path to TensorProtoToMLValue from Constant Folding for External Inputs (#5000 ) * Don't constant fold external inputs. * pass model_path to TensorProtoToMLValue Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-09-02 21:54:40 +08:00
gwang-msft	5651c23271	Fix for Android ORT android initOsArch exception (#5006 )	2020-09-02 00:48:06 -07:00
xkszltl	44b3accb74	Missing header for `std::once_flag` and `std::call_once`. (#5010 )	2020-09-02 00:46:59 -07:00
Changming Sun	9902b57090	Fix a warning in global_thread_pools/test_inference.cc (#4987 ) * Fix a warning in global_thread_pools/test_inference.cc	2020-09-01 20:45:22 -07:00
Thiago Crepaldi	f38f2d5b54	Port #4920 into the new pytorch frontend (#4965 )	2020-09-01 19:00:49 -07:00
Hariharan Seshadri	d30dd41c0e	Remove public default ctor in PyInferenceSession and replace it with a protected ctor (#4990 )	2020-09-01 17:10:36 -07:00
Ryan Lai	c6a3620ba8	Remove evaluate telemetry due to redundancy (#4996 ) * Remove evaluate start / stop from telemetry * Remove eval telemetry * remove check for evaluate time delay * add comment * remove const Co-authored-by: Ryan Lai <ryalai96@gamil.com>	2020-09-01 17:02:00 -07:00
Tianlei Wu	a47cae031f	Use raw attention mask in BERT related fusions (#4889 ) * Use raw attention mask in fusion * update python scripts to use raw attention mask by default	2020-09-01 13:22:20 -07:00
liqunfu	d79af260bb	Liqun/new api orttraining test transformers (#4982 ) * matching transformer model test with Lamb * increase epochs * use atol 1e-6 to pass full precision test	2020-09-01 13:11:06 -07:00
gwang-msft	64237d999c	Add Cmake config for onnxruntime_NO_EXCEPTIONS (#4975 ) * additional noexception setting, added compile options * more no exception changes * addressed PR comments * Fix build issue when MSVC static library is used. * Clarify comment * add fatal message for onnxruntime_NO_EXCEPTIONS enabled without onnxruntime_MINIMAL_BUILD Co-authored-by: Scott McKay <skottmckay@gmail.com>	2020-09-01 10:17:50 -07:00
Pranav Sharma	ad1701dfb1	Rename DeviceAllocatorRegistrationInfo to a more generic name; Use OrtArenaCfg for arena members; Remove unused OrtMemType; Simplify CreateAllocator interface. (#4970 ) * Rename DeviceAllocatorRegistrationInfo to a more generic name; Remove OrtMemType; Simplify CreateAllocator interface. * - fix builds - fixed mixed aggregation + constructor calls (which were coded before this PR) - changed default value of max_mem in API header - added some validation of values for for arena_extend_strategy * fix tensorrt and cuda tests	2020-09-01 09:25:32 -07:00
Yufeng Li	ffc2b25a3a	Quantization tool improvement (#4933 ) Improve quantization tools: 1. Support QAT 2. Make quantization tool to register Operators. 3. Make the API clear to use Co-authored-by: t-yguo <t-yguo@microsoft.com>	2020-09-01 09:07:46 -07:00
Zhang Lei	464bbd27a9	Zhalei/optimize nms (#4875 ) * double the speed of non_max_suppression for cpu. * handle edge case in test case.	2020-08-31 23:33:54 -07:00
Zhang Lei	cf1b74396a	Fix build break for microbench. (#4960 )	2020-08-31 23:29:07 -07:00

1 2 3 4 5 ...

3287 commits