See [this](../onnxruntime/test/shared_lib/test_inference.cc) for examples of MyCustomOp and SliceCustomOp that use the C++ helper API (onnxruntime_cxx_api.h).
You can also compile the custom ops into a shared library and use that to run a model via the C++ API. The same test file contains an example.
The source code for a sample custom op shared library containing two custom kernels is [here](../onnxruntime/test/testdata/custom_op_library/custom_op_library.cc).
See [this](../onnxruntime/test/python/onnxruntime_test_python.py) for an example called testRegisterCustomOpsLibrary that uses the Python API
to register a shared library that contains custom op kernels.
Currently, the only supported Execution Providers (EPs) for custom ops registered via this approach are the `CUDA` and the `CPU` EPs.
Note that when a model being inferred on gpu, onnxruntime will insert MemcpyToHost op before a cpu custom op and append MemcpyFromHost after to make sure tensor(s) are accessible throughout calling, meaning there are no extra efforts required from custom op developer for the case.
To facilitate the custom operator development, sharing and release, please check the [onnxruntime custom operator library](https://github.com/microsoft/ort-customops) project for the more information.