mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-14 20:48:00 +00:00
Implement CloudEP for hybrid inferencing. The PR introduces zero new API, customers could configure session and run options to do inferencing with Azure [triton endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint) Sample configuration in python be like: ``` sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton'); sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com'); sess_opt.add_session_config_entry('cloud.model_name', 'detection2'); sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1 sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose ... run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint. run_opt.add_run_config_entry('cloud.auth_key', '...') ... sess.run(None, {'input':input_}, run_opt) ``` Co-authored-by: Randy Shuai <rashuai@microsoft.com> |
||
|---|---|---|
| .. | ||
| eager | ||
| post_to_dashboard | ||
| bundle_dlls_gpu.bat | ||
| bundle_nuget_with_native_headers.bat | ||
| extract_nuget_files.ps1 | ||
| extract_nuget_files_gpu.ps1 | ||
| extract_zip_files_gpu.ps1 | ||
| helpers.ps1 | ||
| install_third_party_deps.ps1 | ||
| jar_gpu_packaging.ps1 | ||
| jar_packaging.ps1 | ||
| post_binary_sizes_to_dashboard.py | ||
| post_code_coverage_to_dashboard.py | ||
| setup_env.bat | ||
| setup_env_cloud.bat | ||
| setup_env_cuda_11.bat | ||
| setup_env_gpu.bat | ||
| setup_env_trt.bat | ||
| setup_env_x86.bat | ||