**Current issue:** Once ORT gets the capability from EP's GetCapability(), it creates a graph viewer based on the capability as below: `viewers.push_back(std::make_unique<GraphViewer>(graph, *cur_capability.sub_graph));` or see the code [here](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/graph_partitioner.cc#L458). At this point, the graph viewer has the chance to generate the wrong order of `nodes_in_topological_order_` when calling [Graph::ReverseDFSFrom](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/graph_viewer.cc#L107), so that during EP Compile(), EP might create the "wrong nodes ordering" model proto from the graph viewer when calling [GraphViewerToProto()](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/graph_proto_serializer.cc#L37) because of the `nodes_in_topological_order_`. This is a problem for TRT EP to refit weights to the "weightless" engine. Since the engine is built from the model proto provided by TRT EP and the weights is in the original onnx model. The model proto and the orignal onnx model are not the same in terms of node ordering which makes TRT complain when refitting. **The original model (subgraph of ResNet50):** <img width="442" alt="image" src="https://github.com/microsoft/onnxruntime/assets/54722500/bb9a641d-f2f2-46c3-aebf-4084a08ff289"> **The serialized model proto generated by TRT EP:** (The highlighted part has the wrong node order compared to the original model.) <img width="340" alt="image" src="https://github.com/microsoft/onnxruntime/assets/54722500/bbc6bf34-f960-4753-9474-a18ebc2dc48b"> **The solution 1:** Change default comparator to `NodeCompare::operator() {return n1->Index() > n2->Index();}` The root cause of the different node order between original model and EP generated model is from graph viewer [generating ](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/graph_viewer.cc#L107)the different `nodes_in_topological_order_`. Modifying the `NodeCompare::operator()` for sorting can fix the problem. The `NodeCompare::operator()` will be used in [Graph::ReverseDFSFrom](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/graph.cc#L1760) where the input nodes of the current node will be [sorted](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/graph/graph.cc#L1802) based on node index. Due to the sorted nodes will be pushed into a stack which later determines the final topological node order in a "first in, last out" approach, the larger node index should be pushed into the stack first. So that we can get a topological node order aligns with smaller index node comes first. **The solution 2 (This PR uses this solution):** Use priority-based BFS for topological sort in GraphViewerToProto(). |
||
|---|---|---|
| .config | ||
| .devcontainer | ||
| .gdn | ||
| .github | ||
| .pipelines | ||
| .vscode | ||
| cgmanifests | ||
| cmake | ||
| csharp | ||
| dockerfiles | ||
| docs | ||
| include/onnxruntime/core | ||
| java | ||
| js | ||
| objectivec | ||
| onnxruntime | ||
| orttraining | ||
| rust | ||
| samples | ||
| tools | ||
| winml | ||
| .clang-format | ||
| .clang-tidy | ||
| .dockerignore | ||
| .gitattributes | ||
| .gitignore | ||
| .gitmodules | ||
| .lintrunner.toml | ||
| build.bat | ||
| build.sh | ||
| build_arm64x.bat | ||
| CITATION.cff | ||
| CODEOWNERS | ||
| CONTRIBUTING.md | ||
| lgtm.yml | ||
| LICENSE | ||
| NuGet.config | ||
| ort.wprp | ||
| ORT_icon_for_light_bg.png | ||
| packages.config | ||
| pyproject.toml | ||
| README.md | ||
| requirements-dev.txt | ||
| requirements-doc.txt | ||
| requirements-lintrunner.txt | ||
| requirements-training.txt | ||
| requirements.txt.in | ||
| SECURITY.md | ||
| setup.py | ||
| ThirdPartyNotices.txt | ||
| VERSION_NUMBER | ||

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.
ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →
ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →
Get Started & Resources
-
General Information: onnxruntime.ai
-
Usage documentation and tutorials: onnxruntime.ai/docs
-
YouTube video tutorials: youtube.com/@ONNXRuntime
-
Companion sample repositories:
- ONNX Runtime Inferencing: microsoft/onnxruntime-inference-examples
- ONNX Runtime Training: microsoft/onnxruntime-training-examples
Builtin Pipeline Status
| System | Inference | Training |
|---|---|---|
| Windows | ||
| Linux | ||
| Mac | ||
| Android | ||
| iOS | ||
| Web | ||
| Other |
Third-party Pipeline Status
| System | Inference | Training |
|---|---|---|
| Linux |
Data/Telemetry
Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.
Contributions and Feedback
We welcome contributions! Please see the contribution guidelines.
For feature requests or bug reports, please file a GitHub Issue.
For general discussion or questions, please use GitHub Discussions.
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
License
This project is licensed under the MIT License.