### Tune logging experience a bit
After last time we update the ORTModule log experience, we found few
issues:
1. `INFO` level output too many things, including PyTorch exporter
verbose logs (tracing graphs) on every ranks. On this level, we only
want to
- Output a little bit more information to Users than `WARNING` level,
for example the memory recomputation recommendations or other
not-fully-ready features.
- Output a little bit more information for a quick diagnostic, collected
on rank-0 only.
2. ONNX Runtime logging filter during graph build, session init
sometimes will hide the issues (for example segement fault), there is no
useful information in `WARNING`/`INFO` for users to report to us. This
is not good!
3. Some of our devs like using `pdb` to debug Python code, but if we add
`import pdb; pdb.set_trace()` in models' code might hang when they use
`INFO` or `WARNING`, where exporter happens and all output got
redirected due to log filtering. The only workaround is to switch to
VERBOSE, which output toooooooooooo many logs.
The corresponding changes proposed here are:
1. For `INFO` logging,
- We only logs rank-0.
- We restricted the ORT backend logging level to be WARNING in this
case, because ORT backend code output way too many logs that should be
under verbose, while we cannot guarantee we can get them cleaned up
immediately once they are added.
- We output the PyTorch exporter verbose log (including tracing graph),
which is useful for a quick diagnostic when an issue happens.
2. Remove all logging filtering on ORT backend, then the segment fault
issue details will not be hidden once it happens again.
3. Introduced a `DEVINFO` logging,
- Log logs on all ranks
- Log ORT backend logging level INFO
- PyTorch exporter logging filtering are all turned OFF (to unblock the
pdb debugging).
4. Currently, to use Memory Optimizer, need use DEVINFO (which will
output ORT backend INFO log). So update memory optimizer document to
reflect this. https://github.com/microsoft/onnxruntime/pull/17481 will
update the requirement back to INFO for show memory optimization infos.
You can check
https://github.com/microsoft/onnxruntime/blob/pengwa/devinfo_level/docs/ORTModule_Training_Guidelines.md#log-level-explanations
for a better view of different log levels.
This PR also extract some changes from a bigger one
https://github.com/microsoft/onnxruntime/pull/17481, to reduce its
complexity for review.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>
|
||
|---|---|---|
| .config | ||
| .devcontainer | ||
| .gdn | ||
| .github | ||
| .pipelines | ||
| .vscode | ||
| cgmanifests | ||
| cmake | ||
| csharp | ||
| dockerfiles | ||
| docs | ||
| include/onnxruntime/core | ||
| java | ||
| js | ||
| objectivec | ||
| onnxruntime | ||
| orttraining | ||
| rust | ||
| samples | ||
| tools | ||
| winml | ||
| .clang-format | ||
| .clang-tidy | ||
| .dockerignore | ||
| .gitattributes | ||
| .gitignore | ||
| .gitmodules | ||
| .lintrunner.toml | ||
| build.bat | ||
| build.sh | ||
| CITATION.cff | ||
| CODEOWNERS | ||
| CONTRIBUTING.md | ||
| lgtm.yml | ||
| LICENSE | ||
| NuGet.config | ||
| ort.wprp | ||
| ORT_icon_for_light_bg.png | ||
| packages.config | ||
| pyproject.toml | ||
| README.md | ||
| requirements-dev.txt | ||
| requirements-doc.txt | ||
| requirements-lintrunner.txt | ||
| requirements-training.txt | ||
| requirements.txt.in | ||
| SECURITY.md | ||
| setup.py | ||
| ThirdPartyNotices.txt | ||
| VERSION_NUMBER | ||

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.
ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →
ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →
Get Started & Resources
-
General Information: onnxruntime.ai
-
Usage documention and tutorials: onnxruntime.ai/docs
-
YouTube video tutorials: youtube.com/@ONNXRuntime
-
Companion sample repositories:
- ONNX Runtime Inferencing: microsoft/onnxruntime-inference-examples
- ONNX Runtime Training: microsoft/onnxruntime-training-examples
Builtin Pipeline Status
| System | Inference | Training |
|---|---|---|
| Windows | ||
| Linux | ||
| Mac | ||
| Android | ||
| iOS | ||
| Web | ||
| Other |
Third-party Pipeline Status
| System | Inference | Training |
|---|---|---|
| Linux |
Data/Telemetry
Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.
Contributions and Feedback
We welcome contributions! Please see the contribution guidelines.
For feature requests or bug reports, please file a GitHub Issue.
For general discussion or questions, please use GitHub Discussions.
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
License
This project is licensed under the MIT License.