mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-04 23:59:56 +00:00
### Description The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA API to cast float/half to float8 if CUDA>=11.8, a custom implementation if CUDA<11.8. * It implements, Cast, QuantizeLinear, DequantizeLinear for all types on CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA. * It extends the supported types for control flow operator, Shape, Reshape, Identity, If, Loop, Scan, Reshape * It implements Equal(19). * Cast, QuantizeLinear, DequantizeLinear operators now support a parameter `saturate` only valid for float 8 types. It is true by default. In that case, any value out of range is converted into the maximum float 8 value. If false, it is infinite. * QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA (and ROCm by extension), scale = 1D tensor with one scale per channel ### Motivation and Context Supports latest onnx version. Fixes [AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395) --------- Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com> |
||
|---|---|---|
| .. | ||
| nodejs/templates | ||
| nuget/templates | ||
| templates | ||
| android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml | ||
| android-x86_64-crosscompile-ci-pipeline.yml | ||
| binary-size-checks-pipeline.yml | ||
| build-perf-test-binaries-pipeline.yml | ||
| c-api-noopenmp-packaging-pipelines.yml | ||
| clean-build-docker-image-cache-pipeline.yml | ||
| linux-ci-pipeline.yml | ||
| linux-cpu-aten-pipeline.yml | ||
| linux-cpu-eager-pipeline.yml | ||
| linux-cpu-minimal-build-ci-pipeline.yml | ||
| linux-dnnl-ci-pipeline.yml | ||
| linux-gpu-ci-pipeline.yml | ||
| linux-gpu-tensorrt-ci-pipeline.yml | ||
| linux-gpu-tensorrt-daily-perf-pipeline.yml | ||
| linux-migraphx-ci-pipeline.yml | ||
| linux-multi-gpu-ci-pipeline.yml | ||
| linux-multi-gpu-tensorrt-ci-pipeline.yml | ||
| linux-openvino-ci-pipeline.yml | ||
| linux-openvino-nightly-pipeline.yml | ||
| linux-qnn-ci-pipeline.yml | ||
| mac-ci-pipeline.yml | ||
| mac-coreml-ci-pipeline.yml | ||
| mac-ios-ci-pipeline.yml | ||
| mac-ios-packaging-pipeline.yml | ||
| mac-objc-static-analysis-ci-pipeline.yml | ||
| mac-react-native-ci-pipeline.yml | ||
| npm-packaging-pipeline.yml | ||
| orttraining-linux-ci-pipeline.yml | ||
| orttraining-linux-external-custom-ops.yml | ||
| orttraining-linux-gpu-amd-e2e-test-ci-pipeline.yml | ||
| orttraining-linux-gpu-ci-pipeline.yml | ||
| orttraining-linux-gpu-distributed-e2e-test-pipeline.yml | ||
| orttraining-linux-gpu-docker-release-pipeline.yml | ||
| orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml | ||
| orttraining-linux-gpu-ortmodule-test-clear-cache-pipeline.yml | ||
| orttraining-linux-gpu-training-apis.yml | ||
| orttraining-linux-nightly-ortmodule-test-pipeline.yml | ||
| orttraining-mac-ci-pipeline.yml | ||
| orttraining-pai-ci-pipeline.yml | ||
| orttraining-py-packaging-pipeline-cpu.yml | ||
| orttraining-py-packaging-pipeline-cuda.yml | ||
| orttraining-py-packaging-pipeline-rocm.yml | ||
| post-merge-jobs.yml | ||
| py-package-build-pipeline.yml | ||
| py-package-test-pipeline.yml | ||
| py-packaging-pipeline.yml | ||
| qnn-ep-nuget-packaging-pipeline.yml | ||
| sign_ov_ep_binaries.yml | ||
| snpe-ep-nuget-packaging-pipeline.yml | ||
| web-ci-pipeline.yml | ||
| web-packaging-pipeline.yml | ||
| win-ci-fuzz-testing.yml | ||
| win-ci-pipeline.yml | ||
| win-gpu-ci-pipeline.yml | ||
| win-gpu-reduce-op-ci-pipeline.yml | ||
| win-gpu-tensorrt-ci-pipeline.yml | ||
| win-qnn-arm64-ci-pipeline.yml | ||
| win-qnn-ci-pipeline.yml | ||