From 927c3e39ec1fb78e571c0ec2521ae59ed05720f2 Mon Sep 17 00:00:00 2001 From: Jacky Lee <39754370+jla524@users.noreply.github.com> Date: Tue, 17 Dec 2024 09:33:50 -0800 Subject: [PATCH] Fix image preview in multi-GPU inference docs (#35303) fix: link for img --- docs/source/en/perf_infer_gpu_multi.md | 2 +- docs/source/zh/perf_infer_gpu_multi.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/perf_infer_gpu_multi.md b/docs/source/en/perf_infer_gpu_multi.md index 997509441..ea9421747 100644 --- a/docs/source/en/perf_infer_gpu_multi.md +++ b/docs/source/en/perf_infer_gpu_multi.md @@ -64,5 +64,5 @@ You can benefit from considerable speedups for inference, especially for inputs For a single forward pass on [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel) with a sequence length of 512 and various batch sizes, the expected speedup is as follows:
- +
diff --git a/docs/source/zh/perf_infer_gpu_multi.md b/docs/source/zh/perf_infer_gpu_multi.md index ee523bc60..35e5bac46 100644 --- a/docs/source/zh/perf_infer_gpu_multi.md +++ b/docs/source/zh/perf_infer_gpu_multi.md @@ -64,5 +64,5 @@ torchrun --nproc-per-node 4 demo.py 以下是 [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel) 模型在序列长度为 512 且不同批量大小情况下的单次前向推理的预期加速效果:
- +