From 59af1d6413b98eee35050dcc4c5ad61ab1368486 Mon Sep 17 00:00:00 2001 From: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Date: Wed, 24 Jul 2024 17:32:57 -0400 Subject: [PATCH] [Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754) --- docs/source/getting_started/amd-installation.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/source/getting_started/amd-installation.rst b/docs/source/getting_started/amd-installation.rst index 71d7527a3e706..1c7d274b7c47e 100644 --- a/docs/source/getting_started/amd-installation.rst +++ b/docs/source/getting_started/amd-installation.rst @@ -142,3 +142,10 @@ Alternatively, wheels intended for vLLM use can be accessed under the releases. - Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support. - To use CK flash-attention or PyTorch naive attention, please use this flag ``export VLLM_USE_TRITON_FLASH_ATTN=0`` to turn off triton flash attention. - The ROCm version of PyTorch, ideally, should match the ROCm driver version. + + +.. tip:: + - For MI300x (gfx942) users, to achieve optimal performance, please refer to `MI300x tuning guide `_ for performance optimization and tuning tips on system and workflow level. + For vLLM, please refer to `vLLM performance optimization `_. + +