GitHub Park

vLLM Kunlun: a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU

vLLM Kunlun is a community-maintained hardware plugin specifically designed for Kunlun XPUs, enabling the vLLM framework to run seamlessly and efficiently on Kunlun XPU hardware. By providing a pluggable hardware interface, vLLM Kunlun achieves a decoupled integration between Kunlun XPUs and vLLM, allowing various mainstream open-source large models—including Transformer-based, Mixture of Experts (MoE), Embedding, and multimodal LLMs—to operate on this architecture.

vLLM Kunlun supports generative models like Qwen, Llama, and GLM, as well as multimodal models such as Qianfan-VL and InternVL. It offers key features like quantization, LoRA, and segmented Kunlun graphs, delivering exceptional high-performance computing capabilities on Kunlun 3 P800 hardware.

Prerequisites for Running vLLM Kunlun

  • Hardware: Kunlun Chip 3 P800
  • Operating System: Ubuntu 22.04
  • Software Environment:
    • Python version ≥ 3.10
    • PyTorch version ≥ 2.5.1
    • vLLM (must match the vllm-kunlun version)

Supported Models

Generative Models

Model Support Status Quantization LoRA Segmented Kunlun Graph Notes
Qwen2/2.5 - -
Qwen3 - -
Qwen3-Moe/Coder -
QwQ-32B - - -
Llama2/3/3.1 - - -
GLM-4.5/Air -
Qwen3-next ⚠️ - - - Coming soon
GPT OSS ⚠️ - - - Coming soon
DeepSeek-v3/3.2 ⚠️ - - - Coming soon

Multimodal Language Models

Model Support Status Quantization LoRA Segmented Kunlun Graph Notes
Qianfan-VL - - -
Qwen2.5-VL - - -
InternVL2.5/3/3.5 - - -
InternS1 - - -
Qwen2.5-Omni ⚠️ - - - Coming soon
Qwen3-VL ⚠️ - - - Coming soon
GLM-4.5V - - -

Performance

On the Kunlun Chip 3 P800, various models demonstrate efficient computational capabilities. The testing environment was configured with 16 concurrent requests and an input/output size of 2048. The throughput for each model is as follows:

  • Qwen3-30B-A3B: 1927.4
  • Qwen3-14B: 1781.1
  • Qwen3-8B: 1779.8
  • Qwen2 5 14B linstruct: 1592.7
  • Qwen3-32B: 927.7
  • Qwen3-235B-A22B: 927.5
  • Qwen2.5-32B-instruct: 916.5
  • Qwen2.5.72B-lnstruct: 819.5

QuickStart Doc

Installation Doc

Visit baidu/vLLM-Kunlun to access the source code and obtain more information.