vLLM Kunlun: a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU

vLLM Kunlun is a community-maintained hardware plugin specifically designed for Kunlun XPUs, enabling the vLLM framework to run seamlessly and efficiently on Kunlun XPU hardware. By providing a pluggable hardware interface, vLLM Kunlun achieves a decoupled integration between Kunlun XPUs and vLLM, allowing various mainstream open-source large models—including Transformer-based, Mixture of Experts (MoE), Embedding, and multimodal LLMs—to operate on this architecture.

vLLM Kunlun supports generative models like Qwen, Llama, and GLM, as well as multimodal models such as Qianfan-VL and InternVL. It offers key features like quantization, LoRA, and segmented Kunlun graphs, delivering exceptional high-performance computing capabilities on Kunlun 3 P800 hardware.

Prerequisites for Running vLLM Kunlun

Hardware: Kunlun Chip 3 P800
Operating System: Ubuntu 22.04
Software Environment:
- Python version ≥ 3.10
- PyTorch version ≥ 2.5.1
- vLLM (must match the vllm-kunlun version)

Supported Models

Generative Models

Model	Support Status	Quantization	LoRA	Segmented Kunlun Graph	Notes
Qwen2/2.5	✅	-	✅	✅	-
Qwen3	✅	-	✅	✅	-
Qwen3-Moe/Coder	✅	✅	✅	✅	-
QwQ-32B	✅	-	-	✅	-
Llama2/3/3.1	✅	-	-	✅	-
GLM-4.5/Air	✅	✅	✅	✅	-
Qwen3-next	⚠️	-	-	-	Coming soon
GPT OSS	⚠️	-	-	-	Coming soon
DeepSeek-v3/3.2	⚠️	-	-	-	Coming soon

Multimodal Language Models

Model	Support Status	Quantization	LoRA	Segmented Kunlun Graph	Notes
Qianfan-VL	✅	-	-	✅	-
Qwen2.5-VL	✅	-	-	✅	-
InternVL2.5/3/3.5	✅	-	-	✅	-
InternS1	✅	-	-	✅	-
Qwen2.5-Omni	⚠️	-	-	-	Coming soon
Qwen3-VL	⚠️	-	-	-	Coming soon
GLM-4.5V	✅	-	-	✅	-

Performance

On the Kunlun Chip 3 P800, various models demonstrate efficient computational capabilities. The testing environment was configured with 16 concurrent requests and an input/output size of 2048. The throughput for each model is as follows:

Qwen3-30B-A3B: 1927.4
Qwen3-14B: 1781.1
Qwen3-8B: 1779.8
Qwen2 5 14B linstruct: 1592.7
Qwen3-32B: 927.7
Qwen3-235B-A22B: 927.5
Qwen2.5-32B-instruct: 916.5
Qwen2.5.72B-lnstruct: 819.5

QuickStart Doc

Installation Doc