KirinNews
V

vllm-project/vllm

RisingMaintained

A high-throughput and memory-efficient inference and serving engine for LLMs

llminferenceservingcuda
32.5k(+780 this week)4.8k forksPythonApache-2.0Updated Dec 20

What it is

A high-throughput and memory-efficient inference and serving engine for LLMs

Best for

InfraLLM

Quick Start

# Install
pip install vllm