V
vllm-project/vllm
RisingMaintainedA high-throughput and memory-efficient inference and serving engine for LLMs
llminferenceservingcuda
32.5k(+780 this week)4.8k forksPythonApache-2.0Updated Dec 20
What it is
A high-throughput and memory-efficient inference and serving engine for LLMs
Best for
InfraLLM
Quick Start
# Install
pip install vllm