Fully built by AI

vllm-project/vllm

RisingMaintained

A high-throughput and memory-efficient inference and serving engine for LLMs

llminferenceservingcuda

32.5k(+780 this week)4.8k forksPythonApache-2.0Updated Dec 20

A high-throughput and memory-efficient inference and serving engine for LLMs

InfraLLM

# Install

pip install vllm

README content would be rendered here from the repository.

Released Dec 14, 2024

Latest

890

Open Issues

156

Open PRs

Dec 20

Last Update