Get Mini-SGLang up and running in less than 5 minutes. This guide shows you the fastest way to install, launch a server, and make your first inference request.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sgl-project/mini-sglang/llms.txt
Use this file to discover all available pages before exploring further.
Platform Requirements: Mini-SGLang supports Linux only (x86_64 and aarch64). For Windows users, use WSL2. macOS is not supported due to dependencies on Linux-specific CUDA kernels.
Installation and First Run
Launch the server
Start an OpenAI-compatible API server with a single command:The server will start on
http://localhost:1919 by default. You’ll see output indicating the server is ready:Alternative: Interactive Shell
For quick testing and exploration, launch the interactive shell mode:/reset to clear chat history or /exit to quit.
Quick Examples
Next Steps
Now that you have Mini-SGLang running, explore more capabilities:- Installation Guide - Detailed installation options including Docker and WSL2
- Server Configuration - Configure advanced options like Tensor Parallelism and attention backends
- API Reference - Complete OpenAI-compatible API documentation
- Core Concepts - Learn about Radix Cache, Chunked Prefill, and other optimizations
If you encounter network issues downloading models from HuggingFace, use
--model-source modelscope to download from ModelScope instead.