Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sgl-project/mini-sglang/llms.txt

Use this file to discover all available pages before exploring further.

The SamplingParams dataclass configures how tokens are sampled during text generation.

Class Definition

from minisgl.core import SamplingParams

Fields

temperature
float
default:"0.0"
Controls randomness in sampling:
  • 0.0: Greedy decoding (always pick most likely token)
  • 0.0 < temperature < 1.0: Less random
  • 1.0: Sample according to model probabilities
  • > 1.0: More random
Lower values make output more deterministic and focused.
top_k
int
default:"-1"
Limits sampling to the K most likely tokens:
  • -1: Disabled (consider all tokens)
  • 1: Equivalent to greedy decoding
  • > 1: Sample from top K tokens only
Helps prevent sampling low-probability tokens.
top_p
float
default:"1.0"
Nucleus sampling - samples from smallest set of tokens whose cumulative probability exceeds P:
  • 1.0: Disabled (consider all tokens)
  • 0.0 < top_p < 1.0: Sample from top tokens with cumulative probability P
Dynamically adjusts vocabulary size based on probability distribution.
ignore_eos
bool
default:"False"
Whether to ignore end-of-sequence tokens:
  • False: Stop generation when EOS token is sampled
  • True: Continue generating even after EOS token
Useful for forcing generation to reach max_tokens.
max_tokens
int
default:"1024"
Maximum number of tokens to generate (excluding input prompt).Generation stops when either:
  • max_tokens tokens have been generated, or
  • EOS token is sampled (unless ignore_eos=True)

Properties

is_greedy

@property
def is_greedy(self) -> bool
Returns True if the configuration will result in greedy (deterministic) decoding. Conditions:
  • temperature <= 0.0 or top_k == 1
  • AND top_p == 1.0
params = SamplingParams(temperature=0.0)
print(params.is_greedy)  # True

params = SamplingParams(temperature=0.8)
print(params.is_greedy)  # False

Usage Examples

Greedy Decoding

from minisgl.core import SamplingParams

# Most deterministic output
params = SamplingParams(
    temperature=0.0,
    max_tokens=100
)

Balanced Sampling

# Good for most applications
params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=256
)

Creative Generation

# More diverse and creative outputs
params = SamplingParams(
    temperature=0.9,
    top_p=0.95,
    top_k=50,
    max_tokens=512
)

Force Complete Generation

# Generate exactly max_tokens, ignoring EOS
params = SamplingParams(
    temperature=0.8,
    max_tokens=100,
    ignore_eos=True
)

Constrained Sampling

# Restrict to top 40 most likely tokens
params = SamplingParams(
    temperature=0.8,
    top_k=40,
    top_p=0.9,  # Both can be used together
    max_tokens=200
)

Common Patterns

Use CaseTemperatureTop-KTop-PNotes
Factual answers0.0-11.0Greedy, deterministic
Code generation0.2-10.95Low randomness
General chat0.7-10.9Balanced
Creative writing0.9500.95High diversity
Brainstorming1.01000.98Maximum creativity

Notes

  • When both top_k and top_p are set, both filters are applied sequentially
  • temperature=0.0 is equivalent to top_k=1 (but more efficient)
  • For reproducible results, use temperature=0.0
  • Higher temperature values can lead to nonsensical outputs if too high (>1.5)