Runtime Configuration¶
The library and CLI share the same runtime configuration model.
RuntimeConfig¶
Use RuntimeConfig to describe:
- the model reference
- local materialization root
- device
- backend override
- multimodal enablement
- specialization enablement
- cache strategy, cache lifecycle, adaptation mode, cache root, and offload behavior
- sliding-window token budget when bounded-history KV is selected
The current KV scaffolding now distinguishes:
strategy_selector_profile— the deterministic selector profile (balanced,latency,capacity, orbounded-window)kv_cache_strategy— optional explicit strategy override; when omitted, the selector chooses a concrete presetkv_cache_window_tokens— bounded recent-context token budget forsliding-window-ring-buffer; omitted for full-history strategiesdense_projection_chunk_rows— optional explicit row budget for dense optimized-native MLP chunking; when omitted, the dense Llama, Gemma3, and Voxtral paths keep the current16384-row ceiling but derive smaller chunks only when accelerator headroom is tightkv_cache_lifecycle—runtime-scopedor explicitpersistentreuse semantics;residentrequiresruntime-scopedkv_cache_adaptation_mode—disabled,observe-only, orautomatic; observe-only recommendation rules exist, but live switching is still disabledoffload_cpu_layers— requested CPU offload layer budgetoffload_cpu_policy— CPU offload placement policy (auto,prefix,suffix, ormiddle-band)offload_gpu_layers— requested GPU offload layer budget
Current offload truth:
offload_cpu_layersrequires an accelerator runtime deviceoffload_cpu_layersandoffload_gpu_layerscannot be combined in the current implementation
Current selector truth:
- selector-default candidates are
paged,resident, andquantized-cold-tier sliding-window-ring-bufferremains explicit bounded-history opt-in onlystreamed-segmented,log-structured-journal, andtiered-write-backremain explicit override choices, not selector defaults
GenerationConfig¶
Use GenerationConfig to describe:
- token limits
- sampling controls
- seeding
- streaming
See: