Skip to content

High-level Client

RuntimeClient is the recommended high-level Python entry point.

Use it when you want:

  • model resolution without loading
  • planning and inspection output
  • one-shot prompting
  • reusable chat sessions
  • the same runtime semantics as the CLI
from pathlib import Path

from ollm import GenerationConfig, RuntimeClient, RuntimeConfig

client = RuntimeClient()
runtime_config = RuntimeConfig(
    model_reference="Qwen/Qwen2.5-7B-Instruct",
    models_dir=Path("models"),
    device="cpu",
    backend="transformers-generic",
    use_specialization=False,
)

plan = client.describe_plan(runtime_config)
response = client.prompt(
    "List planets",
    runtime_config=runtime_config,
    generation_config=GenerationConfig(stream=False, max_new_tokens=64),
)

See the generated API docs for the full symbol reference: