Skip to content

Runtime Planning and Inspection

The resolver and planner are intentionally inspectable. oLLM does not treat backend selection as opaque magic.

Plan-only surfaces

These commands can print the runtime plan without loading a backend:

ollm prompt --plan-json --model llama3-8B-chat
ollm chat --plan-json --model llama3-8B-chat
ollm doctor --plan-json --model llama3-8B-chat
ollm models info llama3-8B-chat --plan-json

What a runtime plan contains

A plan includes:

  • resolved backend id
  • support level
  • generic model kind when applicable
  • disk/offload support flags
  • specialization enablement and state
  • specialization provider id
  • planned specialization pass ids
  • fallback reason when applicable

Specialization states

  • not-planned
  • planned
  • applied
  • fallback

Planning-only surfaces report the planned state. Actual prompt response metadata reports the finalized execution state.

Why this matters

This makes it possible to distinguish:

  • what oLLM resolved
  • why it picked that backend
  • whether specialization was only planned or actually applied
  • whether execution had to fall back to the generic path