Runtime Planning and Inspection¶
The resolver and planner are intentionally inspectable. oLLM does not treat backend selection as opaque magic.
Plan-only surfaces¶
These commands can print the runtime plan without loading a backend:
ollm prompt --plan-json --model llama3-8B-chat
ollm chat --plan-json --model llama3-8B-chat
ollm doctor --plan-json --model llama3-8B-chat
ollm models info llama3-8B-chat --plan-json
What a runtime plan contains¶
A plan includes:
- resolved backend id
- support level
- generic model kind when applicable
- disk/offload support flags
- specialization enablement and state
- specialization provider id
- planned specialization pass ids
- fallback reason when applicable
Specialization states¶
not-plannedplannedappliedfallback
Planning-only surfaces report the planned state. Actual prompt response metadata reports the finalized execution state.
Why this matters¶
This makes it possible to distinguish:
- what oLLM resolved
- why it picked that backend
- whether specialization was only planned or actually applied
- whether execution had to fall back to the generic path