Skip to content

Runtime Plan API

The runtime plan captures backend selection, support level, and specialization state before execution begins.

Bases: str, Enum

Describe whether specialization is absent, planned, applied, or replaced by fallback.

Describe how oLLM intends to execute a resolved model reference.

Attributes:

Name Type Description
resolved_model ResolvedModel

Final resolved model metadata for the plan.

backend_id str | None

Selected backend identifier when the plan is executable.

model_path Path | None

Local materialized model path when one exists.

support_level SupportLevel

Planned support level.

generic_model_kind GenericModelKind | None

Generic execution family when one applies.

supports_disk_cache bool

Whether the selected backend supports disk KV cache behavior.

supports_cpu_offload bool

Whether CPU offload controls are supported.

supports_gpu_offload bool

Whether GPU offload controls are supported.

specialization_enabled bool

Whether specialization is enabled for the current request.

specialization_applied bool

Whether specialization has already been applied.

specialization_provider_id str | None

Matching specialization provider identifier.

specialization_state SpecializationState

Current specialization lifecycle state.

reason str

Human-readable plan summary.

specialization_pass_ids tuple[SpecializationPassId, ...]

Planned specialization passes.

applied_specialization_pass_ids tuple[SpecializationPassId, ...]

Applied specialization passes.

fallback_reason str | None

Fallback reason when specialization failed.

details dict[str, str]

Extra serialized inspection details.

is_executable

is_executable() -> bool

Return whether the plan resolved to a runnable backend.

Returns:

Name Type Description
bool bool

True when a backend ID was selected.

as_dict

as_dict() -> dict[str, object]

Return a JSON-serializable representation of the runtime plan.

Returns:

Type Description
dict[str, object]

dict[str, object]: Serialized runtime plan payload.