Runtime Plan API¶
The runtime plan captures backend selection, support level, and specialization state before execution begins.
Bases: str, Enum
Describe whether specialization is absent, planned, applied, or replaced by fallback.
Describe how oLLM intends to execute a resolved model reference.
Attributes:
| Name | Type | Description |
|---|---|---|
resolved_model |
ResolvedModel
|
Final resolved model metadata for the plan. |
backend_id |
str | None
|
Selected backend identifier when the plan is executable. |
model_path |
Path | None
|
Local materialized model path when one exists. |
support_level |
SupportLevel
|
Planned support level. |
generic_model_kind |
GenericModelKind | None
|
Generic execution family when one applies. |
supports_disk_cache |
bool
|
Whether the selected backend supports disk KV cache behavior. |
supports_cpu_offload |
bool
|
Whether CPU offload controls are supported. |
supports_gpu_offload |
bool
|
Whether GPU offload controls are supported. |
specialization_enabled |
bool
|
Whether specialization is enabled for the current request. |
specialization_applied |
bool
|
Whether specialization has already been applied. |
specialization_provider_id |
str | None
|
Matching specialization provider identifier. |
specialization_state |
SpecializationState
|
Current specialization lifecycle state. |
reason |
str
|
Human-readable plan summary. |
specialization_pass_ids |
tuple[SpecializationPassId, ...]
|
Planned specialization passes. |
applied_specialization_pass_ids |
tuple[SpecializationPassId, ...]
|
Applied specialization passes. |
fallback_reason |
str | None
|
Fallback reason when specialization failed. |
details |
dict[str, str]
|
Extra serialized inspection details. |
is_executable
¶
is_executable() -> bool
Return whether the plan resolved to a runnable backend.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
|
as_dict
¶
as_dict() -> dict[str, object]
Return a JSON-serializable representation of the runtime plan.
Returns:
| Type | Description |
|---|---|
dict[str, object]
|
dict[str, object]: Serialized runtime plan payload. |