Local Server API Reference¶

oLLM's optional server transport is a local-only FastAPI application. Start it with ollm serve.

Schema and docs endpoints¶

The OpenAI-compatible surface currently targets text chat, text responses, and model discovery.
Chat-completions requests currently support plain string content and structured text-part arrays only.
Responses requests support:
plain string input
message arrays with text, image, audio, and file-reference content parts
function_call_output tool-result input items
custom type=function tool definitions plus tool_choice
POST /v1/chat/completions supports both standard JSON responses and SSE streaming responses.
POST /v1/responses supports both standard JSON responses and typed SSE response events.
GET /v1/responses/{response_id} and previous_response_id require a configured response-store backend.
DELETE /v1/responses/{response_id} also requires a configured response-store backend.
Responses output items include assistant messages and function_call items.
The server continues to expose native oLLM-only runtime planning, prompt, and session endpoints beside the compatibility layer.

The server bind is local-only by default.
OpenAI-compatible streaming responses use text/event-stream chat-completion chunks plus a final data: [DONE] line.
Responses streaming uses typed SSE events such as response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.output_text.done, response.content_part.done, response.function_call_arguments.delta, response.function_call_arguments.done, response.output_item.done, response.completed, and response.failed.
Native streaming responses still use oLLM's SSE event family.
Server-side sessions are in-memory only.
Responses storage is disabled by default; enable a response-store backend when you want retrieval or previous_response_id chaining.
The HTTP transport reuses the same ApplicationService runtime planning and prompt execution logic as the CLI.