Local Server API Reference¶
oLLM's optional server transport is a local-only FastAPI application. Start it
with ollm serve.
Schema and docs endpoints¶
/openapi.json/docs/redoc
OpenAI-compatible routes¶
GET /v1/modelsGET /v1/models/{model_id}POST /v1/chat/completionsPOST /v1/responsesGET /v1/responses/{response_id}DELETE /v1/responses/{response_id}
Native oLLM routes¶
GET /v1/healthGET /v1/ollm/modelsGET /v1/ollm/models/{model_reference}POST /v1/planPOST /v1/promptPOST /v1/prompt/streamPOST /v1/sessionsGET /v1/sessions/{session_id}POST /v1/sessions/{session_id}/promptPOST /v1/sessions/{session_id}/prompt/stream
Compatibility scope¶
- The OpenAI-compatible surface currently targets text chat, text responses, and model discovery.
- Chat-completions requests currently support plain string content and structured text-part arrays only.
- Responses requests support:
- plain string input
- message arrays with text, image, audio, and file-reference content parts
function_call_outputtool-result input items- custom
type=functiontool definitions plustool_choice POST /v1/chat/completionssupports both standard JSON responses and SSE streaming responses.POST /v1/responsessupports both standard JSON responses and typed SSE response events.GET /v1/responses/{response_id}andprevious_response_idrequire a configured response-store backend.DELETE /v1/responses/{response_id}also requires a configured response-store backend.- Responses output items include assistant messages and
function_callitems. - The server continues to expose native oLLM-only runtime planning, prompt, and session endpoints beside the compatibility layer.
Semantics¶
- The server bind is local-only by default.
- OpenAI-compatible streaming responses use
text/event-streamchat-completion chunks plus a finaldata: [DONE]line. - Responses streaming uses typed SSE events such as
response.created,response.in_progress,response.output_item.added,response.content_part.added,response.output_text.delta,response.output_text.done,response.content_part.done,response.function_call_arguments.delta,response.function_call_arguments.done,response.output_item.done,response.completed, andresponse.failed. - Native streaming responses still use oLLM's SSE event family.
- Server-side sessions are in-memory only.
- Responses storage is disabled by default; enable a response-store backend when
you want retrieval or
previous_response_idchaining. - The HTTP transport reuses the same
ApplicationServiceruntime planning and prompt execution logic as the CLI.