Onyx AI
Onyx AI endpoint reference
The Onyx AI endpoint mirrors Ollama's chat contracts so that existing OpenAI and Ollama compatible clients can connect with minimal changes. Use this guide to authenticate, call the chat endpoints, handle streamed tool calls, and approve mutating scripts before they execute in your environment.
Base URL
https://ai.onyx.dev
All examples in this reference target the production cluster. For self-hosted deployments, replace the base URL accordingly.
Overview
The primary chat surface lives at POST /api/chat
, backed by the same contracts that Ollama exposes for chat
,generate
, embeddings
, and model management endpoints. Requests accept Ollama-compatible payloads, including optionaltools
, and responses return message content plus tool call metadata.
Append ?databaseId=...
when you want the copilot to ground itself on a specific Onyx Cloud database. When present, the service loads the matching schema before creating the agent and tracks token usage against that database for billing analytics.
Authentication
Provide credentials with either a bearer token or Onyx API key headers. The service inspects theAuthorization: Bearer <token>
header, thex-onyx-key
/x-onyx-secret
pair, or the signed access key issued by the Onyx Cloud Console and automatically derives the script execution fingerprint from whichever credential is available.
Authorization: Bearer <onyx-cloud-jwt>
x-onyx-key: <database-api-key>
x-onyx-secret: <database-api-secret>
Supplying at least one of those options ensures that downstream tool executions inherit the correct permissions and that any script approvals can be tied back to the approving identity.
Quickstart
The quickest way to confirm connectivity is to send a minimal chat completion request. This example enables streaming so that you can observe incremental tokens and tool calls as they arrive.
curl https://ai.onyx.dev/api/chat?databaseId=<database-id> -H "Authorization: Bearer $ONYX_TOKEN" -H "Content-Type: application/json" -d '{
"model": "onyx-chat",
"stream": true,
"messages": [
{"role": "user", "content": "Give me a breakdown of my website traffic per month for the last year."}
]
}'
The response will stream newline-delimited JSON chunks that conform to the OllamaChatCompletionChunk
schema. Close the connection when you receive a chunk with "done": true
.
Chat completions
Send a JSON body that matches Ollama's ChatCompletionRequest
. When you omitstream
, the service buffers the final assistant turn before returning aChatCompletionResponse
object.
POST /api/chat
Content-Type: application/json
{
"model": "onyx-chat",
"messages": [
{"role": "system", "content": "You are an expert in tracking website analytics."},
{"role": "user", "content": "List the unique number of visitors for the last month."}
]
}
When a databaseId
query parameter is present, the router resolves the schema before instantiating the agent so the response can include database-aware guidance.
Non-agent models (anything outside the onyx-*
family) are proxied directly to the configured Ollama base URL. Agent models automatically emit token usage statistics that are recorded against the supplied database for billing purposes.
Streaming responses
Set "stream": true
to receive newline-delimited chunks withTransfer-Encoding: chunked
. Agent models schedule keep-alive ping chunks once per second so that long-running tools do not break idle connections. Each chunk includes the current delta, and the final payload has "done": true
plus the aggregated token counts when available.
{
"model": "onyx-chat",
"created_at": "2024-05-01T12:00:00Z",
"message": {
"role": "assistant",
"content": "Here is the summary...",
"tool_calls": [
{
"id": "call_123",
"function": {
"name": "generate-script",
"arguments": {"requirements": "..."}
}
}
]
},
"done": false
}
Tool deltas are coalesced automatically, so once done
becomestrue
you can assume each tool_calls
entry contains a fully assembled function payload ready for dispatch.
Tool invocations
The tool_calls
array follows the same shape as Ollama's tool calling schema. When the assistant requests a tool, send back a role: "tool"
message with the matching tool_call_id
so the agent can continue reasoning with the result.
Core tool identifiers
generate-script
– server-managed generator that iterates until it produces a validated script for the current database context.execute-script
– executes approved scripts inside the Script Runner. This tool enforces mutation approvals before running.request-script-approval
– client-handled prompt asking an administrator to approve a mutating script. Respond by recording an approval (see below) and then emitting a tool message that captures the user's decision.get-data-sample
– fetches a five-row sample from the referenced table to give the model concrete column and type information.get-similar-scripts
– surfaces recently stored scripts that resemble the current request so the model can re-use vetted logic.web-search
andread-web-page
– perform a web search and fetch the body of a result for summarisation.
Unless a tool explicitly marks itself as client-managed, the platform executes it server-side and emits a synthetic role: "tool"
message back into the stream before the agent resumes the conversation.
Mutations and approvals
The Script Mutation Guard inspects generated code for persistence verbs such as save, update, insert, delete, truncate, or transaction blocks. When any of those are detected, the agent issues arequest-script-approval
tool call containing the normalised script plus a list of findings so that the console can present an approval dialog.
To approve a script, call POST /api/script-approvals
with the same credentials you use for chat and include the script content in the payload.
POST /api/script-approvals
Authorization: Bearer <onyx-cloud-jwt>
Content-Type: application/json
{
"script": "db.save({...})"
}
The response indicates whether approval was required. When approval is needed, Onyx records a fingerprint scoped to the approving identity and returns an expiration timestamp (currently two minutes). The subsequent execute-script
call must arrive before the approval expires, otherwise the tool throws a mutation guard error and the agent will ask for a fresh confirmation.
After you persist the approval, reply to the pending request-script-approval
tool call with a role: "tool"
message whose JSON body includes the user's decision (for example {"approved": true}
). The agent will resume and, if approved, emit a follow-up execute-script
tool call that runs the code inside the secured runner.
Additional endpoints
Beyond /api/chat
, the Onyx AI service exposes the rest of the Ollama-compatible surface area:
POST /api/embeddings
– create vector embeddings using the configured model.POST /api/generate
andPOST /api/generate/stream
– run prompt-completion requests with optional streaming, recording token usage when adatabaseId
is provided.GET /api/tags
– enumerate available models, including the Onyx agent shims.POST /api/show
– return metadata for a specific model, with synthetic entries for Onyx agent identifiers.GET /api/version
– report the upstream Ollama version.
These routes accept the same payloads that the upstream Ollama client expects, making it straightforward to swap the base URL in existing tooling.
Need help?
Need Help?
If you have any questions or need assistance:
- Email:support@onyx.dev
- Documentation: Visit ourHelp Centerfor tutorials and FAQs.