Onyx AI

Onyx AI endpoint reference

The Onyx AI endpoint mirrors Ollama's chat contracts so that existing OpenAI and Ollama compatible clients can connect with minimal changes. Use this guide to authenticate, call the chat endpoints, handle streamed tool calls, and approve mutating scripts before they execute in your environment.

Base URL

https://ai.onyx.dev

All examples in this reference target the production cluster. For self-hosted deployments, replace the base URL accordingly.

Overview

The primary chat surface lives at POST /api/chat, backed by the same contracts that Ollama exposes for chat,generate, embeddings, and model management endpoints. Requests accept Ollama-compatible payloads, including optionaltools, and responses return message content plus tool call metadata.

Append ?databaseId=... when you want the copilot to ground itself on a specific Onyx Cloud database. When present, the service loads the matching schema before creating the agent and tracks token usage against that database for billing analytics.

Authentication

Provide credentials with either a bearer token or Onyx API key headers. The service inspects theAuthorization: Bearer <token> header, thex-onyx-key/x-onyx-secret pair, or the signed access key issued by the Onyx Cloud Console and automatically derives the script execution fingerprint from whichever credential is available.

Authorization: Bearer <onyx-cloud-jwt>
x-onyx-key: <database-api-key>
x-onyx-secret: <database-api-secret>

Supplying at least one of those options ensures that downstream tool executions inherit the correct permissions and that any script approvals can be tied back to the approving identity.

Quickstart

The quickest way to confirm connectivity is to send a minimal chat completion request. This example enables streaming so that you can observe incremental tokens and tool calls as they arrive.

curl https://ai.onyx.dev/api/chat?databaseId=<database-id>   -H "Authorization: Bearer $ONYX_TOKEN"   -H "Content-Type: application/json"   -d '{
    "model": "onyx-chat",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Give me a breakdown of my website traffic per month for the last year."}
    ]
  }'

The response will stream newline-delimited JSON chunks that conform to the OllamaChatCompletionChunk schema. Close the connection when you receive a chunk with "done": true.

Chat completions

Send a JSON body that matches Ollama's ChatCompletionRequest. When you omitstream, the service buffers the final assistant turn before returning aChatCompletionResponse object.

POST /api/chat
Content-Type: application/json

{
  "model": "onyx-chat",
  "messages": [
    {"role": "system", "content": "You are an expert in tracking website analytics."},
    {"role": "user", "content": "List the unique number of visitors for the last month."}
  ]
}

When a databaseId query parameter is present, the router resolves the schema before instantiating the agent so the response can include database-aware guidance.

Non-agent models (anything outside the onyx-* family) are proxied directly to the configured Ollama base URL. Agent models automatically emit token usage statistics that are recorded against the supplied database for billing purposes.

Streaming responses

Set "stream": true to receive newline-delimited chunks withTransfer-Encoding: chunked. Agent models schedule keep-alive ping chunks once per second so that long-running tools do not break idle connections. Each chunk includes the current delta, and the final payload has "done": true plus the aggregated token counts when available.

{
  "model": "onyx-chat",
  "created_at": "2024-05-01T12:00:00Z",
  "message": {
    "role": "assistant",
    "content": "Here is the summary...",
    "tool_calls": [
      {
        "id": "call_123",
        "function": {
          "name": "generate-script",
          "arguments": {"requirements": "..."}
        }
      }
    ]
  },
  "done": false
}

Tool deltas are coalesced automatically, so once done becomestrue you can assume each tool_calls entry contains a fully assembled function payload ready for dispatch.

Tool invocations

The tool_calls array follows the same shape as Ollama's tool calling schema. When the assistant requests a tool, send back a role: "tool" message with the matching tool_call_id so the agent can continue reasoning with the result.

Core tool identifiers

  • generate-script – server-managed generator that iterates until it produces a validated script for the current database context.
  • execute-script – executes approved scripts inside the Script Runner. This tool enforces mutation approvals before running.
  • request-script-approval – client-handled prompt asking an administrator to approve a mutating script. Respond by recording an approval (see below) and then emitting a tool message that captures the user's decision.
  • get-data-sample – fetches a five-row sample from the referenced table to give the model concrete column and type information.
  • get-similar-scripts – surfaces recently stored scripts that resemble the current request so the model can re-use vetted logic.
  • web-search and read-web-page – perform a web search and fetch the body of a result for summarisation.

Unless a tool explicitly marks itself as client-managed, the platform executes it server-side and emits a synthetic role: "tool" message back into the stream before the agent resumes the conversation.

Mutations and approvals

The Script Mutation Guard inspects generated code for persistence verbs such as save, update, insert, delete, truncate, or transaction blocks. When any of those are detected, the agent issues arequest-script-approval tool call containing the normalised script plus a list of findings so that the console can present an approval dialog.

To approve a script, call POST /api/script-approvals with the same credentials you use for chat and include the script content in the payload.

POST /api/script-approvals
Authorization: Bearer <onyx-cloud-jwt>
Content-Type: application/json

{
  "script": "db.save({...})"
}

The response indicates whether approval was required. When approval is needed, Onyx records a fingerprint scoped to the approving identity and returns an expiration timestamp (currently two minutes). The subsequent execute-script call must arrive before the approval expires, otherwise the tool throws a mutation guard error and the agent will ask for a fresh confirmation.

After you persist the approval, reply to the pending request-script-approvaltool call with a role: "tool" message whose JSON body includes the user's decision (for example {&quot;approved&quot;: true}). The agent will resume and, if approved, emit a follow-up execute-script tool call that runs the code inside the secured runner.

Additional endpoints

Beyond /api/chat, the Onyx AI service exposes the rest of the Ollama-compatible surface area:

  • POST /api/embeddings – create vector embeddings using the configured model.
  • POST /api/generate and POST /api/generate/stream – run prompt-completion requests with optional streaming, recording token usage when adatabaseId is provided.
  • GET /api/tags – enumerate available models, including the Onyx agent shims.
  • POST /api/show – return metadata for a specific model, with synthetic entries for Onyx agent identifiers.
  • GET /api/version – report the upstream Ollama version.

These routes accept the same payloads that the upstream Ollama client expects, making it straightforward to swap the base URL in existing tooling.

Need help?

Need Help?

If you have any questions or need assistance: