Onyx Cloud Admin

Onyx Model Builder guide for database-backed neural networks

The Onyx Model Builder lets you create Onyx-backed neural networks directly from database queries. This guide documents the full screen, including model setup, training data, input and output mappings, transform pipelines, neural network architecture, asynchronous training runs, published model versions, and prediction testing.

Build from saved queries

Use query-row fields as features and targets without exporting training data.

Train Onyx models

Configure neural network layers, loss functions, batching, validation, and model outputs from the admin console.

Publish predictions

Promote completed runs into database-scoped published models and test predictions with raw inputs or saved scripts.

Overview

Model Builder is a dashboard workspace for teams that want machine learning close to operational data. A model is scoped to the selected organization and database. Its training data comes from a saved query, its fields are mapped into feature and target tensors, and its published versions can be called through Onyx prediction endpoints.

The feature is best for tabular prediction, scoring, classification, multi-label outputs, vector regression, text or embedding experiments, and attention-based architectures over database rows. The screen is designed so model definitions, training history, and published artifacts stay attached to the database that produced them.

Core workflow

Select an organization and database, create a model, choose a training query, map inputs and outputs, build the architecture, start a training run, publish a completed artifact, then test or call the published model.

Access and Permissions

Open /model-builder in the Onyx Cloud Admin console after signing in. The workspace requires an active organization and database from the top context bar. The model list, saved query selector, and training actions are scoped to that database.

Read model definitions: requires database read access.
Create, edit, delete, train, complete, cancel, fine tune, and publish: require maintainer-level database access.
Published model prediction: uses the database data API with read-only access and accepts either bearer auth or database API key credentials.
Saved queries: create them in the Query Editor before selecting them as training data or prediction scripts.

Workspace Tour

The Model Builder screen is organized as a persistent model workspace. The header shows the selected model name, model count, configured layer count, expanded layer count, active organization, active database, and save status. Changes auto-save after a short debounce, and pending changes are flushed before training starts or the page unloads.

Header actions

Start Training, open the Models dialog, edit model metadata, delete the selected model, or create a new model.

Model switcher

Switch between database-scoped models from the dropdown. The menu also offers quick access to create a new model.

Training Setup

Select the saved training query, objective, learning rate, batch size, epochs, patience, test fraction, and gradient accumulation.

Inputs, outputs, and architecture

Define row mappings, transform pipelines, and the actual neural network graph.

Model Lifecycle

Create a model

Select New Model or Create model, provide a name and description, then save. New models start with two numeric input mappings, one numeric output mapping, and a simple dense architecture.

Switch and edit model metadata

Use the model dropdown to switch definitions. Use Edit to rename the model or update the description. Model metadata is stored with the same database scope as the architecture.

Delete only when artifacts are no longer needed

Delete removes the saved architecture for the current organization and database. Published models and run artifacts should be reviewed before deleting a model used by applications.

Training Setup

Training Setup connects the model definition to data. Choose a saved query from the Query Editor. The query can return row objects directly, or return a query definition that the backend streams before applying mappings.

Training query shape

The backend expects a list of objects or an object containing records. Each returned row should expose the paths referenced by Inputs and Outputs. For example, a mapping path of row.features[0] reads the first element in thefeatures array when the query returns row objects.

Item	Use	Notes
Mean Squared Error	Continuous regression targets, such as scores, forecasts, quantities, or risk values.	Outputs are trained as dense numeric values. Use linear final activations unless you intentionally need bounded output.
Binary Cross Entropy	Binary classification and multi-label boolean targets.	The backend trains this as a logits-based objective. Use linear logits in the final dense layer and apply thresholds during inference.
Cosine Similarity Loss	Vector targets where direction matters more than magnitude.	Useful for matching embeddings, similarity spaces, and directional vector objectives.
Embedding Loss	Embedding-style outputs produced by the Onyx model-builder embedding objective.	Use when the model is learning a vector representation rather than a scalar prediction.

Batch Size

Rows per training batch.

Max Epochs

Maximum passes through the training data.

Patience

Early-stopping patience before training stops improving.

Test Fraction

Validation/test split from 0 to 1.

Grad Accum Steps

Number of gradient accumulation steps used by the streaming training flow.

Inputs and Outputs

Inputs and Outputs map query-row fields into model tensors. Inputs become features. Outputs become targets during training and structured prediction fields after publish. Each mapping has a name, row path, source type, invalid/null policy, optional default value, optional vector dimensions, and a transform pipeline.

Item	Use	Notes
number	Floating point or integer values from the training row.	Common transforms include Mean / Std, Min / Max, Robust Scaler, Log, and Quantile.
boolean	True/false feature flags or labels.	Boolean outputs pair naturally with Binary Cross Entropy. Boolean feature transforms are centered for training.
categorical	String-like categories such as country, plan, status, type, or segment.	Categorical values are indexed with an unknown bucket so future values can still be handled.
datetime	Timestamp values that should become model features.	Use consistent source formats. Time Decay is available when recency should affect model input.
text	Free-form natural language or descriptive fields.	Use Text Encoding to expand text into token embeddings before optional child transforms.
vector	Arrays of numeric features, embeddings, or indicator vectors.	Set Vector Dimensions when the feature tokenizer expands vector dimensions into separate tokens.

Row path syntax

Paths support object keys, dots, and numeric array indexes, such asrow.customer.score or row.directions[0]. The leading row key can reference the returned row object.

Invalid or null policy

Choose Fail Training, Skip Row, or Use Default Value. Strict failures are useful while debugging, while defaults or skipped rows help production data quality issues.

Vector dimensions

Vector mappings should set Vector Dimensions when the vector width is known. This is required when Feature Tokenizer uses Expand vector dimensions.

Transforms

Transform pipelines prepare raw row values for training. Transforms are ordered, can be moved up or down, and can be removed. Text Encoding and Column Pipeline can contain child transforms, allowing post-encoding or nested column processing.

Item	Use	Notes
Default	Pass values through using the default numeric normalization path.	The initial transform added to new mappings.
Boolean	Convert booleans into trainable numeric values.	Best for true/false flags and binary labels.
Categorical	Index categories discovered during training.	The model retains categorical metadata so published predictions can decode consistently.
Log	Compress skewed positive numeric values.	Use only when the field domain makes sense for logarithmic scaling.
Mean / Std	Standardize numeric columns around their training distribution.	A strong default for dense numeric features.
Time Decay	Turn timestamps or elapsed values into recency-weighted features.	Configure Lambda to tune the decay curve.
L2 Normalization	Normalize vector-like feature groups.	Useful for directional similarity and embedding inputs.
Robust Scaler	Scale numeric data while reducing sensitivity to outliers.	Prefer for long-tailed operational data.
Quantile	Map numeric values through distribution-aware quantile bins.	Useful when rank is more stable than raw magnitude.
Min / Max	Scale numeric values into a bounded range.	Good when min and max are meaningful and stable.
Max Abs	Scale by maximum absolute value while preserving sign.	Useful for sparse or centered numeric features.
Text Encoding	Expand text into token embeddings.	Configure Max Tokens and Embedding Size. Child transforms run after text expansion.
Column Pipeline	Group child transforms into a nested column pipeline.	Use for advanced transform sequences that should be treated as one mapping stage.

Text encoding rule

Text Encoding is intended for text mappings. If a text mapping uses Text Encoding, post-encoding transforms should be placed inside that encoding transform rather than as sibling transforms.

Architecture Editor

The Architecture editor defines the neural network graph. Add layers or blocks, configure each node, reorder nodes with the arrow controls, and remove nodes as needed. Container nodes such as Residual, Repeat Block, and Multi-Head Output open nested editors for child layers.

Item	Use	Notes
Feature Tokenizer	Creates learned tokens from mapped features so attention layers can consume tabular rows.	Must be the first executable layer. Configure model size, CLS token, and vector strategy.
Dense	Fully connected feed-forward layer.	Configure input size, output size, activation, dropout, and scaled sigmoid bounds when used.
Reshape Interval	Converts flat input into token rows by interval count and features per interval.	Use when a flat vector is naturally segmented before attention.
Multi-Head Attention	Runs attention across token rows.	Configure tokens per sample, model size, head count, causal mask, and RoPE base.
Token Pooling	Pools token rows back to a flat representation.	Pooling modes are mean, firstToken, and cls. CLS pooling requires the tokenizer CLS token.
Intervals To Flat	Flattens token rows into one dense row per sample.	Match intervals and token width to the upstream token shape.
Layer Normalization	Normalizes flat or token widths inside deeper architectures.	Configure size, layer norm epsilon, and Adam epsilon.
Multi-Head Output	Fans out a shared trunk into named output heads.	Only one is supported and it must be final in its sequence.
Residual	Wraps child layers in a residual branch.	The branch must preserve the incoming shape. Residual Scale controls branch contribution.
Repeat Block	Repeats a child layer sequence without manually duplicating layers.	Expanded layer count reflects the repeat count.

Dense layers work on flat tensors. Attention layers work on token rows. To move between those shapes, use Feature Tokenizer or Reshape Interval to create tokens, then Token Pooling or Intervals To Flat before returning to Dense layers or output heads.

Architecture Checks

Architecture Checks appear above the root architecture when the graph violates shape or ordering rules. They are meant to catch errors before a training run starts.

Feature Tokenizer first: a feature tokenizer must be the first executable layer when present.
Attention needs tokens: Multi-Head Attention should follow Feature Tokenizer, Reshape Interval, or another token-producing layer.
Token counts must match: Multi-Head Attention, Token Pooling, and Intervals To Flat must match upstream token count and token width.
Dense layers need flat input: add Token Pooling or Intervals To Flat before Dense if the upstream shape is tokens.
Residual branches preserve shape: a residual child sequence must return the same shape it received.
Multi-Head Output is final: only one Multi-Head Output is supported, it must be final, and each output mapping can belong to only one head.

Recommended Patterns

Dense regression or classification baseline

Start with a Dense or Residual MLP before adding attention. Use numeric, boolean, categorical, or vector mappings, normalize inputs, then emit one or more outputs with a final linear layer. For binary labels, choose Binary Cross Entropy and keep the final activation linear so the model emits logits.

Dense
  inputSize: feature width
  outputSize: 256
  activation: GELU
  dropoutRate: 0.10
Layer Normalization
  size: 256
Dense
  inputSize: 256
  outputSize: target width
  activation: LINEAR

Attention over tabular or vector features

Use Feature Tokenizer when each feature should become a token. For vector inputs, choose Project vector to one token when the entire vector should be one learned token, or Expand vector dimensions when each vector element should become its own token.

Feature Tokenizer
  modelSize: 64
  includeClsToken: true
  vectorStrategy: Expand vector dimensions
Multi-Head Attention
  tokensPerSample: feature token count + 1 CLS token
  modelSize: 64
  headCount: 8
Token Pooling
  poolingMode: cls
  tokensPerSample: same token count
  modelSize: 64
Dense
  inputSize: 64
  outputSize: target width
  activation: LINEAR

Multi-output models

Use Multi-Head Output when the model has a shared trunk but needs separate heads for separate targets. Keep the output head mapping order aligned with the Outputs list so structured predictions decode correctly.

Training Runs

Start Training saves any pending edits, opens the Training dialog, and starts an asynchronous run. A model can have only one active running session at a time. The dialog separates in-process runs from previous training results.

Item	Use	Notes
Last Batch Loss	Most recent batch-level loss emitted by the training process.	Use for live trend direction, not as the final model quality score.
Current Evaluation Loss	Cumulative loss for the current epoch evaluation window.	Watch this together with epoch progress and batch count.
Last Epoch Loss	Average loss from the last completed epoch.	Useful for deciding whether to complete or fine tune.
Best Epoch Loss	Lowest retained epoch loss.	The most useful single number for comparing runs in the UI.
Per-Head Loss	Loss summary for each output head.	Important for multi-target models where one head can lag the others.

Complete Now

Completes a running session early and writes the current model weights to disk.

Fine Tune

Starts a follow-up run from a completed run that still has a trained model artifact.

Publish Model

Turns a completed trained artifact into a database-scoped published model version.

Delete run or artifact

Delete run history, delete an unpublished trained artifact, or keep history while removing weights that are no longer needed.

Published Models

Published Models are promoted versions created from completed training runs. Each card shows the published model name, run ID, version, publish time, model ID, best epoch loss, final epoch loss, and training time. Copy the Model ID when wiring applications to the prediction API.

Use Test Model to validate a published version without leaving the admin console. Raw Inputs mode creates a nested input object from the feature mapping row paths. Saved Script mode runs a saved query script, fills template parameters such as {{propertyId}}, and uses the script output rows as model inputs.

Prediction output in the test dialog

The response includes Structured Predictions, Prediction Inputs, and Raw Tensor Output. Use structured predictions for application behavior and raw tensor output when debugging activation ranges, logits, or output shape.

Prediction API

Published models can be called through the Onyx data API. The raw prediction endpoint accepts one object or an array of objects. The script prediction endpoint executes a saved script first, then maps its returned rows through the published model's feature mappings.

Purpose	Endpoint
Predict from raw inputs	POST /data/{databaseId}/model-builder/published-model/{publishedModelId}/predict
Predict from a saved script	POST /data/{databaseId}/model-builder/published-model/{publishedModelId}/predict/script

POST /data/my-database/model-builder/published-model/published-model-123/predict
Content-Type: application/json

{
  "inputs": [
    {
      "row": {
        "account_age_days": 420,
        "is_enterprise": true,
        "country_code": "US"
      }
    }
  ]
}

POST /data/my-database/model-builder/published-model/published-model-123/predict/script
Content-Type: application/json

{
  "scriptId": "training-row-script",
  "scriptParameters": {
    "customerId": "customer-123"
  }
}

{
  "publishedModelId": "published-model-123",
  "modelId": "model-abc",
  "inputCount": 1,
  "inputs": [
    {
      "row": {
        "account_age_days": 420,
        "is_enterprise": true,
        "country_code": "US"
      }
    }
  ],
  "predictions": [
    {
      "churnRisk": 0.1875
    }
  ],
  "rawPredictions": [[0.1875]]
}

Troubleshooting

Most Model Builder issues come from query shape, mapping paths, transform assumptions, or mismatched tensor dimensions. Keep Architecture Checks visible while iterating and run the saved query directly before starting expensive training runs.

Item	Use	Notes
Start Training is disabled	No saved training query is selected or a training run is already running.	Select a Training Data Query in Training Setup and wait for any active run to finish.
No usable rows	The saved query returned no records after mapping and invalid value handling.	Run the query directly, confirm row paths, then decide whether invalid values should fail, skip, or use defaults.
Tokenizer token count mismatch	Attention or pooling expects a different token count than the upstream tokenizer emits.	For vector inputs, decide whether vectors should project to one token or expand each dimension into a token.
Dense input size mismatch	A dense layer is receiving a flat width different from its configured input size.	Update the dense input size or add token pooling / intervals to flat before dense layers.
Output head order warning	Multi-head output mappings do not match the model value mapping order.	Assign each output mapping to exactly one head and keep heads ordered like the Outputs list.
Published prediction returns unexpected values	Raw model output differs from structured prediction decoding.	Inspect both Structured Predictions and Raw Tensor Output in Test Model. Confirm output mappings and transforms.

FAQ

What is the Onyx Model Builder?

The Onyx Model Builder is the Onyx Cloud Admin workspace for creating database-scoped neural networks from saved Onyx queries, training them on query rows, and publishing model versions for prediction.

Do I need to export training data before using Model Builder?

No. The training query runs inside the Onyx environment and the backend maps the returned rows into model features and targets.

Can a published model be called from an application?

Yes. Published models expose prediction endpoints that accept raw input objects or execute a saved script and use its returned rows as inputs.

Can one model predict multiple outputs?

Yes. Add multiple Outputs and use a Multi-Head Output layer when each target should have its own named output head.

Can I fine tune a trained model?

Yes. Completed runs with a retained trained model artifact can be fine tuned from the Training Runs panel.

Next Steps

Create a Query

AI Query Assistant

API Dashboard User Guide

Need Help?

If you have any questions or need assistance:

Email:support@onyx.dev
Documentation: Visit ourHelp Centerfor tutorials and FAQs.

Contact Support