Onyx Cloud Admin

Onyx Model Builder guide for database-backed neural networks

The Onyx Model Builder lets you create Onyx-backed neural networks directly from database queries. This guide documents the full screen, including model setup, training data, input and output mappings, transform pipelines, neural network architecture, asynchronous training runs, published model versions, and prediction testing.

Build from saved queries

Use query-row fields as features and targets without exporting training data.

Train Onyx models

Configure neural network layers, loss functions, batching, validation, and model outputs from the admin console.

Publish predictions

Promote completed runs into database-scoped published models and test predictions with raw inputs or saved scripts.

Overview

Model Builder is a dashboard workspace for teams that want machine learning close to operational data. A model is scoped to the selected organization and database. Its training data comes from a saved query, its fields are mapped into feature and target tensors, and its published versions can be called through Onyx prediction endpoints.

The feature is best for tabular prediction, scoring, classification, multi-label outputs, vector regression, text or embedding experiments, and attention-based architectures over database rows. The screen is designed so model definitions, training history, and published artifacts stay attached to the database that produced them.

Core workflow

Select an organization and database, create a model, choose a training query, map inputs and outputs, build the architecture, start a training run, publish a completed artifact, then test or call the published model.

Access and Permissions

Open /model-builder in the Onyx Cloud Admin console after signing in. The workspace requires an active organization and database from the top context bar. The model list, saved query selector, and training actions are scoped to that database.

  • Read model definitions: requires database read access.
  • Create, edit, delete, train, complete, cancel, fine tune, and publish: require maintainer-level database access.
  • Published model prediction: uses the database data API with read-only access and accepts either bearer auth or database API key credentials.
  • Saved queries: create them in the Query Editor before selecting them as training data or prediction scripts.

Workspace Tour

The Model Builder screen is organized as a persistent model workspace. The header shows the selected model name, model count, configured layer count, expanded layer count, active organization, active database, and save status. Changes auto-save after a short debounce, and pending changes are flushed before training starts or the page unloads.

Header actions

Start Training, open the Models dialog, edit model metadata, delete the selected model, or create a new model.

Model switcher

Switch between database-scoped models from the dropdown. The menu also offers quick access to create a new model.

Training Setup

Select the saved training query, objective, learning rate, batch size, epochs, patience, test fraction, and gradient accumulation.

Inputs, outputs, and architecture

Define row mappings, transform pipelines, and the actual neural network graph.

Model Lifecycle

1

Create a model

Select New Model or Create model, provide a name and description, then save. New models start with two numeric input mappings, one numeric output mapping, and a simple dense architecture.

2

Switch and edit model metadata

Use the model dropdown to switch definitions. Use Edit to rename the model or update the description. Model metadata is stored with the same database scope as the architecture.

3

Delete only when artifacts are no longer needed

Delete removes the saved architecture for the current organization and database. Published models and run artifacts should be reviewed before deleting a model used by applications.

Training Setup

Training Setup connects the model definition to data. Choose a saved query from the Query Editor. The query can return row objects directly, or return a query definition that the backend streams before applying mappings.

Training query shape

The backend expects a list of objects or an object containing records. Each returned row should expose the paths referenced by Inputs and Outputs. For example, a mapping path of row.features[0] reads the first element in thefeatures array when the query returns row objects.

ItemUseNotes
Mean Squared ErrorContinuous regression targets, such as scores, forecasts, quantities, or risk values.Outputs are trained as dense numeric values. Use linear final activations unless you intentionally need bounded output.
Binary Cross EntropyBinary classification and multi-label boolean targets.The backend trains this as a logits-based objective. Use linear logits in the final dense layer and apply thresholds during inference.
Cosine Similarity LossVector targets where direction matters more than magnitude.Useful for matching embeddings, similarity spaces, and directional vector objectives.
Embedding LossEmbedding-style outputs produced by the Onyx model-builder embedding objective.Use when the model is learning a vector representation rather than a scalar prediction.

Batch Size

Rows per training batch.

Max Epochs

Maximum passes through the training data.

Patience

Early-stopping patience before training stops improving.

Test Fraction

Validation/test split from 0 to 1.

Grad Accum Steps

Number of gradient accumulation steps used by the streaming training flow.

Inputs and Outputs

Inputs and Outputs map query-row fields into model tensors. Inputs become features. Outputs become targets during training and structured prediction fields after publish. Each mapping has a name, row path, source type, invalid/null policy, optional default value, optional vector dimensions, and a transform pipeline.

ItemUseNotes
numberFloating point or integer values from the training row.Common transforms include Mean / Std, Min / Max, Robust Scaler, Log, and Quantile.
booleanTrue/false feature flags or labels.Boolean outputs pair naturally with Binary Cross Entropy. Boolean feature transforms are centered for training.
categoricalString-like categories such as country, plan, status, type, or segment.Categorical values are indexed with an unknown bucket so future values can still be handled.
datetimeTimestamp values that should become model features.Use consistent source formats. Time Decay is available when recency should affect model input.
textFree-form natural language or descriptive fields.Use Text Encoding to expand text into token embeddings before optional child transforms.
vectorArrays of numeric features, embeddings, or indicator vectors.Set Vector Dimensions when the feature tokenizer expands vector dimensions into separate tokens.

Row path syntax

Paths support object keys, dots, and numeric array indexes, such asrow.customer.score or row.directions[0]. The leading row key can reference the returned row object.

Invalid or null policy

Choose Fail Training, Skip Row, or Use Default Value. Strict failures are useful while debugging, while defaults or skipped rows help production data quality issues.

Vector dimensions

Vector mappings should set Vector Dimensions when the vector width is known. This is required when Feature Tokenizer uses Expand vector dimensions.

Transforms

Transform pipelines prepare raw row values for training. Transforms are ordered, can be moved up or down, and can be removed. Text Encoding and Column Pipeline can contain child transforms, allowing post-encoding or nested column processing.

ItemUseNotes
DefaultPass values through using the default numeric normalization path.The initial transform added to new mappings.
BooleanConvert booleans into trainable numeric values.Best for true/false flags and binary labels.
CategoricalIndex categories discovered during training.The model retains categorical metadata so published predictions can decode consistently.
LogCompress skewed positive numeric values.Use only when the field domain makes sense for logarithmic scaling.
Mean / StdStandardize numeric columns around their training distribution.A strong default for dense numeric features.
Time DecayTurn timestamps or elapsed values into recency-weighted features.Configure Lambda to tune the decay curve.
L2 NormalizationNormalize vector-like feature groups.Useful for directional similarity and embedding inputs.
Robust ScalerScale numeric data while reducing sensitivity to outliers.Prefer for long-tailed operational data.
QuantileMap numeric values through distribution-aware quantile bins.Useful when rank is more stable than raw magnitude.
Min / MaxScale numeric values into a bounded range.Good when min and max are meaningful and stable.
Max AbsScale by maximum absolute value while preserving sign.Useful for sparse or centered numeric features.
Text EncodingExpand text into token embeddings.Configure Max Tokens and Embedding Size. Child transforms run after text expansion.
Column PipelineGroup child transforms into a nested column pipeline.Use for advanced transform sequences that should be treated as one mapping stage.

Text encoding rule

Text Encoding is intended for text mappings. If a text mapping uses Text Encoding, post-encoding transforms should be placed inside that encoding transform rather than as sibling transforms.

Architecture Editor

The Architecture editor defines the neural network graph. Add layers or blocks, configure each node, reorder nodes with the arrow controls, and remove nodes as needed. Container nodes such as Residual, Repeat Block, and Multi-Head Output open nested editors for child layers.

ItemUseNotes
Feature TokenizerCreates learned tokens from mapped features so attention layers can consume tabular rows.Must be the first executable layer. Configure model size, CLS token, and vector strategy.
DenseFully connected feed-forward layer.Configure input size, output size, activation, dropout, and scaled sigmoid bounds when used.
Reshape IntervalConverts flat input into token rows by interval count and features per interval.Use when a flat vector is naturally segmented before attention.
Multi-Head AttentionRuns attention across token rows.Configure tokens per sample, model size, head count, causal mask, and RoPE base.
Token PoolingPools token rows back to a flat representation.Pooling modes are mean, firstToken, and cls. CLS pooling requires the tokenizer CLS token.
Intervals To FlatFlattens token rows into one dense row per sample.Match intervals and token width to the upstream token shape.
Layer NormalizationNormalizes flat or token widths inside deeper architectures.Configure size, layer norm epsilon, and Adam epsilon.
Multi-Head OutputFans out a shared trunk into named output heads.Only one is supported and it must be final in its sequence.
ResidualWraps child layers in a residual branch.The branch must preserve the incoming shape. Residual Scale controls branch contribution.
Repeat BlockRepeats a child layer sequence without manually duplicating layers.Expanded layer count reflects the repeat count.

Dense layers work on flat tensors. Attention layers work on token rows. To move between those shapes, use Feature Tokenizer or Reshape Interval to create tokens, then Token Pooling or Intervals To Flat before returning to Dense layers or output heads.

Architecture Checks

Architecture Checks appear above the root architecture when the graph violates shape or ordering rules. They are meant to catch errors before a training run starts.

  • Feature Tokenizer first: a feature tokenizer must be the first executable layer when present.
  • Attention needs tokens: Multi-Head Attention should follow Feature Tokenizer, Reshape Interval, or another token-producing layer.
  • Token counts must match: Multi-Head Attention, Token Pooling, and Intervals To Flat must match upstream token count and token width.
  • Dense layers need flat input: add Token Pooling or Intervals To Flat before Dense if the upstream shape is tokens.
  • Residual branches preserve shape: a residual child sequence must return the same shape it received.
  • Multi-Head Output is final: only one Multi-Head Output is supported, it must be final, and each output mapping can belong to only one head.

Training Runs

Start Training saves any pending edits, opens the Training dialog, and starts an asynchronous run. A model can have only one active running session at a time. The dialog separates in-process runs from previous training results.

ItemUseNotes
Last Batch LossMost recent batch-level loss emitted by the training process.Use for live trend direction, not as the final model quality score.
Current Evaluation LossCumulative loss for the current epoch evaluation window.Watch this together with epoch progress and batch count.
Last Epoch LossAverage loss from the last completed epoch.Useful for deciding whether to complete or fine tune.
Best Epoch LossLowest retained epoch loss.The most useful single number for comparing runs in the UI.
Per-Head LossLoss summary for each output head.Important for multi-target models where one head can lag the others.

Complete Now

Completes a running session early and writes the current model weights to disk.

Fine Tune

Starts a follow-up run from a completed run that still has a trained model artifact.

Publish Model

Turns a completed trained artifact into a database-scoped published model version.

Delete run or artifact

Delete run history, delete an unpublished trained artifact, or keep history while removing weights that are no longer needed.

Published Models

Published Models are promoted versions created from completed training runs. Each card shows the published model name, run ID, version, publish time, model ID, best epoch loss, final epoch loss, and training time. Copy the Model ID when wiring applications to the prediction API.

Use Test Model to validate a published version without leaving the admin console. Raw Inputs mode creates a nested input object from the feature mapping row paths. Saved Script mode runs a saved query script, fills template parameters such as {{propertyId}}, and uses the script output rows as model inputs.

Prediction output in the test dialog

The response includes Structured Predictions, Prediction Inputs, and Raw Tensor Output. Use structured predictions for application behavior and raw tensor output when debugging activation ranges, logits, or output shape.

Prediction API

Published models can be called through the Onyx data API. The raw prediction endpoint accepts one object or an array of objects. The script prediction endpoint executes a saved script first, then maps its returned rows through the published model's feature mappings.

PurposeEndpoint
Predict from raw inputsPOST /data/{databaseId}/model-builder/published-model/{publishedModelId}/predict
Predict from a saved scriptPOST /data/{databaseId}/model-builder/published-model/{publishedModelId}/predict/script
POST /data/my-database/model-builder/published-model/published-model-123/predict
Content-Type: application/json

{
  "inputs": [
    {
      "row": {
        "account_age_days": 420,
        "is_enterprise": true,
        "country_code": "US"
      }
    }
  ]
}
POST /data/my-database/model-builder/published-model/published-model-123/predict/script
Content-Type: application/json

{
  "scriptId": "training-row-script",
  "scriptParameters": {
    "customerId": "customer-123"
  }
}
{
  "publishedModelId": "published-model-123",
  "modelId": "model-abc",
  "inputCount": 1,
  "inputs": [
    {
      "row": {
        "account_age_days": 420,
        "is_enterprise": true,
        "country_code": "US"
      }
    }
  ],
  "predictions": [
    {
      "churnRisk": 0.1875
    }
  ],
  "rawPredictions": [[0.1875]]
}

Troubleshooting

Most Model Builder issues come from query shape, mapping paths, transform assumptions, or mismatched tensor dimensions. Keep Architecture Checks visible while iterating and run the saved query directly before starting expensive training runs.

ItemUseNotes
Start Training is disabledNo saved training query is selected or a training run is already running.Select a Training Data Query in Training Setup and wait for any active run to finish.
No usable rowsThe saved query returned no records after mapping and invalid value handling.Run the query directly, confirm row paths, then decide whether invalid values should fail, skip, or use defaults.
Tokenizer token count mismatchAttention or pooling expects a different token count than the upstream tokenizer emits.For vector inputs, decide whether vectors should project to one token or expand each dimension into a token.
Dense input size mismatchA dense layer is receiving a flat width different from its configured input size.Update the dense input size or add token pooling / intervals to flat before dense layers.
Output head order warningMulti-head output mappings do not match the model value mapping order.Assign each output mapping to exactly one head and keep heads ordered like the Outputs list.
Published prediction returns unexpected valuesRaw model output differs from structured prediction decoding.Inspect both Structured Predictions and Raw Tensor Output in Test Model. Confirm output mappings and transforms.

FAQ

What is the Onyx Model Builder?

The Onyx Model Builder is the Onyx Cloud Admin workspace for creating database-scoped neural networks from saved Onyx queries, training them on query rows, and publishing model versions for prediction.

Do I need to export training data before using Model Builder?

No. The training query runs inside the Onyx environment and the backend maps the returned rows into model features and targets.

Can a published model be called from an application?

Yes. Published models expose prediction endpoints that accept raw input objects or execute a saved script and use its returned rows as inputs.

Can one model predict multiple outputs?

Yes. Add multiple Outputs and use a Multi-Head Output layer when each target should have its own named output head.

Can I fine tune a trained model?

Yes. Completed runs with a retained trained model artifact can be fine tuned from the Training Runs panel.

Next Steps

Need Help?

If you have any questions or need assistance: