Onyx Cloud Admin
Onyx Model Builder guide for database-backed neural networks
The Onyx Model Builder lets you create Onyx-backed neural networks directly from database queries. This guide documents the full screen, including model setup, training data, input and output mappings, transform pipelines, neural network architecture, asynchronous training runs, published model versions, and prediction testing.
Build from saved queries
Train Onyx models
Publish predictions
Overview
Model Builder is a dashboard workspace for teams that want machine learning close to operational data. A model is scoped to the selected organization and database. Its training data comes from a saved query, its fields are mapped into feature and target tensors, and its published versions can be called through Onyx prediction endpoints.
The feature is best for tabular prediction, scoring, classification, multi-label outputs, vector regression, text or embedding experiments, and attention-based architectures over database rows. The screen is designed so model definitions, training history, and published artifacts stay attached to the database that produced them.
Core workflow
Select an organization and database, create a model, choose a training query, map inputs and outputs, build the architecture, start a training run, publish a completed artifact, then test or call the published model.
Access and Permissions
Open /model-builder in the Onyx Cloud Admin console after signing in. The workspace requires an active organization and database from the top context bar. The model list, saved query selector, and training actions are scoped to that database.
- Read model definitions: requires database read access.
- Create, edit, delete, train, complete, cancel, fine tune, and publish: require maintainer-level database access.
- Published model prediction: uses the database data API with read-only access and accepts either bearer auth or database API key credentials.
- Saved queries: create them in the Query Editor before selecting them as training data or prediction scripts.
Workspace Tour
The Model Builder screen is organized as a persistent model workspace. The header shows the selected model name, model count, configured layer count, expanded layer count, active organization, active database, and save status. Changes auto-save after a short debounce, and pending changes are flushed before training starts or the page unloads.
Header actions
Model switcher
Training Setup
Inputs, outputs, and architecture
Model Lifecycle
Create a model
Select New Model or Create model, provide a name and description, then save. New models start with two numeric input mappings, one numeric output mapping, and a simple dense architecture.
Switch and edit model metadata
Use the model dropdown to switch definitions. Use Edit to rename the model or update the description. Model metadata is stored with the same database scope as the architecture.
Delete only when artifacts are no longer needed
Delete removes the saved architecture for the current organization and database. Published models and run artifacts should be reviewed before deleting a model used by applications.
Training Setup
Training Setup connects the model definition to data. Choose a saved query from the Query Editor. The query can return row objects directly, or return a query definition that the backend streams before applying mappings.
Training query shape
The backend expects a list of objects or an object containing records. Each returned row should expose the paths referenced by Inputs and Outputs. For example, a mapping path of row.features[0] reads the first element in thefeatures array when the query returns row objects.
| Item | Use | Notes |
|---|---|---|
| Mean Squared Error | Continuous regression targets, such as scores, forecasts, quantities, or risk values. | Outputs are trained as dense numeric values. Use linear final activations unless you intentionally need bounded output. |
| Binary Cross Entropy | Binary classification and multi-label boolean targets. | The backend trains this as a logits-based objective. Use linear logits in the final dense layer and apply thresholds during inference. |
| Cosine Similarity Loss | Vector targets where direction matters more than magnitude. | Useful for matching embeddings, similarity spaces, and directional vector objectives. |
| Embedding Loss | Embedding-style outputs produced by the Onyx model-builder embedding objective. | Use when the model is learning a vector representation rather than a scalar prediction. |
Batch Size
Rows per training batch.
Max Epochs
Maximum passes through the training data.
Patience
Early-stopping patience before training stops improving.
Test Fraction
Validation/test split from 0 to 1.
Grad Accum Steps
Number of gradient accumulation steps used by the streaming training flow.
Inputs and Outputs
Inputs and Outputs map query-row fields into model tensors. Inputs become features. Outputs become targets during training and structured prediction fields after publish. Each mapping has a name, row path, source type, invalid/null policy, optional default value, optional vector dimensions, and a transform pipeline.
| Item | Use | Notes |
|---|---|---|
| number | Floating point or integer values from the training row. | Common transforms include Mean / Std, Min / Max, Robust Scaler, Log, and Quantile. |
| boolean | True/false feature flags or labels. | Boolean outputs pair naturally with Binary Cross Entropy. Boolean feature transforms are centered for training. |
| categorical | String-like categories such as country, plan, status, type, or segment. | Categorical values are indexed with an unknown bucket so future values can still be handled. |
| datetime | Timestamp values that should become model features. | Use consistent source formats. Time Decay is available when recency should affect model input. |
| text | Free-form natural language or descriptive fields. | Use Text Encoding to expand text into token embeddings before optional child transforms. |
| vector | Arrays of numeric features, embeddings, or indicator vectors. | Set Vector Dimensions when the feature tokenizer expands vector dimensions into separate tokens. |
Row path syntax
row.customer.score or row.directions[0]. The leading row key can reference the returned row object.Invalid or null policy
Vector dimensions
Transforms
Transform pipelines prepare raw row values for training. Transforms are ordered, can be moved up or down, and can be removed. Text Encoding and Column Pipeline can contain child transforms, allowing post-encoding or nested column processing.
| Item | Use | Notes |
|---|---|---|
| Default | Pass values through using the default numeric normalization path. | The initial transform added to new mappings. |
| Boolean | Convert booleans into trainable numeric values. | Best for true/false flags and binary labels. |
| Categorical | Index categories discovered during training. | The model retains categorical metadata so published predictions can decode consistently. |
| Log | Compress skewed positive numeric values. | Use only when the field domain makes sense for logarithmic scaling. |
| Mean / Std | Standardize numeric columns around their training distribution. | A strong default for dense numeric features. |
| Time Decay | Turn timestamps or elapsed values into recency-weighted features. | Configure Lambda to tune the decay curve. |
| L2 Normalization | Normalize vector-like feature groups. | Useful for directional similarity and embedding inputs. |
| Robust Scaler | Scale numeric data while reducing sensitivity to outliers. | Prefer for long-tailed operational data. |
| Quantile | Map numeric values through distribution-aware quantile bins. | Useful when rank is more stable than raw magnitude. |
| Min / Max | Scale numeric values into a bounded range. | Good when min and max are meaningful and stable. |
| Max Abs | Scale by maximum absolute value while preserving sign. | Useful for sparse or centered numeric features. |
| Text Encoding | Expand text into token embeddings. | Configure Max Tokens and Embedding Size. Child transforms run after text expansion. |
| Column Pipeline | Group child transforms into a nested column pipeline. | Use for advanced transform sequences that should be treated as one mapping stage. |
Text encoding rule
Text Encoding is intended for text mappings. If a text mapping uses Text Encoding, post-encoding transforms should be placed inside that encoding transform rather than as sibling transforms.
Architecture Editor
The Architecture editor defines the neural network graph. Add layers or blocks, configure each node, reorder nodes with the arrow controls, and remove nodes as needed. Container nodes such as Residual, Repeat Block, and Multi-Head Output open nested editors for child layers.
| Item | Use | Notes |
|---|---|---|
| Feature Tokenizer | Creates learned tokens from mapped features so attention layers can consume tabular rows. | Must be the first executable layer. Configure model size, CLS token, and vector strategy. |
| Dense | Fully connected feed-forward layer. | Configure input size, output size, activation, dropout, and scaled sigmoid bounds when used. |
| Reshape Interval | Converts flat input into token rows by interval count and features per interval. | Use when a flat vector is naturally segmented before attention. |
| Multi-Head Attention | Runs attention across token rows. | Configure tokens per sample, model size, head count, causal mask, and RoPE base. |
| Token Pooling | Pools token rows back to a flat representation. | Pooling modes are mean, firstToken, and cls. CLS pooling requires the tokenizer CLS token. |
| Intervals To Flat | Flattens token rows into one dense row per sample. | Match intervals and token width to the upstream token shape. |
| Layer Normalization | Normalizes flat or token widths inside deeper architectures. | Configure size, layer norm epsilon, and Adam epsilon. |
| Multi-Head Output | Fans out a shared trunk into named output heads. | Only one is supported and it must be final in its sequence. |
| Residual | Wraps child layers in a residual branch. | The branch must preserve the incoming shape. Residual Scale controls branch contribution. |
| Repeat Block | Repeats a child layer sequence without manually duplicating layers. | Expanded layer count reflects the repeat count. |
Dense layers work on flat tensors. Attention layers work on token rows. To move between those shapes, use Feature Tokenizer or Reshape Interval to create tokens, then Token Pooling or Intervals To Flat before returning to Dense layers or output heads.
Architecture Checks
Architecture Checks appear above the root architecture when the graph violates shape or ordering rules. They are meant to catch errors before a training run starts.
- Feature Tokenizer first: a feature tokenizer must be the first executable layer when present.
- Attention needs tokens: Multi-Head Attention should follow Feature Tokenizer, Reshape Interval, or another token-producing layer.
- Token counts must match: Multi-Head Attention, Token Pooling, and Intervals To Flat must match upstream token count and token width.
- Dense layers need flat input: add Token Pooling or Intervals To Flat before Dense if the upstream shape is tokens.
- Residual branches preserve shape: a residual child sequence must return the same shape it received.
- Multi-Head Output is final: only one Multi-Head Output is supported, it must be final, and each output mapping can belong to only one head.
Recommended Patterns
Dense regression or classification baseline
Start with a Dense or Residual MLP before adding attention. Use numeric, boolean, categorical, or vector mappings, normalize inputs, then emit one or more outputs with a final linear layer. For binary labels, choose Binary Cross Entropy and keep the final activation linear so the model emits logits.
Dense
inputSize: feature width
outputSize: 256
activation: GELU
dropoutRate: 0.10
Layer Normalization
size: 256
Dense
inputSize: 256
outputSize: target width
activation: LINEARAttention over tabular or vector features
Use Feature Tokenizer when each feature should become a token. For vector inputs, choose Project vector to one token when the entire vector should be one learned token, or Expand vector dimensions when each vector element should become its own token.
Feature Tokenizer
modelSize: 64
includeClsToken: true
vectorStrategy: Expand vector dimensions
Multi-Head Attention
tokensPerSample: feature token count + 1 CLS token
modelSize: 64
headCount: 8
Token Pooling
poolingMode: cls
tokensPerSample: same token count
modelSize: 64
Dense
inputSize: 64
outputSize: target width
activation: LINEARMulti-output models
Use Multi-Head Output when the model has a shared trunk but needs separate heads for separate targets. Keep the output head mapping order aligned with the Outputs list so structured predictions decode correctly.
Training Runs
Start Training saves any pending edits, opens the Training dialog, and starts an asynchronous run. A model can have only one active running session at a time. The dialog separates in-process runs from previous training results.
| Item | Use | Notes |
|---|---|---|
| Last Batch Loss | Most recent batch-level loss emitted by the training process. | Use for live trend direction, not as the final model quality score. |
| Current Evaluation Loss | Cumulative loss for the current epoch evaluation window. | Watch this together with epoch progress and batch count. |
| Last Epoch Loss | Average loss from the last completed epoch. | Useful for deciding whether to complete or fine tune. |
| Best Epoch Loss | Lowest retained epoch loss. | The most useful single number for comparing runs in the UI. |
| Per-Head Loss | Loss summary for each output head. | Important for multi-target models where one head can lag the others. |
Complete Now
Fine Tune
Publish Model
Delete run or artifact
Published Models
Published Models are promoted versions created from completed training runs. Each card shows the published model name, run ID, version, publish time, model ID, best epoch loss, final epoch loss, and training time. Copy the Model ID when wiring applications to the prediction API.
Use Test Model to validate a published version without leaving the admin console. Raw Inputs mode creates a nested input object from the feature mapping row paths. Saved Script mode runs a saved query script, fills template parameters such as {{propertyId}}, and uses the script output rows as model inputs.
Prediction output in the test dialog
The response includes Structured Predictions, Prediction Inputs, and Raw Tensor Output. Use structured predictions for application behavior and raw tensor output when debugging activation ranges, logits, or output shape.
Prediction API
Published models can be called through the Onyx data API. The raw prediction endpoint accepts one object or an array of objects. The script prediction endpoint executes a saved script first, then maps its returned rows through the published model's feature mappings.
| Purpose | Endpoint |
|---|---|
| Predict from raw inputs | POST /data/{databaseId}/model-builder/published-model/{publishedModelId}/predict |
| Predict from a saved script | POST /data/{databaseId}/model-builder/published-model/{publishedModelId}/predict/script |
POST /data/my-database/model-builder/published-model/published-model-123/predict
Content-Type: application/json
{
"inputs": [
{
"row": {
"account_age_days": 420,
"is_enterprise": true,
"country_code": "US"
}
}
]
}POST /data/my-database/model-builder/published-model/published-model-123/predict/script
Content-Type: application/json
{
"scriptId": "training-row-script",
"scriptParameters": {
"customerId": "customer-123"
}
}{
"publishedModelId": "published-model-123",
"modelId": "model-abc",
"inputCount": 1,
"inputs": [
{
"row": {
"account_age_days": 420,
"is_enterprise": true,
"country_code": "US"
}
}
],
"predictions": [
{
"churnRisk": 0.1875
}
],
"rawPredictions": [[0.1875]]
}Troubleshooting
Most Model Builder issues come from query shape, mapping paths, transform assumptions, or mismatched tensor dimensions. Keep Architecture Checks visible while iterating and run the saved query directly before starting expensive training runs.
| Item | Use | Notes |
|---|---|---|
| Start Training is disabled | No saved training query is selected or a training run is already running. | Select a Training Data Query in Training Setup and wait for any active run to finish. |
| No usable rows | The saved query returned no records after mapping and invalid value handling. | Run the query directly, confirm row paths, then decide whether invalid values should fail, skip, or use defaults. |
| Tokenizer token count mismatch | Attention or pooling expects a different token count than the upstream tokenizer emits. | For vector inputs, decide whether vectors should project to one token or expand each dimension into a token. |
| Dense input size mismatch | A dense layer is receiving a flat width different from its configured input size. | Update the dense input size or add token pooling / intervals to flat before dense layers. |
| Output head order warning | Multi-head output mappings do not match the model value mapping order. | Assign each output mapping to exactly one head and keep heads ordered like the Outputs list. |
| Published prediction returns unexpected values | Raw model output differs from structured prediction decoding. | Inspect both Structured Predictions and Raw Tensor Output in Test Model. Confirm output mappings and transforms. |
FAQ
What is the Onyx Model Builder?
The Onyx Model Builder is the Onyx Cloud Admin workspace for creating database-scoped neural networks from saved Onyx queries, training them on query rows, and publishing model versions for prediction.
Do I need to export training data before using Model Builder?
No. The training query runs inside the Onyx environment and the backend maps the returned rows into model features and targets.
Can a published model be called from an application?
Yes. Published models expose prediction endpoints that accept raw input objects or execute a saved script and use its returned rows as inputs.
Can one model predict multiple outputs?
Yes. Add multiple Outputs and use a Multi-Head Output layer when each target should have its own named output head.
Can I fine tune a trained model?
Yes. Completed runs with a retained trained model artifact can be fine tuned from the Training Runs panel.
Next Steps
Need Help?
If you have any questions or need assistance:
- Email:support@onyx.dev
- Documentation: Visit ourHelp Centerfor tutorials and FAQs.