Skip to content

Track Experiments

MLOps Desktop includes full experiment tracking: run history, experiment grouping, model registry with versioning, and side-by-side comparison. All data stays local in SQLite.

Time to complete: ~10 minutes

  • Run several pipelines with different configurations
  • Familiarity with the Quickstart

The Runs tab shows all pipeline executions with metrics.

All runs in a single list, sorted by date (newest first).

ColumnDescription
NameRun display name (editable)
PipelinePipeline that was executed
ExperimentAssociated experiment (if any)
MetricsKey metric (Accuracy or R²)
DurationExecution time
StatusSuccess/Failed

Click a run to:

  • View metrics — Full metrics display in Metrics tab
  • View explain data — Load SHAP/importance from this run
  • Edit metadata — Change display name, add notes, tags
  • Register model — Add to model registry
  • Delete — Remove run (with confirmation)

Experiments group related runs for organization.

  1. Click the Experiment selector dropdown in the toolbar
  2. Click + New Experiment
  3. Enter a name and description
  4. Click Create

New runs will automatically be associated with the active experiment.

StatusMeaning
ActiveCurrently running experiments
CompletedFinished experiments (archive candidates)
ArchivedHidden from default view

Filter runs by experiment status in the Runs tab.

Compare 2-5 runs side-by-side:

  1. In the Runs tab, select multiple runs (Cmd+click)
  2. Click Compare
  3. View the comparison modal
Run 1 Run 2 Run 3
Accuracy 0.82 0.85 0.87 ✓
Precision 0.79 0.83 0.85 ✓
Recall 0.78 0.81 0.84 ✓
F1 Score 0.78 0.82 0.84 ✓
Parameters:
n_estimators 100 150 200
max_depth 10 15 15
learning_rate - - 0.1

The best value in each row is highlighted.

The Models tab manages your model versions.

  1. In the Runs tab, click a successful run
  2. Click Register Model
  3. Enter model name (or select existing)
  4. Add description, tags
  5. Click Register

Each model version has a stage:

StagePurposeTypical Use
NoneJust registeredNew models
StagingTesting/validationPre-production
ProductionServing live trafficCurrent best model
ArchivedNo longer activeOld versions

Promote models through stages:

None → Staging → Production → Archived
  • View versions — Expand to see all versions
  • Compare versions — Side-by-side metrics comparison
  • Promote/demote — Change stage
  • View details — File path, format, metrics snapshot
  • Delete — Remove version (with confirmation)

Add tags to runs and model versions for organization:

Tags: baseline, tuned, production-candidate, best-so-far

Filter by tags in both Runs and Models tabs.

TagUse For
baselineReference model to beat
tunedAfter hyperparameter optimization
productionCurrently deployed
experiment-v2Group related experiments
bestYour top performer

Add notes to runs for context:

Display Name: "RF with feature engineering"
Notes: |
- Added age_group feature
- Removed cabin column (too many nulls)
- Stratified by Survived
Tags: [feature-engineering, v2]

Access via Edit Metadata on any run.

Group related runs into experiments:

Experiment: "Titanic - Model Selection"
Run 1: Logistic Regression
Run 2: Random Forest
Run 3: Gradient Boosting
Experiment: "Titanic - RF Tuning"
Run 1: n_estimators=100
Run 2: n_estimators=200
Run 3: Optuna 50 trials

When you get a good result, tag it immediately. It’s easy to forget which run was best.

Before promoting a model:

  1. Compare against current production
  2. Verify improvement on all key metrics
  3. Check explain data for reasonable feature importance

Delete failed or test runs to keep history clean. Use filters to find them:

  • Status: Failed
  • Duration: < 5s (likely errors)

All tracking data is stored locally:

~/Library/Application Support/com.mlops.desktop/settings.db

Tables:

  • runs — Run metadata, status, duration
  • run_metrics — Metrics per run (JSON for complex data)
  • models — Model registry
  • model_versions — Versions with stage, tags
  • experiments — Experiment metadata
  • run_annotations — Display names, notes, tags

Export run history for external analysis:

import sqlite3
import pandas as pd
conn = sqlite3.connect(
"~/Library/Application Support/com.mlops.desktop/settings.db"
)
runs = pd.read_sql("SELECT * FROM runs", conn)
metrics = pd.read_sql("SELECT * FROM run_metrics", conn)