Track Experiments
MLOps Desktop includes full experiment tracking: run history, experiment grouping, model registry with versioning, and side-by-side comparison. All data stays local in SQLite.
Time to complete: ~10 minutes
Prerequisites
Section titled “Prerequisites”- Run several pipelines with different configurations
- Familiarity with the Quickstart
Runs Tab
Section titled “Runs Tab”The Runs tab shows all pipeline executions with metrics.
View Modes
Section titled “View Modes”All runs in a single list, sorted by date (newest first).
| Column | Description |
|---|---|
| Name | Run display name (editable) |
| Pipeline | Pipeline that was executed |
| Experiment | Associated experiment (if any) |
| Metrics | Key metric (Accuracy or R²) |
| Duration | Execution time |
| Status | Success/Failed |
Runs grouped by experiment.
▼ Titanic Classification (3 runs) Run 1: Accuracy 0.82 Run 2: Accuracy 0.85 Run 3: Accuracy 0.87▼ Iris Baseline (2 runs) Run 1: Accuracy 0.96 Run 2: Accuracy 0.97▼ Unassigned (5 runs) ...Run Actions
Section titled “Run Actions”Click a run to:
- View metrics — Full metrics display in Metrics tab
- View explain data — Load SHAP/importance from this run
- Edit metadata — Change display name, add notes, tags
- Register model — Add to model registry
- Delete — Remove run (with confirmation)
Experiments
Section titled “Experiments”Experiments group related runs for organization.
Create an Experiment
Section titled “Create an Experiment”- Click the Experiment selector dropdown in the toolbar
- Click + New Experiment
- Enter a name and description
- Click Create
New runs will automatically be associated with the active experiment.
Experiment Status
Section titled “Experiment Status”| Status | Meaning |
|---|---|
| Active | Currently running experiments |
| Completed | Finished experiments (archive candidates) |
| Archived | Hidden from default view |
Filter runs by experiment status in the Runs tab.
Compare Runs
Section titled “Compare Runs”Compare 2-5 runs side-by-side:
- In the Runs tab, select multiple runs (Cmd+click)
- Click Compare
- View the comparison modal
Comparison View
Section titled “Comparison View” Run 1 Run 2 Run 3Accuracy 0.82 0.85 0.87 ✓Precision 0.79 0.83 0.85 ✓Recall 0.78 0.81 0.84 ✓F1 Score 0.78 0.82 0.84 ✓
Parameters:n_estimators 100 150 200max_depth 10 15 15learning_rate - - 0.1The best value in each row is highlighted.
Model Registry
Section titled “Model Registry”The Models tab manages your model versions.
Register a Model
Section titled “Register a Model”- In the Runs tab, click a successful run
- Click Register Model
- Enter model name (or select existing)
- Add description, tags
- Click Register
Model Stages
Section titled “Model Stages”Each model version has a stage:
| Stage | Purpose | Typical Use |
|---|---|---|
| None | Just registered | New models |
| Staging | Testing/validation | Pre-production |
| Production | Serving live traffic | Current best model |
| Archived | No longer active | Old versions |
Promote models through stages:
None → Staging → Production → ArchivedModel Actions
Section titled “Model Actions”- View versions — Expand to see all versions
- Compare versions — Side-by-side metrics comparison
- Promote/demote — Change stage
- View details — File path, format, metrics snapshot
- Delete — Remove version (with confirmation)
Tagging
Section titled “Tagging”Add tags to runs and model versions for organization:
Tags: baseline, tuned, production-candidate, best-so-farFilter by tags in both Runs and Models tabs.
Tag Best Practices
Section titled “Tag Best Practices”| Tag | Use For |
|---|---|
baseline | Reference model to beat |
tuned | After hyperparameter optimization |
production | Currently deployed |
experiment-v2 | Group related experiments |
best | Your top performer |
Annotations
Section titled “Annotations”Add notes to runs for context:
Display Name: "RF with feature engineering"Notes: | - Added age_group feature - Removed cabin column (too many nulls) - Stratified by SurvivedTags: [feature-engineering, v2]Access via Edit Metadata on any run.
Best Practices
Section titled “Best Practices”Always Use Experiments
Section titled “Always Use Experiments”Group related runs into experiments:
Experiment: "Titanic - Model Selection" Run 1: Logistic Regression Run 2: Random Forest Run 3: Gradient Boosting
Experiment: "Titanic - RF Tuning" Run 1: n_estimators=100 Run 2: n_estimators=200 Run 3: Optuna 50 trialsTag Important Runs
Section titled “Tag Important Runs”When you get a good result, tag it immediately. It’s easy to forget which run was best.
Compare Before Deploying
Section titled “Compare Before Deploying”Before promoting a model:
- Compare against current production
- Verify improvement on all key metrics
- Check explain data for reasonable feature importance
Clean Up Failed Runs
Section titled “Clean Up Failed Runs”Delete failed or test runs to keep history clean. Use filters to find them:
- Status: Failed
- Duration: < 5s (likely errors)
Data Storage
Section titled “Data Storage”All tracking data is stored locally:
~/Library/Application Support/com.mlops.desktop/settings.dbTables:
runs— Run metadata, status, durationrun_metrics— Metrics per run (JSON for complex data)models— Model registrymodel_versions— Versions with stage, tagsexperiments— Experiment metadatarun_annotations— Display names, notes, tags
Export Data
Section titled “Export Data”Export run history for external analysis:
import sqlite3import pandas as pd
conn = sqlite3.connect( "~/Library/Application Support/com.mlops.desktop/settings.db")runs = pd.read_sql("SELECT * FROM runs", conn)metrics = pd.read_sql("SELECT * FROM run_metrics", conn)