Train a Classifier
Pick an algorithm, set params, click Train. This tutorial walks through building a complete classification pipeline using the Iris dataset. You’ll learn how to:
- Load and preview data
- Configure a classifier
- Evaluate model performance
- Interpret results
Time to complete: ~10 minutes
Prerequisites
Section titled “Prerequisites”- MLOps Desktop installed
- Python packages:
pip install scikit-learn pandas
Prepare the Dataset
Section titled “Prepare the Dataset”First, create a sample dataset. Open Terminal and run:
from sklearn.datasets import load_irisfrom pathlib import Pathimport pandas as pd
iris = load_iris(as_frame=True)df = iris.framedf.to_csv(Path.home() / "Desktop" / "iris.csv", index=False)print(f"Saved iris.csv with {len(df)} rows")This creates iris.csv on your Desktop with 150 samples and 5 columns:
sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)— featurestarget— class label (0=setosa, 1=versicolor, 2=virginica)
Build the Pipeline
Section titled “Build the Pipeline”-
Create a new pipeline
Open MLOps Desktop. You’ll see an empty canvas.
-
Add a DataLoader node
Click + Add Node → DataLoader.
Click the node to select it, then in the properties panel:
- Click Browse and select
iris.csv - The preview shows your data columns
- Click Browse and select
-
Add a Trainer node
Click + Add Node → Trainer.
Connect the DataLoader’s right handle to the Trainer’s left handle.
-
Configure the Trainer
Click the Trainer node. In the properties panel:
Setting Value Model Type Random Forest Classifier Target Column targetTest Size 0.2 Random State 42 Random Forest settings:
Parameter Value Why n_estimators 100 Number of trees (more = better, slower) max_depth 10 Limit tree depth to prevent overfitting min_samples_split 2 Minimum samples to split a node -
Add an Evaluator node
Click + Add Node → Evaluator.
Connect the Trainer’s right handle to the Evaluator’s left handle.
-
Run the pipeline
Click Run in the toolbar.
Watch the output panel as each node executes:
[DataLoader] Loaded iris.csv: 150 rows, 5 columns[Trainer] Training Random Forest on 120 samples...[Trainer] Training complete[Evaluator] Accuracy: 0.967 (29/30 correct)
Understanding Classification Metrics
Section titled “Understanding Classification Metrics”After running, the Evaluator shows several metrics:
Accuracy
Section titled “Accuracy”The simplest metric: percentage of correct predictions.
Accuracy = Correct Predictions / Total Predictions = 29 / 30 = 0.967 (96.7%)Precision, Recall, F1
Section titled “Precision, Recall, F1”For each class:
| Metric | Formula | Interpretation |
|---|---|---|
| Precision | TP / (TP + FP) | “Of all predicted positives, how many were correct?” |
| Recall | TP / (TP + FN) | “Of all actual positives, how many did we find?” |
| F1 Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
Confusion Matrix
Section titled “Confusion Matrix”A 3×3 matrix showing predictions vs actual labels:
Predicted 0 1 2Actual 0 [ 10 0 0 ] ← Setosa (perfect!) 1 [ 0 9 1 ] ← Versicolor (1 misclassified as Virginica) 2 [ 0 0 10 ] ← Virginica (perfect!)Diagonal values = correct predictions. Off-diagonal = errors.
Try Different Models
Section titled “Try Different Models”Experiment with other classifiers:
| Model | Best For |
|---|---|
| Logistic Regression | Linear decision boundaries, interpretable |
| Random Forest | Complex relationships, handles outliers |
| Gradient Boosting | Maximum accuracy, slower training |
| SVM | High-dimensional data, binary classification |
To switch models:
- Click the Trainer node
- Change Model Type
- Click Run again
Compare accuracy across models to find the best one for your data.
Save Your Pipeline
Section titled “Save Your Pipeline”Click Save and name it “iris-classifier”.
Your pipeline is now saved and can be reloaded anytime from the Load dropdown.
Next Steps
Section titled “Next Steps”Troubleshooting:
- “Column ‘target’ not found” — Check your CSV has a
targetcolumn, or select the correct column name - Low accuracy — Try increasing
n_estimatorsor changing the model type - “Not enough samples” — Ensure test_size leaves enough data for training (at least 50+ samples)