Train a Classifier
This tutorial walks through building a complete classification pipeline using the Iris dataset. You’ll learn how to:
- Load and preview data
- Configure a classifier
- Evaluate model performance
- Interpret results
Time to complete: ~10 minutes
Prerequisites
Section titled “Prerequisites”- MLOps Desktop installed
- Python packages:
pip install scikit-learn pandas
Prepare the Dataset
Section titled “Prepare the Dataset”First, create a sample dataset. Open Terminal and run:
from sklearn.datasets import load_irisimport pandas as pd
iris = load_iris(as_frame=True)df = iris.framedf.to_csv("~/Desktop/iris.csv", index=False)print(f"Saved iris.csv with {len(df)} rows")This creates iris.csv on your Desktop with 150 samples and 5 columns:
sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)— featurestarget— class label (0=setosa, 1=versicolor, 2=virginica)
Build the Pipeline
Section titled “Build the Pipeline”-
Create a new pipeline
Open MLOps Desktop. You’ll see an empty canvas.
-
Add a DataLoader node
Click + Add Node → DataLoader.
Click the node to select it, then in the properties panel:
- Click Browse and select
iris.csv - The preview shows your data columns
- Click Browse and select
-
Add a Trainer node
Click + Add Node → Trainer.
Connect the DataLoader’s right handle to the Trainer’s left handle.
-
Configure the Trainer
Click the Trainer node. In the properties panel:
Setting Value Model Type Random Forest Classifier Target Column targetTest Size 0.2 Random State 42 Random Forest settings:
Parameter Value Why n_estimators 100 Number of trees (more = better, slower) max_depth 10 Limit tree depth to prevent overfitting min_samples_split 2 Minimum samples to split a node -
Add an Evaluator node
Click + Add Node → Evaluator.
Connect the Trainer’s right handle to the Evaluator’s left handle.
-
Run the pipeline
Click Run in the toolbar.
Watch the output panel as each node executes:
[DataLoader] Loaded iris.csv: 150 rows, 5 columns[Trainer] Training Random Forest on 120 samples...[Trainer] Training complete[Evaluator] Accuracy: 0.967 (29/30 correct)
Understanding Classification Metrics
Section titled “Understanding Classification Metrics”After running, the Evaluator shows several metrics:
Accuracy
Section titled “Accuracy”The simplest metric: percentage of correct predictions.
Accuracy = Correct Predictions / Total Predictions = 29 / 30 = 0.967 (96.7%)Precision, Recall, F1
Section titled “Precision, Recall, F1”For each class:
| Metric | Formula | Interpretation |
|---|---|---|
| Precision | TP / (TP + FP) | “Of all predicted positives, how many were correct?” |
| Recall | TP / (TP + FN) | “Of all actual positives, how many did we find?” |
| F1 Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
Confusion Matrix
Section titled “Confusion Matrix”A 3×3 matrix showing predictions vs actual labels:
Predicted 0 1 2Actual 0 [ 10 0 0 ] ← Setosa (perfect!) 1 [ 0 9 1 ] ← Versicolor (1 misclassified as Virginica) 2 [ 0 0 10 ] ← Virginica (perfect!)Diagonal values = correct predictions. Off-diagonal = errors.
Try Different Models
Section titled “Try Different Models”Experiment with other classifiers:
| Model | Best For |
|---|---|
| Logistic Regression | Linear decision boundaries, interpretable |
| Random Forest | Complex relationships, handles outliers |
| Gradient Boosting | Maximum accuracy, slower training |
| SVM | High-dimensional data, binary classification |
To switch models:
- Click the Trainer node
- Change Model Type
- Click Run again
Compare accuracy across models to find the best one for your data.
Save Your Pipeline
Section titled “Save Your Pipeline”Click Save and name it “iris-classifier”.
Your pipeline is now saved and can be reloaded anytime from the Load dropdown.
Next Steps
Section titled “Next Steps”Find optimal model settings with Optuna
Explain PredictionsUnderstand why your model makes predictions
Troubleshooting:
- “Column ‘target’ not found” — Check your CSV has a
targetcolumn, or select the correct column name - Low accuracy — Try increasing
n_estimatorsor changing the model type - “Not enough samples” — Ensure test_size leaves enough data for training (at least 50+ samples)