Skip to content

Train a Classifier

This tutorial walks through building a complete classification pipeline using the Iris dataset. You’ll learn how to:

  • Load and preview data
  • Configure a classifier
  • Evaluate model performance
  • Interpret results

Time to complete: ~10 minutes

  • MLOps Desktop installed
  • Python packages: pip install scikit-learn pandas

First, create a sample dataset. Open Terminal and run:

from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris(as_frame=True)
df = iris.frame
df.to_csv("~/Desktop/iris.csv", index=False)
print(f"Saved iris.csv with {len(df)} rows")

This creates iris.csv on your Desktop with 150 samples and 5 columns:

  • sepal length (cm), sepal width (cm), petal length (cm), petal width (cm) — features
  • target — class label (0=setosa, 1=versicolor, 2=virginica)
  1. Create a new pipeline

    Open MLOps Desktop. You’ll see an empty canvas.

  2. Add a DataLoader node

    Click + Add NodeDataLoader.

    Click the node to select it, then in the properties panel:

    • Click Browse and select iris.csv
    • The preview shows your data columns
  3. Add a Trainer node

    Click + Add NodeTrainer.

    Connect the DataLoader’s right handle to the Trainer’s left handle.

  4. Configure the Trainer

    Click the Trainer node. In the properties panel:

    SettingValue
    Model TypeRandom Forest Classifier
    Target Columntarget
    Test Size0.2
    Random State42

    Random Forest settings:

    ParameterValueWhy
    n_estimators100Number of trees (more = better, slower)
    max_depth10Limit tree depth to prevent overfitting
    min_samples_split2Minimum samples to split a node
  5. Add an Evaluator node

    Click + Add NodeEvaluator.

    Connect the Trainer’s right handle to the Evaluator’s left handle.

  6. Run the pipeline

    Click Run in the toolbar.

    Watch the output panel as each node executes:

    [DataLoader] Loaded iris.csv: 150 rows, 5 columns
    [Trainer] Training Random Forest on 120 samples...
    [Trainer] Training complete
    [Evaluator] Accuracy: 0.967 (29/30 correct)

After running, the Evaluator shows several metrics:

The simplest metric: percentage of correct predictions.

Accuracy = Correct Predictions / Total Predictions
= 29 / 30
= 0.967 (96.7%)

For each class:

MetricFormulaInterpretation
PrecisionTP / (TP + FP)“Of all predicted positives, how many were correct?”
RecallTP / (TP + FN)“Of all actual positives, how many did we find?”
F1 Score2 × (P × R) / (P + R)Harmonic mean of precision and recall

A 3×3 matrix showing predictions vs actual labels:

Predicted
0 1 2
Actual 0 [ 10 0 0 ] ← Setosa (perfect!)
1 [ 0 9 1 ] ← Versicolor (1 misclassified as Virginica)
2 [ 0 0 10 ] ← Virginica (perfect!)

Diagonal values = correct predictions. Off-diagonal = errors.

Experiment with other classifiers:

ModelBest For
Logistic RegressionLinear decision boundaries, interpretable
Random ForestComplex relationships, handles outliers
Gradient BoostingMaximum accuracy, slower training
SVMHigh-dimensional data, binary classification

To switch models:

  1. Click the Trainer node
  2. Change Model Type
  3. Click Run again

Compare accuracy across models to find the best one for your data.

Click Save and name it “iris-classifier”.

Your pipeline is now saved and can be reloaded anytime from the Load dropdown.


Troubleshooting:

  • “Column ‘target’ not found” — Check your CSV has a target column, or select the correct column name
  • Low accuracy — Try increasing n_estimators or changing the model type
  • “Not enough samples” — Ensure test_size leaves enough data for training (at least 50+ samples)