Trainer Node

The Trainer node is the core of model training in MLOps Desktop. It supports three modes: training new models, loading pre-trained models, and hyperparameter tuning with Optuna.

Overview

Property	Value
Type	Processing node
Inputs	DataFrame (from DataLoader or DataSplit)
Outputs	Trained model
Library	scikit-learn
Modes	Train, Load, Tune

Operating Modes

The Trainer node has three modes, selected via toggle buttons:

Train a new model from scratch.

Configuration:

Model Type — Select from 12 algorithms
Target Column — Column to predict
Test Split — Ratio for train/test split (if not using DataSplit node)

Best for initial model development and quick experiments.

Load a pre-trained model from disk.

Configuration:

Model File Path — Path to .joblib, .pkl, or .pickle file

Best for using models trained elsewhere or resuming work.

Supported Models

Classification Models (6)

Model	Description	Key Hyperparameters
Logistic Regression	Linear classifier, interpretable	`C`, `max_iter`
Random Forest Classifier	Ensemble of decision trees	`n_estimators`, `max_depth`, `min_samples_split`
Gradient Boosting Classifier	Sequential boosting, high accuracy	`n_estimators`, `learning_rate`, `max_depth`
SVM (SVC)	Support vector machine	`C`, `kernel`, `gamma`
KNN Classifier	Distance-based classification	`n_neighbors`, `weights`, `metric`
MLP Classifier	Neural network	`hidden_layer_sizes`, `alpha`, `learning_rate_init`

Regression Models (6)

Model	Description	Key Hyperparameters
Linear Regression	Simple linear model	None (no tuning)
Random Forest Regressor	Ensemble for regression	`n_estimators`, `max_depth`, `min_samples_split`
Gradient Boosting Regressor	Boosted trees for regression	`n_estimators`, `learning_rate`, `max_depth`
SVM (SVR)	Support vector regression	`C`, `kernel`, `gamma`
KNN Regressor	Distance-based regression	`n_neighbors`, `weights`, `metric`
MLP Regressor	Neural network for regression	`hidden_layer_sizes`, `alpha`, `learning_rate_init`

Hyperparameter Tuning

When Tune mode is selected, click the tuning config button to open the TuningPanel.

Search Strategies

Strategy	Description	Best For
Bayesian (TPE)	Tree-structured Parzen Estimator, learns from past trials	Most cases (default)
Random	Uniform random sampling	Baseline comparison
Grid	Exhaustive enumeration of all combinations	Small, discrete spaces

Tuning Configuration

Setting	Range	Default	Description
Number of Trials	1-1000	50	How many configurations to try
CV Folds	2-10	3	Cross-validation folds
Scoring Metric	varies	accuracy/r2	Metric to optimize

Search Spaces by Model

Each model has predefined search ranges:

Random Forest:

n_estimators: 50-300 (step 50)
max_depth: [null, 10, 15, 20, 30]
min_samples_split: 2-10 (step 2)
min_samples_leaf: 1-4 (step 1)

Gradient Boosting:

n_estimators: 50-300 (step 50)
learning_rate: 0.01-0.3 (log scale)
max_depth: 3-8 (step 1)
subsample: 0.7-1.0 (uniform)

SVM (SVC/SVR):

C: 0.1-100 (log scale)
kernel: [rbf, linear, poly]
gamma: [scale, auto]

KNN:

n_neighbors: 3-21 (step 2)
weights: [uniform, distance]
metric: [euclidean, manhattan, minkowski]

MLP Neural Network:

hidden_layer_sizes: [(50,), (100,), (100,50), (100,100)]
alpha: 0.0001-0.1 (log scale)
learning_rate_init: 0.0001-0.1 (log scale)
max_iter: 200-1000 (step 100)

Scoring Metrics

Classification:

Accuracy
F1 Score
Precision
Recall
ROC AUC

Regression:

R² Score
Neg MSE
Neg MAE
Neg RMSE

Automatic Preprocessing

The Trainer automatically handles:

Missing Values — Numeric columns filled with median, categorical with mode
Categorical Encoding — Label encoding for all categorical columns
ID Column Filtering — Drops columns like id, index, name, ticket, cabin
High-Cardinality Columns — Drops columns with >50 unique values

Connections

Direction	Node Types
Input from	DataLoader, DataSplit
Output to	Evaluator, ModelExporter

Typical pipeline:

DataLoader → DataSplit → Trainer → Evaluator

Trials Panel

When tuning, results appear in the Trials tab:

Column	Description
Trial #	Trial number
Score	Cross-validation score
Parameters	Hyperparameter values used
Duration	Time taken
Status	Complete, Pruned, or Failed

The best trial is highlighted with a star icon.

Common Issues

”Linear Regression cannot be tuned”

Linear Regression has no tunable hyperparameters. Use Train mode instead, or select a different model.

”Optuna not installed”

Install Optuna:

pip install optuna

Tuning is slow

Reduce number of trials
Use Random search instead of Grid
Reduce CV folds (minimum 2)
Use a faster model (Logistic Regression vs MLP)

“Target column not found”

Check that the column name matches exactly (case-sensitive). Use the DataLoader preview to verify column names.

DataLoader — Load training data
DataSplit — Split into train/test sets
Evaluator — Evaluate trained models
ModelExporter — Export models