Exporter Node
The Exporter node saves trained models to disk in various formats for deployment or later use.
Overview
Section titled “Overview”| Property | Value |
|---|---|
| Type | Terminal node |
| Inputs | Trained model (from Trainer) |
| Outputs | Model file(s) on disk |
| Formats | joblib, pickle, ONNX (coming soon) |
Configuration
Section titled “Configuration”Output Path
Section titled “Output Path”Where to save the model file.
~/Desktop/my_model.joblib/path/to/models/classifier_v1.pklExport Format
Section titled “Export Format”Best for: Python applications
- Efficient for NumPy arrays
- Smaller file sizes for large models
- Fast loading
import joblibmodel = joblib.load("model.joblib")Best for: Python applications (built-in)
- No extra dependencies
- Standard Python format
- Works everywhere
import picklewith open("model.pkl", "rb") as f: model = pickle.load(f)Best for: Cross-platform deployment
- Run in any language
- Optimized inference
- GPU acceleration
import onnxruntime as ortsession = ort.InferenceSession("model.onnx")Include Metadata
Section titled “Include Metadata”When enabled, saves a companion .json file with:
{ "model_type": "RandomForestClassifier", "sklearn_version": "1.3.0", "feature_names": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "target_names": ["setosa", "versicolor", "virginica"], "n_features": 4, "n_classes": 3, "training_date": "2024-01-15T10:30:00Z", "metrics": { "accuracy": 0.967, "f1_weighted": 0.965 }, "hyperparameters": { "n_estimators": 100, "max_depth": 10 }}Output Files
Section titled “Output Files”When you export model.joblib with metadata:
| File | Contents |
|---|---|
model.joblib | The trained model |
model_meta.json | Metadata (features, metrics, etc.) |
Loading Exported Models
Section titled “Loading Exported Models”In Python
Section titled “In Python”import joblibimport json
# Load modelmodel = joblib.load("model.joblib")
# Load metadatawith open("model_meta.json") as f: meta = json.load(f)
# Validate inputexpected_features = meta["feature_names"]print(f"Model expects {len(expected_features)} features: {expected_features}")
# Make predictionsimport numpy as npX_new = np.array([[5.1, 3.5, 1.4, 0.2]])prediction = model.predict(X_new)print(f"Predicted: {meta['target_names'][prediction[0]]}")In a Flask API
Section titled “In a Flask API”from flask import Flask, request, jsonifyimport joblibimport numpy as np
app = Flask(__name__)model = joblib.load("model.joblib")
@app.route("/predict", methods=["POST"])def predict(): data = request.json X = np.array(data["features"]) predictions = model.predict(X) return jsonify({"predictions": predictions.tolist()})
if __name__ == "__main__": app.run(host="0.0.0.0", port=5000)Model Versioning
Section titled “Model Versioning”Best practices for managing model versions:
File Naming Convention
Section titled “File Naming Convention”{model_name}_v{version}_{algorithm}_{date}.joblibExamples:
churn_model_v1_rf_2024-01-15.joblibchurn_model_v2_gb_tuned_2024-01-20.joblib
Directory Structure
Section titled “Directory Structure”models/├── production/│ └── churn_model_current.joblib -> ../v2/churn_model.joblib├── v1/│ ├── churn_model.joblib│ └── churn_model_meta.json└── v2/ ├── churn_model.joblib └── churn_model_meta.jsonGit LFS for Large Models
Section titled “Git LFS for Large Models”For models > 100MB, use Git LFS:
git lfs installgit lfs track "*.joblib"git add .gitattributesgit add models/git commit -m "Add trained model"Common Issues
Section titled “Common Issues””Incompatible sklearn version”
Section titled “”Incompatible sklearn version””Models saved with one scikit-learn version may not load with another.
Solution: Check the sklearn_version in metadata and install the matching version:
pip install scikit-learn==1.3.0Model file is too large
Section titled “Model file is too large”Large models (especially Random Forest with many trees) can be big.
Solutions:
- Reduce
n_estimators - Use compression:
joblib.dump(model, "model.joblib", compress=3) - Export to ONNX for smaller files (coming soon)
“ModuleNotFoundError” when loading
Section titled ““ModuleNotFoundError” when loading”The loading environment must have the same packages as the training environment.
Solution: Include a requirements.txt:
scikit-learn==1.3.0pandas==2.0.0numpy==1.24.0Security Considerations
Section titled “Security Considerations”Best practices:
- Only load models you created or trust
- Verify file integrity with checksums
- Use ONNX for sharing models publicly (safer format)
Generated Code
Section titled “Generated Code”import joblibimport jsonfrom datetime import datetime
# Save modeljoblib.dump(model, "/path/to/model.joblib")
# Save metadatametadata = { "model_type": type(model).__name__, "feature_names": list(X.columns), "training_date": datetime.now().isoformat(), "metrics": { "accuracy": accuracy, "f1_score": f1 }}
with open("/path/to/model_meta.json", "w") as f: json.dump(metadata, f, indent=2)
print(f"Model saved to /path/to/model.joblib")