DataLoader Node
The DataLoader node is the starting point for most pipelines. It loads data files and passes them to downstream nodes for processing.
Overview
Section titled “Overview”| Property | Value |
|---|---|
| Type | Source node |
| Inputs | None |
| Outputs | DataFrame |
| Supported formats | CSV, Parquet |
Configuration
Section titled “Configuration”File Path
Section titled “File Path”The path to your data file. Click Browse to select a file, or enter the path manually.
/Users/yourname/Desktop/data.csvSupported Formats
Section titled “Supported Formats”| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Comma-separated values |
| Parquet | .parquet | Columnar storage format |
Output
Section titled “Output”The DataLoader outputs a pandas DataFrame with:
- All columns from the source file
- Inferred data types (numeric, string, datetime)
- Original row order preserved
Data Preview
Section titled “Data Preview”After selecting a file, the node shows a preview:
- Rows: Total number of rows
- Columns: Column names and types
- Sample: First 5 rows of data
Usage Examples
Section titled “Usage Examples”Basic Loading
Section titled “Basic Loading”- Add a DataLoader node to the canvas
- Click the node to open the properties panel
- Click Browse and select your CSV file
- Connect to a Trainer or Script node
Large Files
Section titled “Large Files”For files over 100MB:
- Consider converting to Parquet format first
- Use a Script node to sample or filter data
- Check available memory before loading
Common Issues
Section titled “Common Issues””File not found”
Section titled “”File not found””- Check that the file path is correct
- Ensure the file hasn’t been moved or renamed
- Use absolute paths (starting with
/)
“Encoding error”
Section titled ““Encoding error””- CSV files should use UTF-8 encoding
- If your file uses a different encoding, convert it first:
iconv -f ISO-8859-1 -t UTF-8 input.csv > output.csv“Memory error”
Section titled ““Memory error””- File is too large for available RAM
- Sample the data or use a smaller subset
- Consider using Parquet format
Generated Code
Section titled “Generated Code”When the pipeline runs, DataLoader generates:
import pandas as pd
# Load the datadf = pd.read_csv("/path/to/data.csv")
# Display infoprint(f"Loaded {len(df)} rows, {len(df.columns)} columns")print(df.dtypes)Best Practices
Section titled “Best Practices”- Use descriptive file names —
customer_churn_2024.csvnotdata.csv - Keep data files in a consistent location — Avoid moving files after creating pipelines
- Check data quality — Preview the data to ensure it loaded correctly
- Use Parquet for large files — Faster loading and smaller file size