Skip to content

DataLoader Node

The DataLoader node is the starting point for most pipelines. It loads data files and passes them to downstream nodes for processing.

PropertyValue
TypeSource node
InputsNone
OutputsDataFrame
Supported formatsCSV, Parquet

The path to your data file. Click Browse to select a file, or enter the path manually.

/Users/yourname/Desktop/data.csv
FormatExtensionNotes
CSV.csvComma-separated values
Parquet.parquetColumnar storage format

The DataLoader outputs a pandas DataFrame with:

  • All columns from the source file
  • Inferred data types (numeric, string, datetime)
  • Original row order preserved

After selecting a file, the node shows a preview:

  • Rows: Total number of rows
  • Columns: Column names and types
  • Sample: First 5 rows of data
  1. Add a DataLoader node to the canvas
  2. Click the node to open the properties panel
  3. Click Browse and select your CSV file
  4. Connect to a Trainer or Script node

For files over 100MB:

  1. Consider converting to Parquet format first
  2. Use a Script node to sample or filter data
  3. Check available memory before loading
  • Check that the file path is correct
  • Ensure the file hasn’t been moved or renamed
  • Use absolute paths (starting with /)
  • CSV files should use UTF-8 encoding
  • If your file uses a different encoding, convert it first:
Terminal window
iconv -f ISO-8859-1 -t UTF-8 input.csv > output.csv
  • File is too large for available RAM
  • Sample the data or use a smaller subset
  • Consider using Parquet format

When the pipeline runs, DataLoader generates:

import pandas as pd
# Load the data
df = pd.read_csv("/path/to/data.csv")
# Display info
print(f"Loaded {len(df)} rows, {len(df.columns)} columns")
print(df.dtypes)
  1. Use descriptive file namescustomer_churn_2024.csv not data.csv
  2. Keep data files in a consistent location — Avoid moving files after creating pipelines
  3. Check data quality — Preview the data to ensure it loaded correctly
  4. Use Parquet for large files — Faster loading and smaller file size
  • Trainer — Train models on loaded data
  • Script — Custom data preprocessing