Module pipeline

Module pipeline 

Source
Expand description

DataFrame-centric pipeline/transforms backed by a Polars lazy plan.

This module provides a small, engine-delegated pipeline API that compiles to a Polars [polars::prelude::LazyFrame] and then collects results back into our in-memory crate::types::DataSet.

Design goals for Phase 1:

  • Keep the public API in our own types (no Polars types in signatures)
  • Support a minimal set of transformation primitives needed for parity/benchmarks
  • Provide deterministic, testable behavior (null handling, missing column errors)

§Examples

use rust_data_processing::pipeline::{Agg, DataFrame, JoinKind, Predicate};
use rust_data_processing::types::{DataSet, DataType, Field, Schema, Value};

let ds = DataSet::new(
    Schema::new(vec![
        Field::new("id", DataType::Int64),
        Field::new("active", DataType::Bool),
        Field::new("score", DataType::Int64),
        Field::new("grp", DataType::Utf8),
    ]),
    vec![
        vec![Value::Int64(1), Value::Bool(true), Value::Int64(10), Value::Utf8("A".to_string())],
        vec![Value::Int64(2), Value::Bool(true), Value::Null, Value::Utf8("A".to_string())],
    ],
);

// Rename + cast + fill nulls.
let cleaned = DataFrame::from_dataset(&ds)?
    .rename(&[("score", "score_i")])?
    .cast("score_i", DataType::Float64)?
    .fill_null("score_i", Value::Float64(0.0))?;

// Filter + group_by.
let _out = cleaned
    .filter(Predicate::Eq {
        column: "active".to_string(),
        value: Value::Bool(true),
    })?
    .group_by(
        &["grp"],
        &[Agg::Sum {
            column: "score_i".to_string(),
            alias: "sum_score".to_string(),
        }],
    )?
    .collect()?;

// Join two DataFrames.
let left = DataFrame::from_dataset(&ds)?;
let right = DataFrame::from_dataset(&ds)?;
let _joined = left.join(right, &["id"], &["id"], JoinKind::Inner)?;

Structs§

DataFrame
A DataFrame-centric pipeline compiled into a lazy plan.

Enums§

Agg
Aggregations for DataFrame::group_by.
CastMode
Casting behavior for DataFrame::cast_with_mode.
JoinKind
Join behavior for DataFrame::join.
Predicate
A predicate used by DataFrame::filter.

Type Aliases§

PolarsPipeline
Backwards-compatible alias for earlier naming.