Expand description
DataFrame-centric pipeline/transforms backed by a Polars lazy plan.
This module provides a small, engine-delegated pipeline API that compiles to a Polars
[polars::prelude::LazyFrame] and then collects results back into our in-memory crate::types::DataSet.
Design goals for Phase 1:
- Keep the public API in our own types (no Polars types in signatures)
- Support a minimal set of transformation primitives needed for parity/benchmarks
- Provide deterministic, testable behavior (null handling, missing column errors)
§Examples
use rust_data_processing::pipeline::{Agg, DataFrame, JoinKind, Predicate};
use rust_data_processing::types::{DataSet, DataType, Field, Schema, Value};
let ds = DataSet::new(
Schema::new(vec![
Field::new("id", DataType::Int64),
Field::new("active", DataType::Bool),
Field::new("score", DataType::Int64),
Field::new("grp", DataType::Utf8),
]),
vec![
vec![Value::Int64(1), Value::Bool(true), Value::Int64(10), Value::Utf8("A".to_string())],
vec![Value::Int64(2), Value::Bool(true), Value::Null, Value::Utf8("A".to_string())],
],
);
// Rename + cast + fill nulls.
let cleaned = DataFrame::from_dataset(&ds)?
.rename(&[("score", "score_i")])?
.cast("score_i", DataType::Float64)?
.fill_null("score_i", Value::Float64(0.0))?;
// Filter + group_by.
let _out = cleaned
.filter(Predicate::Eq {
column: "active".to_string(),
value: Value::Bool(true),
})?
.group_by(
&["grp"],
&[Agg::Sum {
column: "score_i".to_string(),
alias: "sum_score".to_string(),
}],
)?
.collect()?;
// Join two DataFrames.
let left = DataFrame::from_dataset(&ds)?;
let right = DataFrame::from_dataset(&ds)?;
let _joined = left.join(right, &["id"], &["id"], JoinKind::Inner)?;Structs§
- Data
Frame - A DataFrame-centric pipeline compiled into a lazy plan.
Enums§
- Agg
- Aggregations for
DataFrame::group_by. - Cast
Mode - Casting behavior for
DataFrame::cast_with_mode. - Join
Kind - Join behavior for
DataFrame::join. - Predicate
- A predicate used by
DataFrame::filter.
Type Aliases§
- Polars
Pipeline - Backwards-compatible alias for earlier naming.