pub struct DataFrame { /* private fields */ }Expand description
A DataFrame-centric pipeline compiled into a lazy plan.
The public API stays in this crate’s own types. The current engine implementation is Polars, but callers do not need to depend on Polars types.
Implementations§
Source§impl DataFrame
impl DataFrame
Sourcepub fn from_dataset(ds: &DataSet) -> IngestionResult<Self>
pub fn from_dataset(ds: &DataSet) -> IngestionResult<Self>
Build a pipeline starting from an in-memory DataSet.
Note: this converts the dataset into a Polars DataFrame first. The transformations after
that are planned lazily.
Sourcepub fn filter(self, predicate: Predicate) -> IngestionResult<Self>
pub fn filter(self, predicate: Predicate) -> IngestionResult<Self>
Add a filter predicate.
Sourcepub fn multiply_f64(self, column: &str, factor: f64) -> IngestionResult<Self>
pub fn multiply_f64(self, column: &str, factor: f64) -> IngestionResult<Self>
Multiply a Float64 column by a constant factor (nulls remain null).
Sourcepub fn add_f64(self, column: &str, delta: f64) -> IngestionResult<Self>
pub fn add_f64(self, column: &str, delta: f64) -> IngestionResult<Self>
Add a constant Float64 value to a column (nulls remain null).
Sourcepub fn with_mul_f64(
self,
name: &str,
source: &str,
factor: f64,
) -> IngestionResult<Self>
pub fn with_mul_f64( self, name: &str, source: &str, factor: f64, ) -> IngestionResult<Self>
Add a derived Float64 column: name = source * factor (nulls remain null).
Sourcepub fn with_add_f64(
self,
name: &str,
source: &str,
delta: f64,
) -> IngestionResult<Self>
pub fn with_add_f64( self, name: &str, source: &str, delta: f64, ) -> IngestionResult<Self>
Add a derived Float64 column: name = source + delta (nulls remain null).
Sourcepub fn select(self, columns: &[&str]) -> IngestionResult<Self>
pub fn select(self, columns: &[&str]) -> IngestionResult<Self>
Select a subset of columns (in the provided order).
Sourcepub fn rename(self, pairs: &[(&str, &str)]) -> IngestionResult<Self>
pub fn rename(self, pairs: &[(&str, &str)]) -> IngestionResult<Self>
Rename columns.
This uses Polars’ rename(..., strict=true) behavior: all from columns must exist.
Sourcepub fn cast(self, column: &str, to: DataType) -> IngestionResult<Self>
pub fn cast(self, column: &str, to: DataType) -> IngestionResult<Self>
Cast a column to a target type.
Note: cast errors (e.g. invalid parses) surface at collect() time.
Sourcepub fn cast_with_mode(
self,
column: &str,
to: DataType,
mode: CastMode,
) -> IngestionResult<Self>
pub fn cast_with_mode( self, column: &str, to: DataType, mode: CastMode, ) -> IngestionResult<Self>
Cast a column with an explicit mode (strict vs lossy).
Sourcepub fn drop(self, columns: &[&str]) -> IngestionResult<Self>
pub fn drop(self, columns: &[&str]) -> IngestionResult<Self>
Drop columns by name.
Sourcepub fn fill_null(self, column: &str, value: Value) -> IngestionResult<Self>
pub fn fill_null(self, column: &str, value: Value) -> IngestionResult<Self>
Fill nulls in a column with a literal.
Sourcepub fn with_literal(self, name: &str, value: Value) -> IngestionResult<Self>
pub fn with_literal(self, name: &str, value: Value) -> IngestionResult<Self>
Add a derived column with a literal value.
Sourcepub fn group_by(self, keys: &[&str], aggs: &[Agg]) -> IngestionResult<Self>
pub fn group_by(self, keys: &[&str], aggs: &[Agg]) -> IngestionResult<Self>
Group rows by keys and compute aggregations.
Sourcepub fn join(
self,
other: DataFrame,
left_on: &[&str],
right_on: &[&str],
how: JoinKind,
) -> IngestionResult<Self>
pub fn join( self, other: DataFrame, left_on: &[&str], right_on: &[&str], how: JoinKind, ) -> IngestionResult<Self>
Join this pipeline with another DataFrame on key columns.
Note: join planning is infallible; missing-column errors surface at collect() time.
Sourcepub fn collect(self) -> IngestionResult<DataSet>
pub fn collect(self) -> IngestionResult<DataSet>
Collect the pipeline into an in-memory DataSet.
Sourcepub fn collect_with_schema(self, schema: &Schema) -> IngestionResult<DataSet>
pub fn collect_with_schema(self, schema: &Schema) -> IngestionResult<DataSet>
Collect the pipeline into an in-memory DataSet, enforcing an explicit output schema.
Sourcepub fn reduce(
self,
column: &str,
op: ReduceOp,
) -> IngestionResult<Option<Value>>
pub fn reduce( self, column: &str, op: ReduceOp, ) -> IngestionResult<Option<Value>>
Reduce a column using a built-in ReduceOp (Polars-backed).
Returns None if column does not exist (aligned with [crate::processing::reduce]).
Sourcepub fn sum(self, column: &str) -> IngestionResult<Option<Value>>
pub fn sum(self, column: &str) -> IngestionResult<Option<Value>>
Reduce a numeric column by summing values (nulls ignored; all-null -> null).
Returns None if column does not exist (aligned with processing::reduce).
Sourcepub fn feature_wise_mean_std(
self,
columns: &[&str],
std_kind: VarianceKind,
) -> IngestionResult<Vec<(String, FeatureMeanStd)>>
pub fn feature_wise_mean_std( self, columns: &[&str], std_kind: VarianceKind, ) -> IngestionResult<Vec<(String, FeatureMeanStd)>>
Single Polars collect: for each column, mean and standard deviation (std_kind maps to
Polars ddof). Columns are cast to Float64 first (aligned with scalar reduces).
Returns an error if any column name is missing from the lazy schema.
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for DataFrame
impl !RefUnwindSafe for DataFrame
impl Send for DataFrame
impl Sync for DataFrame
impl Unpin for DataFrame
impl !UnwindSafe for DataFrame
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more