dataframe-operations-1.1.0.3: Column operations, expression DSL, and statistics for the dataframe ecosystem.
Safe HaskellNone
LanguageHaskell2010

DataFrame.Operations.SetOps

Description

These treat a DataFrame as a set of rows and implement the subobject lattice from relational algebra: union, intersect, difference, and symmetricDifference. Every result is deduplicated, so each operation has the schema-preserving shape DataFrame -> DataFrame -> DataFrame.

Row equality is the same hash-based notion used by distinct (see DataFrame.Operations.Aggregation), so these operations and distinct agree on what "the same row" means. Both inputs are expected to share a schema; the typed layer (Typed) enforces that statically.

Synopsis

Documentation

union :: DataFrame -> DataFrame -> DataFrame Source #

All rows that appear in either dataframe, deduplicated.

union a b is distinct (a <> b): the set union of the two row sets.

intersect :: DataFrame -> DataFrame -> DataFrame Source #

Rows that appear in both dataframes, deduplicated.

A row survives iff an equal row is present in each input.

difference :: DataFrame -> DataFrame -> DataFrame Source #

Rows present in the left dataframe but absent from the right, deduplicated (the relational EXCEPT; the subobject complement of a by b).

symmetricDifference :: DataFrame -> DataFrame -> DataFrame Source #

Rows present in exactly one of the two dataframes, deduplicated.

symmetricDifference a b is union (difference a b) (difference b a).