Skip to content

narwhals.LazyFrame

Narwhals LazyFrame, backed by a native lazyframe.

Warning

This class is not meant to be instantiated directly - instead use narwhals.from_native with a native object that is a lazy dataframe from one of the supported backend (e.g. polars.LazyFrame, dask_expr._collection.DataFrame):

narwhals.from_native(native_lazyframe)

columns: list[str] property

Get column names.

Returns:

Type Description
list[str]

The column names stored in a list.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>>
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

We define a library agnostic function:

>>> def agnostic_columns(df_native: IntoFrame) -> list[str]:
...     df = nw.from_native(df_native)
...     return df.columns

We can then pass either pandas or Polars to agnostic_columns:

>>> agnostic_columns(df_pd)
['foo', 'bar', 'ham']
>>> agnostic_columns(lf_pl)
['foo', 'bar', 'ham']

implementation: Implementation property

Return implementation of native frame.

This can be useful when you need to some special-casing for some libraries for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.

Returns:

Type Description
Implementation

Implementation.

Examples:

>>> import narwhals as nw
>>> import polars as pl
>>> lf_native = pl.LazyFrame({"a": [1, 2, 3]})
>>> lf = nw.from_native(lf_native)
>>> lf.implementation
<Implementation.POLARS: 6>
>>> lf.implementation.is_pandas()
False
>>> lf.implementation.is_polars()
True

schema: Schema property

Get an ordered mapping of column names to their data type.

Returns:

Type Description
Schema

A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> lf_pl = pl.LazyFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf.schema
Schema({'foo': Int64, 'bar': Float64, 'ham', String})

clone()

Create a copy of this DataFrame.

Returns:

Type Description
Self

An identical copy of the original LazyFrame.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which we copy the DataFrame:

>>> def agnostic_clone(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.clone().to_native()
>>> agnostic_clone(df_pd)
   a  b
0  1  3
1  2  4
>>> agnostic_clone(lf_pl).collect()
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

collect()

Materialize this LazyFrame into a DataFrame.

Returns:

Type Description
DataFrame[Any]

DataFrame

Examples:

>>> import narwhals as nw
>>> import polars as pl
>>> lf_pl = pl.LazyFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf
┌───────────────────────────────────────┐
| Narwhals LazyFrame                    |
| Use `.to_native` to see native output |
└───────────────────────────────────────┘
>>> df = lf.group_by("a").agg(nw.all().sum()).collect()
>>> df.to_native().sort("a")
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 4   ┆ 10  │
│ b   ┆ 11  ┆ 10  │
│ c   ┆ 6   ┆ 1   │
└─────┴─────┴─────┘

collect_schema()

Get an ordered mapping of column names to their data type.

Returns:

Type Description
Schema

A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> lf_pl = pl.LazyFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf.collect_schema()
Schema({'foo': Int64, 'bar': Float64, 'ham': String})

drop(*columns, strict=True)

Remove columns from the LazyFrame.

Parameters:

Name Type Description Default
*columns str | Iterable[str]

Names of the columns that should be removed from the dataframe.

()
strict bool

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

True

Returns:

Type Description
Self

The LazyFrame with the specified columns removed.

Warning

strict argument is ignored for polars<1.0.0.

Please consider upgrading to a newer version or pass to eager mode.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> def agnostic_drop(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.drop("ham").to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_drop(df_pd)
   foo  bar
0    1  6.0
1    2  7.0
2    3  8.0
>>> agnostic_drop(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 6.0 │
│ 2   ┆ 7.0 │
│ 3   ┆ 8.0 │
└─────┴─────┘

Use positional arguments to drop multiple columns.

>>> def agnostic_drop(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.drop("foo", "ham").to_native()
>>> agnostic_drop(df_pd)
   bar
0  6.0
1  7.0
2  8.0
>>> agnostic_drop(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ f64 │
╞═════╡
│ 6.0 │
│ 7.0 │
│ 8.0 │
└─────┘

drop_nulls(subset=None)

Drop rows that contain null values.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) for which null values are considered. If set to None (default), use all columns.

None

Returns:

Type Description
Self

The original object with the rows removed that contained the null values.

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1.0, 2.0, None], "ba": [1.0, None, 2.0]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_drop_nulls(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.drop_nulls().to_native()

We can then pass any supported library such as Pandas or Polars to agnostic_drop_nulls:

>>> agnostic_drop_nulls(df_pd)
     a   ba
0  1.0  1.0
>>> agnostic_drop_nulls(lf_pl).collect()
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ ba  │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
└─────┴─────┘

explode(columns, *more_columns)

Explode the dataframe to long format by exploding the given columns.

Notes

It is possible to explode multiple columns only if these columns must have matching element counts.

Parameters:

Name Type Description Default
columns str | Sequence[str]

Column names. The underlying columns being exploded must be of the List data type.

required
*more_columns str

Additional names of columns to explode, specified as positional arguments.

()

Returns:

Type Description
Self

New LazyFrame

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> import polars as pl
>>> data = {
...     "a": ["x", "y", "z", "w"],
...     "lst1": [[1, 2], None, [None], []],
...     "lst2": [[3, None], None, [42], []],
... }

We define a library agnostic function:

>>> def agnostic_explode(df_native: IntoFrameT) -> IntoFrameT:
...     return (
...         nw.from_native(df_native)
...         .with_columns(nw.col("lst1", "lst2").cast(nw.List(nw.Int32())))
...         .explode("lst1", "lst2")
...         .to_native()
...     )

We can then pass any supported library such as pandas, Polars (eager), or PyArrow to agnostic_explode:

>>> agnostic_explode(pl.LazyFrame(data)).collect()
shape: (5, 3)
┌─────┬──────┬──────┐
│ a   ┆ lst1 ┆ lst2 │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i32  ┆ i32  │
╞═════╪══════╪══════╡
│ x   ┆ 1    ┆ 3    │
│ x   ┆ 2    ┆ null │
│ y   ┆ null ┆ null │
│ z   ┆ null ┆ 42   │
│ w   ┆ null ┆ null │
└─────┴──────┴──────┘

filter(*predicates, **constraints)

Filter the rows in the LazyFrame based on a predicate expression.

The original order of the remaining rows is preserved.

Parameters:

Name Type Description Default
*predicates IntoExpr | Iterable[IntoExpr] | list[bool]

Expression that evaluates to a boolean Series. Can also be a (single!) boolean list.

()
**constraints Any

Column filters; use name = value to filter columns by the supplied value. Each constraint will behave the same as nw.col(name).eq(value), and will be implicitly joined with the other filter conditions using &.

{}

Returns:

Type Description
Self

The filtered LazyFrame.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which we filter on one condition.

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter(nw.col("foo") > 1).to_native()

We can then pass either pandas or Polars to agnostic_filter:

>>> agnostic_filter(df_pd)
   foo  bar ham
1    2    7   b
2    3    8   c
>>> agnostic_filter(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_filter(lf_pl).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Filter on multiple conditions:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter((nw.col("foo") < 3) & (nw.col("ham") == "a")).to_native()
>>>
>>> agnostic_filter(df_pd)
   foo  bar ham
0    1    6   a
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> agnostic_filter(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Provide multiple filters using *args syntax:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter(
...         nw.col("foo") == 1,
...         nw.col("ham") == "a",
...     ).to_native()
>>>
>>> agnostic_filter(df_pd)
   foo  bar ham
0    1    6   a
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> agnostic_filter(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Filter on an OR condition:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter(
...         (nw.col("foo") == 1) | (nw.col("ham") == "c")
...     ).to_native()
>>>
>>> agnostic_filter(df_pd)
   foo  bar ham
0    1    6   a
2    3    8   c
>>> agnostic_filter(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_filter(lf_pl).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Provide multiple filters using **kwargs syntax:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter(foo=2, ham="b").to_native()
>>>
>>> agnostic_filter(df_pd)
   foo  bar ham
1    2    7   b
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
└─────┴─────┴─────┘
>>> agnostic_filter(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
└─────┴─────┴─────┘

gather_every(n, offset=0)

Take every nth row in the DataFrame and return as a new DataFrame.

Parameters:

Name Type Description Default
n int

Gather every n-th row.

required
offset int

Starting index.

0

Returns:

Type Description
Self

The LazyFrame containing only the selected rows.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which gather every 2 rows, starting from a offset of 1:

>>> def agnostic_gather_every(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.gather_every(n=2, offset=1).to_native()
>>> agnostic_gather_every(df_pd)
   a  b
1  2  6
3  4  8
>>> agnostic_gather_every(lf_pl).collect()
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 6   │
│ 4   ┆ 8   │
└─────┴─────┘

group_by(*keys, drop_null_keys=False)

Start a group by operation.

Parameters:

Name Type Description Default
*keys str | Iterable[str]

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

()
drop_null_keys bool

if True, then groups where any key is null won't be included in the result.

False

Returns:

Name Type Description
LazyGroupBy LazyGroupBy[Self]

Object which can be used to perform aggregations.

Examples:

Group by one column and call agg to compute the grouped sum of another column.

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> df = {
...     "a": ["a", "b", "a", "b", "c"],
...     "b": [1, 2, 1, 3, 3],
...     "c": [5, 4, 3, 2, 1],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

Let's define a dataframe-agnostic function in which we group by one column and call agg to compute the grouped sum of another column.

>>> def agnostic_group_by_agg(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.group_by("a").agg(nw.col("b").sum()).sort("a").to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_group_by_agg(df_pd)
   a  b
0  a  2
1  b  5
2  c  3
>>> agnostic_group_by_agg(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘
>>> agnostic_group_by_agg(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘

Group by multiple columns by passing a list of column names.

>>> def agnostic_group_by_agg(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return (
...         df.group_by(["a", "b"]).agg(nw.max("c")).sort(["a", "b"]).to_native()
...     )
>>>
>>> agnostic_group_by_agg(df_pd)
   a  b  c
0  a  1  5
1  b  2  4
2  b  3  2
3  c  3  1
>>> agnostic_group_by_agg(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘
>>> agnostic_group_by_agg(lf_pl).collect()
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘

head(n=5)

Get the first n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Returns:

Type Description
Self

A subset of the LazyFrame of shape (n, n_columns).

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "a": [1, 2, 3, 4, 5, 6],
...     "b": [7, 8, 9, 10, 11, 12],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function that gets the first 3 rows.

>>> def agnostic_head(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.head(3).to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_head(df_pd)
   a  b
0  1  7
1  2  8
2  3  9
>>> agnostic_head(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘
>>> agnostic_head(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘

join(other, on=None, how='inner', *, left_on=None, right_on=None, suffix='_right')

Add a join operation to the Logical Plan.

Parameters:

Name Type Description Default
other Self

Lazy DataFrame to join with.

required
on str | list[str] | None

Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None.

None
how Literal['inner', 'left', 'cross', 'semi', 'anti']

Join strategy.

  • inner: Returns rows that have matching values in both tables.
  • left: Returns all rows from the left table, and the matched rows from the right table.
  • cross: Returns the Cartesian product of rows from both tables.
  • semi: Filter rows that have a match in the right table.
  • anti: Filter rows that do not have a match in the right table.
'inner'
left_on str | list[str] | None

Join column of the left DataFrame.

None
right_on str | list[str] | None

Join column of the right DataFrame.

None
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Returns:

Type Description
Self

A new joined LazyFrame.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> data_other = {
...     "apple": ["x", "y", "z"],
...     "ham": ["a", "b", "d"],
... }
>>> df_pd = pd.DataFrame(data)
>>> other_pd = pd.DataFrame(data_other)
>>> lf_pl = pl.LazyFrame(data)
>>> other_pl = pl.LazyFrame(data_other)

Let's define a dataframe-agnostic function in which we join over "ham" column:

>>> def agnostic_join_on_ham(
...     df_native: IntoFrameT,
...     other_native: IntoFrameT,
... ) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     other = nw.from_native(other_native)
...     return df.join(other, left_on="ham", right_on="ham").to_native()

We can now pass either pandas or Polars to the function:

>>> agnostic_join_on_ham(df_pd, other_pd)
   foo  bar ham apple
0    1  6.0   a     x
1    2  7.0   b     y
>>> agnostic_join_on_ham(lf_pl, other_pl).collect()
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
└─────┴─────┴─────┴───────┘

join_asof(other, *, left_on=None, right_on=None, on=None, by_left=None, by_right=None, by=None, strategy='backward')

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the asof_join key.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
left_on str | None

Name(s) of the left join column(s).

None
right_on str | None

Name(s) of the right join column(s).

None
on str | None

Join column of both DataFrames. If set, left_on and right_on should be None.

None
by_left str | list[str] | None

join on these columns before doing asof join

None
by_right str | list[str] | None

join on these columns before doing asof join

None
by str | list[str] | None

join on these columns before doing asof join

None
strategy Literal['backward', 'forward', 'nearest']

Join strategy. The default is "backward".

  • backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key.
  • forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key.
  • nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.
'backward'

Returns:

Type Description
Self

A new joined LazyFrame.

Examples:

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from typing import Literal
>>> from narwhals.typing import IntoFrameT
>>>
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_pd = pd.DataFrame(data_gdp)
>>> population_pd = pd.DataFrame(data_population)
>>> gdp_pl = pl.LazyFrame(data_gdp).sort("datetime")
>>> population_pl = pl.LazyFrame(data_population).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" column:

>>> def agnostic_join_asof_datetime(
...     df_native: IntoFrameT,
...     other_native: IntoFrameT,
...     strategy: Literal["backward", "forward", "nearest"],
... ) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     other = nw.from_native(other_native)
...     return df.join_asof(other, on="datetime", strategy=strategy).to_native()

We can now pass either pandas or Polars to the function:

>>> agnostic_join_asof_datetime(population_pd, gdp_pd, strategy="backward")
    datetime  population   gdp
0 2016-03-01       82.19  4164
1 2018-08-01       82.66  4566
2 2019-01-01       83.12  4696
>>> agnostic_join_asof_datetime(
...     population_pl, gdp_pl, strategy="backward"
... ).collect()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

Here is a real-world times-series example that uses by argument.

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data_quotes = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 30),
...         datetime(2016, 5, 25, 13, 30, 0, 41),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 49),
...         datetime(2016, 5, 25, 13, 30, 0, 72),
...         datetime(2016, 5, 25, 13, 30, 0, 75),
...     ],
...     "ticker": [
...         "GOOG",
...         "MSFT",
...         "MSFT",
...         "MSFT",
...         "GOOG",
...         "AAPL",
...         "GOOG",
...         "MSFT",
...     ],
...     "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...     "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
>>> data_trades = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 38),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...     ],
...     "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
...     "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...     "quantity": [75, 155, 100, 100, 100],
... }
>>> quotes_pd = pd.DataFrame(data_quotes)
>>> trades_pd = pd.DataFrame(data_trades)
>>> quotes_pl = pl.LazyFrame(data_quotes).sort("datetime")
>>> trades_pl = pl.LazyFrame(data_trades).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" and by "ticker" columns:

>>> def agnostic_join_asof_datetime_by_ticker(
...     df_native: IntoFrameT,
...     other_native: IntoFrameT,
... ) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     other = nw.from_native(other_native)
...     return df.join_asof(other, on="datetime", by="ticker").to_native()

We can now pass either pandas or Polars to the function:

>>> agnostic_join_asof_datetime_by_ticker(trades_pd, quotes_pd)
                    datetime ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.000023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.000038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.000048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.000048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.000048   AAPL   98.00       100     NaN     NaN
>>> agnostic_join_asof_datetime_by_ticker(trades_pl, quotes_pl).collect()
shape: (5, 6)
┌────────────────────────────┬────────┬────────┬──────────┬───────┬────────┐
│ datetime                   ┆ ticker ┆ price  ┆ quantity ┆ bid   ┆ ask    │
│ ---                        ┆ ---    ┆ ---    ┆ ---      ┆ ---   ┆ ---    │
│ datetime[μs]               ┆ str    ┆ f64    ┆ i64      ┆ f64   ┆ f64    │
╞════════════════════════════╪════════╪════════╪══════════╪═══════╪════════╡
│ 2016-05-25 13:30:00.000023 ┆ MSFT   ┆ 51.95  ┆ 75       ┆ 51.95 ┆ 51.96  │
│ 2016-05-25 13:30:00.000038 ┆ MSFT   ┆ 51.95  ┆ 155      ┆ 51.97 ┆ 51.98  │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.77 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.92 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ AAPL   ┆ 98.0   ┆ 100      ┆ null  ┆ null   │
└────────────────────────────┴────────┴────────┴──────────┴───────┴────────┘

lazy()

Lazify the DataFrame (if possible).

If a library does not support lazy execution, then this is a no-op.

Returns:

Type Description
Self

A LazyFrame.

Examples:

Construct pandas and Polars objects:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

We define a library agnostic function:

>>> def agnostic_lazy(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.lazy().to_native()

Note that then, pandas dataframe stay eager, and the Polars LazyFrame stays lazy:

>>> agnostic_lazy(df_pd)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_lazy(lf_pl)
<LazyFrame ...>

pipe(function, *args, **kwargs)

Pipe function call.

Parameters:

Name Type Description Default
function Callable[[Any], Self]

Function to apply.

required
args Any

Positional arguments to pass to function.

()
kwargs Any

Keyword arguments to pass to function.

{}

Returns:

Type Description
Self

The original object with the function applied.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2, 3], "ba": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_pipe(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.pipe(lambda _df: _df.select("a")).to_native()

We can then pass either pandas or Polars:

>>> agnostic_pipe(df_pd)
   a
0  1
1  2
2  3
>>> agnostic_pipe(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

rename(mapping)

Rename column names.

Parameters:

Name Type Description Default
mapping dict[str, str]

Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name.

required

Returns:

Type Description
Self

The LazyFrame with the specified columns renamed.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> def agnostic_rename(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.rename({"foo": "apple"}).to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_rename(df_pd)
   apple  bar ham
0      1    6   a
1      2    7   b
2      3    8   c
>>> agnostic_rename(lf_pl).collect()
shape: (3, 3)
┌───────┬─────┬─────┐
│ apple ┆ bar ┆ ham │
│ ---   ┆ --- ┆ --- │
│ i64   ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1     ┆ 6   ┆ a   │
│ 2     ┆ 7   ┆ b   │
│ 3     ┆ 8   ┆ c   │
└───────┴─────┴─────┘

select(*exprs, **named_exprs)

Select columns from this LazyFrame.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

()
**named_exprs IntoExpr

Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Type Description
Self

The LazyFrame containing only the selected columns.

Notes

If you'd like to select a column whose name isn't a string (for example, if you're working with pandas) then you should explicitly use nw.col instead of just passing the column name. For example, to select a column named 0 use df.select(nw.col(0)), not df.select(0).

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> df = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

Let's define a dataframe-agnostic function in which we pass the name of a column to select that column.

>>> def agnostic_select(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.select("foo").to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_select(df_pd)
   foo
0    1
1    2
2    3
>>> agnostic_select(df_pl)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘
>>> agnostic_select(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

Multiple columns can be selected by passing a list of column names.

>>> def agnostic_select(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.select(["foo", "bar"]).to_native()
>>>
>>> agnostic_select(df_pd)
   foo  bar
0    1    6
1    2    7
2    3    8
>>> agnostic_select(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 6   │
│ 2   ┆ 7   │
│ 3   ┆ 8   │
└─────┴─────┘
>>> agnostic_select(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 6   │
│ 2   ┆ 7   │
│ 3   ┆ 8   │
└─────┴─────┘

Multiple columns can also be selected using positional arguments instead of a list. Expressions are also accepted.

>>> def agnostic_select(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.select(nw.col("foo"), nw.col("bar") + 1).to_native()
>>>
>>> agnostic_select(df_pd)
   foo  bar
0    1    7
1    2    8
2    3    9
>>> agnostic_select(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘
>>> agnostic_select(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘

Use keyword arguments to easily name your expression inputs.

>>> def agnostic_select(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.select(threshold=nw.col("foo") * 2).to_native()
>>>
>>> agnostic_select(df_pd)
   threshold
0          2
1          4
2          6
>>> agnostic_select(df_pl)
shape: (3, 1)
┌───────────┐
│ threshold │
│ ---       │
│ i64       │
╞═══════════╡
│ 2         │
│ 4         │
│ 6         │
└───────────┘
>>> agnostic_select(lf_pl).collect()
shape: (3, 1)
┌───────────┐
│ threshold │
│ ---       │
│ i64       │
╞═══════════╡
│ 2         │
│ 4         │
│ 6         │
└───────────┘

sort(by, *more_by, descending=False, nulls_last=False)

Sort the LazyFrame by the given columns.

Parameters:

Name Type Description Default
by str | Iterable[str]

Column(s) names to sort by.

required
*more_by str

Additional columns to sort by, specified as positional arguments.

()
descending bool | Sequence[bool]

Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.

False
nulls_last bool

Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.

False

Returns:

Type Description
Self

The sorted LazyFrame.

Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "a": [1, 2, None],
...     "b": [6.0, 5.0, 4.0],
...     "c": ["a", "c", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which we sort by multiple columns in different orders

>>> def agnostic_sort(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.sort("c", "a", descending=[False, True]).to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_sort(df_pd)
     a    b  c
0  1.0  6.0  a
2  NaN  4.0  b
1  2.0  5.0  c
>>> agnostic_sort(lf_pl).collect()
shape: (3, 3)
┌──────┬─────┬─────┐
│ a    ┆ b   ┆ c   │
│ ---  ┆ --- ┆ --- │
│ i64  ┆ f64 ┆ str │
╞══════╪═════╪═════╡
│ 1    ┆ 6.0 ┆ a   │
│ null ┆ 4.0 ┆ b   │
│ 2    ┆ 5.0 ┆ c   │
└──────┴─────┴─────┘

tail(n=5)

Get the last n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Returns:

Type Description
Self

A subset of the LazyFrame of shape (n, n_columns).

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "a": [1, 2, 3, 4, 5, 6],
...     "b": [7, 8, 9, 10, 11, 12],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function that gets the last 3 rows.

>>> def agnostic_tail(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.tail(3).to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_tail(df_pd)
   a   b
3  4  10
4  5  11
5  6  12
>>> agnostic_tail(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4   ┆ 10  │
│ 5   ┆ 11  │
│ 6   ┆ 12  │
└─────┴─────┘
>>> agnostic_tail(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4   ┆ 10  │
│ 5   ┆ 11  │
│ 6   ┆ 12  │
└─────┴─────┘

to_native()

Convert Narwhals LazyFrame to native one.

Returns:

Type Description
FrameT

Object of class that user started with.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>>
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
>>> df_pa = pa.table(data)

Calling to_native on a Narwhals DataFrame returns the native object:

>>> nw.from_native(df_pd).lazy().to_native()
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> nw.from_native(lf_pl).to_native().collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘

unique(subset=None, *, keep='any', maintain_order=False)

Drop duplicate rows from this LazyFrame.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) to consider when identifying duplicate rows. If set to None, use all columns.

None
keep Literal['any', 'first', 'last', 'none']

{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.

  • 'any': Does not give any guarantee of which row is kept. This allows more optimizations.
  • 'none': Don't keep duplicate rows.
  • 'first': Keep first unique row.
  • 'last': Keep last unique row.
'any'
maintain_order bool

Keep the same order as the original DataFrame. This may be more expensive to compute. Settings this to True blocks the possibility to run on the streaming engine for Polars.

False

Returns:

Name Type Description
LazyFrame Self

LazyFrame with unique rows.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "foo": [1, 2, 3, 1],
...     "bar": ["a", "a", "a", "a"],
...     "ham": ["b", "b", "b", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> def agnostic_unique(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.unique(["bar", "ham"]).to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_unique(df_pd)
   foo bar ham
0    1   a   b
>>> agnostic_unique(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘

unpivot(on=None, *, index=None, variable_name=None, value_name=None)

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name Type Description Default
on str | list[str] | None

Column(s) to use as values variables; if on is empty all columns that are not in index will be used.

None
index str | list[str] | None

Column(s) to use as identifier variables.

None
variable_name str | None

Name to give to the variable column. Defaults to "variable".

None
value_name str | None

Name to give to the value column. Defaults to "value".

None

Returns:

Type Description
Self

The unpivoted LazyFrame.

Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import narwhals as nw
>>> import polars as pl
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
...     "a": ["x", "y", "z"],
...     "b": [1, 3, 5],
...     "c": [2, 4, 6],
... }
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> def agnostic_unpivot(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return (
...         df.unpivot(on=["b", "c"], index="a").sort(["variable", "a"])
...     ).to_native()
>>> agnostic_unpivot(lf_pl).collect()
shape: (6, 3)
┌─────┬──────────┬───────┐
│ a   ┆ variable ┆ value │
│ --- ┆ ---      ┆ ---   │
│ str ┆ str      ┆ i64   │
╞═════╪══════════╪═══════╡
│ x   ┆ b        ┆ 1     │
│ y   ┆ b        ┆ 3     │
│ z   ┆ b        ┆ 5     │
│ x   ┆ c        ┆ 2     │
│ y   ┆ c        ┆ 4     │
│ z   ┆ c        ┆ 6     │
└─────┴──────────┴───────┘

with_columns(*exprs, **named_exprs)

Add columns to this LazyFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Name Type Description
LazyFrame Self

A new LazyFrame with the columns added.

Note

Creating a new LazyFrame using this method does not create a new copy of existing data.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> df = {
...     "a": [1, 2, 3, 4],
...     "b": [0.5, 4, 10, 13],
...     "c": [True, True, False, True],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

Let's define a dataframe-agnostic function in which we pass an expression to add it as a new column:

>>> def agnostic_with_columns(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.with_columns((nw.col("a") * 2).alias("2a")).to_native()

We can then pass either pandas or Polars to func:

>>> agnostic_with_columns(df_pd)
   a     b      c  2a
0  1   0.5   True   2
1  2   4.0   True   4
2  3  10.0  False   6
3  4  13.0   True   8
>>> agnostic_with_columns(df_pl)
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a   ┆ b    ┆ c     ┆ 2a  │
│ --- ┆ ---  ┆ ---   ┆ --- │
│ i64 ┆ f64  ┆ bool  ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1   ┆ 0.5  ┆ true  ┆ 2   │
│ 2   ┆ 4.0  ┆ true  ┆ 4   │
│ 3   ┆ 10.0 ┆ false ┆ 6   │
│ 4   ┆ 13.0 ┆ true  ┆ 8   │
└─────┴──────┴───────┴─────┘
>>> agnostic_with_columns(lf_pl).collect()
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a   ┆ b    ┆ c     ┆ 2a  │
│ --- ┆ ---  ┆ ---   ┆ --- │
│ i64 ┆ f64  ┆ bool  ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1   ┆ 0.5  ┆ true  ┆ 2   │
│ 2   ┆ 4.0  ┆ true  ┆ 4   │
│ 3   ┆ 10.0 ┆ false ┆ 6   │
│ 4   ┆ 13.0 ┆ true  ┆ 8   │
└─────┴──────┴───────┴─────┘

with_row_index(name='index')

Insert column which enumerates rows.

Parameters:

Name Type Description Default
name str

The name of the column as a string. The default is "index".

'index'

Returns:

Type Description
Self

The original object with the column added.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_with_row_index(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.with_row_index().to_native()

We can then pass either pandas or Polars:

>>> agnostic_with_row_index(df_pd)
   index  a  b
0      0  1  4
1      1  2  5
2      2  3  6
>>> agnostic_with_row_index(lf_pl).collect()
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a   ┆ b   │
│ ---   ┆ --- ┆ --- │
│ u32   ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0     ┆ 1   ┆ 4   │
│ 1     ┆ 2   ┆ 5   │
│ 2     ┆ 3   ┆ 6   │
└───────┴─────┴─────┘