Skip to content

narwhals.LazyFrame

Narwhals DataFrame, backed by a native dataframe.

The native dataframe might be pandas.DataFrame, polars.LazyFrame, ...

This class is not meant to be instantiated directly - instead, use narwhals.from_native.

columns: list[str] property

Get column names.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.columns

We can then pass either pandas or Polars to func:

>>> func(df_pd)
['foo', 'bar', 'ham']
>>> func(lf_pl)
['foo', 'bar', 'ham']

schema: Schema property

Get an ordered mapping of column names to their data type.

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> lf_pl = pl.LazyFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf.schema
Schema({'foo': Int64, 'bar': Float64, 'ham', String})

clone()

Create a copy of this DataFrame.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which we copy the DataFrame:

>>> @nw.narwhalify
... def func(df):
...     return df.clone()
>>> func(df_pd)
   a  b
0  1  3
1  2  4
>>> func(df_pl).collect()
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

collect()

Materialize this LazyFrame into a DataFrame.

Returns:

Type Description
DataFrame[Any]

DataFrame

Examples:

>>> import narwhals as nw
>>> import polars as pl
>>> lf_pl = pl.LazyFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf
┌───────────────────────────────────────┐
| Narwhals LazyFrame                    |
| Use `.to_native` to see native output |
└───────────────────────────────────────┘
>>> df = lf.group_by("a").agg(nw.all().sum()).collect()
>>> df.to_native().sort("a")
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 4   ┆ 10  │
│ b   ┆ 11  ┆ 10  │
│ c   ┆ 6   ┆ 1   │
└─────┴─────┴─────┘

collect_schema()

Get an ordered mapping of column names to their data type.

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> lf_pl = pl.LazyFrame(
...     {
...         "foo": [1, 2, 3],
...         "bar": [6.0, 7.0, 8.0],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf.collect_schema()
Schema({'foo': Int64, 'bar': Float64, 'ham': String})

drop(*columns, strict=True)

Remove columns from the LazyFrame.

Parameters:

Name Type Description Default
*columns str | Iterable[str]

Names of the columns that should be removed from the dataframe.

()
strict bool

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

True
Warning

strict argument is ignored for polars<1.0.0.

Please consider upgrading to a newer version or pass to eager mode.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.drop("ham")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar
0    1  6.0
1    2  7.0
2    3  8.0
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 6.0 │
│ 2   ┆ 7.0 │
│ 3   ┆ 8.0 │
└─────┴─────┘

Use positional arguments to drop multiple columns.

>>> @nw.narwhalify
... def func(df):
...     return df.drop("foo", "ham")
>>> func(df_pd)
   bar
0  6.0
1  7.0
2  8.0
>>> func(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ f64 │
╞═════╡
│ 6.0 │
│ 7.0 │
│ 8.0 │
└─────┘

drop_nulls(subset=None)

Drop null values.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) for which null values are considered. If set to None (default), use all columns.

None
Notes

pandas and Polars handle null values differently. Polars distinguishes between NaN and Null, whereas pandas doesn't.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1.0, 2.0, None], "ba": [1.0, None, 2.0]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.drop_nulls()

We can then pass either pandas or Polars:

>>> func(df_pd)
     a   ba
0  1.0  1.0
>>> func(df_pl).collect()
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ ba  │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
└─────┴─────┘

filter(*predicates)

Filter the rows in the LazyFrame based on a predicate expression.

The original order of the remaining rows is preserved.

Parameters:

Name Type Description Default
*predicates IntoExpr | Iterable[IntoExpr] | list[bool]

Expression that evaluates to a boolean Series. Can also be a (single!) boolean list.

()

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which we filter on one condition.

>>> @nw.narwhalify
... def func(df):
...     return df.filter(nw.col("foo") > 1)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar ham
1    2    7   b
2    3    8   c
>>> func(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Filter on multiple conditions:

>>> @nw.narwhalify
... def func(df):
...     return df.filter((nw.col("foo") < 3) & (nw.col("ham") == "a"))
>>> func(df_pd)
   foo  bar ham
0    1    6   a
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Provide multiple filters using *args syntax:

>>> @nw.narwhalify
... def func(df):
...     dframe = df.filter(
...         nw.col("foo") == 1,
...         nw.col("ham") == "a",
...     )
...     return dframe
>>> func(df_pd)
   foo  bar ham
0    1    6   a
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

Filter on an OR condition:

>>> @nw.narwhalify
... def func(df):
...     return df.filter((nw.col("foo") == 1) | (nw.col("ham") == "c"))
>>> func(df_pd)
   foo  bar ham
0    1    6   a
2    3    8   c
>>> func(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

gather_every(n, offset=0)

Take every nth row in the DataFrame and return as a new DataFrame.

Parameters:

Name Type Description Default
n int

Gather every n-th row.

required
offset int

Starting index.

0

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which gather every 2 rows, starting from a offset of 1:

>>> @nw.narwhalify
... def func(df):
...     return df.gather_every(n=2, offset=1)
>>> func(df_pd)
   a  b
1  2  6
3  4  8
>>> func(lf_pl).collect()
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 6   │
│ 4   ┆ 8   │
└─────┴─────┘

group_by(*keys, drop_null_keys=False)

Start a group by operation.

Parameters:

Name Type Description Default
*keys str | Iterable[str]

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

()
drop_null_keys bool

if True, then groups where any key is null won't be included in the result.

False

Examples:

Group by one column and call agg to compute the grouped sum of another column.

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "a": ["a", "b", "a", "b", "c"],
...     "b": [1, 2, 1, 3, 3],
...     "c": [5, 4, 3, 2, 1],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

Let's define a dataframe-agnostic function in which we group by one column and call agg to compute the grouped sum of another column.

>>> @nw.narwhalify
... def func(df):
...     return df.group_by("a").agg(nw.col("b").sum()).sort("a")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a  b
0  a  2
1  b  5
2  c  3
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘

Group by multiple columns by passing a list of column names.

>>> @nw.narwhalify
... def func(df):
...     return df.group_by(["a", "b"]).agg(nw.max("c")).sort(["a", "b"])
>>> func(df_pd)
   a  b  c
0  a  1  5
1  b  2  4
2  b  3  2
3  c  3  1
>>> func(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘

head(n=5)

Get the first n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "a": [1, 2, 3, 4, 5, 6],
...     "b": [7, 8, 9, 10, 11, 12],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function that gets the first 3 rows.

>>> @nw.narwhalify
... def func(df):
...     return df.head(3)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a  b
0  1  7
1  2  8
2  3  9
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘

join(other, on=None, how='inner', *, left_on=None, right_on=None, suffix='_right')

Add a join operation to the Logical Plan.

Parameters:

Name Type Description Default
other Self

Lazy DataFrame to join with.

required
on str | list[str] | None

Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None.

None
how Literal['inner', 'left', 'cross', 'semi', 'anti']

Join strategy.

  • inner: Returns rows that have matching values in both tables.
  • left: Returns all rows from the left table, and the matched rows from the right table.
  • cross: Returns the Cartesian product of rows from both tables.
  • semi: Filter rows that have a match in the right table.
  • anti: Filter rows that do not have a match in the right table.
'inner'
left_on str | list[str] | None

Join column of the left DataFrame.

None
right_on str | list[str] | None

Join column of the right DataFrame.

None
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Returns:

Type Description
Self

A new joined LazyFrame

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> data_other = {
...     "apple": ["x", "y", "z"],
...     "ham": ["a", "b", "d"],
... }
>>> df_pd = pd.DataFrame(data)
>>> other_pd = pd.DataFrame(data_other)
>>> df_pl = pl.LazyFrame(data)
>>> other_pl = pl.LazyFrame(data_other)

Let's define a dataframe-agnostic function in which we join over "ham" column:

>>> @nw.narwhalify
... def join_on_ham(df, other_any):
...     return df.join(other_any, left_on="ham", right_on="ham")

We can now pass either pandas or Polars to the function:

>>> join_on_ham(df_pd, other_pd)
   foo  bar ham apple
0    1  6.0   a     x
1    2  7.0   b     y
>>> join_on_ham(df_pl, other_pl).collect()
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
└─────┴─────┴─────┴───────┘

join_asof(other, *, left_on=None, right_on=None, on=None, by_left=None, by_right=None, by=None, strategy='backward')

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the asof_join key.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
left_on str | None

Name(s) of the left join column(s).

None
right_on str | None

Name(s) of the right join column(s).

None
on str | None

Join column of both DataFrames. If set, left_on and right_on should be None.

None
by_left str | list[str] | None

join on these columns before doing asof join

None
by_right str | list[str] | None

join on these columns before doing asof join

None
by str | list[str] | None

join on these columns before doing asof join

None
strategy Literal['backward', 'forward', 'nearest']

Join strategy. The default is "backward".

  • backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key.
  • forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key.
  • nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.
'backward'

Returns:

Type Description
Self

A new joined DataFrame

Examples:

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_pd = pd.DataFrame(data_gdp)
>>> population_pd = pd.DataFrame(data_population)
>>> gdp_pl = pl.LazyFrame(data_gdp).sort("datetime")
>>> population_pl = pl.LazyFrame(data_population).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" column:

>>> @nw.narwhalify
... def join_asof_datetime(df, other_any, strategy):
...     return df.join_asof(other_any, on="datetime", strategy=strategy)

We can now pass either pandas or Polars to the function:

>>> join_asof_datetime(population_pd, gdp_pd, strategy="backward")
    datetime  population   gdp
0 2016-03-01       82.19  4164
1 2018-08-01       82.66  4566
2 2019-01-01       83.12  4696
>>> join_asof_datetime(population_pl, gdp_pl, strategy="backward").collect()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

Here is a real-world times-series example that uses by argument.

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_quotes = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 30),
...         datetime(2016, 5, 25, 13, 30, 0, 41),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 49),
...         datetime(2016, 5, 25, 13, 30, 0, 72),
...         datetime(2016, 5, 25, 13, 30, 0, 75),
...     ],
...     "ticker": [
...         "GOOG",
...         "MSFT",
...         "MSFT",
...         "MSFT",
...         "GOOG",
...         "AAPL",
...         "GOOG",
...         "MSFT",
...     ],
...     "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...     "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
>>> data_trades = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 38),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...     ],
...     "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
...     "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...     "quantity": [75, 155, 100, 100, 100],
... }
>>> quotes_pd = pd.DataFrame(data_quotes)
>>> trades_pd = pd.DataFrame(data_trades)
>>> quotes_pl = pl.LazyFrame(data_quotes).sort("datetime")
>>> trades_pl = pl.LazyFrame(data_trades).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" and by "ticker" columns:

>>> @nw.narwhalify
... def join_asof_datetime_by_ticker(df, other_any):
...     return df.join_asof(other_any, on="datetime", by="ticker")

We can now pass either pandas or Polars to the function:

>>> join_asof_datetime_by_ticker(trades_pd, quotes_pd)
                    datetime ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.000023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.000038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.000048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.000048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.000048   AAPL   98.00       100     NaN     NaN
>>> join_asof_datetime_by_ticker(trades_pl, quotes_pl).collect()
shape: (5, 6)
┌────────────────────────────┬────────┬────────┬──────────┬───────┬────────┐
│ datetime                   ┆ ticker ┆ price  ┆ quantity ┆ bid   ┆ ask    │
│ ---                        ┆ ---    ┆ ---    ┆ ---      ┆ ---   ┆ ---    │
│ datetime[μs]               ┆ str    ┆ f64    ┆ i64      ┆ f64   ┆ f64    │
╞════════════════════════════╪════════╪════════╪══════════╪═══════╪════════╡
│ 2016-05-25 13:30:00.000023 ┆ MSFT   ┆ 51.95  ┆ 75       ┆ 51.95 ┆ 51.96  │
│ 2016-05-25 13:30:00.000038 ┆ MSFT   ┆ 51.95  ┆ 155      ┆ 51.97 ┆ 51.98  │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.77 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.92 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ AAPL   ┆ 98.0   ┆ 100      ┆ null  ┆ null   │
└────────────────────────────┴────────┴────────┴──────────┴───────┴────────┘

lazy()

Lazify the DataFrame (if possible).

If a library does not support lazy execution, then this is a no-op.

Examples:

Construct pandas and Polars objects:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.LazyFrame(df)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.lazy()

Note that then, pandas dataframe stay eager, and the Polars LazyFrame stays lazy:

>>> func(df_pd)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> func(df_pl)
<LazyFrame ...>

pipe(function, *args, **kwargs)

Pipe function call.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1, 2, 3], "ba": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.pipe(lambda _df: _df.select("a"))

We can then pass either pandas or Polars:

>>> func(df_pd)
   a
0  1
1  2
2  3
>>> func(df_pl).collect()
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

rename(mapping)

Rename column names.

Parameters:

Name Type Description Default
mapping dict[str, str]

Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name.

required

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.rename({"foo": "apple"})

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   apple  bar ham
0      1    6   a
1      2    7   b
2      3    8   c
>>> func(lf_pl).collect()
shape: (3, 3)
┌───────┬─────┬─────┐
│ apple ┆ bar ┆ ham │
│ ---   ┆ --- ┆ --- │
│ i64   ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1     ┆ 6   ┆ a   │
│ 2     ┆ 7   ┆ b   │
│ 3     ┆ 8   ┆ c   │
└───────┴─────┴─────┘

select(*exprs, **named_exprs)

Select columns from this LazyFrame.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

()
**named_exprs IntoExpr

Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

{}
Notes

If you'd like to select a column whose name isn't a string (for example, if you're working with pandas) then you should explicitly use nw.col instead of just passing the column name. For example, to select a column named 0 use df.select(nw.col(0)), not df.select(0).

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

Let's define a dataframe-agnostic function in which we pass the name of a column to select that column.

>>> @nw.narwhalify
... def func(df):
...     return df.select("foo")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo
0    1
1    2
2    3
>>> func(df_pl)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘
>>> func(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

Multiple columns can be selected by passing a list of column names.

>>> @nw.narwhalify
... def func(df):
...     return df.select(["foo", "bar"])
>>> func(df_pd)
   foo  bar
0    1    6
1    2    7
2    3    8
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 6   │
│ 2   ┆ 7   │
│ 3   ┆ 8   │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 6   │
│ 2   ┆ 7   │
│ 3   ┆ 8   │
└─────┴─────┘

Multiple columns can also be selected using positional arguments instead of a list. Expressions are also accepted.

>>> @nw.narwhalify
... def func(df):
...     return df.select(nw.col("foo"), nw.col("bar") + 1)
>>> func(df_pd)
   foo  bar
0    1    7
1    2    8
2    3    9
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘

Use keyword arguments to easily name your expression inputs.

>>> @nw.narwhalify
... def func(df):
...     return df.select(threshold=nw.col("foo") * 2)
>>> func(df_pd)
   threshold
0          2
1          4
2          6
>>> func(df_pl)
shape: (3, 1)
┌───────────┐
│ threshold │
│ ---       │
│ i64       │
╞═══════════╡
│ 2         │
│ 4         │
│ 6         │
└───────────┘
>>> func(lf_pl).collect()
shape: (3, 1)
┌───────────┐
│ threshold │
│ ---       │
│ i64       │
╞═══════════╡
│ 2         │
│ 4         │
│ 6         │
└───────────┘

sort(by, *more_by, descending=False, nulls_last=False)

Sort the LazyFrame by the given columns.

Parameters:

Name Type Description Default
by str | Iterable[str]

Column(s) names to sort by.

required
*more_by str

Additional columns to sort by, specified as positional arguments.

()
descending bool | Sequence[bool]

Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.

False
nulls_last bool

Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.

False
Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "a": [1, 2, None],
...     "b": [6.0, 5.0, 4.0],
...     "c": ["a", "c", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_lf = pl.LazyFrame(data)

Let's define a dataframe-agnostic function in which we sort by multiple columns in different orders

>>> @nw.narwhalify
... def func(df):
...     return df.sort("c", "a", descending=[False, True])

We can then pass either pandas or Polars to func:

>>> func(df_pd)
     a    b  c
0  1.0  6.0  a
2  NaN  4.0  b
1  2.0  5.0  c
>>> func(df_lf).collect()
shape: (3, 3)
┌──────┬─────┬─────┐
│ a    ┆ b   ┆ c   │
│ ---  ┆ --- ┆ --- │
│ i64  ┆ f64 ┆ str │
╞══════╪═════╪═════╡
│ 1    ┆ 6.0 ┆ a   │
│ null ┆ 4.0 ┆ b   │
│ 2    ┆ 5.0 ┆ c   │
└──────┴─────┴─────┘

tail(n=5)

Get the last n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "a": [1, 2, 3, 4, 5, 6],
...     "b": [7, 8, 9, 10, 11, 12],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function that gets the last 3 rows.

>>> @nw.narwhalify
... def func(df):
...     return df.tail(3)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a   b
3  4  10
4  5  11
5  6  12
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4   ┆ 10  │
│ 5   ┆ 11  │
│ 6   ┆ 12  │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4   ┆ 10  │
│ 5   ┆ 11  │
│ 6   ┆ 12  │
└─────┴─────┘

to_native()

Convert Narwhals LazyFrame to native one.

Returns:

Type Description
FrameT

Object of class that user started with.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)
>>> df_pa = pa.table(data)

Calling to_native on a Narwhals DataFrame returns the native object:

>>> nw.from_native(df_pd).lazy().to_native()
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> nw.from_native(df_pl).to_native().collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘

unique(subset=None, *, keep='any', maintain_order=False)

Drop duplicate rows from this LazyFrame.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) to consider when identifying duplicate rows. If set to None, use all columns.

None
keep Literal['any', 'first', 'last', 'none']

{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.

  • 'any': Does not give any guarantee of which row is kept. This allows more optimizations.
  • 'none': Don't keep duplicate rows.
  • 'first': Keep first unique row.
  • 'last': Keep last unique row.
'any'
maintain_order bool

Keep the same order as the original DataFrame. This may be more expensive to compute. Settings this to True blocks the possibility to run on the streaming engine for Polars.

False

Returns:

Name Type Description
LazyFrame Self

LazyFrame with unique rows.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {
...     "foo": [1, 2, 3, 1],
...     "bar": ["a", "a", "a", "a"],
...     "ham": ["b", "b", "b", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.unique(["bar", "ham"])

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo bar ham
0    1   a   b
>>> func(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘

unpivot(on=None, *, index=None, variable_name=None, value_name=None)

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name Type Description Default
on str | list[str] | None

Column(s) to use as values variables; if on is empty all columns that are not in index will be used.

None
index str | list[str] | None

Column(s) to use as identifier variables.

None
variable_name str | None

Name to give to the variable column. Defaults to "variable".

None
value_name str | None

Name to give to the value column. Defaults to "value".

None
Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import narwhals as nw
>>> import polars as pl
>>> data = {
...     "a": ["x", "y", "z"],
...     "b": [1, 3, 5],
...     "c": [2, 4, 6],
... }

We define a library agnostic function:

>>> @nw.narwhalify
... def func(lf):
...     return (
...         lf.unpivot(on=["b", "c"], index="a").sort(["variable", "a"]).collect()
...     )
>>> func(pl.LazyFrame(data))
shape: (6, 3)
┌─────┬──────────┬───────┐
│ a   ┆ variable ┆ value │
│ --- ┆ ---      ┆ ---   │
│ str ┆ str      ┆ i64   │
╞═════╪══════════╪═══════╡
│ x   ┆ b        ┆ 1     │
│ y   ┆ b        ┆ 3     │
│ z   ┆ b        ┆ 5     │
│ x   ┆ c        ┆ 2     │
│ y   ┆ c        ┆ 4     │
│ z   ┆ c        ┆ 6     │
└─────┴──────────┴───────┘

with_columns(*exprs, **named_exprs)

Add columns to this LazyFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Name Type Description
LazyFrame Self

A new LazyFrame with the columns added.

Note

Creating a new LazyFrame using this method does not create a new copy of existing data.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "a": [1, 2, 3, 4],
...     "b": [0.5, 4, 10, 13],
...     "c": [True, True, False, True],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)

Let's define a dataframe-agnostic function in which we pass an expression to add it as a new column:

>>> @nw.narwhalify
... def func(df):
...     return df.with_columns((nw.col("a") * 2).alias("2a"))

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a     b      c  2a
0  1   0.5   True   2
1  2   4.0   True   4
2  3  10.0  False   6
3  4  13.0   True   8
>>> func(df_pl)
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a   ┆ b    ┆ c     ┆ 2a  │
│ --- ┆ ---  ┆ ---   ┆ --- │
│ i64 ┆ f64  ┆ bool  ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1   ┆ 0.5  ┆ true  ┆ 2   │
│ 2   ┆ 4.0  ┆ true  ┆ 4   │
│ 3   ┆ 10.0 ┆ false ┆ 6   │
│ 4   ┆ 13.0 ┆ true  ┆ 8   │
└─────┴──────┴───────┴─────┘
>>> func(lf_pl).collect()
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a   ┆ b    ┆ c     ┆ 2a  │
│ --- ┆ ---  ┆ ---   ┆ --- │
│ i64 ┆ f64  ┆ bool  ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1   ┆ 0.5  ┆ true  ┆ 2   │
│ 2   ┆ 4.0  ┆ true  ┆ 4   │
│ 3   ┆ 10.0 ┆ false ┆ 6   │
│ 4   ┆ 13.0 ┆ true  ┆ 8   │
└─────┴──────┴───────┴─────┘

with_row_index(name='index')

Insert column which enumerates rows.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.with_row_index()

We can then pass either pandas or Polars:

>>> func(df_pd)
   index  a  b
0      0  1  4
1      1  2  5
2      2  3  6
>>> func(df_pl).collect()
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a   ┆ b   │
│ ---   ┆ --- ┆ --- │
│ u32   ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0     ┆ 1   ┆ 4   │
│ 1     ┆ 2   ┆ 5   │
│ 2     ┆ 3   ┆ 6   │
└───────┴─────┴─────┘