narwhals.LazyFrame
Narwhals DataFrame, backed by a native dataframe.
The native dataframe might be pandas.DataFrame, polars.LazyFrame, ...
This class is not meant to be instantiated directly - instead, use
narwhals.from_native
.
columns: list[str]
property
Get column names.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)
We define a library agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.columns
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
['foo', 'bar', 'ham']
>>> func(lf_pl)
['foo', 'bar', 'ham']
schema: Schema
property
Get an ordered mapping of column names to their data type.
Examples:
>>> import polars as pl
>>> import narwhals as nw
>>> lf_pl = pl.LazyFrame(
... {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf.schema
Schema({'foo': Int64, 'bar': Float64, 'ham', String})
clone()
Create a copy of this DataFrame.
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function in which we copy the DataFrame:
>>> @nw.narwhalify
... def func(df):
... return df.clone()
>>> func(df_pd)
a b
0 1 3
1 2 4
>>> func(df_pl).collect()
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 3 │
│ 2 ┆ 4 │
└─────┴─────┘
collect()
Materialize this LazyFrame into a DataFrame.
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
DataFrame |
Examples:
>>> import narwhals as nw
>>> import polars as pl
>>> lf_pl = pl.LazyFrame(
... {
... "a": ["a", "b", "a", "b", "b", "c"],
... "b": [1, 2, 3, 4, 5, 6],
... "c": [6, 5, 4, 3, 2, 1],
... }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf
┌───────────────────────────────────────┐
| Narwhals LazyFrame |
| Use `.to_native` to see native output |
└───────────────────────────────────────┘
>>> df = lf.group_by("a").agg(nw.all().sum()).collect()
>>> df.to_native().sort("a")
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a ┆ 4 ┆ 10 │
│ b ┆ 11 ┆ 10 │
│ c ┆ 6 ┆ 1 │
└─────┴─────┴─────┘
collect_schema()
Get an ordered mapping of column names to their data type.
Examples:
>>> import polars as pl
>>> import narwhals as nw
>>> lf_pl = pl.LazyFrame(
... {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
... )
>>> lf = nw.from_native(lf_pl)
>>> lf.collect_schema()
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
drop(*columns, strict=True)
Remove columns from the LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*columns
|
str | Iterable[str]
|
Names of the columns that should be removed from the dataframe. |
()
|
strict
|
bool
|
Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema. |
True
|
Warning
strict
argument is ignored for polars<1.0.0
.
Please consider upgrading to a newer version or pass to eager mode.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
We define a library agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.drop("ham")
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
foo bar
0 1 6.0
1 2 7.0
2 3 8.0
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 6.0 │
│ 2 ┆ 7.0 │
│ 3 ┆ 8.0 │
└─────┴─────┘
Use positional arguments to drop multiple columns.
>>> @nw.narwhalify
... def func(df):
... return df.drop("foo", "ham")
>>> func(df_pd)
bar
0 6.0
1 7.0
2 8.0
>>> func(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ f64 │
╞═════╡
│ 6.0 │
│ 7.0 │
│ 8.0 │
└─────┘
drop_nulls(subset=None)
Drop null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) for which null values are considered. If set to None (default), use all columns. |
None
|
Notes
pandas and Polars handle null values differently. Polars distinguishes between NaN and Null, whereas pandas doesn't.
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1.0, 2.0, None], "ba": [1.0, None, 2.0]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.drop_nulls()
We can then pass either pandas or Polars:
>>> func(df_pd)
a ba
0 1.0 1.0
>>> func(df_pl).collect()
shape: (1, 2)
┌─────┬─────┐
│ a ┆ ba │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
└─────┴─────┘
filter(*predicates)
Filter the rows in the LazyFrame based on a predicate expression.
The original order of the remaining rows is preserved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*predicates
|
IntoExpr | Iterable[IntoExpr] | list[bool]
|
Expression that evaluates to a boolean Series. Can also be a (single!) boolean list. |
()
|
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function in which we filter on one condition.
>>> @nw.narwhalify
... def func(df):
... return df.filter(nw.col("foo") > 1)
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
foo bar ham
1 2 7 b
2 3 8 c
>>> func(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
Filter on multiple conditions:
>>> @nw.narwhalify
... def func(df):
... return df.filter((nw.col("foo") < 3) & (nw.col("ham") == "a"))
>>> func(df_pd)
foo bar ham
0 1 6 a
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
Provide multiple filters using *args
syntax:
>>> @nw.narwhalify
... def func(df):
... dframe = df.filter(
... nw.col("foo") == 1,
... nw.col("ham") == "a",
... )
... return dframe
>>> func(df_pd)
foo bar ham
0 1 6 a
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
Filter on an OR condition:
>>> @nw.narwhalify
... def func(df):
... return df.filter((nw.col("foo") == 1) | (nw.col("ham") == "c"))
>>> func(df_pd)
foo bar ham
0 1 6 a
2 3 8 c
>>> func(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
gather_every(n, offset=0)
Take every nth row in the DataFrame and return as a new DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Gather every n-th row. |
required |
offset
|
int
|
Starting index. |
0
|
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function in which gather every 2 rows, starting from a offset of 1:
>>> @nw.narwhalify
... def func(df):
... return df.gather_every(n=2, offset=1)
>>> func(df_pd)
a b
1 2 6
3 4 8
>>> func(lf_pl).collect()
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2 ┆ 6 │
│ 4 ┆ 8 │
└─────┴─────┘
group_by(*keys, drop_null_keys=False)
Start a group by operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*keys
|
str | Iterable[str]
|
Column(s) to group by. Accepts expression input. Strings are parsed as column names. |
()
|
drop_null_keys
|
bool
|
if True, then groups where any key is null won't be included in the result. |
False
|
Examples:
Group by one column and call agg
to compute the grouped sum of
another column.
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
... "a": ["a", "b", "a", "b", "c"],
... "b": [1, 2, 1, 3, 3],
... "c": [5, 4, 3, 2, 1],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)
Let's define a dataframe-agnostic function in which we group by one column
and call agg
to compute the grouped sum of another column.
>>> @nw.narwhalify
... def func(df):
... return df.group_by("a").agg(nw.col("b").sum()).sort("a")
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
a b
0 a 2
1 b 5
2 c 3
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 2 │
│ b ┆ 5 │
│ c ┆ 3 │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 2 │
│ b ┆ 5 │
│ c ┆ 3 │
└─────┴─────┘
Group by multiple columns by passing a list of column names.
>>> @nw.narwhalify
... def func(df):
... return df.group_by(["a", "b"]).agg(nw.max("c")).sort(["a", "b"])
>>> func(df_pd)
a b c
0 a 1 5
1 b 2 4
2 b 3 2
3 c 3 1
>>> func(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a ┆ 1 ┆ 5 │
│ b ┆ 2 ┆ 4 │
│ b ┆ 3 ┆ 2 │
│ c ┆ 3 ┆ 1 │
└─────┴─────┴─────┘
>>> func(lf_pl).collect()
shape: (4, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a ┆ 1 ┆ 5 │
│ b ┆ 2 ┆ 4 │
│ b ┆ 3 ┆ 2 │
│ c ┆ 3 ┆ 1 │
└─────┴─────┴─────┘
head(n=5)
Get the first n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. |
5
|
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
... "a": [1, 2, 3, 4, 5, 6],
... "b": [7, 8, 9, 10, 11, 12],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function that gets the first 3 rows.
>>> @nw.narwhalify
... def func(df):
... return df.head(3)
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
a b
0 1 7
1 2 8
2 3 9
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 7 │
│ 2 ┆ 8 │
│ 3 ┆ 9 │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 7 │
│ 2 ┆ 8 │
│ 3 ┆ 9 │
└─────┴─────┘
join(other, on=None, how='inner', *, left_on=None, right_on=None, suffix='_right')
Add a join operation to the Logical Plan.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
Lazy DataFrame to join with. |
required |
on
|
str | list[str] | None
|
Name(s) of the join columns in both DataFrames. If set, |
None
|
how
|
Literal['inner', 'left', 'cross', 'semi', 'anti']
|
Join strategy.
|
'inner'
|
left_on
|
str | list[str] | None
|
Join column of the left DataFrame. |
None
|
right_on
|
str | list[str] | None
|
Join column of the right DataFrame. |
None
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined LazyFrame |
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
>>> data_other = {
... "apple": ["x", "y", "z"],
... "ham": ["a", "b", "d"],
... }
>>> df_pd = pd.DataFrame(data)
>>> other_pd = pd.DataFrame(data_other)
>>> df_pl = pl.LazyFrame(data)
>>> other_pl = pl.LazyFrame(data_other)
Let's define a dataframe-agnostic function in which we join over "ham" column:
>>> @nw.narwhalify
... def join_on_ham(df, other_any):
... return df.join(other_any, left_on="ham", right_on="ham")
We can now pass either pandas or Polars to the function:
>>> join_on_ham(df_pd, other_pd)
foo bar ham apple
0 1 6.0 a x
1 2 7.0 b y
>>> join_on_ham(df_pl, other_pl).collect()
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6.0 ┆ a ┆ x │
│ 2 ┆ 7.0 ┆ b ┆ y │
└─────┴─────┴─────┴───────┘
join_asof(other, *, left_on=None, right_on=None, on=None, by_left=None, by_right=None, by=None, strategy='backward')
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the asof_join key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
left_on
|
str | None
|
Name(s) of the left join column(s). |
None
|
right_on
|
str | None
|
Name(s) of the right join column(s). |
None
|
on
|
str | None
|
Join column of both DataFrames. If set, left_on and right_on should be None. |
None
|
by_left
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
by_right
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
by
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
strategy
|
Literal['backward', 'forward', 'nearest']
|
Join strategy. The default is "backward".
|
'backward'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined DataFrame |
Examples:
>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_gdp = {
... "datetime": [
... datetime(2016, 1, 1),
... datetime(2017, 1, 1),
... datetime(2018, 1, 1),
... datetime(2019, 1, 1),
... datetime(2020, 1, 1),
... ],
... "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
... "datetime": [
... datetime(2016, 3, 1),
... datetime(2018, 8, 1),
... datetime(2019, 1, 1),
... ],
... "population": [82.19, 82.66, 83.12],
... }
>>> gdp_pd = pd.DataFrame(data_gdp)
>>> population_pd = pd.DataFrame(data_population)
>>> gdp_pl = pl.LazyFrame(data_gdp).sort("datetime")
>>> population_pl = pl.LazyFrame(data_population).sort("datetime")
Let's define a dataframe-agnostic function in which we join over "datetime" column:
>>> @nw.narwhalify
... def join_asof_datetime(df, other_any, strategy):
... return df.join_asof(other_any, on="datetime", strategy=strategy)
We can now pass either pandas or Polars to the function:
>>> join_asof_datetime(population_pd, gdp_pd, strategy="backward")
datetime population gdp
0 2016-03-01 82.19 4164
1 2018-08-01 82.66 4566
2 2019-01-01 83.12 4696
>>> join_asof_datetime(population_pl, gdp_pl, strategy="backward").collect()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19 ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66 ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
Here is a real-world times-series example that uses by
argument.
>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_quotes = {
... "datetime": [
... datetime(2016, 5, 25, 13, 30, 0, 23),
... datetime(2016, 5, 25, 13, 30, 0, 23),
... datetime(2016, 5, 25, 13, 30, 0, 30),
... datetime(2016, 5, 25, 13, 30, 0, 41),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... datetime(2016, 5, 25, 13, 30, 0, 49),
... datetime(2016, 5, 25, 13, 30, 0, 72),
... datetime(2016, 5, 25, 13, 30, 0, 75),
... ],
... "ticker": [
... "GOOG",
... "MSFT",
... "MSFT",
... "MSFT",
... "GOOG",
... "AAPL",
... "GOOG",
... "MSFT",
... ],
... "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
... "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
>>> data_trades = {
... "datetime": [
... datetime(2016, 5, 25, 13, 30, 0, 23),
... datetime(2016, 5, 25, 13, 30, 0, 38),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... ],
... "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
... "price": [51.95, 51.95, 720.77, 720.92, 98.0],
... "quantity": [75, 155, 100, 100, 100],
... }
>>> quotes_pd = pd.DataFrame(data_quotes)
>>> trades_pd = pd.DataFrame(data_trades)
>>> quotes_pl = pl.LazyFrame(data_quotes).sort("datetime")
>>> trades_pl = pl.LazyFrame(data_trades).sort("datetime")
Let's define a dataframe-agnostic function in which we join over "datetime" and by "ticker" columns:
>>> @nw.narwhalify
... def join_asof_datetime_by_ticker(df, other_any):
... return df.join_asof(other_any, on="datetime", by="ticker")
We can now pass either pandas or Polars to the function:
>>> join_asof_datetime_by_ticker(trades_pd, quotes_pd)
datetime ticker price quantity bid ask
0 2016-05-25 13:30:00.000023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.000038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.000048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.000048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.000048 AAPL 98.00 100 NaN NaN
>>> join_asof_datetime_by_ticker(trades_pl, quotes_pl).collect()
shape: (5, 6)
┌────────────────────────────┬────────┬────────┬──────────┬───────┬────────┐
│ datetime ┆ ticker ┆ price ┆ quantity ┆ bid ┆ ask │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ f64 ┆ i64 ┆ f64 ┆ f64 │
╞════════════════════════════╪════════╪════════╪══════════╪═══════╪════════╡
│ 2016-05-25 13:30:00.000023 ┆ MSFT ┆ 51.95 ┆ 75 ┆ 51.95 ┆ 51.96 │
│ 2016-05-25 13:30:00.000038 ┆ MSFT ┆ 51.95 ┆ 155 ┆ 51.97 ┆ 51.98 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG ┆ 720.77 ┆ 100 ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG ┆ 720.92 ┆ 100 ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ AAPL ┆ 98.0 ┆ 100 ┆ null ┆ null │
└────────────────────────────┴────────┴────────┴──────────┴───────┴────────┘
lazy()
Lazify the DataFrame (if possible).
If a library does not support lazy execution, then this is a no-op.
Examples:
Construct pandas and Polars objects:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.LazyFrame(df)
We define a library agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.lazy()
Note that then, pandas dataframe stay eager, and the Polars LazyFrame stays lazy:
>>> func(df_pd)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> func(df_pl)
<LazyFrame ...>
pipe(function, *args, **kwargs)
Pipe function call.
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1, 2, 3], "ba": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.pipe(lambda _df: _df.select("a"))
We can then pass either pandas or Polars:
>>> func(df_pd)
a
0 1
1 2
2 3
>>> func(df_pl).collect()
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
rename(mapping)
Rename column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping
|
dict[str, str]
|
Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name. |
required |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
We define a library agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.rename({"foo": "apple"})
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
apple bar ham
0 1 6 a
1 2 7 b
2 3 8 c
>>> func(lf_pl).collect()
shape: (3, 3)
┌───────┬─────┬─────┐
│ apple ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└───────┴─────┴─────┘
select(*exprs, **named_exprs)
Select columns from this LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Notes
If you'd like to select a column whose name isn't a string (for example,
if you're working with pandas) then you should explicitly use nw.col
instead
of just passing the column name. For example, to select a column named
0
use df.select(nw.col(0))
, not df.select(0)
.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)
Let's define a dataframe-agnostic function in which we pass the name of a column to select that column.
>>> @nw.narwhalify
... def func(df):
... return df.select("foo")
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
foo
0 1
1 2
2 3
>>> func(df_pl)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
>>> func(lf_pl).collect()
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
Multiple columns can be selected by passing a list of column names.
>>> @nw.narwhalify
... def func(df):
... return df.select(["foo", "bar"])
>>> func(df_pd)
foo bar
0 1 6
1 2 7
2 3 8
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 6 │
│ 2 ┆ 7 │
│ 3 ┆ 8 │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 6 │
│ 2 ┆ 7 │
│ 3 ┆ 8 │
└─────┴─────┘
Multiple columns can also be selected using positional arguments instead of a list. Expressions are also accepted.
>>> @nw.narwhalify
... def func(df):
... return df.select(nw.col("foo"), nw.col("bar") + 1)
>>> func(df_pd)
foo bar
0 1 7
1 2 8
2 3 9
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 7 │
│ 2 ┆ 8 │
│ 3 ┆ 9 │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 7 │
│ 2 ┆ 8 │
│ 3 ┆ 9 │
└─────┴─────┘
Use keyword arguments to easily name your expression inputs.
>>> @nw.narwhalify
... def func(df):
... return df.select(threshold=nw.col("foo") * 2)
>>> func(df_pd)
threshold
0 2
1 4
2 6
>>> func(df_pl)
shape: (3, 1)
┌───────────┐
│ threshold │
│ --- │
│ i64 │
╞═══════════╡
│ 2 │
│ 4 │
│ 6 │
└───────────┘
>>> func(lf_pl).collect()
shape: (3, 1)
┌───────────┐
│ threshold │
│ --- │
│ i64 │
╞═══════════╡
│ 2 │
│ 4 │
│ 6 │
└───────────┘
sort(by, *more_by, descending=False, nulls_last=False)
Sort the LazyFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by
|
str | Iterable[str]
|
Column(s) names to sort by. |
required |
*more_by
|
str
|
Additional columns to sort by, specified as positional arguments. |
()
|
descending
|
bool | Sequence[bool]
|
Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. |
False
|
nulls_last
|
bool
|
Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control. |
False
|
Warning
Unlike Polars, it is not possible to specify a sequence of booleans for
nulls_last
in order to control per-column behaviour. Instead a single
boolean is applied for all by
columns.
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
... "a": [1, 2, None],
... "b": [6.0, 5.0, 4.0],
... "c": ["a", "c", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_lf = pl.LazyFrame(data)
Let's define a dataframe-agnostic function in which we sort by multiple columns in different orders
>>> @nw.narwhalify
... def func(df):
... return df.sort("c", "a", descending=[False, True])
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
a b c
0 1.0 6.0 a
2 NaN 4.0 b
1 2.0 5.0 c
>>> func(df_lf).collect()
shape: (3, 3)
┌──────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴─────┴─────┘
tail(n=5)
Get the last n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. |
5
|
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
... "a": [1, 2, 3, 4, 5, 6],
... "b": [7, 8, 9, 10, 11, 12],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function that gets the last 3 rows.
>>> @nw.narwhalify
... def func(df):
... return df.tail(3)
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
a b
3 4 10
4 5 11
5 6 12
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4 ┆ 10 │
│ 5 ┆ 11 │
│ 6 ┆ 12 │
└─────┴─────┘
>>> func(lf_pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4 ┆ 10 │
│ 5 ┆ 11 │
│ 6 ┆ 12 │
└─────┴─────┘
to_native()
Convert Narwhals LazyFrame to native one.
Returns:
Type | Description |
---|---|
FrameT
|
Object of class that user started with. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)
>>> df_pa = pa.table(data)
Calling to_native
on a Narwhals DataFrame returns the native object:
>>> nw.from_native(df_pd).lazy().to_native()
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> nw.from_native(df_pl).to_native().collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 7.0 ┆ b │
│ 3 ┆ 8.0 ┆ c │
└─────┴─────┴─────┘
unique(subset=None, *, keep='any', maintain_order=False)
Drop duplicate rows from this LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) to consider when identifying duplicate rows.
If set to |
None
|
keep
|
Literal['any', 'first', 'last', 'none']
|
{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.
|
'any'
|
maintain_order
|
bool
|
Keep the same order as the original DataFrame. This may be more
expensive to compute. Settings this to |
False
|
Returns:
Name | Type | Description |
---|---|---|
LazyFrame |
Self
|
LazyFrame with unique rows. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {
... "foo": [1, 2, 3, 1],
... "bar": ["a", "a", "a", "a"],
... "ham": ["b", "b", "b", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> lf_pl = pl.LazyFrame(data)
We define a library agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.unique(["bar", "ham"])
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
foo bar ham
0 1 a b
>>> func(lf_pl).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ a ┆ b │
└─────┴─────┴─────┘
unpivot(on=None, *, index=None, variable_name=None, value_name=None)
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str] | None
|
Column(s) to use as values variables; if |
None
|
index
|
str | list[str] | None
|
Column(s) to use as identifier variables. |
None
|
variable_name
|
str | None
|
Name to give to the |
None
|
value_name
|
str | None
|
Name to give to the |
None
|
Notes
If you're coming from pandas, this is similar to pandas.DataFrame.melt
,
but with index
replacing id_vars
and on
replacing value_vars
.
In other frameworks, you might know this operation as pivot_longer
.
Examples:
>>> import narwhals as nw
>>> import polars as pl
>>> data = {
... "a": ["x", "y", "z"],
... "b": [1, 3, 5],
... "c": [2, 4, 6],
... }
We define a library agnostic function:
>>> @nw.narwhalify
... def func(lf):
... return (
... lf.unpivot(on=["b", "c"], index="a").sort(["variable", "a"]).collect()
... )
>>> func(pl.LazyFrame(data))
shape: (6, 3)
┌─────┬──────────┬───────┐
│ a ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪══════════╪═══════╡
│ x ┆ b ┆ 1 │
│ y ┆ b ┆ 3 │
│ z ┆ b ┆ 5 │
│ x ┆ c ┆ 2 │
│ y ┆ c ┆ 4 │
│ z ┆ c ┆ 6 │
└─────┴──────────┴───────┘
with_columns(*exprs, **named_exprs)
Add columns to this LazyFrame.
Added columns will replace existing columns with the same name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
LazyFrame |
Self
|
A new LazyFrame with the columns added. |
Note
Creating a new LazyFrame using this method does not create a new copy of existing data.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
... "a": [1, 2, 3, 4],
... "b": [0.5, 4, 10, 13],
... "c": [True, True, False, True],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> lf_pl = pl.LazyFrame(df)
Let's define a dataframe-agnostic function in which we pass an expression to add it as a new column:
>>> @nw.narwhalify
... def func(df):
... return df.with_columns((nw.col("a") * 2).alias("2a"))
We can then pass either pandas or Polars to func
:
>>> func(df_pd)
a b c 2a
0 1 0.5 True 2
1 2 4.0 True 4
2 3 10.0 False 6
3 4 13.0 True 8
>>> func(df_pl)
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a ┆ b ┆ c ┆ 2a │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ bool ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1 ┆ 0.5 ┆ true ┆ 2 │
│ 2 ┆ 4.0 ┆ true ┆ 4 │
│ 3 ┆ 10.0 ┆ false ┆ 6 │
│ 4 ┆ 13.0 ┆ true ┆ 8 │
└─────┴──────┴───────┴─────┘
>>> func(lf_pl).collect()
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a ┆ b ┆ c ┆ 2a │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ bool ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1 ┆ 0.5 ┆ true ┆ 2 │
│ 2 ┆ 4.0 ┆ true ┆ 4 │
│ 3 ┆ 10.0 ┆ false ┆ 6 │
│ 4 ┆ 13.0 ┆ true ┆ 8 │
└─────┴──────┴───────┴─────┘
with_row_index(name='index')
Insert column which enumerates rows.
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.LazyFrame(data)
Let's define a dataframe-agnostic function:
>>> @nw.narwhalify
... def func(df):
... return df.with_row_index()
We can then pass either pandas or Polars:
>>> func(df_pd)
index a b
0 0 1 4
1 1 2 5
2 2 3 6
>>> func(df_pl).collect()
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a ┆ b │
│ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0 ┆ 1 ┆ 4 │
│ 1 ┆ 2 ┆ 5 │
│ 2 ┆ 3 ┆ 6 │
└───────┴─────┴─────┘