Series
In dataframe, you learned how to write a dataframe-agnostic function.
We only used DataFrame methods there - but what if we need to operate on its columns?
Note that Polars does not have lazy columns. If you need to operate on columns as part of
a dataframe operation, you should use expressions - but if you need to extract a single
column, you need to ensure that you start with an eager DataFrame
. To do that, you need
to pass eager_only=True
to nw.from_native
.
Example 1: filter based on a column's values
This can stay lazy, so we just use expressions:
import narwhals as nw
from narwhals.typing import IntoFrameT
def my_func(df: IntoFrameT) -> IntoFrameT:
return nw.from_native(df).filter(nw.col("a") > 0).to_native()
import narwhals as nw
from narwhals.typing import FrameT
@nw.narwhalify
def my_func(df: FrameT) -> FrameT:
return df.filter(nw.col("a") > 0)
and call it either on a eager or lazy dataframe:
import pandas as pd
df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
a b
1 1 5
2 3 -3
import polars as pl
df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 5 │
│ 3 ┆ -3 │
└─────┴─────┘
import polars as pl
df = pl.LazyFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df).collect())
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 5 │
│ 3 ┆ -3 │
└─────┴─────┘
import pyarrow as pa
table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
pyarrow.Table
a: int64
b: int64
----
a: [[1,3]]
b: [[5,-3]]
Example 2: multiply a column's values by a constant
Let's write a dataframe-agnostic function which multiplies the values in column
'a'
by 2. This can also stay lazy, and can use expressions:
import narwhals as nw
from narwhals.typing import IntoFrameT
def my_func(df: IntoFrameT) -> IntoFrameT:
return nw.from_native(df).with_columns(nw.col("a") * 2).to_native()
import narwhals as nw
from narwhals.typing import FrameT
@nw.narwhalify
def my_func(df: FrameT) -> FrameT:
return df.with_columns(nw.col("a") * 2)
and call it either on a eager or lazy dataframe:
import pandas as pd
df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
a b
0 -2 3
1 2 5
2 6 -3
import polars as pl
df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ -2 ┆ 3 │
│ 2 ┆ 5 │
│ 6 ┆ -3 │
└─────┴─────┘
import polars as pl
df = pl.LazyFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df).collect())
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ -2 ┆ 3 │
│ 2 ┆ 5 │
│ 6 ┆ -3 │
└─────┴─────┘
import pyarrow as pa
table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
pyarrow.Table
a: int64
b: int64
----
a: [[-2,2,6]]
b: [[3,5,-3]]
Note that column 'a'
was overwritten. If we had wanted to add a new column called 'c'
containing column 'a'
's
values multiplied by 2, we could have used Expr.alias
:
import narwhals as nw
from narwhals.typing import FrameT
@nw.narwhalify
def my_func(df: FrameT) -> FrameT:
return df.with_columns((nw.col("a") * 2).alias("c"))
import pandas as pd
df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
a b c
0 -1 3 -2
1 1 5 2
2 3 -3 6
import polars as pl
df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ -1 ┆ 3 ┆ -2 │
│ 1 ┆ 5 ┆ 2 │
│ 3 ┆ -3 ┆ 6 │
└─────┴─────┴─────┘
import polars as pl
df = pl.LazyFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df).collect())
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ -1 ┆ 3 ┆ -2 │
│ 1 ┆ 5 ┆ 2 │
│ 3 ┆ -3 ┆ 6 │
└─────┴─────┴─────┘
import pyarrow as pa
table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
pyarrow.Table
a: int64
b: int64
c: int64
----
a: [[-1,1,3]]
b: [[3,5,-3]]
c: [[-2,2,6]]
Example 3: finding the mean of a column as a scalar
Now, we want to find the mean of column 'a'
, and we need it as a Python scalar.
This means that computation cannot stay lazy - it must execute!
Therefore, we'll pass eager_only=True
to nw.from_native
(or nw.narwhalify
),
and then, instead of using expressions, we'll extract a Series
.
import narwhals as nw
from narwhals.typing import IntoDataFrameT
def my_func(df: IntoDataFrameT) -> float | None:
return nw.from_native(df, eager_only=True)["a"].mean()
import narwhals as nw
from narwhals.typing import DataFrameT
@nw.narwhalify(eager_only=True)
def my_func(df: DataFrameT) -> float | None:
return df["a"].mean()
Now we can call it on a eager dataframe only:
import pandas as pd
df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
1.0
import polars as pl
df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
1.0
import pyarrow as pa
table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
1.0
Note that, even though the output of our function is not a dataframe nor a series, we can
still use narwhalify
.