Skip to content

Series

In dataframe, you learned how to write a dataframe-agnostic function.

We only used DataFrame methods there - but what if we need to operate on its columns?

Note that Polars does not have lazy columns. If you need to operate on columns as part of a dataframe operation, you should use expressions - but if you need to extract a single column, you need to ensure that you start with an eager DataFrame. To do that, you need to pass eager_only=True to nw.from_native.

Example 1: filter based on a column's values

This can stay lazy, so we just use nw.from_native and expressions:

import narwhals as nw
from narwhals.typing import FrameT


@nw.narwhalify
def my_func(df: FrameT) -> FrameT:
    return df.filter(nw.col("a") > 0)
import pandas as pd

df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
   a  b
1  1  5
2  3 -3
import polars as pl

df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
shape: (2, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 1    5   
 3    -3  
└─────┴─────┘
import polars as pl

df = pl.LazyFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df).collect())
shape: (2, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 1    5   
 3    -3  
└─────┴─────┘
import pyarrow as pa

table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
pyarrow.Table
a: int64
b: int64
----
a: [[1,3]]
b: [[5,-3]]

Example 2: multiply a column's values by a constant

Let's write a dataframe-agnostic function which multiplies the values in column 'a' by 2. This can also stay lazy, and can use expressions:

import narwhals as nw
from narwhals.typing import FrameT


@nw.narwhalify
def my_func(df: FrameT) -> FrameT:
    return df.with_columns(nw.col("a") * 2)
import pandas as pd

df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
   a  b
0 -2  3
1  2  5
2  6 -3
import polars as pl

df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
shape: (3, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 -2   3   
 2    5   
 6    -3  
└─────┴─────┘
import polars as pl

df = pl.LazyFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df).collect())
shape: (3, 2)
┌─────┬─────┐
 a    b   
 ---  --- 
 i64  i64 
╞═════╪═════╡
 -2   3   
 2    5   
 6    -3  
└─────┴─────┘
import pyarrow as pa

table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
pyarrow.Table
a: int64
b: int64
----
a: [[-2,2,6]]
b: [[3,5,-3]]

Note that column 'a' was overwritten. If we had wanted to add a new column called 'c' containing column 'a''s values multiplied by 2, we could have used Expr.alias:

import narwhals as nw
from narwhals.typing import FrameT


@nw.narwhalify
def my_func(df: FrameT) -> FrameT:
    return df.with_columns((nw.col("a") * 2).alias("c"))
import pandas as pd

df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
   a  b  c
0 -1  3 -2
1  1  5  2
2  3 -3  6
import polars as pl

df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
shape: (3, 3)
┌─────┬─────┬─────┐
 a    b    c   
 ---  ---  --- 
 i64  i64  i64 
╞═════╪═════╪═════╡
 -1   3    -2  
 1    5    2   
 3    -3   6   
└─────┴─────┴─────┘
import polars as pl

df = pl.LazyFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df).collect())
shape: (3, 3)
┌─────┬─────┬─────┐
 a    b    c   
 ---  ---  --- 
 i64  i64  i64 
╞═════╪═════╪═════╡
 -1   3    -2  
 1    5    2   
 3    -3   6   
└─────┴─────┴─────┘
import pyarrow as pa

table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
pyarrow.Table
a: int64
b: int64
c: int64
----
a: [[-1,1,3]]
b: [[3,5,-3]]
c: [[-2,2,6]]

Example 3: finding the mean of a column as a scalar

Now, we want to find the mean of column 'a', and we need it as a Python scalar. This means that computation cannot stay lazy - it must execute! Therefore, we'll pass eager_only=True to nw.narwhalify, and then, instead of using expressions, we'll extract a Series.

from __future__ import annotations
import narwhals as nw


@nw.narwhalify(eager_only=True)
def my_func(df: nw.DataFrame) -> float | None:
    return df["a"].mean()
import pandas as pd

df = pd.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
1.0
import polars as pl

df = pl.DataFrame({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(df))
1.0
import pyarrow as pa

table = pa.table({"a": [-1, 1, 3], "b": [3, 5, -3]})
print(my_func(table))
1.0

Note that, even though the output of our function is not a dataframe nor a series, we can still use narwhalify.