Skip to content

Null/NaN handling

pandas doesn't distinguish between Null and NaN values as Polars and PyArrow do.

Depending on the data type of the underlying data structure, np.nan, pd.NaT, None and pd.NA all encode missing data in pandas.

Polars and PyArrow, instead, treat NaN as a valid floating point value which is rare to encounter and more often produced as the result of a computation than explicitly set during data initialization; they treat null as the missing data indicator, regardless of the data type.

In Narwhals, then, is_null behaves differently across backends (and so do drop_nulls, fill_null and null_count):

import narwhals as nw
import numpy as np
from narwhals.typing import IntoFrameT

data = {"a": [1.4, float("nan"), np.nan, 4.2, None]}


def check_null_behavior(df: IntoFrameT) -> IntoFrameT:
    return nw.from_native(df).with_columns(a_is_null=nw.col("a").is_null()).to_native()
import pandas as pd

df = pd.DataFrame(data)
print(check_null_behavior(df))
     a  a_is_null
0  1.4      False
1  NaN       True
2  NaN       True
3  4.2      False
4  NaN       True
import polars as pl

df = pl.DataFrame(data)
print(check_null_behavior(df))
shape: (5, 2)
┌──────┬───────────┐
 a     a_is_null 
 ---   ---       
 f64   bool      
╞══════╪═══════════╡
 1.4   false     
 NaN   false     
 NaN   false     
 4.2   false     
 null  true      
└──────┴───────────┘
import pyarrow as pa

df = pa.table(data)
print(check_null_behavior(df))
pyarrow.Table
a: double
a_is_null: bool
----
a: [[1.4,nan,nan,4.2,null]]
a_is_null: [[false,false,false,false,true]]