Skip to content

Boolean columns

Generally speaking, Narwhals operations preserve null values. For example, if you do nw.col('a')*2, then:

  • Values which were non-null get multiplied by 2.
  • Null values stay null.
import narwhals as nw
from narwhals.typing import FrameT

data = {"a": [1.4, None, 4.2]}


def multiplication(df: FrameT) -> FrameT:
    return nw.from_native(df).with_columns((nw.col("a") * 2).alias("a*2")).to_native()
import pandas as pd

df = pd.DataFrame(data)
print(multiplication(df))
     a  a*2
0  1.4  2.8
1  NaN  NaN
2  4.2  8.4
import polars as pl

df = pl.DataFrame(data)
print(multiplication(df))
shape: (3, 2)
┌──────┬──────┐
 a     a*2  
 ---   ---  
 f64   f64  
╞══════╪══════╡
 1.4   2.8  
 null  null 
 4.2   8.4  
└──────┴──────┘
import pyarrow as pa

table = pa.table(data)
print(multiplication(table))
pyarrow.Table
a: double
a*2: double
----
a: [[1.4,null,4.2]]
a*2: [[2.8,null,8.4]]

What do we do, however, when the result column is boolean? For example, nw.col('a') > 0? Unfortunately, this is backend-dependent:

  • for all backends except pandas, null values are preserved
  • for pandas, this depends on the dtype backend:
    • for PyArrow dtypes and pandas nullable dtypes, null values are preserved
    • for the classic NumPy dtypes, null values are typically filled in with False.

pandas is generally moving towards nullable dtypes, and they may become the default in the future, so we hope that the classical NumPy dtypes not supporting null values will just be a temporary legacy pandas issue which will eventually go away anyway.

def comparison(df: FrameT) -> FrameT:
    return nw.from_native(df).with_columns((nw.col("a") > 2).alias("a>2")).to_native()
import pandas as pd

df = pd.DataFrame(data)
print(comparison(df))
     a    a>2
0  1.4  False
1  NaN  False
2  4.2   True
import polars as pl

df = pl.DataFrame(data)
print(comparison(df))
shape: (3, 2)
┌──────┬───────┐
 a     a>2   
 ---   ---   
 f64   bool  
╞══════╪═══════╡
 1.4   false 
 null  null  
 4.2   true  
└──────┴───────┘
import pyarrow as pa

table = pa.table(data)
print(comparison(table))
pyarrow.Table
a: double
a>2: bool
----
a: [[1.4,null,4.2]]
a>2: [[false,null,true]]