Skip to content

Boolean columns

Null preservation

Generally speaking, Narwhals operations preserve null values. For example, if you do nw.col('a')*2, then:

  • Values which were non-null get multiplied by 2.
  • Null values stay null.
import narwhals as nw
from narwhals.typing import IntoFrameT

data = {"a": [1.4, None, 4.2]}


def multiplication(df: IntoFrameT) -> IntoFrameT:
    return nw.from_native(df).with_columns((nw.col("a") * 2).alias("a*2")).to_native()
import pandas as pd

df = pd.DataFrame(data)
print(multiplication(df))
     a  a*2
0  1.4  2.8
1  NaN  NaN
2  4.2  8.4
import polars as pl

df = pl.DataFrame(data)
print(multiplication(df))
shape: (3, 2)
┌──────┬──────┐
 a     a*2  
 ---   ---  
 f64   f64  
╞══════╪══════╡
 1.4   2.8  
 null  null 
 4.2   8.4  
└──────┴──────┘
import pyarrow as pa

table = pa.table(data)
print(multiplication(table))
pyarrow.Table
a: double
a*2: double
----
a: [[1.4,null,4.2]]
a*2: [[2.8,null,8.4]]

What do we do, however, when the result column is boolean? For example, nw.col('a') > 0? Unfortunately, this is backend-dependent:

  • for all backends except pandas, null values are preserved
  • for pandas, this depends on the dtype backend:
    • for PyArrow dtypes and pandas nullable dtypes, null values are preserved
    • for the classic NumPy dtypes, null values are typically filled in with False.

pandas is generally moving towards nullable dtypes, and they may become the default in the future, so we hope that the classical NumPy dtypes not supporting null values will just be a temporary legacy pandas issue which will eventually go away anyway.

from narwhals.typing import FrameT


def comparison(df: FrameT) -> FrameT:
    return nw.from_native(df).with_columns((nw.col("a") > 2).alias("a>2")).to_native()
import pandas as pd

df = pd.DataFrame(data)
print(comparison(df))
     a    a>2
0  1.4  False
1  NaN  False
2  4.2   True
import polars as pl

df = pl.DataFrame(data)
print(comparison(df))
shape: (3, 2)
┌──────┬───────┐
 a     a>2   
 ---   ---   
 f64   bool  
╞══════╪═══════╡
 1.4   false 
 null  null  
 4.2   true  
└──────┴───────┘
import pyarrow as pa

table = pa.table(data)
print(comparison(table))
pyarrow.Table
a: double
a>2: bool
----
a: [[1.4,null,4.2]]
a>2: [[false,null,true]]

Kleene logic

Generally speaking, if we have two boolean columns 'a' and 'b', then nw.col('a') | nw.col('b') and nw.col('a') & nw.col('b') follow Kleene logic. That is to say:

nw.col('a') nw.col('b') nw.col('a') | nw.col('b') nw.col('a') & nw.col('b')
True True True True
True False True False
True None True None
False True True False
False False False False
False None None False
None True True None
None False None False
None None None None

Here, too, pandas backed by NumPy types differs, as its boolean columns cannot store null values:

  • For nw.col('a') | nw.col('b'), pandas returns True if at least one column contains a True value, and False otherwise.
  • For nw.col('a') & nw.col('b'), pandas returns True if both columns contain True values, and False otherwise.

In any_horizontal and all_horizontal there is an ignore_nulls argument, which behaves as follows:

  • If True, then null values are ignored and contribute nothing to the final result. If there are no values, the result is:

    • False for any_horizontal.
    • True for all_horizontal.
    • If False, then Kleene logic is followed. If using pandas backed by classical NumPy types, then this option is not supported.