Skip to content

What about the pandas Index?

There are two types of pandas users:

  • The ones who make full use of the Index's power.
  • The .reset_index(drop=True) ones, who would rather not think about the Index.

Narwhals aims to accommodate both!

  • If you'd rather not think about the Index, then don't worry: it's not part of the Narwhals public API, and you'll never have to worry about resetting the index or about pandas doing funky index alignment for you.
  • If you want your library to cater to Index powerusers who would be very angry if you reset their beautiful Index on their behalf, then don't worry: Narwhals makes certain promises with regards to the Index.

Let's learn about what Narwhals promises.

1. Narwhals will preserve your index for dataframe operations

import narwhals as nw
from narwhals.typing import IntoFrameT


def my_func(df: IntoFrameT) -> IntoFrameT:
    df = nw.from_native(df)
    df = df.with_columns(a_plus_one=nw.col("a") + 1)
    return nw.to_native(df)

Let's start with a dataframe with an Index with values [7, 8, 9].

import pandas as pd

df = pd.DataFrame({"a": [2, 1, 3], "b": [3, 5, -3]}, index=[7, 8, 9])
print(my_func(df))
   a  b  a_plus_one
7  2  3           3
8  1  5           2
9  3 -3           4

Note how the result still has the original index - Narwhals did not modify it.

2. Index alignment follows the left-hand-rule

pandas automatically aligns indices for users. For example:

import pandas as pd

df_pd = pd.DataFrame({"a": [2, 1, 3], "b": [4, 5, 6]})
s_pd = df_pd["a"].sort_values()
df_pd["a_sorted"] = s_pd

Reading the code, you might expect that 'a_sorted' will contain the values [1, 2, 3].

However, here's what actually happens:

print(df_pd)
   a  b  a_sorted
0  2  4         2
1  1  5         1
2  3  6         3

In other words, pandas' index alignment undid the sort_values operation!

Narwhals, on the other hand, preserves the index of the left-hand-side argument. Everything else will be inserted positionally, just like Polars would do:

import narwhals as nw

df = nw.from_native(df_pd)
s = nw.from_native(s_pd, allow_series=True)
df = df.with_columns(a_sorted=s.sort())
print(nw.to_native(df))
   a  b  a_sorted
0  2  4         1
1  1  5         2
2  3  6         3

If you keep these two rules in mind, then Narwhals will both help you avoid Index-related surprises whilst letting you preserve the Index for the subset of your users who consciously make great use of it.