Skip to content

narwhals.DataFrame

Narwhals DataFrame, backed by a native dataframe.

The native dataframe might be pandas.DataFrame, polars.DataFrame, ...

This class is not meant to be instantiated directly - instead, use narwhals.from_native.

columns: list[str] property

Get column names.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.columns

We can pass any supported library such as pandas, Polars, or PyArrow to func:

>>> func(df_pd)
['foo', 'bar', 'ham']
>>> func(df_pl)
['foo', 'bar', 'ham']
>>> func(df_pa)
['foo', 'bar', 'ham']

schema: Schema property

Get an ordered mapping of column names to their data type.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.schema

You can pass either pandas or Polars to func:

>>> df_pd_schema = func(df_pd)
>>> df_pd_schema
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> df_pl_schema = func(df_pl)
>>> df_pl_schema
Schema({'foo': Int64, 'bar': Float64, 'ham': String})

shape: tuple[int, int] property

Get the shape of the DataFrame.

Examples:

Construct pandas and polars DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>>
>>> df = {"foo": [1, 2, 3, 4, 5]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def agnostic_shape(df_native: IntoDataFrame) -> tuple[int, int]:
...     df = nw.from_native(df_native)
...     return df.shape

We can then pass either pandas, Polars or PyArrow to agnostic_shape:

>>> agnostic_shape(df_pd)
(5, 1)
>>> agnostic_shape(df_pl)
(5, 1)
>>> agnostic_shape(df_pa)
(5, 1)

__arrow_c_stream__(requested_schema=None)

Export a DataFrame via the Arrow PyCapsule Interface.

  • if the underlying dataframe implements the interface, it'll return that
  • else, it'll call to_arrow and then defer to PyArrow's implementation

See PyCapsule Interface for more.

__getitem__(item)

__getitem__(item: tuple[Sequence[int], slice]) -> Self
__getitem__(
    item: tuple[Sequence[int], Sequence[int]]
) -> Self
__getitem__(item: tuple[slice, Sequence[int]]) -> Self
__getitem__(item: tuple[Sequence[int], str]) -> Series
__getitem__(item: tuple[slice, str]) -> Series
__getitem__(
    item: tuple[Sequence[int], Sequence[str]]
) -> Self
__getitem__(item: tuple[slice, Sequence[str]]) -> Self
__getitem__(item: tuple[Sequence[int], int]) -> Series
__getitem__(item: tuple[slice, int]) -> Series
__getitem__(item: Sequence[int]) -> Self
__getitem__(item: str) -> Series
__getitem__(item: Sequence[str]) -> Self
__getitem__(item: slice) -> Self
__getitem__(item: tuple[slice, slice]) -> Self

Extract column or slice of DataFrame.

Parameters:

Name Type Description Default
item str | slice | Sequence[int] | Sequence[str] | tuple[Sequence[int], str | int] | tuple[slice, str | int] | tuple[slice | Sequence[int], Sequence[int] | Sequence[str] | slice] | tuple[slice, slice]

How to slice dataframe. What happens depends on what is passed. It's easiest to explain by example. Suppose we have a Dataframe df:

  • df['a'] extracts column 'a' and returns a Series.
  • df[0:2] extracts the first two rows and returns a DataFrame.
  • df[0:2, 'a'] extracts the first two rows from column 'a' and returns a Series.
  • df[0:2, 0] extracts the first two rows from the first column and returns a Series.
  • df[[0, 1], [0, 1, 2]] extracts the first two rows and the first three columns and returns a DataFrame
  • df[:, [0, 1, 2]] extracts all rows from the first three columns and returns a DataFrame.
  • df[:, ['a', 'c']] extracts all rows and columns 'a' and 'c' and returns a DataFrame.
  • df[['a', 'c']] extracts all rows and columns 'a' and 'c' and returns a DataFrame.
  • df[0: 2, ['a', 'c']] extracts the first two rows and columns 'a' and 'c' and returns a DataFrame
  • df[:, 0: 2] extracts all rows from the first two columns and returns a DataFrame
  • df[:, 'a': 'c'] extracts all rows and all columns positioned between 'a' and 'c' inclusive and returns a DataFrame. For example, if the columns are 'a', 'd', 'c', 'b', then that would extract columns 'a', 'd', and 'c'.
required
Notes
  • Integers are always interpreted as positions
  • Strings are always interpreted as column names.

In contrast with Polars, pandas allows non-string column names. If you don't know whether the column name you're trying to extract is definitely a string (e.g. df[df.columns[0]]) then you should use DataFrame.get_column instead.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>>
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_slice(df_native: IntoDataFrame) -> IntoSeries:
...     df = nw.from_native(df_native)
...     return df["a"].to_native()

We can then pass either pandas, Polars or PyArrow to agnostic_slice:

>>> agnostic_slice(df_pd)
0    1
1    2
Name: a, dtype: int64
>>> agnostic_slice(df_pl)
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> agnostic_slice(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
  [
    1,
    2
  ]
]

clone()

Create a copy of this DataFrame.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function in which we clone the DataFrame:

>>> @nw.narwhalify
... def func(df):
...     return df.clone()
>>> func(df_pd)
   a  b
0  1  3
1  2  4
>>> func(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

collect_schema()

Get an ordered mapping of column names to their data type.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.collect_schema()

You can pass either pandas or Polars to func:

>>> df_pd_schema = func(df_pd)
>>> df_pd_schema
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> df_pl_schema = func(df_pl)
>>> df_pl_schema
Schema({'foo': Int64, 'bar': Float64, 'ham': String})

drop(*columns, strict=True)

Remove columns from the dataframe.

Parameters:

Name Type Description Default
*columns str | Iterable[str]

Names of the columns that should be removed from the dataframe.

()
strict bool

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

True

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.drop("ham")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar
0    1  6.0
1    2  7.0
2    3  8.0
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 6.0 │
│ 2   ┆ 7.0 │
│ 3   ┆ 8.0 │
└─────┴─────┘

Use positional arguments to drop multiple columns.

>>> @nw.narwhalify
... def func(df):
...     return df.drop("foo", "ham")
>>> func(df_pd)
   bar
0  6.0
1  7.0
2  8.0
>>> func(df_pl)
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ f64 │
╞═════╡
│ 6.0 │
│ 7.0 │
│ 8.0 │
└─────┘

drop_nulls(subset=None)

Drop null values.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) for which null values are considered. If set to None (default), use all columns.

None
Notes

pandas and Polars handle null values differently. Polars distinguishes between NaN and Null, whereas pandas doesn't.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1.0, 2.0, None], "ba": [1.0, None, 2.0]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.drop_nulls()

We can then pass either pandas or Polars:

>>> func(df_pd)
     a   ba
0  1.0  1.0
>>> func(df_pl)
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ ba  │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
└─────┴─────┘

filter(*predicates)

Filter the rows in the DataFrame based on one or more predicate expressions.

The original order of the remaining rows is preserved.

Parameters:

Name Type Description Default
*predicates IntoExpr | Iterable[IntoExpr] | list[bool]

Expression(s) that evaluates to a boolean Series. Can also be a (single!) boolean list.

()

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

Let's define a dataframe-agnostic function in which we filter on one condition.

>>> @nw.narwhalify
... def func(df):
...     return df.filter(nw.col("foo") > 1)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar ham
1    2    7   b
2    3    8   c
>>> func(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Filter on multiple conditions, combined with and/or operators:

>>> @nw.narwhalify
... def func(df):
...     return df.filter((nw.col("foo") < 3) & (nw.col("ham") == "a"))
>>> func(df_pd)
   foo  bar ham
0    1    6   a
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> @nw.narwhalify
... def func(df):
...     return df.filter((nw.col("foo") == 1) | (nw.col("ham") == "c"))
>>> func(df_pd)
   foo  bar ham
0    1    6   a
2    3    8   c
>>> func(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

Provide multiple filters using *args syntax:

>>> @nw.narwhalify
... def func(df):
...     dframe = df.filter(
...         nw.col("foo") <= 2,
...         ~nw.col("ham").is_in(["b", "c"]),
...     )
...     return dframe
>>> func(df_pd)
   foo  bar ham
0    1    6   a
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘

gather_every(n, offset=0)

Take every nth row in the DataFrame and return as a new DataFrame.

Parameters:

Name Type Description Default
n int

Gather every n-th row.

required
offset int

Starting index.

0

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function in which gather every 2 rows, starting from a offset of 1:

>>> @nw.narwhalify
... def func(df):
...     return df.gather_every(n=2, offset=1)
>>> func(df_pd)
   a  b
1  2  6
3  4  8
>>> func(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 6   │
│ 4   ┆ 8   │
└─────┴─────┘

get_column(name)

Get a single column by name.

Notes

Although name is typed as str, pandas does allow non-string column names, and they will work when passed to this function if the narwhals.DataFrame is backed by a pandas dataframe with non-string columns. This function can only be used to extract a column by name, so there is no risk of ambiguity.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>>
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

We define a library agnostic function:

>>> def agnostic_get_column(df_native: IntoDataFrame) -> IntoSeries:
...     df = nw.from_native(df_native)
...     name = df.columns[0]
...     return df.get_column(name).to_native()

We can then pass either pandas or Polars to agnostic_get_column:

>>> agnostic_get_column(df_pd)
0    1
1    2
Name: a, dtype: int64
>>> agnostic_get_column(df_pl)
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]

group_by(*keys, drop_null_keys=False)

Start a group by operation.

Parameters:

Name Type Description Default
*keys str | Iterable[str]

Column(s) to group by. Accepts multiple columns names as a list.

()
drop_null_keys bool

if True, then groups where any key is null won't be included in the result.

False

Returns:

Name Type Description
GroupBy GroupBy[Self]

Object which can be used to perform aggregations.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "a": ["a", "b", "a", "b", "c"],
...     "b": [1, 2, 1, 3, 3],
...     "c": [5, 4, 3, 2, 1],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

Let's define a dataframe-agnostic function in which we group by one column and call agg to compute the grouped sum of another column.

>>> @nw.narwhalify
... def func(df):
...     return df.group_by("a").agg(nw.col("b").sum()).sort("a")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a  b
0  a  2
1  b  5
2  c  3
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘

Group by multiple columns by passing a list of column names.

>>> @nw.narwhalify
... def func(df):
...     return df.group_by(["a", "b"]).agg(nw.max("c")).sort("a", "b")
>>> func(df_pd)
   a  b  c
0  a  1  5
1  b  2  4
2  b  3  2
3  c  3  1
>>> func(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘

head(n=5)

Get the first n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return. If a negative value is passed, return all rows except the last abs(n).

5

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "foo": [1, 2, 3, 4, 5],
...     "bar": [6, 7, 8, 9, 10],
...     "ham": ["a", "b", "c", "d", "e"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

Let's define a dataframe-agnostic function that gets the first 3 rows.

>>> @nw.narwhalify
... def func(df):
...     return df.head(3)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar ham
0    1    6   a
1    2    7   b
2    3    8   c
>>> func(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘

is_duplicated()

Get a mask of all duplicated rows in this DataFrame.

Returns:

Type Description
Series

A new Series.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> df_pd = pd.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["x", "y", "z", "x"],
...     }
... )
>>> df_pl = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["x", "y", "z", "x"],
...     }
... )

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.is_duplicated()

We can then pass either pandas or Polars to func:

>>> func(df_pd)
0     True
1    False
2    False
3     True
dtype: bool
>>> func(df_pl)
shape: (4,)
Series: '' [bool]
[
    true
    false
    false
    true
]

is_empty()

Check if the dataframe is empty.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl

Let's define a dataframe-agnostic function that filters rows in which "foo" values are greater than 10, and then checks if the result is empty or not:

>>> @nw.narwhalify
... def func(df):
...     return df.filter(nw.col("foo") > 10).is_empty()

We can then pass either pandas or Polars to func:

>>> df_pd = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
>>> df_pl = pl.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
>>> func(df_pd), func(df_pl)
(True, True)
>>> df_pd = pd.DataFrame({"foo": [100, 2, 3], "bar": [4, 5, 6]})
>>> df_pl = pl.DataFrame({"foo": [100, 2, 3], "bar": [4, 5, 6]})
>>> func(df_pd), func(df_pl)
(False, False)

is_unique()

Get a mask of all unique rows in this DataFrame.

Returns:

Type Description
Series

A new Series.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> df_pd = pd.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["x", "y", "z", "x"],
...     }
... )
>>> df_pl = pl.DataFrame(
...     {
...         "a": [1, 2, 3, 1],
...         "b": ["x", "y", "z", "x"],
...     }
... )

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.is_unique()

We can then pass either pandas or Polars to func:

>>> func(df_pd)
0    False
1     True
2     True
3    False
dtype: bool
>>> func(df_pl)
shape: (4,)
Series: '' [bool]
[
    false
     true
     true
    false
]

item(row=None, column=None)

Return the DataFrame as a scalar, or return the element at the given row/column.

Notes

If row/col not provided, this is equivalent to df[0,0], with a check that the shape is (1,1). With row/col, this is equivalent to df[row,col].

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function that returns item at given row/column

>>> @nw.narwhalify
... def func(df, row, column):
...     return df.item(row, column)

We can then pass either pandas or Polars to func:

>>> func(df_pd, 1, 1), func(df_pd, 2, "b")
(np.int64(5), np.int64(6))
>>> func(df_pl, 1, 1), func(df_pl, 2, "b")
(5, 6)

iter_rows(*, named=False, buffer_size=512)

iter_rows(
    *, named: Literal[False], buffer_size: int = ...
) -> Iterator[tuple[Any, ...]]
iter_rows(
    *, named: Literal[True], buffer_size: int = ...
) -> Iterator[dict[str, Any]]
iter_rows(
    *, named: bool, buffer_size: int = ...
) -> Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]

Returns an iterator over the DataFrame of rows of python-native values.

Parameters:

Name Type Description Default
named bool

By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead.

False
buffer_size int

Determines the number of rows that are buffered internally while iterating over the data. See https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_rows.html

512
Notes

cuDF doesn't support this method.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df, *, named):
...     return df.iter_rows(named=named)

We can then pass either pandas or Polars to func:

>>> [row for row in func(df_pd, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in func(df_pd, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> [row for row in func(df_pl, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in func(df_pl, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]

join(other, on=None, how='inner', *, left_on=None, right_on=None, suffix='_right')

Join in SQL-like fashion.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
on str | list[str] | None

Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None.

None
how Literal['inner', 'left', 'cross', 'semi', 'anti']

Join strategy.

  • inner: Returns rows that have matching values in both tables.
  • left: Returns all rows from the left table, and the matched rows from the right table.
  • cross: Returns the Cartesian product of rows from both tables.
  • semi: Filter rows that have a match in the right table.
  • anti: Filter rows that do not have a match in the right table.
'inner'
left_on str | list[str] | None

Join column of the left DataFrame.

None
right_on str | list[str] | None

Join column of the right DataFrame.

None
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Returns:

Type Description
Self

A new joined DataFrame

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> data_other = {
...     "apple": ["x", "y", "z"],
...     "ham": ["a", "b", "d"],
... }
>>> df_pd = pd.DataFrame(data)
>>> other_pd = pd.DataFrame(data_other)
>>> df_pl = pl.DataFrame(data)
>>> other_pl = pl.DataFrame(data_other)

Let's define a dataframe-agnostic function in which we join over "ham" column:

>>> @nw.narwhalify
... def join_on_ham(df, other_any):
...     return df.join(other_any, left_on="ham", right_on="ham")

We can now pass either pandas or Polars to the function:

>>> join_on_ham(df_pd, other_pd)
   foo  bar ham apple
0    1  6.0   a     x
1    2  7.0   b     y
>>> join_on_ham(df_pl, other_pl)
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
└─────┴─────┴─────┴───────┘

join_asof(other, *, left_on=None, right_on=None, on=None, by_left=None, by_right=None, by=None, strategy='backward')

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the asof_join key.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
left_on str | None

Name(s) of the left join column(s).

None
right_on str | None

Name(s) of the right join column(s).

None
on str | None

Join column of both DataFrames. If set, left_on and right_on should be None.

None
by_left str | list[str] | None

join on these columns before doing asof join

None
by_right str | list[str] | None

join on these columns before doing asof join

None
by str | list[str] | None

join on these columns before doing asof join

None
strategy Literal['backward', 'forward', 'nearest']

Join strategy. The default is "backward".

  • backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key.
  • forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key.
  • nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.
'backward'

Returns:

Type Description
Self

A new joined DataFrame

Examples:

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_pd = pd.DataFrame(data_gdp)
>>> population_pd = pd.DataFrame(data_population)
>>> gdp_pl = pl.DataFrame(data_gdp).sort("datetime")
>>> population_pl = pl.DataFrame(data_population).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" column:

>>> @nw.narwhalify
... def join_asof_datetime(df, other_any, strategy):
...     return df.join_asof(other_any, on="datetime", strategy=strategy)

We can now pass either pandas or Polars to the function:

>>> join_asof_datetime(population_pd, gdp_pd, strategy="backward")
    datetime  population   gdp
0 2016-03-01       82.19  4164
1 2018-08-01       82.66  4566
2 2019-01-01       83.12  4696
>>> join_asof_datetime(population_pl, gdp_pl, strategy="backward")
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

Here is a real-world times-series example that uses by argument.

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_quotes = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 30),
...         datetime(2016, 5, 25, 13, 30, 0, 41),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 49),
...         datetime(2016, 5, 25, 13, 30, 0, 72),
...         datetime(2016, 5, 25, 13, 30, 0, 75),
...     ],
...     "ticker": [
...         "GOOG",
...         "MSFT",
...         "MSFT",
...         "MSFT",
...         "GOOG",
...         "AAPL",
...         "GOOG",
...         "MSFT",
...     ],
...     "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...     "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
>>> data_trades = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 38),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...     ],
...     "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
...     "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...     "quantity": [75, 155, 100, 100, 100],
... }
>>> quotes_pd = pd.DataFrame(data_quotes)
>>> trades_pd = pd.DataFrame(data_trades)
>>> quotes_pl = pl.DataFrame(data_quotes).sort("datetime")
>>> trades_pl = pl.DataFrame(data_trades).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" and by "ticker" columns:

>>> @nw.narwhalify
... def join_asof_datetime_by_ticker(df, other_any):
...     return df.join_asof(other_any, on="datetime", by="ticker")

We can now pass either pandas or Polars to the function:

>>> join_asof_datetime_by_ticker(trades_pd, quotes_pd)
                    datetime ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.000023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.000038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.000048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.000048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.000048   AAPL   98.00       100     NaN     NaN
>>> join_asof_datetime_by_ticker(trades_pl, quotes_pl)
shape: (5, 6)
┌────────────────────────────┬────────┬────────┬──────────┬───────┬────────┐
│ datetime                   ┆ ticker ┆ price  ┆ quantity ┆ bid   ┆ ask    │
│ ---                        ┆ ---    ┆ ---    ┆ ---      ┆ ---   ┆ ---    │
│ datetime[μs]               ┆ str    ┆ f64    ┆ i64      ┆ f64   ┆ f64    │
╞════════════════════════════╪════════╪════════╪══════════╪═══════╪════════╡
│ 2016-05-25 13:30:00.000023 ┆ MSFT   ┆ 51.95  ┆ 75       ┆ 51.95 ┆ 51.96  │
│ 2016-05-25 13:30:00.000038 ┆ MSFT   ┆ 51.95  ┆ 155      ┆ 51.97 ┆ 51.98  │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.77 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.92 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ AAPL   ┆ 98.0   ┆ 100      ┆ null  ┆ null   │
└────────────────────────────┴────────┴────────┴──────────┴───────┴────────┘

lazy()

Lazify the DataFrame (if possible).

If a library does not support lazy execution, then this is a no-op.

Returns:

Type Description
LazyFrame[Any]

A new LazyFrame.

Examples:

Construct pandas, Polars and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>>
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def agnostic_lazy(df_native: IntoFrame) -> IntoFrame:
...     df = nw.from_native(df_native)
...     return df.lazy().to_native()

Note that then, pandas and pyarrow dataframe stay eager, but Polars DataFrame becomes a Polars LazyFrame:

>>> agnostic_lazy(df_pd)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_lazy(df_pl)
<LazyFrame ...>
>>> agnostic_lazy(df_pa)
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]

null_count()

Create a new DataFrame that shows the null counts per column.

Notes

pandas and Polars handle null values differently. Polars distinguishes between NaN and Null, whereas pandas doesn't.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> df_pd = pd.DataFrame(
...     {
...         "foo": [1, None, 3],
...         "bar": [6, 7, None],
...         "ham": ["a", "b", "c"],
...     }
... )
>>> df_pl = pl.DataFrame(
...     {
...         "foo": [1, None, 3],
...         "bar": [6, 7, None],
...         "ham": ["a", "b", "c"],
...     }
... )

Let's define a dataframe-agnostic function that returns the null count of each columns:

>>> @nw.narwhalify
... def func(df):
...     return df.null_count()

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar  ham
0    1    1    0
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 0   │
└─────┴─────┴─────┘

pipe(function, *args, **kwargs)

Pipe function call.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1, 2, 3], "ba": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.pipe(
...         lambda _df: _df.select([x for x in _df.columns if len(x) == 1])
...     )

We can then pass either pandas or Polars:

>>> func(df_pd)
   a
0  1
1  2
2  3
>>> func(df_pl)
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

pivot(on, *, index=None, values=None, aggregate_function=None, maintain_order=True, sort_columns=False, separator='_')

Create a spreadsheet-style pivot table as a DataFrame.

Parameters:

Name Type Description Default
on str | list[str]

Name of the column(s) whose values will be used as the header of the output DataFrame.

required
index str | list[str] | None

One or multiple keys to group by. If None, all remaining columns not specified on on and values will be used. At least one of index and values must be specified.

None
values str | list[str] | None

One or multiple keys to group by. If None, all remaining columns not specified on on and index will be used. At least one of index and values must be specified.

None
aggregate_function Literal['min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'] | None

Choose from: - None: no aggregation takes place, will raise error if multiple values are in group. - A predefined aggregate function string, one of {'min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'}

None
maintain_order bool

Sort the grouped keys so that the output order is predictable.

True
sort_columns bool

Sort the transposed columns by name. Default is by order of discovery.

False
separator str

Used as separator/delimiter in generated column names in case of multiple values columns.

'_'

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "ix": [1, 1, 2, 2, 1, 2],
...     "col": ["a", "a", "a", "a", "b", "b"],
...     "foo": [0, 1, 2, 2, 7, 1],
...     "bar": [0, 2, 0, 0, 9, 4],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.pivot("col", index="ix", aggregate_function="sum")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   ix  foo_a  foo_b  bar_a  bar_b
0   1      1      7      2      9
1   2      4      1      0      4
>>> func(df_pl)
shape: (2, 5)
┌─────┬───────┬───────┬───────┬───────┐
│ ix  ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64   │
╞═════╪═══════╪═══════╪═══════╪═══════╡
│ 1   ┆ 1     ┆ 7     ┆ 2     ┆ 9     │
│ 2   ┆ 4     ┆ 1     ┆ 0     ┆ 4     │
└─────┴───────┴───────┴───────┴───────┘

rename(mapping)

Rename column names.

Parameters:

Name Type Description Default
mapping dict[str, str]

Key value pairs that map from old name to new name.

required

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.rename({"foo": "apple"})

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   apple  bar ham
0      1    6   a
1      2    7   b
2      3    8   c
>>> func(df_pl)
shape: (3, 3)
┌───────┬─────┬─────┐
│ apple ┆ bar ┆ ham │
│ ---   ┆ --- ┆ --- │
│ i64   ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1     ┆ 6   ┆ a   │
│ 2     ┆ 7   ┆ b   │
│ 3     ┆ 8   ┆ c   │
└───────┴─────┴─────┘

row(index)

Get values at given row.

Note

You should NEVER use this method to iterate over a DataFrame; if you require row-iteration you should strongly prefer use of iter_rows() instead.

Parameters:

Name Type Description Default
index int

Row number.

required
Notes

cuDF doesn't support this method.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a library-agnostic function to get the second row.

>>> @nw.narwhalify
... def func(df):
...     return df.row(1)

We can then pass pandas / Polars / any other supported library:

>>> func(df_pd)
(2, 5)
>>> func(df_pl)
(2, 5)

rows(*, named=False)

rows(
    *, named: Literal[False] = False
) -> list[tuple[Any, ...]]
rows(*, named: Literal[True]) -> list[dict[str, Any]]
rows(
    *, named: bool
) -> list[tuple[Any, ...]] | list[dict[str, Any]]

Returns all data in the DataFrame as a list of rows of python-native values.

Parameters:

Name Type Description Default
named bool

By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead.

False

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df, *, named):
...     return df.rows(named=named)

We can then pass either pandas or Polars to func:

>>> func(df_pd, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> func(df_pd, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> func(df_pl, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> func(df_pl, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]

sample(n=None, *, fraction=None, with_replacement=False, seed=None)

Sample from this DataFrame.

Parameters:

Name Type Description Default
n int | None

Number of items to return. Cannot be used with fraction.

None
fraction float | None

Fraction of items to return. Cannot be used with n.

None
with_replacement bool

Allow values to be sampled more than once.

False
seed int | None

Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation.

None
Notes

The results may not be consistent across libraries.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"a": [1, 2, 3, 4], "b": ["x", "y", "x", "y"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.sample(n=2, seed=123)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a  b
3  4  y
0  1  x
>>> func(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 2   ┆ y   │
│ 3   ┆ x   │
└─────┴─────┘

As you can see, by using the same seed, the result will be consistent within the same backend, but not necessarely across different backends.

select(*exprs, **named_exprs)

Select columns from this DataFrame.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

Let's define a dataframe-agnostic function in which we pass the name of a column to select that column.

>>> @nw.narwhalify
... def func(df):
...     return df.select("foo")

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo
0    1
1    2
2    3
>>> func(df_pl)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

Multiple columns can be selected by passing a list of column names.

>>> @nw.narwhalify
... def func(df):
...     return df.select(["foo", "bar"])
>>> func(df_pd)
   foo  bar
0    1    6
1    2    7
2    3    8
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 6   │
│ 2   ┆ 7   │
│ 3   ┆ 8   │
└─────┴─────┘

Multiple columns can also be selected using positional arguments instead of a list. Expressions are also accepted.

>>> @nw.narwhalify
... def func(df):
...     return df.select(nw.col("foo"), nw.col("bar") + 1)
>>> func(df_pd)
   foo  bar
0    1    7
1    2    8
2    3    9
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘

Use keyword arguments to easily name your expression inputs.

>>> @nw.narwhalify
... def func(df):
...     return df.select(threshold=nw.col("foo") * 2)
>>> func(df_pd)
   threshold
0          2
1          4
2          6
>>> func(df_pl)
shape: (3, 1)
┌───────────┐
│ threshold │
│ ---       │
│ i64       │
╞═══════════╡
│ 2         │
│ 4         │
│ 6         │
└───────────┘

sort(by, *more_by, descending=False, nulls_last=False)

Sort the dataframe by the given columns.

Parameters:

Name Type Description Default
by str | Iterable[str]

Column(s) names to sort by.

required
*more_by str

Additional columns to sort by, specified as positional arguments.

()
descending bool | Sequence[bool]

Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.

False
nulls_last bool

Place null values last.

False
Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {
...     "a": [1, 2, None],
...     "b": [6.0, 5.0, 4.0],
...     "c": ["a", "c", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function in which we sort by multiple columns in different orders

>>> @nw.narwhalify
... def func(df):
...     return df.sort("c", "a", descending=[False, True])

We can then pass either pandas or Polars to func:

>>> func(df_pd)
     a    b  c
0  1.0  6.0  a
2  NaN  4.0  b
1  2.0  5.0  c
>>> func(df_pl)
shape: (3, 3)
┌──────┬─────┬─────┐
│ a    ┆ b   ┆ c   │
│ ---  ┆ --- ┆ --- │
│ i64  ┆ f64 ┆ str │
╞══════╪═════╪═════╡
│ 1    ┆ 6.0 ┆ a   │
│ null ┆ 4.0 ┆ b   │
│ 2    ┆ 5.0 ┆ c   │
└──────┴─────┴─────┘

tail(n=5)

Get the last n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return. If a negative value is passed, return all rows except the first abs(n).

5

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "foo": [1, 2, 3, 4, 5],
...     "bar": [6, 7, 8, 9, 10],
...     "ham": ["a", "b", "c", "d", "e"],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

Let's define a dataframe-agnostic function that gets the last 3 rows.

>>> @nw.narwhalify
... def func(df):
...     return df.tail(3)

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo  bar ham
2    3    8   c
3    4    9   d
4    5   10   e
>>> func(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 3   ┆ 8   ┆ c   │
│ 4   ┆ 9   ┆ d   │
│ 5   ┆ 10  ┆ e   │
└─────┴─────┴─────┘

to_arrow()

Convert to arrow table.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = {"foo": [1, 2, 3], "bar": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function that converts to arrow table:

>>> @nw.narwhalify
... def func(df):
...     return df.to_arrow()
>>> func(df_pd)
pyarrow.Table
foo: int64
bar: string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]
>>> func(df_pl)
pyarrow.Table
foo: int64
bar: large_string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]

to_dict(*, as_series=True)

to_dict(
    *, as_series: Literal[True] = ...
) -> dict[str, Series]
to_dict(
    *, as_series: Literal[False]
) -> dict[str, list[Any]]
to_dict(
    *, as_series: bool
) -> dict[str, Series] | dict[str, list[Any]]

Convert DataFrame to a dictionary mapping column name to values.

Parameters:

Name Type Description Default
as_series bool

If set to true True, then the values are Narwhals Series, otherwise the values are Any.

True

Returns:

Type Description
dict[str, Series] | dict[str, list[Any]]

A mapping from column name to values / Series.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>>
>>> df = {
...     "A": [1, 2, 3, 4, 5],
...     "fruits": ["banana", "banana", "apple", "apple", "banana"],
...     "B": [5, 4, 3, 2, 1],
...     "animals": ["beetle", "fly", "beetle", "beetle", "beetle"],
...     "optional": [28, 300, None, 2, -30],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def agnostic_to_dict(
...     df_native: IntoDataFrame,
... ) -> dict[str, list[int | str | float | None]]:
...     df = nw.from_native(df_native)
...     return df.to_dict(as_series=False)

We can then pass either pandas, Polars or PyArrow to agnostic_to_dict:

>>> agnostic_to_dict(df_pd)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28.0, 300.0, nan, 2.0, -30.0]}
>>> agnostic_to_dict(df_pl)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28, 300, None, 2, -30]}
>>> agnostic_to_dict(df_pa)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28, 300, None, 2, -30]}

to_native()

Convert Narwhals DataFrame to native one.

Returns:

Type Description
DataFrameT

Object of class that user started with.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Calling to_native on a Narwhals DataFrame returns the native object:

>>> nw.from_native(df_pd).to_native()
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> nw.from_native(df_pl).to_native()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘
>>> nw.from_native(df_pa).to_native()
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]

to_numpy()

Convert this DataFrame to a NumPy ndarray.

Examples:

Construct pandas and polars DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> import numpy as np
>>> from narwhals.typing import IntoDataFrame
>>>
>>> df = {"foo": [1, 2, 3], "bar": [6.5, 7.0, 8.5], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def agnostic_to_numpy(df_native: IntoDataFrame) -> np.ndarray:
...     df = nw.from_native(df_native)
...     return df.to_numpy()

We can then pass either pandas, Polars or PyArrow to agnostic_to_numpy:

>>> agnostic_to_numpy(df_pd)
array([[1, 6.5, 'a'],
       [2, 7.0, 'b'],
       [3, 8.5, 'c']], dtype=object)
>>> agnostic_to_numpy(df_pl)
array([[1, 6.5, 'a'],
       [2, 7.0, 'b'],
       [3, 8.5, 'c']], dtype=object)
>>> agnostic_to_numpy(df_pa)
array([[1, 6.5, 'a'],
       [2, 7.0, 'b'],
       [3, 8.5, 'c']], dtype=object)

to_pandas()

Convert this DataFrame to a pandas DataFrame.

Examples:

Construct pandas, Polars (eager) and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>>
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def agnostic_to_pandas(df_native: IntoDataFrame) -> pd.DataFrame:
...     df = nw.from_native(df_native)
...     return df.to_pandas()

We can then pass any supported library such as pandas, Polars (eager), or PyArrow to agnostic_to_pandas:

>>> agnostic_to_pandas(df_pd)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_to_pandas(df_pl)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_to_pandas(df_pa)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c

unique(subset=None, *, keep='any', maintain_order=False)

Drop duplicate rows from this dataframe.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) to consider when identifying duplicate rows.

None
keep Literal['any', 'first', 'last', 'none']

{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.

  • 'any': Does not give any guarantee of which row is kept. This allows more optimizations.
  • 'none': Don't keep duplicate rows.
  • 'first': Keep first unique row.
  • 'last': Keep last unique row.
'any'
maintain_order bool

Keep the same order as the original DataFrame. This may be more expensive to compute. Settings this to True blocks the possibility to run on the streaming engine for Polars.

False

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data = {
...     "foo": [1, 2, 3, 1],
...     "bar": ["a", "a", "a", "a"],
...     "ham": ["b", "b", "b", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.unique(["bar", "ham"])

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   foo bar ham
0    1   a   b
>>> func(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘

unpivot(on=None, *, index=None, variable_name=None, value_name=None)

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name Type Description Default
on str | list[str] | None

Column(s) to use as values variables; if on is empty all columns that are not in index will be used.

None
index str | list[str] | None

Column(s) to use as identifier variables.

None
variable_name str | None

Name to give to the variable column. Defaults to "variable".

None
value_name str | None

Name to give to the value column. Defaults to "value".

None
Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> data = {
...     "a": ["x", "y", "z"],
...     "b": [1, 3, 5],
...     "c": [2, 4, 6],
... }

We define a library agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.unpivot(on=["b", "c"], index="a")

We can pass any supported library such as pandas, Polars or PyArrow to func:

>>> func(pl.DataFrame(data))
shape: (6, 3)
┌─────┬──────────┬───────┐
│ a   ┆ variable ┆ value │
│ --- ┆ ---      ┆ ---   │
│ str ┆ str      ┆ i64   │
╞═════╪══════════╪═══════╡
│ x   ┆ b        ┆ 1     │
│ y   ┆ b        ┆ 3     │
│ z   ┆ b        ┆ 5     │
│ x   ┆ c        ┆ 2     │
│ y   ┆ c        ┆ 4     │
│ z   ┆ c        ┆ 6     │
└─────┴──────────┴───────┘
>>> func(pd.DataFrame(data))
   a variable  value
0  x        b      1
1  y        b      3
2  z        b      5
3  x        c      2
4  y        c      4
5  z        c      6
>>> func(pa.table(data))
pyarrow.Table
a: string
variable: string
value: int64
----
a: [["x","y","z"],["x","y","z"]]
variable: [["b","b","b"],["c","c","c"]]
value: [[1,3,5],[2,4,6]]

with_columns(*exprs, **named_exprs)

Add columns to this DataFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Name Type Description
DataFrame Self

A new DataFrame with the columns added.

Note

Creating a new DataFrame using this method does not create a new copy of existing data.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df = {
...     "a": [1, 2, 3, 4],
...     "b": [0.5, 4, 10, 13],
...     "c": [True, True, False, True],
... }
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)

Let's define a dataframe-agnostic function in which we pass an expression to add it as a new column:

>>> @nw.narwhalify
... def func(df):
...     return df.with_columns((nw.col("a") * 2).alias("a*2"))

We can then pass either pandas or Polars to func:

>>> func(df_pd)
   a     b      c  a*2
0  1   0.5   True    2
1  2   4.0   True    4
2  3  10.0  False    6
3  4  13.0   True    8
>>> func(df_pl)
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a   ┆ b    ┆ c     ┆ a*2 │
│ --- ┆ ---  ┆ ---   ┆ --- │
│ i64 ┆ f64  ┆ bool  ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1   ┆ 0.5  ┆ true  ┆ 2   │
│ 2   ┆ 4.0  ┆ true  ┆ 4   │
│ 3   ┆ 10.0 ┆ false ┆ 6   │
│ 4   ┆ 13.0 ┆ true  ┆ 8   │
└─────┴──────┴───────┴─────┘

with_row_index(name='index')

Insert column which enumerates rows.

Examples:

Construct pandas as polars DataFrames:

>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function:

>>> @nw.narwhalify
... def func(df):
...     return df.with_row_index()

We can then pass either pandas or Polars:

>>> func(df_pd)
   index  a  b
0      0  1  4
1      1  2  5
2      2  3  6
>>> func(df_pl)
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a   ┆ b   │
│ ---   ┆ --- ┆ --- │
│ u32   ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0     ┆ 1   ┆ 4   │
│ 1     ┆ 2   ┆ 5   │
│ 2     ┆ 3   ┆ 6   │
└───────┴─────┴─────┘

write_csv(file=None)

Write dataframe to comma-separated values (CSV) file.

Examples:

Construct pandas and Polars DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def func(df):
...     df = nw.from_native(df)
...     return df.write_csv()

We can pass any supported library such as pandas, Polars or PyArrow to func:

>>> func(df_pd)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> func(df_pl)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> func(df_pa)
'"foo","bar","ham"\n1,6,"a"\n2,7,"b"\n3,8,"c"\n'

If we had passed a file name to write_csv, it would have been written to that file.

write_parquet(file)

Write dataframe to parquet file.

Examples:

Construct pandas, Polars and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(df)
>>> df_pl = pl.DataFrame(df)
>>> df_pa = pa.table(df)

We define a library agnostic function:

>>> def func(df):
...     df = nw.from_native(df)
...     df.write_parquet("foo.parquet")

We can then pass either pandas, Polars or PyArrow to func:

>>> func(df_pd)
>>> func(df_pl)
>>> func(df_pa)