Skip to content

narwhals.DataFrame

Narwhals DataFrame, backed by a native eager dataframe.

Warning

This class is not meant to be instantiated directly - instead:

  • If the native object is a eager dataframe from one of the supported backend (e.g. pandas.DataFrame, polars.DataFrame, pyarrow.Table), you can use narwhals.from_native:

    narwhals.from_native(native_dataframe)
    narwhals.from_native(native_dataframe, eager_only=True)
    

  • If the object is a dictionary of column names and generic sequences mapping (e.g. dict[str, list]), you can create a DataFrame via narwhals.from_dict:

    narwhals.from_dict(
        data={"a": [1, 2, 3]},
        native_namespace=narwhals.get_native_namespace(another_object),
    )
    

columns property

Get column names.

Returns:

Type Description
list[str]

The column names stored in a list.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_columns(df_native: IntoFrame) -> list[str]:
...     df = nw.from_native(df_native)
...     return df.columns

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_columns:

>>> agnostic_columns(df_pd)
['foo', 'bar', 'ham']
>>> agnostic_columns(df_pl)
['foo', 'bar', 'ham']
>>> agnostic_columns(df_pa)
['foo', 'bar', 'ham']

implementation property

Return implementation of native frame.

This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.

Returns:

Type Description
Implementation

Implementation.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> df_native = pd.DataFrame({"a": [1, 2, 3]})
>>> df = nw.from_native(df_native)
>>> df.implementation
<Implementation.PANDAS: 1>
>>> df.implementation.is_pandas()
True
>>> df.implementation.is_pandas_like()
True
>>> df.implementation.is_polars()
False

schema property

Get an ordered mapping of column names to their data type.

Returns:

Type Description
Schema

A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.schema import Schema
>>> from narwhals.typing import IntoFrame
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_schema(df_native: IntoFrame) -> Schema:
...     df = nw.from_native(df_native)
...     return df.schema

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_schema:

>>> agnostic_schema(df_pd)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_schema(df_pl)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_schema(df_pa)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})

shape property

Get the shape of the DataFrame.

Returns:

Type Description
tuple[int, int]

The shape of the dataframe as a tuple.

Examples:

Construct pandas and polars DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3, 4, 5]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_shape(df_native: IntoDataFrame) -> tuple[int, int]:
...     df = nw.from_native(df_native)
...     return df.shape

We can then pass either pandas, Polars or PyArrow to agnostic_shape:

>>> agnostic_shape(df_pd)
(5, 1)
>>> agnostic_shape(df_pl)
(5, 1)
>>> agnostic_shape(df_pa)
(5, 1)

__arrow_c_stream__(requested_schema=None)

Export a DataFrame via the Arrow PyCapsule Interface.

  • if the underlying dataframe implements the interface, it'll return that
  • else, it'll call to_arrow and then defer to PyArrow's implementation

See PyCapsule Interface for more.

__getitem__(item)

Extract column or slice of DataFrame.

Parameters:

Name Type Description Default
item str | slice | Sequence[int] | Sequence[str] | tuple[Sequence[int], str | int] | tuple[slice, str | int] | tuple[slice | Sequence[int], Sequence[int] | Sequence[str] | slice] | tuple[slice, slice]

How to slice dataframe. What happens depends on what is passed. It's easiest to explain by example. Suppose we have a Dataframe df:

  • df['a'] extracts column 'a' and returns a Series.
  • df[0:2] extracts the first two rows and returns a DataFrame.
  • df[0:2, 'a'] extracts the first two rows from column 'a' and returns a Series.
  • df[0:2, 0] extracts the first two rows from the first column and returns a Series.
  • df[[0, 1], [0, 1, 2]] extracts the first two rows and the first three columns and returns a DataFrame
  • df[:, [0, 1, 2]] extracts all rows from the first three columns and returns a DataFrame.
  • df[:, ['a', 'c']] extracts all rows and columns 'a' and 'c' and returns a DataFrame.
  • df[['a', 'c']] extracts all rows and columns 'a' and 'c' and returns a DataFrame.
  • df[0: 2, ['a', 'c']] extracts the first two rows and columns 'a' and 'c' and returns a DataFrame
  • df[:, 0: 2] extracts all rows from the first two columns and returns a DataFrame
  • df[:, 'a': 'c'] extracts all rows and all columns positioned between 'a' and 'c' inclusive and returns a DataFrame. For example, if the columns are 'a', 'd', 'c', 'b', then that would extract columns 'a', 'd', and 'c'.
required

Returns:

Type Description
Series[Any] | Self

A Narwhals Series, backed by a native series.

Notes
  • Integers are always interpreted as positions
  • Strings are always interpreted as column names.

In contrast with Polars, pandas allows non-string column names. If you don't know whether the column name you're trying to extract is definitely a string (e.g. df[df.columns[0]]) then you should use DataFrame.get_column instead.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_slice(df_native: IntoDataFrame) -> IntoSeries:
...     df = nw.from_native(df_native)
...     return df["a"].to_native()

We can then pass either pandas, Polars or PyArrow to agnostic_slice:

>>> agnostic_slice(df_pd)
0    1
1    2
Name: a, dtype: int64
>>> agnostic_slice(df_pl)
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> agnostic_slice(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
  [
    1,
    2
  ]
]

clone()

Create a copy of this DataFrame.

Returns:

Type Description
Self

An identical copy of the original dataframe.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)

Let's define a dataframe-agnostic function in which we clone the DataFrame:

>>> def agnostic_clone(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.clone().to_native()

We can then pass any supported library such as Pandas or Polars to agnostic_clone:

>>> agnostic_clone(df_pd)
   a  b
0  1  3
1  2  4
>>> agnostic_clone(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

collect_schema()

Get an ordered mapping of column names to their data type.

Returns:

Type Description
Schema

A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.schema import Schema
>>> from narwhals.typing import IntoFrame
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_collect_schema(df_native: IntoFrame) -> Schema:
...     df = nw.from_native(df_native)
...     return df.collect_schema()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_collect_schema:

>>> agnostic_collect_schema(df_pd)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_collect_schema(df_pl)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_collect_schema(df_pa)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})

drop(*columns, strict=True)

Remove columns from the dataframe.

Returns:

Type Description
Self

The dataframe with the specified columns removed.

Parameters:

Name Type Description Default
*columns str | Iterable[str]

Names of the columns that should be removed from the dataframe.

()
strict bool

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

True

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_drop(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).drop("ham").to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_drop:

>>> agnostic_drop(df_pd)
   foo  bar
0    1  6.0
1    2  7.0
2    3  8.0
>>> agnostic_drop(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 6.0 │
│ 2   ┆ 7.0 │
│ 3   ┆ 8.0 │
└─────┴─────┘
>>> agnostic_drop(df_pa)
pyarrow.Table
foo: int64
bar: double
----
foo: [[1,2,3]]
bar: [[6,7,8]]

Use positional arguments to drop multiple columns.

>>> def agnostic_drop_multi(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).drop("foo", "ham").to_native()
>>> agnostic_drop_multi(df_pd)
   bar
0  6.0
1  7.0
2  8.0
>>> agnostic_drop_multi(df_pl)
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ f64 │
╞═════╡
│ 6.0 │
│ 7.0 │
│ 8.0 │
└─────┘
>>> agnostic_drop_multi(df_pa)
pyarrow.Table
bar: double
----
bar: [[6,7,8]]

drop_nulls(subset=None)

Drop rows that contain null values.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) for which null values are considered. If set to None (default), use all columns.

None

Returns:

Type Description
Self

The original object with the rows removed that contained the null values.

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import polars as pl
>>> import pandas as pd
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1.0, 2.0, None], "ba": [1.0, None, 2.0]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_drop_nulls(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.drop_nulls().to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_drop_nulls:

>>> agnostic_drop_nulls(df_pd)
     a   ba
0  1.0  1.0
>>> agnostic_drop_nulls(df_pl)
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ ba  │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
└─────┴─────┘
>>> agnostic_drop_nulls(df_pa)
pyarrow.Table
a: double
ba: double
----
a: [[1]]
ba: [[1]]

estimated_size(unit='b')

Return an estimation of the total (heap) allocated size of the DataFrame.

Estimated size is given in the specified unit (bytes by default).

Parameters:

Name Type Description Default
unit SizeUnit

'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', 'gigabytes', or 'terabytes'.

'b'

Returns:

Type Description
int | float

Integer or Float.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_estimated_size(df_native: IntoDataFrameT) -> int | float:
...     df = nw.from_native(df_native)
...     return df.estimated_size()

We can then pass either pandas, Polars or PyArrow to agnostic_estimated_size:

>>> agnostic_estimated_size(df_pd)
np.int64(330)
>>> agnostic_estimated_size(df_pl)
51
>>> agnostic_estimated_size(df_pa)
63

explode(columns, *more_columns)

Explode the dataframe to long format by exploding the given columns.

Notes

It is possible to explode multiple columns only if these columns must have matching element counts.

Parameters:

Name Type Description Default
columns str | Sequence[str]

Column names. The underlying columns being exploded must be of the List data type.

required
*more_columns str

Additional names of columns to explode, specified as positional arguments.

()

Returns:

Type Description
Self

New DataFrame

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "a": ["x", "y", "z", "w"],
...     "lst1": [[1, 2], None, [None], []],
...     "lst2": [[3, None], None, [42], []],
... }

We define a library agnostic function:

>>> def agnostic_explode(df_native: IntoFrameT) -> IntoFrameT:
...     return (
...         nw.from_native(df_native)
...         .with_columns(nw.col("lst1", "lst2").cast(nw.List(nw.Int32())))
...         .explode("lst1", "lst2")
...         .to_native()
...     )

We can then pass any supported library such as pandas, Polars (eager), or PyArrow to agnostic_explode:

>>> agnostic_explode(pd.DataFrame(data))
   a  lst1  lst2
0  x     1     3
0  x     2  <NA>
1  y  <NA>  <NA>
2  z  <NA>    42
3  w  <NA>  <NA>
>>> agnostic_explode(pl.DataFrame(data))
shape: (5, 3)
┌─────┬──────┬──────┐
│ a   ┆ lst1 ┆ lst2 │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i32  ┆ i32  │
╞═════╪══════╪══════╡
│ x   ┆ 1    ┆ 3    │
│ x   ┆ 2    ┆ null │
│ y   ┆ null ┆ null │
│ z   ┆ null ┆ 42   │
│ w   ┆ null ┆ null │
└─────┴──────┴──────┘

filter(*predicates, **constraints)

Filter the rows in the DataFrame based on one or more predicate expressions.

The original order of the remaining rows is preserved.

Parameters:

Name Type Description Default
*predicates IntoExpr | Iterable[IntoExpr] | list[bool]

Expression(s) that evaluates to a boolean Series. Can also be a (single!) boolean list.

()
**constraints Any

Column filters; use name = value to filter columns by the supplied value. Each constraint will behave the same as nw.col(name).eq(value), and will be implicitly joined with the other filter conditions using &.

{}

Returns:

Type Description
Self

The filtered dataframe.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function in which we filter on one condition.

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter(nw.col("foo") > 1).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_filter:

>>> agnostic_filter(df_pd)
   foo  bar ham
1    2    7   b
2    3    8   c
>>> agnostic_filter(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[2,3]]
bar: [[7,8]]
ham: [["b","c"]]

Filter on multiple conditions, combined with and/or operators:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter((nw.col("foo") < 3) & (nw.col("ham") == "a")).to_native()
>>> agnostic_filter(df_pd)
   foo  bar ham
0    1    6   a
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1]]
bar: [[6]]
ham: [["a"]]
>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     dframe = df.filter(
...         (nw.col("foo") == 1) | (nw.col("ham") == "c")
...     ).to_native()
...     return dframe
>>> agnostic_filter(df_pd)
   foo  bar ham
0    1    6   a
2    3    8   c
>>> agnostic_filter(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1,3]]
bar: [[6,8]]
ham: [["a","c"]]

Provide multiple filters using *args syntax:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     dframe = df.filter(
...         nw.col("foo") <= 2,
...         ~nw.col("ham").is_in(["b", "c"]),
...     ).to_native()
...     return dframe
>>> agnostic_filter(df_pd)
   foo  bar ham
0    1    6   a
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1]]
bar: [[6]]
ham: [["a"]]

Provide multiple filters using **kwargs syntax:

>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.filter(foo=2, ham="b").to_native()
>>> agnostic_filter(df_pd)
   foo  bar ham
1    2    7   b
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 7   ┆ b   │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[2]]
bar: [[7]]
ham: [["b"]]

gather_every(n, offset=0)

Take every nth row in the DataFrame and return as a new DataFrame.

Parameters:

Name Type Description Default
n int

Gather every n-th row.

required
offset int

Starting index.

0

Returns:

Type Description
Self

The dataframe containing only the selected rows.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function in which gather every 2 rows, starting from a offset of 1:

>>> def agnostic_gather_every(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.gather_every(n=2, offset=1).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_gather_every:

>>> agnostic_gather_every(df_pd)
   a  b
1  2  6
3  4  8
>>> agnostic_gather_every(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 6   │
│ 4   ┆ 8   │
└─────┴─────┘
>>> agnostic_gather_every(df_pa)
pyarrow.Table
a: int64
b: int64
----
a: [[2,4]]
b: [[6,8]]

get_column(name)

Get a single column by name.

Parameters:

Name Type Description Default
name str

The column name as a string.

required

Returns:

Type Description
Series[Any]

A Narwhals Series, backed by a native series.

Notes

Although name is typed as str, pandas does allow non-string column names, and they will work when passed to this function if the narwhals.DataFrame is backed by a pandas dataframe with non-string columns. This function can only be used to extract a column by name, so there is no risk of ambiguity.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_get_column(df_native: IntoDataFrame) -> IntoSeries:
...     df = nw.from_native(df_native)
...     name = df.columns[0]
...     return df.get_column(name).to_native()

We can then pass either pandas, Polars or PyArrow to agnostic_get_column:

>>> agnostic_get_column(df_pd)
0    1
1    2
Name: a, dtype: int64
>>> agnostic_get_column(df_pl)
shape: (2,)
Series: 'a' [i64]
[
    1
    2
]
>>> agnostic_get_column(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
  [
    1,
    2
  ]
]

group_by(*keys, drop_null_keys=False)

Start a group by operation.

Parameters:

Name Type Description Default
*keys str | Iterable[str]

Column(s) to group by. Accepts multiple columns names as a list.

()
drop_null_keys bool

if True, then groups where any key is null won't be included in the result.

False

Returns:

Name Type Description
GroupBy GroupBy[Self]

Object which can be used to perform aggregations.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {
...     "a": ["a", "b", "a", "b", "c"],
...     "b": [1, 2, 1, 3, 3],
...     "c": [5, 4, 3, 2, 1],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function in which we group by one column and call agg to compute the grouped sum of another column.

>>> def agnostic_group_by_agg(df_native: IntoDataFrameT) -> IntoDataFrameT:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.group_by("a").agg(nw.col("b").sum()).sort("a").to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_group_by_agg:

>>> agnostic_group_by_agg(df_pd)
   a  b
0  a  2
1  b  5
2  c  3
>>> agnostic_group_by_agg(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘
>>> agnostic_group_by_agg(df_pa)
pyarrow.Table
a: string
b: int64
----
a: [["a","b","c"]]
b: [[2,5,3]]

Group by multiple columns by passing a list of column names.

>>> def agnostic_group_by_agg(df_native: IntoDataFrameT) -> IntoDataFrameT:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.group_by(["a", "b"]).agg(nw.max("c")).sort("a", "b").to_native()
>>> agnostic_group_by_agg(df_pd)
   a  b  c
0  a  1  5
1  b  2  4
2  b  3  2
3  c  3  1
>>> agnostic_group_by_agg(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘
>>> agnostic_group_by_agg(df_pa)
pyarrow.Table
a: string
b: int64
c: int64
----
a: [["a","b","b","c"]]
b: [[1,2,3,3]]
c: [[5,4,2,1]]

head(n=5)

Get the first n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return. If a negative value is passed, return all rows except the last abs(n).

5

Returns:

Type Description
Self

A subset of the dataframe of shape (n, n_columns).

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, 2, 3, 4, 5],
...     "bar": [6, 7, 8, 9, 10],
...     "ham": ["a", "b", "c", "d", "e"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function that gets the first 3 rows.

>>> def agnostic_head(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).head(3).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_head:

>>> agnostic_head(df_pd)
   foo  bar ham
0    1    6   a
1    2    7   b
2    3    8   c
>>> agnostic_head(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_head(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]

is_duplicated()

Get a mask of all duplicated rows in this DataFrame.

Returns:

Type Description
Series[Any]

A new Series.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {
...     "a": [1, 2, 3, 1],
...     "b": ["x", "y", "z", "x"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_is_duplicated(df_native: IntoDataFrame) -> IntoSeries:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.is_duplicated().to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_is_duplicated:

>>> agnostic_is_duplicated(df_pd)
0     True
1    False
2    False
3     True
dtype: bool
>>> agnostic_is_duplicated(df_pl)
shape: (4,)
Series: '' [bool]
[
    true
    false
    false
    true
]
>>> agnostic_is_duplicated(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
  [
    true,
    false,
    false,
    true
  ]
]

is_empty()

Check if the dataframe is empty.

Returns:

Type Description
bool

A boolean indicating whether the dataframe is empty (True) or not (False).

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame

Let's define a dataframe-agnostic function that filters rows in which "foo" values are greater than 10, and then checks if the result is empty or not:

>>> def agnostic_is_empty(df_native: IntoDataFrame) -> bool:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.filter(nw.col("foo") > 10).is_empty()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_is_empty:

>>> data = {"foo": [1, 2, 3], "bar": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
>>> agnostic_is_empty(df_pd), agnostic_is_empty(df_pl), agnostic_is_empty(df_pa)
(True, True, True)
>>> data = {"foo": [100, 2, 3], "bar": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
>>> agnostic_is_empty(df_pd), agnostic_is_empty(df_pl), agnostic_is_empty(df_pa)
(False, False, False)

is_unique()

Get a mask of all unique rows in this DataFrame.

Returns:

Type Description
Series[Any]

A new Series.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {
...     "a": [1, 2, 3, 1],
...     "b": ["x", "y", "z", "x"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_is_unique(df_native: IntoDataFrame) -> IntoSeries:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.is_unique().to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_is_unique:

>>> agnostic_is_unique(df_pd)
0    False
1     True
2     True
3    False
dtype: bool
>>> agnostic_is_unique(df_pl)
shape: (4,)
Series: '' [bool]
[
    false
     true
     true
    false
]
>>> agnostic_is_unique(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
  [
    false,
    true,
    true,
    false
  ]
]

item(row=None, column=None)

Return the DataFrame as a scalar, or return the element at the given row/column.

Parameters:

Name Type Description Default
row int | None

The n-th row.

None
column int | str | None

The column selected via an integer or a string (column name).

None

Returns:

Type Description
Any

A scalar or the specified element in the dataframe.

Notes

If row/col not provided, this is equivalent to df[0,0], with a check that the shape is (1,1). With row/col, this is equivalent to df[row,col].

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function that returns item at given row/column

>>> def agnostic_item(
...     df_native: IntoDataFrame, row: int | None, column: int | str | None
... ):
...     df = nw.from_native(df_native, eager_only=True)
...     return df.item(row, column)

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_item:

>>> agnostic_item(df_pd, 1, 1), agnostic_item(df_pd, 2, "b")
(np.int64(5), np.int64(6))
>>> agnostic_item(df_pl, 1, 1), agnostic_item(df_pl, 2, "b")
(5, 6)
>>> agnostic_item(df_pa, 1, 1), agnostic_item(df_pa, 2, "b")
(5, 6)

iter_rows(*, named=False, buffer_size=512)

Returns an iterator over the DataFrame of rows of python-native values.

Parameters:

Name Type Description Default
named bool

By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead.

False
buffer_size int

Determines the number of rows that are buffered internally while iterating over the data. See https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_rows.html

512

Returns:

Type Description
Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]

An iterator over the DataFrame of rows.

Notes

cuDF doesn't support this method.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_iter_rows(df_native: IntoDataFrame, *, named: bool):
...     return nw.from_native(df_native, eager_only=True).iter_rows(named=named)

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_iter_rows:

>>> [row for row in agnostic_iter_rows(df_pd, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in agnostic_iter_rows(df_pd, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> [row for row in agnostic_iter_rows(df_pl, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in agnostic_iter_rows(df_pl, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> [row for row in agnostic_iter_rows(df_pa, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in agnostic_iter_rows(df_pa, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]

join(other, on=None, how='inner', *, left_on=None, right_on=None, suffix='_right')

Join in SQL-like fashion.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
on str | list[str] | None

Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None.

None
how Literal['inner', 'left', 'cross', 'semi', 'anti']

Join strategy.

  • inner: Returns rows that have matching values in both tables.
  • left: Returns all rows from the left table, and the matched rows from the right table.
  • cross: Returns the Cartesian product of rows from both tables.
  • semi: Filter rows that have a match in the right table.
  • anti: Filter rows that do not have a match in the right table.
'inner'
left_on str | list[str] | None

Join column of the left DataFrame.

None
right_on str | list[str] | None

Join column of the right DataFrame.

None
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Returns:

Type Description
Self

A new joined DataFrame

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6.0, 7.0, 8.0],
...     "ham": ["a", "b", "c"],
... }
>>> data_other = {
...     "apple": ["x", "y", "z"],
...     "ham": ["a", "b", "d"],
... }
>>> df_pd = pd.DataFrame(data)
>>> other_pd = pd.DataFrame(data_other)
>>> df_pl = pl.DataFrame(data)
>>> other_pl = pl.DataFrame(data_other)
>>> df_pa = pa.table(data)
>>> other_pa = pa.table(data_other)

Let's define a dataframe-agnostic function in which we join over "ham" column:

>>> def agnostic_join_on_ham(
...     df_native: IntoFrameT, other_native: IntoFrameT
... ) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     other = nw.from_native(other_native)
...     return df.join(other, left_on="ham", right_on="ham").to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_join_on_ham:

>>> agnostic_join_on_ham(df_pd, other_pd)
   foo  bar ham apple
0    1  6.0   a     x
1    2  7.0   b     y
>>> agnostic_join_on_ham(df_pl, other_pl)
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ ---   │
│ i64 ┆ f64 ┆ str ┆ str   │
╞═════╪═════╪═════╪═══════╡
│ 1   ┆ 6.0 ┆ a   ┆ x     │
│ 2   ┆ 7.0 ┆ b   ┆ y     │
└─────┴─────┴─────┴───────┘
>>> agnostic_join_on_ham(df_pa, other_pa)
pyarrow.Table
foo: int64
bar: double
ham: string
apple: string
----
foo: [[1,2]]
bar: [[6,7]]
ham: [["a","b"]]
apple: [["x","y"]]

join_asof(other, *, left_on=None, right_on=None, on=None, by_left=None, by_right=None, by=None, strategy='backward')

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the asof_join key.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
left_on str | None

Name(s) of the left join column(s).

None
right_on str | None

Name(s) of the right join column(s).

None
on str | None

Join column of both DataFrames. If set, left_on and right_on should be None.

None
by_left str | list[str] | None

join on these columns before doing asof join.

None
by_right str | list[str] | None

join on these columns before doing asof join.

None
by str | list[str] | None

join on these columns before doing asof join.

None
strategy Literal['backward', 'forward', 'nearest']

Join strategy. The default is "backward".

  • backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key.
  • forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key.
  • nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.
'backward'

Returns:

Type Description
Self

A new joined DataFrame

Examples:

>>> from datetime import datetime
>>> from typing import Literal
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_pd = pd.DataFrame(data_gdp)
>>> population_pd = pd.DataFrame(data_population)
>>> gdp_pl = pl.DataFrame(data_gdp).sort("datetime")
>>> population_pl = pl.DataFrame(data_population).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" column:

>>> def agnostic_join_asof_datetime(
...     df_native: IntoFrameT,
...     other_native: IntoFrameT,
...     strategy: Literal["backward", "forward", "nearest"],
... ) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     other = nw.from_native(other_native)
...     return df.join_asof(other, on="datetime", strategy=strategy).to_native()

We can then pass any supported library such as Pandas or Polars to agnostic_join_asof_datetime:

>>> agnostic_join_asof_datetime(population_pd, gdp_pd, strategy="backward")
    datetime  population   gdp
0 2016-03-01       82.19  4164
1 2018-08-01       82.66  4566
2 2019-01-01       83.12  4696
>>> agnostic_join_asof_datetime(population_pl, gdp_pl, strategy="backward")
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

Here is a real-world times-series example that uses by argument.

>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_quotes = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 30),
...         datetime(2016, 5, 25, 13, 30, 0, 41),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 49),
...         datetime(2016, 5, 25, 13, 30, 0, 72),
...         datetime(2016, 5, 25, 13, 30, 0, 75),
...     ],
...     "ticker": [
...         "GOOG",
...         "MSFT",
...         "MSFT",
...         "MSFT",
...         "GOOG",
...         "AAPL",
...         "GOOG",
...         "MSFT",
...     ],
...     "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
...     "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
>>> data_trades = {
...     "datetime": [
...         datetime(2016, 5, 25, 13, 30, 0, 23),
...         datetime(2016, 5, 25, 13, 30, 0, 38),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...         datetime(2016, 5, 25, 13, 30, 0, 48),
...     ],
...     "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
...     "price": [51.95, 51.95, 720.77, 720.92, 98.0],
...     "quantity": [75, 155, 100, 100, 100],
... }
>>> quotes_pd = pd.DataFrame(data_quotes)
>>> trades_pd = pd.DataFrame(data_trades)
>>> quotes_pl = pl.DataFrame(data_quotes).sort("datetime")
>>> trades_pl = pl.DataFrame(data_trades).sort("datetime")

Let's define a dataframe-agnostic function in which we join over "datetime" and by "ticker" columns:

>>> def agnostic_join_asof_datetime_by_ticker(
...     df_native: IntoFrameT, other_native: IntoFrameT
... ) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     other = nw.from_native(other_native)
...     return df.join_asof(other, on="datetime", by="ticker").to_native()

We can now pass either pandas or Polars to the function:

>>> agnostic_join_asof_datetime_by_ticker(trades_pd, quotes_pd)
                    datetime ticker   price  quantity     bid     ask
0 2016-05-25 13:30:00.000023   MSFT   51.95        75   51.95   51.96
1 2016-05-25 13:30:00.000038   MSFT   51.95       155   51.97   51.98
2 2016-05-25 13:30:00.000048   GOOG  720.77       100  720.50  720.93
3 2016-05-25 13:30:00.000048   GOOG  720.92       100  720.50  720.93
4 2016-05-25 13:30:00.000048   AAPL   98.00       100     NaN     NaN
>>> agnostic_join_asof_datetime_by_ticker(trades_pl, quotes_pl)
shape: (5, 6)
┌────────────────────────────┬────────┬────────┬──────────┬───────┬────────┐
│ datetime                   ┆ ticker ┆ price  ┆ quantity ┆ bid   ┆ ask    │
│ ---                        ┆ ---    ┆ ---    ┆ ---      ┆ ---   ┆ ---    │
│ datetime[μs]               ┆ str    ┆ f64    ┆ i64      ┆ f64   ┆ f64    │
╞════════════════════════════╪════════╪════════╪══════════╪═══════╪════════╡
│ 2016-05-25 13:30:00.000023 ┆ MSFT   ┆ 51.95  ┆ 75       ┆ 51.95 ┆ 51.96  │
│ 2016-05-25 13:30:00.000038 ┆ MSFT   ┆ 51.95  ┆ 155      ┆ 51.97 ┆ 51.98  │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.77 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG   ┆ 720.92 ┆ 100      ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ AAPL   ┆ 98.0   ┆ 100      ┆ null  ┆ null   │
└────────────────────────────┴────────┴────────┴──────────┴───────┴────────┘

lazy()

Lazify the DataFrame (if possible).

If a library does not support lazy execution, then this is a no-op.

Returns:

Type Description
LazyFrame[Any]

A new LazyFrame.

Examples:

Construct pandas, Polars and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_lazy(df_native: IntoFrame) -> IntoFrame:
...     df = nw.from_native(df_native)
...     return df.lazy().to_native()

Note that then, pandas and pyarrow dataframe stay eager, but Polars DataFrame becomes a Polars LazyFrame:

>>> agnostic_lazy(df_pd)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_lazy(df_pl)
<LazyFrame ...>
>>> agnostic_lazy(df_pa)
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]

null_count()

Create a new DataFrame that shows the null counts per column.

Returns:

Type Description
Self

A dataframe of shape (1, n_columns).

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, None, 3],
...     "bar": [6, 7, None],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function that returns the null count of each columns:

>>> def agnostic_null_count(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.null_count().to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_null_count:

>>> agnostic_null_count(df_pd)
   foo  bar  ham
0    1    1    0
>>> agnostic_null_count(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 0   │
└─────┴─────┴─────┘
>>> agnostic_null_count(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: int64
----
foo: [[1]]
bar: [[1]]
ham: [[0]]

pipe(function, *args, **kwargs)

Pipe function call.

Parameters:

Name Type Description Default
function Callable[[Any], Self]

Function to apply.

required
args Any

Positional arguments to pass to function.

()
kwargs Any

Keyword arguments to pass to function.

{}

Returns:

Type Description
Self

The original object with the function applied.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3], "ba": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_pipe(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.pipe(
...         lambda _df: _df.select(
...             [x for x in _df.columns if len(x) == 1]
...         ).to_native()
...     )

We can then pass either pandas, Polars or PyArrow to agnostic_pipe:

>>> agnostic_pipe(df_pd)
   a
0  1
1  2
2  3
>>> agnostic_pipe(df_pl)
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘
>>> agnostic_pipe(df_pa)
pyarrow.Table
a: int64
----
a: [[1,2,3]]

pivot(on, *, index=None, values=None, aggregate_function=None, maintain_order=None, sort_columns=False, separator='_')

Create a spreadsheet-style pivot table as a DataFrame.

Parameters:

Name Type Description Default
on str | list[str]

Name of the column(s) whose values will be used as the header of the output DataFrame.

required
index str | list[str] | None

One or multiple keys to group by. If None, all remaining columns not specified on on and values will be used. At least one of index and values must be specified.

None
values str | list[str] | None

One or multiple keys to group by. If None, all remaining columns not specified on on and index will be used. At least one of index and values must be specified.

None
aggregate_function Literal['min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'] | None

Choose from:

  • None: no aggregation takes place, will raise error if multiple values are in group.
  • A predefined aggregate function string, one of {'min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'}
None
maintain_order bool | None

Has no effect and is kept around only for backwards-compatibility.

None
sort_columns bool

Sort the transposed columns by name. Default is by order of discovery.

False
separator str

Used as separator/delimiter in generated column names in case of multiple values columns.

'_'

Returns:

Type Description
Self

A new dataframe.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {
...     "ix": [1, 1, 2, 2, 1, 2],
...     "col": ["a", "a", "a", "a", "b", "b"],
...     "foo": [0, 1, 2, 2, 7, 1],
...     "bar": [0, 2, 0, 0, 9, 4],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_pivot(df_native: IntoDataFrameT) -> IntoDataFrameT:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.pivot("col", index="ix", aggregate_function="sum").to_native()

We can then pass any supported library such as Pandas or Polars to agnostic_pivot:

>>> agnostic_pivot(df_pd)
   ix  foo_a  foo_b  bar_a  bar_b
0   1      1      7      2      9
1   2      4      1      0      4
>>> agnostic_pivot(df_pl)
shape: (2, 5)
┌─────┬───────┬───────┬───────┬───────┐
│ ix  ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64   │
╞═════╪═══════╪═══════╪═══════╪═══════╡
│ 1   ┆ 1     ┆ 7     ┆ 2     ┆ 9     │
│ 2   ┆ 4     ┆ 1     ┆ 0     ┆ 4     │
└─────┴───────┴───────┴───────┴───────┘

rename(mapping)

Rename column names.

Parameters:

Name Type Description Default
mapping dict[str, str]

Key value pairs that map from old name to new name.

required

Returns:

Type Description
Self

The dataframe with the specified columns renamed.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_rename(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).rename({"foo": "apple"}).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_rename:

>>> agnostic_rename(df_pd)
   apple  bar ham
0      1    6   a
1      2    7   b
2      3    8   c
>>> agnostic_rename(df_pl)
shape: (3, 3)
┌───────┬─────┬─────┐
│ apple ┆ bar ┆ ham │
│ ---   ┆ --- ┆ --- │
│ i64   ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1     ┆ 6   ┆ a   │
│ 2     ┆ 7   ┆ b   │
│ 3     ┆ 8   ┆ c   │
└───────┴─────┴─────┘
>>> agnostic_rename(df_pa)
pyarrow.Table
apple: int64
bar: int64
ham: string
----
apple: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]

row(index)

Get values at given row.

Warning

You should NEVER use this method to iterate over a DataFrame; if you require row-iteration you should strongly prefer use of iter_rows() instead.

Parameters:

Name Type Description Default
index int

Row number.

required

Returns:

Type Description
tuple[Any, ...]

A tuple of the values in the selected row.

Notes

cuDF doesn't support this method.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> from narwhals.typing import IntoDataFrame
>>> from typing import Any
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a library-agnostic function to get the second row.

>>> def agnostic_row(df_native: IntoDataFrame) -> tuple[Any, ...]:
...     return nw.from_native(df_native).row(1)

We can then pass either pandas, Polars or PyArrow to agnostic_row:

>>> agnostic_row(df_pd)
(2, 5)
>>> agnostic_row(df_pl)
(2, 5)
>>> agnostic_row(df_pa)
(<pyarrow.Int64Scalar: 2>, <pyarrow.Int64Scalar: 5>)

rows(*, named=False)

Returns all data in the DataFrame as a list of rows of python-native values.

Parameters:

Name Type Description Default
named bool

By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead.

False

Returns:

Type Description
list[tuple[Any, ...]] | list[dict[str, Any]]

The data as a list of rows.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_rows(df_native: IntoDataFrame, *, named: bool):
...     return nw.from_native(df_native, eager_only=True).rows(named=named)

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_rows:

>>> agnostic_rows(df_pd, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> agnostic_rows(df_pd, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> agnostic_rows(df_pl, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> agnostic_rows(df_pl, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> agnostic_rows(df_pa, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> agnostic_rows(df_pa, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]

sample(n=None, *, fraction=None, with_replacement=False, seed=None)

Sample from this DataFrame.

Parameters:

Name Type Description Default
n int | None

Number of items to return. Cannot be used with fraction.

None
fraction float | None

Fraction of items to return. Cannot be used with n.

None
with_replacement bool

Allow values to be sampled more than once.

False
seed int | None

Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation.

None

Returns:

Type Description
Self

A new dataframe.

Notes

The results may not be consistent across libraries.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {"a": [1, 2, 3, 4], "b": ["x", "y", "x", "y"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_sample(df_native: IntoDataFrameT) -> IntoDataFrameT:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.sample(n=2, seed=123).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_sample:

>>> agnostic_sample(df_pd)
   a  b
3  4  y
0  1  x
>>> agnostic_sample(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 2   ┆ y   │
│ 3   ┆ x   │
└─────┴─────┘
>>> agnostic_sample(df_pa)
pyarrow.Table
a: int64
b: string
----
a: [[1,3]]
b: [["x","x"]]

As you can see, by using the same seed, the result will be consistent within the same backend, but not necessarely across different backends.

select(*exprs, **named_exprs)

Select columns from this DataFrame.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Type Description
Self

The dataframe containing only the selected columns.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, 2, 3],
...     "bar": [6, 7, 8],
...     "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function in which we pass the name of a column to select that column.

>>> def agnostic_single_select(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).select("foo").to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_single_select:

>>> agnostic_single_select(df_pd)
   foo
0    1
1    2
2    3
>>> agnostic_single_select(df_pl)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘
>>> agnostic_single_select(df_pa)
pyarrow.Table
foo: int64
----
foo: [[1,2,3]]

Multiple columns can be selected by passing a list of column names.

>>> def agnostic_multi_select(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).select(["foo", "bar"]).to_native()
>>> agnostic_multi_select(df_pd)
   foo  bar
0    1    6
1    2    7
2    3    8
>>> agnostic_multi_select(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 6   │
│ 2   ┆ 7   │
│ 3   ┆ 8   │
└─────┴─────┘
>>> agnostic_multi_select(df_pa)
pyarrow.Table
foo: int64
bar: int64
----
foo: [[1,2,3]]
bar: [[6,7,8]]

Multiple columns can also be selected using positional arguments instead of a list. Expressions are also accepted.

>>> def agnostic_select(df_native: IntoFrameT) -> IntoFrameT:
...     return (
...         nw.from_native(df_native)
...         .select(nw.col("foo"), nw.col("bar") + 1)
...         .to_native()
...     )
>>> agnostic_select(df_pd)
   foo  bar
0    1    7
1    2    8
2    3    9
>>> agnostic_select(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 7   │
│ 2   ┆ 8   │
│ 3   ┆ 9   │
└─────┴─────┘
>>> agnostic_select(df_pa)
pyarrow.Table
foo: int64
bar: int64
----
foo: [[1,2,3]]
bar: [[7,8,9]]

Use keyword arguments to easily name your expression inputs.

>>> def agnostic_select_w_kwargs(df_native: IntoFrameT) -> IntoFrameT:
...     return (
...         nw.from_native(df_native)
...         .select(threshold=nw.col("foo") * 2)
...         .to_native()
...     )
>>> agnostic_select_w_kwargs(df_pd)
   threshold
0          2
1          4
2          6
>>> agnostic_select_w_kwargs(df_pl)
shape: (3, 1)
┌───────────┐
│ threshold │
│ ---       │
│ i64       │
╞═══════════╡
│ 2         │
│ 4         │
│ 6         │
└───────────┘
>>> agnostic_select_w_kwargs(df_pa)
pyarrow.Table
threshold: int64
----
threshold: [[2,4,6]]

sort(by, *more_by, descending=False, nulls_last=False)

Sort the dataframe by the given columns.

Parameters:

Name Type Description Default
by str | Iterable[str]

Column(s) names to sort by.

required
*more_by str

Additional columns to sort by, specified as positional arguments.

()
descending bool | Sequence[bool]

Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.

False
nulls_last bool

Place null values last.

False

Returns:

Type Description
Self

The sorted dataframe.

Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "a": [1, 2, None],
...     "b": [6.0, 5.0, 4.0],
...     "c": ["a", "c", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function in which we sort by multiple columns in different orders

>>> def agnostic_sort(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.sort("c", "a", descending=[False, True]).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_sort:

>>> agnostic_sort(df_pd)
     a    b  c
0  1.0  6.0  a
2  NaN  4.0  b
1  2.0  5.0  c
>>> agnostic_sort(df_pl)
shape: (3, 3)
┌──────┬─────┬─────┐
│ a    ┆ b   ┆ c   │
│ ---  ┆ --- ┆ --- │
│ i64  ┆ f64 ┆ str │
╞══════╪═════╪═════╡
│ 1    ┆ 6.0 ┆ a   │
│ null ┆ 4.0 ┆ b   │
│ 2    ┆ 5.0 ┆ c   │
└──────┴─────┴─────┘
>>> agnostic_sort(df_pa)
pyarrow.Table
a: int64
b: double
c: string
----
a: [[1,null,2]]
b: [[6,4,5]]
c: [["a","b","c"]]

tail(n=5)

Get the last n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return. If a negative value is passed, return all rows except the first abs(n).

5

Returns:

Type Description
Self

A subset of the dataframe of shape (n, n_columns).

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, 2, 3, 4, 5],
...     "bar": [6, 7, 8, 9, 10],
...     "ham": ["a", "b", "c", "d", "e"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function that gets the last 3 rows.

>>> def agnostic_tail(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).tail(3).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_tail:

>>> agnostic_tail(df_pd)
   foo  bar ham
2    3    8   c
3    4    9   d
4    5   10   e
>>> agnostic_tail(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 3   ┆ 8   ┆ c   │
│ 4   ┆ 9   ┆ d   │
│ 5   ┆ 10  ┆ e   │
└─────┴─────┴─────┘
>>> agnostic_tail(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[3,4,5]]
bar: [[8,9,10]]
ham: [["c","d","e"]]

to_arrow()

Convert to arrow table.

Returns:

Type Description
Table

A new PyArrow table.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function that converts to arrow table:

>>> def agnostic_to_arrow(df_native: IntoDataFrame) -> pa.Table:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.to_arrow()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_to_arrow:

>>> agnostic_to_arrow(df_pd)
pyarrow.Table
foo: int64
bar: string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]
>>> agnostic_to_arrow(df_pl)
pyarrow.Table
foo: int64
bar: large_string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]
>>> agnostic_to_arrow(df_pa)
pyarrow.Table
foo: int64
bar: string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]

to_dict(*, as_series=True)

Convert DataFrame to a dictionary mapping column name to values.

Parameters:

Name Type Description Default
as_series bool

If set to true True, then the values are Narwhals Series, otherwise the values are Any.

True

Returns:

Type Description
dict[str, Series[Any]] | dict[str, list[Any]]

A mapping from column name to values / Series.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {
...     "A": [1, 2, 3, 4, 5],
...     "fruits": ["banana", "banana", "apple", "apple", "banana"],
...     "B": [5, 4, 3, 2, 1],
...     "animals": ["beetle", "fly", "beetle", "beetle", "beetle"],
...     "optional": [28, 300, None, 2, -30],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_to_dict(
...     df_native: IntoDataFrame,
... ) -> dict[str, list[int | str | float | None]]:
...     df = nw.from_native(df_native)
...     return df.to_dict(as_series=False)

We can then pass either pandas, Polars or PyArrow to agnostic_to_dict:

>>> agnostic_to_dict(df_pd)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28.0, 300.0, nan, 2.0, -30.0]}
>>> agnostic_to_dict(df_pl)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28, 300, None, 2, -30]}
>>> agnostic_to_dict(df_pa)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28, 300, None, 2, -30]}

to_native()

Convert Narwhals DataFrame to native one.

Returns:

Type Description
DataFrameT

Object of class that user started with.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Calling to_native on a Narwhals DataFrame returns the native object:

>>> nw.from_native(df_pd).to_native()
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> nw.from_native(df_pl).to_native()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘
>>> nw.from_native(df_pa).to_native()
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]

to_numpy()

Convert this DataFrame to a NumPy ndarray.

Returns:

Type Description
ndarray

A NumPy ndarray array.

Examples:

Construct pandas and polars DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> import numpy as np
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.5, 7.0, 8.5], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_to_numpy(df_native: IntoDataFrame) -> np.ndarray:
...     df = nw.from_native(df_native)
...     return df.to_numpy()

We can then pass either pandas, Polars or PyArrow to agnostic_to_numpy:

>>> agnostic_to_numpy(df_pd)
array([[1, 6.5, 'a'],
       [2, 7.0, 'b'],
       [3, 8.5, 'c']], dtype=object)
>>> agnostic_to_numpy(df_pl)
array([[1, 6.5, 'a'],
       [2, 7.0, 'b'],
       [3, 8.5, 'c']], dtype=object)
>>> agnostic_to_numpy(df_pa)
array([[1, 6.5, 'a'],
       [2, 7.0, 'b'],
       [3, 8.5, 'c']], dtype=object)

to_pandas()

Convert this DataFrame to a pandas DataFrame.

Returns:

Type Description
DataFrame

A pandas DataFrame.

Examples:

Construct pandas, Polars (eager) and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_to_pandas(df_native: IntoDataFrame) -> pd.DataFrame:
...     df = nw.from_native(df_native)
...     return df.to_pandas()

We can then pass any supported library such as pandas, Polars (eager), or PyArrow to agnostic_to_pandas:

>>> agnostic_to_pandas(df_pd)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_to_pandas(df_pl)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c
>>> agnostic_to_pandas(df_pa)
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c

to_polars()

Convert this DataFrame to a polars DataFrame.

Returns:

Type Description
DataFrame

A polars DataFrame.

Examples:

Construct pandas, Polars (eager) and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_to_polars(df_native: IntoDataFrame) -> pl.DataFrame:
...     df = nw.from_native(df_native)
...     return df.to_polars()

We can then pass any supported library such as pandas, Polars (eager), or PyArrow to agnostic_to_polars:

>>> agnostic_to_polars(df_pd)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_to_polars(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘
>>> agnostic_to_polars(df_pa)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6.0 ┆ a   │
│ 2   ┆ 7.0 ┆ b   │
│ 3   ┆ 8.0 ┆ c   │
└─────┴─────┴─────┘

unique(subset=None, *, keep='any', maintain_order=False)

Drop duplicate rows from this dataframe.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) to consider when identifying duplicate rows.

None
keep Literal['any', 'first', 'last', 'none']

{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.

  • 'any': Does not give any guarantee of which row is kept. This allows more optimizations.
  • 'none': Don't keep duplicate rows.
  • 'first': Keep first unique row.
  • 'last': Keep last unique row.
'any'
maintain_order bool

Keep the same order as the original DataFrame. This may be more expensive to compute.

False

Returns:

Type Description
Self

The dataframe with the duplicate rows removed.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "foo": [1, 2, 3, 1],
...     "bar": ["a", "a", "a", "a"],
...     "ham": ["b", "b", "b", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_unique(df_native: IntoFrameT) -> IntoFrameT:
...     return nw.from_native(df_native).unique(["bar", "ham"]).to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_unique:

>>> agnostic_unique(df_pd)
   foo bar ham
0    1   a   b
>>> agnostic_unique(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘
>>> agnostic_unique(df_pa)
pyarrow.Table
foo: int64
bar: string
ham: string
----
foo: [[1]]
bar: [["a"]]
ham: [["b"]]

unpivot(on=None, *, index=None, variable_name=None, value_name=None)

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name Type Description Default
on str | list[str] | None

Column(s) to use as values variables; if on is empty all columns that are not in index will be used.

None
index str | list[str] | None

Column(s) to use as identifier variables.

None
variable_name str | None

Name to give to the variable column. Defaults to "variable".

None
value_name str | None

Name to give to the value column. Defaults to "value".

None

Returns:

Type Description
Self

The unpivoted dataframe.

Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "a": ["x", "y", "z"],
...     "b": [1, 3, 5],
...     "c": [2, 4, 6],
... }

We define a library agnostic function:

>>> def agnostic_unpivot(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.unpivot(on=["b", "c"], index="a").to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_unpivot:

>>> agnostic_unpivot(pl.DataFrame(data))
shape: (6, 3)
┌─────┬──────────┬───────┐
│ a   ┆ variable ┆ value │
│ --- ┆ ---      ┆ ---   │
│ str ┆ str      ┆ i64   │
╞═════╪══════════╪═══════╡
│ x   ┆ b        ┆ 1     │
│ y   ┆ b        ┆ 3     │
│ z   ┆ b        ┆ 5     │
│ x   ┆ c        ┆ 2     │
│ y   ┆ c        ┆ 4     │
│ z   ┆ c        ┆ 6     │
└─────┴──────────┴───────┘
>>> agnostic_unpivot(pd.DataFrame(data))
   a variable  value
0  x        b      1
1  y        b      3
2  z        b      5
3  x        c      2
4  y        c      4
5  z        c      6
>>> agnostic_unpivot(pa.table(data))
pyarrow.Table
a: string
variable: string
value: int64
----
a: [["x","y","z"],["x","y","z"]]
variable: [["b","b","b"],["c","c","c"]]
value: [[1,3,5],[2,4,6]]

with_columns(*exprs, **named_exprs)

Add columns to this DataFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Name Type Description
DataFrame Self

A new DataFrame with the columns added.

Note

Creating a new DataFrame using this method does not create a new copy of existing data.

Examples:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
...     "a": [1, 2, 3, 4],
...     "b": [0.5, 4, 10, 13],
...     "c": [True, True, False, True],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function in which we pass an expression to add it as a new column:

>>> def agnostic_with_columns(df_native: IntoFrameT) -> IntoFrameT:
...     return (
...         nw.from_native(df_native)
...         .with_columns((nw.col("a") * 2).alias("a*2"))
...         .to_native()
...     )

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_with_columns:

>>> agnostic_with_columns(df_pd)
   a     b      c  a*2
0  1   0.5   True    2
1  2   4.0   True    4
2  3  10.0  False    6
3  4  13.0   True    8
>>> agnostic_with_columns(df_pl)
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a   ┆ b    ┆ c     ┆ a*2 │
│ --- ┆ ---  ┆ ---   ┆ --- │
│ i64 ┆ f64  ┆ bool  ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1   ┆ 0.5  ┆ true  ┆ 2   │
│ 2   ┆ 4.0  ┆ true  ┆ 4   │
│ 3   ┆ 10.0 ┆ false ┆ 6   │
│ 4   ┆ 13.0 ┆ true  ┆ 8   │
└─────┴──────┴───────┴─────┘
>>> agnostic_with_columns(df_pa)
pyarrow.Table
a: int64
b: double
c: bool
a*2: int64
----
a: [[1,2,3,4]]
b: [[0.5,4,10,13]]
c: [[true,true,false,true]]
a*2: [[2,4,6,8]]

with_row_index(name='index')

Insert column which enumerates rows.

Parameters:

Name Type Description Default
name str

The name of the column as a string. The default is "index".

'index'

Returns:

Type Description
Self

The original object with the column added.

Examples:

Construct pandas as polars DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

Let's define a dataframe-agnostic function:

>>> def agnostic_with_row_index(df_native: IntoFrameT) -> IntoFrameT:
...     df = nw.from_native(df_native)
...     return df.with_row_index().to_native()

We can then pass any supported library such as Pandas, Polars, or PyArrow to agnostic_with_row_index:

>>> agnostic_with_row_index(df_pd)
   index  a  b
0      0  1  4
1      1  2  5
2      2  3  6
>>> agnostic_with_row_index(df_pl)
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a   ┆ b   │
│ ---   ┆ --- ┆ --- │
│ u32   ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0     ┆ 1   ┆ 4   │
│ 1     ┆ 2   ┆ 5   │
│ 2     ┆ 3   ┆ 6   │
└───────┴─────┴─────┘
>>> agnostic_with_row_index(df_pa)
pyarrow.Table
index: int64
a: int64
b: int64
----
index: [[0,1,2]]
a: [[1,2,3]]
b: [[4,5,6]]

write_csv(file=None)

Write dataframe to comma-separated values (CSV) file.

Parameters:

Name Type Description Default
file str | Path | BytesIO | None

String, path object or file-like object to which the dataframe will be written. If None, the resulting csv format is returned as a string.

None

Returns:

Type Description
str | None

String or None.

Examples:

Construct pandas, Polars (eager) and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_write_csv(df_native: IntoDataFrame) -> str:
...     df = nw.from_native(df_native)
...     return df.write_csv()

We can pass any supported library such as pandas, Polars or PyArrow to agnostic_write_csv:

>>> agnostic_write_csv(df_pd)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> agnostic_write_csv(df_pl)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> agnostic_write_csv(df_pa)
'"foo","bar","ham"\n1,6,"a"\n2,7,"b"\n3,8,"c"\n'

If we had passed a file name to write_csv, it would have been written to that file.

write_parquet(file)

Write dataframe to parquet file.

Parameters:

Name Type Description Default
file str | Path | BytesIO

String, path object or file-like object to which the dataframe will be written.

required

Returns:

Type Description
None

None.

Examples:

Construct pandas, Polars and PyArrow DataFrames:

>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)

We define a library agnostic function:

>>> def agnostic_write_parquet(df_native: IntoDataFrame):
...     df = nw.from_native(df_native)
...     df.write_parquet("foo.parquet")

We can then pass either pandas, Polars or PyArrow to agnostic_write_parquet:

>>> agnostic_write_parquet(df_pd)
>>> agnostic_write_parquet(df_pl)
>>> agnostic_write_parquet(df_pa)