`narwhals.DataFrame`

Narwhals DataFrame, backed by a native eager dataframe.

Warning

This class is not meant to be instantiated directly - instead:

If the native object is a eager dataframe from one of the supported backend (e.g. pandas.DataFrame, polars.DataFrame, pyarrow.Table), you can use narwhals.from_native:
```
narwhals.from_native(native_dataframe)
narwhals.from_native(native_dataframe, eager_only=True)
```
If the object is a dictionary of column names and generic sequences mapping (e.g. dict[str, list]), you can create a DataFrame via narwhals.from_dict:
```
narwhals.from_dict(
    data={"a": [1, 2, 3]},
    backend=narwhals.get_native_namespace(another_object),
)
```

columns `property`

columns: list[str]

Get column names.

Returns:

Type	Description
`list[str]`	The column names stored in a list.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).columns
['foo', 'bar']

implementation `property`

implementation: Implementation

Return implementation of native frame.

This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.

Returns:

Type	Description
`Implementation`	Implementation.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> df_native = pd.DataFrame({"a": [1, 2, 3]})
>>> df = nw.from_native(df_native)
>>> df.implementation
<Implementation.PANDAS: 'pandas'>
>>> df.implementation.is_pandas()
True
>>> df.implementation.is_pandas_like()
True
>>> df.implementation.is_polars()
False

schema `property`

schema: Schema

Get an ordered mapping of column names to their data type.

Returns:

Type	Description
`Schema`	A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).schema
Schema({'foo': Int64, 'bar': Float64})

shape `property`

shape: tuple[int, int]

Get the shape of the DataFrame.

Returns:

Type	Description
`tuple[int, int]`	The shape of the dataframe as a tuple.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2]})
>>> df = nw.from_native(df_native)
>>> df.shape
(2, 1)

__arrow_c_stream__

__arrow_c_stream__(
    requested_schema: object | None = None,
) -> object

Export a DataFrame via the Arrow PyCapsule Interface.

if the underlying dataframe implements the interface, it'll return that
else, it'll call to_arrow and then defer to PyArrow's implementation

See PyCapsule Interface for more.

getitem

__getitem__(
    item: tuple[SingleIndexSelector, SingleColSelector],
) -> Any

__getitem__(
    item: (
        str | tuple[MultiIndexSelector, SingleColSelector]
    ),
) -> Series[Any]

__getitem__(
    item: (
        SingleIndexSelector
        | MultiIndexSelector
        | MultiColSelector
        | tuple[SingleIndexSelector, MultiColSelector]
        | tuple[MultiIndexSelector, MultiColSelector]
    ),
) -> Self

__getitem__(
    item: (
        SingleIndexSelector
        | SingleColSelector
        | MultiColSelector
        | MultiIndexSelector
        | tuple[SingleIndexSelector, SingleColSelector]
        | tuple[SingleIndexSelector, MultiColSelector]
        | tuple[MultiIndexSelector, SingleColSelector]
        | tuple[MultiIndexSelector, MultiColSelector]
    ),
) -> Series[Any] | Self | Any

Extract column or slice of DataFrame.

Parameters:

Name	Type	Description	Default
`item`	`SingleIndexSelector \| SingleColSelector \| MultiColSelector \| MultiIndexSelector \| tuple[SingleIndexSelector, SingleColSelector] \| tuple[SingleIndexSelector, MultiColSelector] \| tuple[MultiIndexSelector, SingleColSelector] \| tuple[MultiIndexSelector, MultiColSelector]`	How to slice dataframe. What happens depends on what is passed. It's easiest to explain by example. Suppose we have a Dataframe `df` `df['a']` extracts column `'a'` and returns a `Series`. `df[0:2]` extracts the first two rows and returns a `DataFrame`. `df[0:2, 'a']` extracts the first two rows from column `'a'` and returns a `Series`. `df[0:2, 0]` extracts the first two rows from the first column and returns a `Series`. `df[[0, 1], [0, 1, 2]]` extracts the first two rows and the first three columns and returns a `DataFrame` `df[:, [0, 1, 2]]` extracts all rows from the first three columns and returns a `DataFrame`. `df[:, ['a', 'c']]` extracts all rows and columns `'a'` and `'c'` and returns a `DataFrame`. `df[['a', 'c']]` extracts all rows and columns `'a'` and `'c'` and returns a `DataFrame`. `df[0: 2, ['a', 'c']]` extracts the first two rows and columns `'a'` and `'c'` and returns a `DataFrame` `df[:, 0: 2]` extracts all rows from the first two columns and returns a `DataFrame` `df[:, 'a': 'c']` extracts all rows and all columns positioned between `'a'` and `'c'` inclusive and returns a `DataFrame`. For example, if the columns are `'a', 'd', 'c', 'b'`, then that would extract columns `'a'`, `'d'`, and `'c'`.	required

Returns:

Type	Description
`Series[Any] \| Self \| Any`	A Narwhals Series, backed by a native series.

Notes

Integers are always interpreted as positions
Strings are always interpreted as column names.

In contrast with Polars, pandas allows non-string column names. If you don't know whether the column name you're trying to extract is definitely a string (e.g. df[df.columns[0]]) then you should use DataFrame.get_column instead.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2]})
>>> df = nw.from_native(df_native)
>>> df["a"].to_native()
0    1
1    2
Name: a, dtype: int64

clone

clone() -> Self

Create a copy of this DataFrame.

Returns:

Type	Description
`Self`	An identical copy of the original dataframe.

collect_schema

collect_schema() -> Schema

Get an ordered mapping of column names to their data type.

Returns:

Type	Description
`Schema`	A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).collect_schema()
Schema({'foo': Int64, 'bar': Float64})

drop

drop(
    *columns: str | Iterable[str], strict: bool = True
) -> Self

Remove columns from the dataframe.

Returns:

Type	Description
`Self`	The dataframe with the specified columns removed.

Parameters:

Name	Type	Description	Default
`*columns`	`str \| Iterable[str]`	Names of the columns that should be removed from the dataframe.	`()`
`strict`	`bool`	Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.	`True`

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {"foo": [1, 2], "bar": [6.0, 7.0], "ham": ["a", "b"]}
... )
>>> nw.from_native(df_native).drop("ham").to_native()
   foo  bar
0    1  6.0
1    2  7.0

drop_nulls

drop_nulls(subset: str | list[str] | None = None) -> Self

Drop rows that contain null values.

Parameters:

Name	Type	Description	Default
`subset`	`str \| list[str] \| None`	Column name(s) for which null values are considered. If set to None (default), use all columns.	`None`

Returns:

Type	Description
`Self`	The original object with the rows removed that contained the null values.

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1.0, None], "ba": [1.0, 2.0]})
>>> nw.from_native(df_native).drop_nulls().to_native()
pyarrow.Table
a: double
ba: double
----
a: [[1]]
ba: [[1]]

estimated_size

estimated_size(unit: SizeUnit = 'b') -> int | float

Return an estimation of the total (heap) allocated size of the DataFrame.

Estimated size is given in the specified unit (bytes by default).

Parameters:

Name	Type	Description	Default
`unit`	`SizeUnit`	'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', 'gigabytes', or 'terabytes'.	`'b'`

Returns:

Type	Description
`int \| float`	Integer or Float.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.estimated_size()
32

explode

explode(
    columns: str | Sequence[str], *more_columns: str
) -> Self

Explode the dataframe to long format by exploding the given columns.

Notes

It is possible to explode multiple columns only if these columns must have matching element counts.

Parameters:

Name	Type	Description	Default
`columns`	`str \| Sequence[str]`	Column names. The underlying columns being exploded must be of the `List` data type.	required
`*more_columns`	`str`	Additional names of columns to explode, specified as positional arguments.	`()`

Returns:

Type	Description
`Self`	New DataFrame

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> data = {"a": ["x", "y"], "b": [[1, 2], [3]]}
>>> df_native = pl.DataFrame(data)
>>> nw.from_native(df_native).explode("b").to_native()
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ x   ┆ 1   │
│ x   ┆ 2   │
│ y   ┆ 3   │
└─────┴─────┘

filter

filter(
    *predicates: IntoExpr | Iterable[IntoExpr] | list[bool],
    **constraints: Any
) -> Self

Filter the rows in the DataFrame based on one or more predicate expressions.

The original order of the remaining rows is preserved.

Parameters:

Name	Type	Description	Default
`*predicates`	`IntoExpr \| Iterable[IntoExpr] \| list[bool]`	Expression(s) that evaluates to a boolean Series. Can also be a (single!) boolean list.	`()`
`**constraints`	`Any`	Column filters; use `name = value` to filter columns by the supplied value. Each constraint will behave the same as `nw.col(name).eq(value)`, and will be implicitly joined with the other filter conditions using &.	`{}`

Returns:

Type	Description
`Self`	The filtered dataframe.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
... )

Filter on one condition

>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
   foo  bar ham
1    2    7   b
2    3    8   c

Filter on multiple conditions with implicit &

>>> nw.from_native(df_native).filter(
...     nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
   foo  bar ham
0    1    6   a

Filter on multiple conditions with |

>>> nw.from_native(df_native).filter(
...     (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
   foo  bar ham
0    1    6   a
2    3    8   c

Filter using **kwargs syntax

>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
   foo  bar ham
1    2    7   b

gather_every

gather_every(n: int, offset: int = 0) -> Self

Take every nth row in the DataFrame and return as a new DataFrame.

Parameters:

Name	Type	Description	Default
`n`	`int`	Gather every n-th row.	required
`offset`	`int`	Starting index.	`0`

Returns:

Type	Description
`Self`	The dataframe containing only the selected rows.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, None, 2, 3]})
>>> nw.from_native(df_native).gather_every(2)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  foo: int64      |
|  ----            |
|  foo: [[1,2]]    |
└──────────────────┘

get_column

get_column(name: str) -> Series[Any]

Get a single column by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	The column name as a string.	required

Returns:

Type	Description
`Series[Any]`	A Narwhals Series, backed by a native series.

Notes

Although name is typed as str, pandas does allow non-string column names, and they will work when passed to this function if the narwhals.DataFrame is backed by a pandas dataframe with non-string columns. This function can only be used to extract a column by name, so there is no risk of ambiguity.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2]})
>>> df = nw.from_native(df_native)
>>> df.get_column("a").to_native()
0    1
1    2
Name: a, dtype: int64

group_by

group_by(
    *keys: IntoExpr | Iterable[IntoExpr],
    drop_null_keys: Literal[False] = ...
) -> GroupBy[Self]

group_by(
    *keys: str | Iterable[str],
    drop_null_keys: Literal[True]
) -> GroupBy[Self]

group_by(
    *keys: IntoExpr | Iterable[IntoExpr],
    drop_null_keys: bool = False
) -> GroupBy[Self]

Start a group by operation.

Parameters:

Name	Type	Description	Default
`*keys`	`IntoExpr \| Iterable[IntoExpr]`	Column(s) to group by. Accepts expression input. Strings are parsed as column names.	`()`
`drop_null_keys`	`bool`	if True, then groups where any key is null won't be included in the result.	`False`

Returns:

Name	Type	Description
`GroupBy`	`GroupBy[Self]`	Object which can be used to perform aggregations.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {
...         "a": ["a", "b", "a", "b", "c"],
...         "b": [1, 2, 1, 3, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... )

Group by one column and compute the sum of another column

>>> nw.from_native(df_native, eager_only=True).group_by("a").agg(
...     nw.col("b").sum()
... ).sort("a").to_native()
   a  b
0  a  2
1  b  5
2  c  3

Group by multiple columns and compute the max of another column

>>> (
...     nw.from_native(df_native, eager_only=True)
...     .group_by(["a", "b"])
...     .agg(nw.max("c"))
...     .sort("a", "b")
...     .to_native()
... )
   a  b  c
0  a  1  5
1  b  2  4
2  b  3  2
3  c  3  1

Expressions are also accepted.

>>> nw.from_native(df_native, eager_only=True).group_by(
...     "a", nw.col("b") // 2
... ).agg(nw.col("c").mean()).to_native()
   a  b    c
0  a  0  4.0
1  b  1  3.0
2  c  1  1.0

head

head(n: int = 5) -> Self

Get the first n rows.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of rows to return. If a negative value is passed, return all rows except the last `abs(n)`.	`5`

Returns:

Type	Description
`Self`	A subset of the dataframe of shape (n, n_columns).

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "b": [0.5, 4.0]})
>>> nw.from_native(df_native).head(1).to_native()
   a    b
0  1  0.5

is_duplicated

is_duplicated() -> Series[Any]

Get a mask of all duplicated rows in this DataFrame.

Returns:

Type	Description
`Series[Any]`	A new Series.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [2, 2, 2], "bar": [6.0, 6.0, 7.0]})
>>> nw.from_native(df_native).is_duplicated()
┌───────────────┐
|Narwhals Series|
|---------------|
|  0     True   |
|  1     True   |
|  2    False   |
|  dtype: bool  |
└───────────────┘

is_empty

is_empty() -> bool

Check if the dataframe is empty.

Returns:

Type	Description
`bool`	A boolean indicating whether the dataframe is empty (True) or not (False).

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [2, 2, 2], "bar": [6.0, 6.0, 7.0]})
>>> nw.from_native(df_native).is_empty()
False

is_unique

is_unique() -> Series[Any]

Get a mask of all unique rows in this DataFrame.

Returns:

Type	Description
`Series[Any]`	A new Series.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [2, 2, 2], "bar": [6.0, 6.0, 7.0]})
>>> nw.from_native(df_native).is_unique()
┌───────────────┐
|Narwhals Series|
|---------------|
|  0    False   |
|  1    False   |
|  2     True   |
|  dtype: bool  |
└───────────────┘

item

item(
    row: int | None = None, column: int | str | None = None
) -> Any

Return the DataFrame as a scalar, or return the element at the given row/column.

Parameters:

Name	Type	Description	Default
`row`	`int \| None`	The n-th row.	`None`
`column`	`int \| str \| None`	The column selected via an integer or a string (column name).	`None`

Returns:

Type	Description
`Any`	A scalar or the specified element in the dataframe.

Notes

If row/col not provided, this is equivalent to df[0,0], with a check that the shape is (1,1). With row/col, this is equivalent to df[row,col].

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, None], "bar": [2, 3]})
>>> nw.from_native(df_native).item(0, 1)
2

iter_columns

iter_columns() -> Iterator[Series[Any]]

Returns an iterator over the columns of this DataFrame.

Yields:

Type	Description
`Series[Any]`	A Narwhals Series, backed by a native series.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> iter_columns = nw.from_native(df_native).iter_columns()
>>> next(iter_columns)
┌───────────────────────┐
|    Narwhals Series    |
|-----------------------|
|0    1                 |
|1    2                 |
|Name: foo, dtype: int64|
└───────────────────────┘
>>> next(iter_columns)
┌─────────────────────────┐
|     Narwhals Series     |
|-------------------------|
|0    6.0                 |
|1    7.0                 |
|Name: bar, dtype: float64|
└─────────────────────────┘

iter_rows

iter_rows(
    *, named: Literal[False], buffer_size: int = ...
) -> Iterator[tuple[Any, ...]]

iter_rows(
    *, named: Literal[True], buffer_size: int = ...
) -> Iterator[dict[str, Any]]

iter_rows(
    *, named: bool, buffer_size: int = ...
) -> Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]

iter_rows(
    *, named: bool = False, buffer_size: int = 512
) -> Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]

Returns an iterator over the DataFrame of rows of python-native values.

Parameters:

Name	Type	Description	Default
`named`	`bool`	By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead.	`False`
`buffer_size`	`int`	Determines the number of rows that are buffered internally while iterating over the data. See https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_rows.html	`512`

Returns:

Type	Description
`Iterator[tuple[Any, ...]] \| Iterator[dict[str, Any]]`	An iterator over the DataFrame of rows.

Notes

cuDF doesn't support this method.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> iter_rows = nw.from_native(df_native).iter_rows()
>>> next(iter_rows)
(1, 6.0)
>>> next(iter_rows)
(2, 7.0)

join

join(
    other: Self,
    on: str | list[str] | None = None,
    how: JoinStrategy = "inner",
    *,
    left_on: str | list[str] | None = None,
    right_on: str | list[str] | None = None,
    suffix: str = "_right"
) -> Self

Join in SQL-like fashion.

Parameters:

Name	Type	Description	Default
`other`	`Self`	DataFrame to join with.	required
`on`	`str \| list[str] \| None`	Name(s) of the join columns in both DataFrames. If set, `left_on` and `right_on` should be None.	`None`
`how`	`JoinStrategy`	Join strategy. inner: Returns rows that have matching values in both tables. left: Returns all rows from the left table, and the matched rows from the right table. full: Returns all rows in both dataframes, with the suffix appended to the right join keys. cross: Returns the Cartesian product of rows from both tables. semi: Filter rows that have a match in the right table. anti: Filter rows that do not have a match in the right table.	`'inner'`
`left_on`	`str \| list[str] \| None`	Join column of the left DataFrame.	`None`
`right_on`	`str \| list[str] \| None`	Join column of the right DataFrame.	`None`
`suffix`	`str`	Suffix to append to columns with a duplicate name.	`'_right'`

Returns:

Type	Description
`Self`	A new joined DataFrame

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_1_native = pd.DataFrame({"id": ["a", "b"], "price": [6.0, 7.0]})
>>> df_2_native = pd.DataFrame({"id": ["a", "b", "c"], "qty": [1, 2, 3]})
>>> nw.from_native(df_1_native).join(nw.from_native(df_2_native), on="id")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|   id  price  qty |
| 0  a    6.0    1 |
| 1  b    7.0    2 |
└──────────────────┘

join_asof

join_asof(
    other: Self,
    *,
    left_on: str | None = None,
    right_on: str | None = None,
    on: str | None = None,
    by_left: str | list[str] | None = None,
    by_right: str | list[str] | None = None,
    by: str | list[str] | None = None,
    strategy: AsofJoinStrategy = "backward",
    suffix: str = "_right"
) -> Self

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

For Polars, both DataFrames must be sorted by the on key (within each by group if specified).

Parameters:

Name	Type	Description	Default
`other`	`Self`	DataFrame to join with.	required
`left_on`	`str \| None`	Name(s) of the left join column(s).	`None`
`right_on`	`str \| None`	Name(s) of the right join column(s).	`None`
`on`	`str \| None`	Join column of both DataFrames. If set, left_on and right_on should be None.	`None`
`by_left`	`str \| list[str] \| None`	join on these columns before doing asof join.	`None`
`by_right`	`str \| list[str] \| None`	join on these columns before doing asof join.	`None`
`by`	`str \| list[str] \| None`	join on these columns before doing asof join.	`None`
`strategy`	`AsofJoinStrategy`	Join strategy. The default is "backward".	`'backward'`
`suffix`	`str`	Suffix to append to columns with a duplicate name. backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key. forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key. nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.	`'_right'`

Returns:

Type	Description
`Self`	A new joined DataFrame

Examples:

>>> from datetime import datetime
>>> import pandas as pd
>>> import narwhals as nw
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pd.DataFrame(data_gdp)
>>> population_native = pd.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward")
┌──────────────────────────────┐
|      Narwhals DataFrame      |
|------------------------------|
|    datetime  population   gdp|
|0 2016-03-01       82.19  4164|
|1 2018-08-01       82.66  4566|
|2 2019-01-01       83.12  4696|
└──────────────────────────────┘

lazy

lazy(
    backend: (
        ModuleType | Implementation | str | None
    ) = None,
) -> LazyFrame[Any]

Restrict available API methods to lazy-only ones.

If backend is specified, then a conversion between different backends might be triggered.

If a library does not support lazy execution and backend is not specified, then this is will only restrict the API to lazy-only operations. This is useful if you want to ensure that you write dataframe-agnostic code which all has the possibility of running entirely lazily.

Parameters:

Name	Type	Description	Default
`backend`	`ModuleType \| Implementation \| str \| None`	Which lazy backend collect to. This will be the underlying backend for the resulting Narwhals LazyFrame. If not specified, and the given library does not support lazy execution, then this will restrict the API to lazy-only operations. `backend` can be specified in various ways As `Implementation.<BACKEND>` with `BACKEND` being `DASK`, `DUCKDB` or `POLARS`. As a string: `"dask"`, `"duckdb"` or `"polars"` Directly as a module `dask.dataframe`, `duckdb` or `polars`.	`None`

Returns:

Type	Description
`LazyFrame[Any]`	A new LazyFrame.

Examples:

>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pl.DataFrame({"a": [1, 2], "b": [4, 6]})
>>> df = nw.from_native(df_native)

If we call df.lazy, we get a narwhals.LazyFrame backed by a Polars LazyFrame.

>>> df.lazy()
┌─────────────────────────────┐
|     Narwhals LazyFrame      |
|-----------------------------|
|<LazyFrame at 0x7F52B9937230>|
└─────────────────────────────┘

We can also pass DuckDB as the backend, and then we'll get a narwhals.LazyFrame backed by a duckdb.DuckDBPyRelation.

>>> df.lazy(backend=nw.Implementation.DUCKDB)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int64 │ int64 │ |
|├───────┼───────┤ |
|│     1 │     4 │ |
|│     2 │     6 │ |
|└───────┴───────┘ |
└──────────────────┘

null_count

null_count() -> Self

Create a new DataFrame that shows the null counts per column.

Returns:

Type	Description
`Self`	A dataframe of shape (1, n_columns).

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, None], "bar": [2, 3]})
>>> nw.from_native(df_native).null_count()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  foo: int64      |
|  bar: int64      |
|  ----            |
|  foo: [[1]]      |
|  bar: [[0]]      |
└──────────────────┘

pipe

pipe(
    function: Callable[Concatenate[Self, PS], R],
    *args: args,
    **kwargs: kwargs
) -> R

Pipe function call.

Parameters:

Name	Type	Description	Default
`function`	`Callable[Concatenate[Self, PS], R]`	Function to apply.	required
`args`	`args`	Positional arguments to pass to function.	`()`
`kwargs`	`kwargs`	Keyword arguments to pass to function.	`{}`

Returns:

Type	Description
`R`	The original object with the function applied.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "ba": [4, 5]})
>>> nw.from_native(df_native).pipe(
...     lambda _df: _df.select(
...         [x for x in _df.columns if len(x) == 1]
...     ).to_native()
... )
   a
0  1
1  2

pivot

pivot(
    on: str | list[str],
    *,
    index: str | list[str] | None = None,
    values: str | list[str] | None = None,
    aggregate_function: PivotAgg | None = None,
    maintain_order: bool | None = None,
    sort_columns: bool = False,
    separator: str = "_"
) -> Self

Create a spreadsheet-style pivot table as a DataFrame.

Parameters:

Name	Type	Description	Default
`on`	`str \| list[str]`	Name of the column(s) whose values will be used as the header of the output DataFrame.	required
`index`	`str \| list[str] \| None`	One or multiple keys to group by. If None, all remaining columns not specified on `on` and `values` will be used. At least one of `index` and `values` must be specified.	`None`
`values`	`str \| list[str] \| None`	One or multiple keys to group by. If None, all remaining columns not specified on `on` and `index` will be used. At least one of `index` and `values` must be specified.	`None`
`aggregate_function`	`PivotAgg \| None`	Choose from None: no aggregation takes place, will raise error if multiple values are in group. A predefined aggregate function string, one of {'min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'}	`None`
`maintain_order`	`bool \| None`	Has no effect and is kept around only for backwards-compatibility.	`None`
`sort_columns`	`bool`	Sort the transposed columns by name. Default is by order of discovery.	`False`
`separator`	`str`	Used as separator/delimiter in generated column names in case of multiple `values` columns.	`'_'`

Returns:

Type	Description
`Self`	A new dataframe.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> data = {
...     "ix": [1, 1, 2, 2, 1, 2],
...     "col": ["a", "a", "a", "a", "b", "b"],
...     "foo": [0, 1, 2, 2, 7, 1],
...     "bar": [0, 2, 0, 0, 9, 4],
... }
>>> df_native = pd.DataFrame(data)
>>> nw.from_native(df_native).pivot(
...     "col", index="ix", aggregate_function="sum"
... )
┌─────────────────────────────────┐
|       Narwhals DataFrame        |
|---------------------------------|
|   ix  foo_a  foo_b  bar_a  bar_b|
|0   1      1      7      2      9|
|1   2      4      1      0      4|
└─────────────────────────────────┘

rename

rename(mapping: dict[str, str]) -> Self

Rename column names.

Parameters:

Name	Type	Description	Default
`mapping`	`dict[str, str]`	Key value pairs that map from old name to new name.	required

Returns:

Type	Description
`Self`	The dataframe with the specified columns renamed.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6, 7]})
>>> nw.from_native(df_native).rename({"foo": "apple"}).to_native()
pyarrow.Table
apple: int64
bar: int64
----
apple: [[1,2]]
bar: [[6,7]]

row

row(index: int) -> tuple[Any, ...]

Get values at given row.

Warning

You should NEVER use this method to iterate over a DataFrame; if you require row-iteration you should strongly prefer use of iter_rows() instead.

Parameters:

Name	Type	Description	Default
`index`	`int`	Row number.	required

Returns:

Type	Description
`tuple[Any, ...]`	A tuple of the values in the selected row.

Notes

cuDF doesn't support this method.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1, 2], "b": [4, 5]})
>>> nw.from_native(df_native).row(1)
(<pyarrow.Int64Scalar: 2>, <pyarrow.Int64Scalar: 5>)

rows

rows(
    *, named: Literal[False] = False
) -> list[tuple[Any, ...]]

rows(*, named: Literal[True]) -> list[dict[str, Any]]

rows(
    *, named: bool
) -> list[tuple[Any, ...]] | list[dict[str, Any]]

rows(
    *, named: bool = False
) -> list[tuple[Any, ...]] | list[dict[str, Any]]

Returns all data in the DataFrame as a list of rows of python-native values.

Parameters:

Name	Type	Description	Default
`named`	`bool`	By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead.	`False`

Returns:

Type	Description
`list[tuple[Any, ...]] \| list[dict[str, Any]]`	The data as a list of rows.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).rows()
[(1, 6.0), (2, 7.0)]

sample

sample(
    n: int | None = None,
    *,
    fraction: float | None = None,
    with_replacement: bool = False,
    seed: int | None = None
) -> Self

Sample from this DataFrame.

Parameters:

Name	Type	Description	Default
`n`	`int \| None`	Number of items to return. Cannot be used with fraction.	`None`
`fraction`	`float \| None`	Fraction of items to return. Cannot be used with n.	`None`
`with_replacement`	`bool`	Allow values to be sampled more than once.	`False`
`seed`	`int \| None`	Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation.	`None`

Returns:

Type	Description
`Self`	A new dataframe.

Notes

The results may not be consistent across libraries.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2, 3], "bar": [19, 32, 4]})
>>> nw.from_native(df_native).sample(n=2)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|      foo  bar    |
|   2    3    4    |
|   1    2   32    |
└──────────────────┘

select

select(
    *exprs: IntoExpr | Iterable[IntoExpr],
    **named_exprs: IntoExpr
) -> Self

Select columns from this DataFrame.

Parameters:

Name	Type	Description	Default
`*exprs`	`IntoExpr \| Iterable[IntoExpr]`	Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.	`()`
`**named_exprs`	`IntoExpr`	Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.	`{}`

Returns:

Type	Description
`Self`	The dataframe containing only the selected columns.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1, 2], "b": [3, 4]})
>>> nw.from_native(df_native).select("a", a_plus_1=nw.col("a") + 1)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|pyarrow.Table     |
|a: int64          |
|a_plus_1: int64   |
|----              |
|a: [[1,2]]        |
|a_plus_1: [[2,3]] |
└──────────────────┘

sort

sort(
    by: str | Iterable[str],
    *more_by: str,
    descending: bool | Sequence[bool] = False,
    nulls_last: bool = False
) -> Self

Sort the dataframe by the given columns.

Parameters:

Name	Type	Description	Default
`by`	`str \| Iterable[str]`	Column(s) names to sort by.	required
`*more_by`	`str`	Additional columns to sort by, specified as positional arguments.	`()`
`descending`	`bool \| Sequence[bool]`	Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.	`False`
`nulls_last`	`bool`	Place null values last.	`False`

Returns:

Type	Description
`Self`	The sorted dataframe.

Note

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {"foo": [2, 1], "bar": [6.0, 7.0], "ham": ["a", "b"]}
... )
>>> nw.from_native(df_native).sort("foo")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|    foo  bar ham  |
| 1    1  7.0   b  |
| 0    2  6.0   a  |
└──────────────────┘

tail

tail(n: int = 5) -> Self

Get the last n rows.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of rows to return. If a negative value is passed, return all rows except the first `abs(n)`.	`5`

Returns:

Type	Description
`Self`	A subset of the dataframe of shape (n, n_columns).

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "b": [0.5, 4.0]})
>>> nw.from_native(df_native).tail(1)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|       a    b     |
|    1  2  4.0     |
└──────────────────┘

to_arrow

to_arrow() -> pa.Table

Convert to arrow table.

Returns:

Type	Description
`Table`	A new PyArrow table.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, None], "bar": [2, 3]})
>>> nw.from_native(df_native).to_arrow()
pyarrow.Table
foo: double
bar: int64
----
foo: [[1,null]]
bar: [[2,3]]

to_dict

to_dict(
    *, as_series: Literal[True] = ...
) -> dict[str, Series[Any]]

to_dict(
    *, as_series: Literal[False]
) -> dict[str, list[Any]]

to_dict(
    *, as_series: bool
) -> dict[str, Series[Any]] | dict[str, list[Any]]

to_dict(
    *, as_series: bool = True
) -> dict[str, Series[Any]] | dict[str, list[Any]]

Convert DataFrame to a dictionary mapping column name to values.

Parameters:

Name	Type	Description	Default
`as_series`	`bool`	If set to true `True`, then the values are Narwhals Series, otherwise the values are Any.	`True`

Returns:

Type	Description
`dict[str, Series[Any]] \| dict[str, list[Any]]`	A mapping from column name to values / Series.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"A": [1, 2], "fruits": ["banana", "apple"]})
>>> df = nw.from_native(df_native)
>>> df.to_dict(as_series=False)
{'A': [1, 2], 'fruits': ['banana', 'apple']}

to_native

to_native() -> DataFrameT

Convert Narwhals DataFrame to native one.

Returns:

Type	Description
`DataFrameT`	Object of class that user started with.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
... )

Calling to_native on a Narwhals DataFrame returns the native object:

>>> nw.from_native(df_native).to_native()
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c

to_numpy

to_numpy() -> _2DArray

Convert this DataFrame to a NumPy ndarray.

Returns:

Type	Description
`_2DArray`	A NumPy ndarray array.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2], "bar": [6.5, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.to_numpy()
array([[1. , 6.5],
       [2. , 7. ]])

to_pandas

to_pandas() -> pd.DataFrame

Convert this DataFrame to a pandas DataFrame.

Returns:

Type	Description
`DataFrame`	A pandas DataFrame.

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> df_native = pl.DataFrame(
...     {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
... )
>>> df = nw.from_native(df_native)
>>> df.to_pandas()
   foo  bar ham
0    1  6.0   a
1    2  7.0   b
2    3  8.0   c

to_polars

to_polars() -> pl.DataFrame

Convert this DataFrame to a polars DataFrame.

Returns:

Type	Description
`DataFrame`	A polars DataFrame.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.to_polars()
shape: (2, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 6.0 │
│ 2   ┆ 7.0 │
└─────┴─────┘

unique

unique(
    subset: str | list[str] | None = None,
    *,
    keep: UniqueKeepStrategy = "any",
    maintain_order: bool = False
) -> Self

Drop duplicate rows from this dataframe.

Parameters:

Name	Type	Description	Default
`subset`	`str \| list[str] \| None`	Column name(s) to consider when identifying duplicate rows.	`None`
`keep`	`UniqueKeepStrategy`	{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep. 'any': Does not give any guarantee of which row is kept. This allows more optimizations. 'none': Don't keep duplicate rows. 'first': Keep first unique row. 'last': Keep last unique row.	`'any'`
`maintain_order`	`bool`	Keep the same order as the original DataFrame. This may be more expensive to compute.	`False`

Returns:

Type	Description
`Self`	The dataframe with the duplicate rows removed.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {"foo": [1, 2], "bar": ["a", "a"], "ham": ["b", "b"]}
... )
>>> nw.from_native(df_native).unique(["bar", "ham"]).to_native()
   foo bar ham
0    1   a   b

unpivot

unpivot(
    on: str | list[str] | None = None,
    *,
    index: str | list[str] | None = None,
    variable_name: str = "variable",
    value_name: str = "value"
) -> Self

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name	Type	Description	Default
`on`	`str \| list[str] \| None`	Column(s) to use as values variables; if `on` is empty all columns that are not in `index` will be used.	`None`
`index`	`str \| list[str] \| None`	Column(s) to use as identifier variables.	`None`
`variable_name`	`str`	Name to give to the `variable` column. Defaults to "variable".	`'variable'`
`value_name`	`str`	Name to give to the `value` column. Defaults to "value".	`'value'`

Returns:

Type	Description
`Self`	The unpivoted dataframe.

Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> data = {"a": ["x", "y", "z"], "b": [1, 3, 5], "c": [2, 4, 6]}
>>> df_native = pd.DataFrame(data)
>>> nw.from_native(df_native).unpivot(["b", "c"], index="a")
┌────────────────────┐
| Narwhals DataFrame |
|--------------------|
|   a variable  value|
|0  x        b      1|
|1  y        b      3|
|2  z        b      5|
|3  x        c      2|
|4  y        c      4|
|5  z        c      6|
└────────────────────┘

with_columns

with_columns(
    *exprs: IntoExpr | Iterable[IntoExpr],
    **named_exprs: IntoExpr
) -> Self

Add columns to this DataFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name	Type	Description	Default
`*exprs`	`IntoExpr \| Iterable[IntoExpr]`	Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.	`()`
`**named_exprs`	`IntoExpr`	Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.	`{}`

Returns:

Name	Type	Description
`DataFrame`	`Self`	A new DataFrame with the columns added.

Note

Creating a new DataFrame using this method does not create a new copy of existing data.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "b": [0.5, 4.0]})
>>> (
...     nw.from_native(df_native)
...     .with_columns((nw.col("a") * 2).alias("a*2"))
...     .to_native()
... )
   a    b  a*2
0  1  0.5    2
1  2  4.0    4

with_row_index

with_row_index(
    name: str = "index",
    *,
    order_by: str | Sequence[str] | None = None
) -> Self

Insert column which enumerates rows.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the column as a string. The default is "index".	`'index'`
`order_by`	`str \| Sequence[str] \| None`	Column(s) to order by when computing the row index.	`None`

Returns:

Type	Description
`Self`	The original object with the column added.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1, 2], "b": [4, 5]})
>>> nw.from_native(df_native).with_row_index().to_native()
pyarrow.Table
index: int64
a: int64
b: int64
----
index: [[0,1]]
a: [[1,2]]
b: [[4,5]]

write_csv

write_csv(file: None = None) -> str

write_csv(file: str | Path | BytesIO) -> None

write_csv(
    file: str | Path | BytesIO | None = None,
) -> str | None

Write dataframe to comma-separated values (CSV) file.

Parameters:

Name	Type	Description	Default
`file`	`str \| Path \| BytesIO \| None`	String, path object or file-like object to which the dataframe will be written. If None, the resulting csv format is returned as a string.	`None`

Returns:

Type	Description
`str \| None`	String or None.

Examples:

>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
...     {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
... )
>>> df = nw.from_native(df_native)
>>> df.write_csv()
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'

If we had passed a file name to write_csv, it would have been written to that file.

write_parquet

write_parquet(file: str | Path | BytesIO) -> None

Write dataframe to parquet file.

Parameters:

Name	Type	Description	Default
`file`	`str \| Path \| BytesIO`	String, path object or file-like object to which the dataframe will be written.	required

Returns:

Type	Description
`None`	None.

Examples:

>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.write_parquet("out.parquet")

narwhals.DataFrame

columns property

implementation property

schema property

shape property

__arrow_c_stream__

__getitem__

clone

collect_schema

drop

drop_nulls

estimated_size

explode

filter

gather_every

get_column

group_by

head

is_duplicated

is_empty

is_unique

item

iter_columns

iter_rows

join

join_asof

lazy

null_count

pipe

pivot

rename

row

rows

sample

select

sort

tail

to_arrow

to_dict

to_native

to_numpy

to_pandas

to_polars

unique

unpivot

with_columns

with_row_index

write_csv

write_parquet

`narwhals.DataFrame`

columns `property`

implementation `property`

schema `property`

shape `property`

getitem