narwhals.DataFrame
Narwhals DataFrame, backed by a native eager dataframe.
Warning
This class is not meant to be instantiated directly - instead:
-
If the native object is a eager dataframe from one of the supported backend (e.g. pandas.DataFrame, polars.DataFrame, pyarrow.Table), you can use
narwhals.from_native
:narwhals.from_native(native_dataframe) narwhals.from_native(native_dataframe, eager_only=True)
-
If the object is a dictionary of column names and generic sequences mapping (e.g.
dict[str, list]
), you can create a DataFrame vianarwhals.from_dict
:narwhals.from_dict( data={"a": [1, 2, 3]}, native_namespace=narwhals.get_native_namespace(another_object), )
columns
property
Get column names.
Returns:
Type | Description |
---|---|
list[str]
|
The column names stored in a list. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_columns(df_native: IntoFrame) -> list[str]:
... df = nw.from_native(df_native)
... return df.columns
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_columns
:
>>> agnostic_columns(df_pd)
['foo', 'bar', 'ham']
>>> agnostic_columns(df_pl)
['foo', 'bar', 'ham']
>>> agnostic_columns(df_pa)
['foo', 'bar', 'ham']
implementation
property
Return implementation of native frame.
This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.
Returns:
Type | Description |
---|---|
Implementation
|
Implementation. |
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> df_native = pd.DataFrame({"a": [1, 2, 3]})
>>> df = nw.from_native(df_native)
>>> df.implementation
<Implementation.PANDAS: 1>
>>> df.implementation.is_pandas()
True
>>> df.implementation.is_pandas_like()
True
>>> df.implementation.is_polars()
False
schema
property
Get an ordered mapping of column names to their data type.
Returns:
Type | Description |
---|---|
Schema
|
A Narwhals Schema object that displays the mapping of column names. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.schema import Schema
>>> from narwhals.typing import IntoFrame
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_schema(df_native: IntoFrame) -> Schema:
... df = nw.from_native(df_native)
... return df.schema
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_schema
:
>>> agnostic_schema(df_pd)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_schema(df_pl)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_schema(df_pa)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
shape
property
Get the shape of the DataFrame.
Returns:
Type | Description |
---|---|
tuple[int, int]
|
The shape of the dataframe as a tuple. |
Examples:
Construct pandas and polars DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3, 4, 5]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_shape(df_native: IntoDataFrame) -> tuple[int, int]:
... df = nw.from_native(df_native)
... return df.shape
We can then pass either pandas, Polars or PyArrow to agnostic_shape
:
>>> agnostic_shape(df_pd)
(5, 1)
>>> agnostic_shape(df_pl)
(5, 1)
>>> agnostic_shape(df_pa)
(5, 1)
__arrow_c_stream__(requested_schema=None)
Export a DataFrame via the Arrow PyCapsule Interface.
- if the underlying dataframe implements the interface, it'll return that
- else, it'll call
to_arrow
and then defer to PyArrow's implementation
See PyCapsule Interface for more.
__getitem__(item)
Extract column or slice of DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item
|
str | slice | Sequence[int] | Sequence[str] | tuple[Sequence[int], str | int] | tuple[slice, str | int] | tuple[slice | Sequence[int], Sequence[int] | Sequence[str] | slice] | tuple[slice, slice]
|
How to slice dataframe. What happens depends on what is passed. It's easiest
to explain by example. Suppose we have a Dataframe
|
required |
Returns:
Type | Description |
---|---|
Series[Any] | Self
|
A Narwhals Series, backed by a native series. |
Notes
- Integers are always interpreted as positions
- Strings are always interpreted as column names.
In contrast with Polars, pandas allows non-string column names.
If you don't know whether the column name you're trying to extract
is definitely a string (e.g. df[df.columns[0]]
) then you should
use DataFrame.get_column
instead.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_slice(df_native: IntoDataFrame) -> IntoSeries:
... df = nw.from_native(df_native)
... return df["a"].to_native()
We can then pass either pandas, Polars or PyArrow to agnostic_slice
:
>>> agnostic_slice(df_pd)
0 1
1 2
Name: a, dtype: int64
>>> agnostic_slice(df_pl)
shape: (2,)
Series: 'a' [i64]
[
1
2
]
>>> agnostic_slice(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
[
1,
2
]
]
clone()
Create a copy of this DataFrame.
Returns:
Type | Description |
---|---|
Self
|
An identical copy of the original dataframe. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
Let's define a dataframe-agnostic function in which we clone the DataFrame:
>>> def agnostic_clone(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.clone().to_native()
We can then pass any supported library such as Pandas or Polars
to agnostic_clone
:
>>> agnostic_clone(df_pd)
a b
0 1 3
1 2 4
>>> agnostic_clone(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 3 │
│ 2 ┆ 4 │
└─────┴─────┘
collect_schema()
Get an ordered mapping of column names to their data type.
Returns:
Type | Description |
---|---|
Schema
|
A Narwhals Schema object that displays the mapping of column names. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.schema import Schema
>>> from narwhals.typing import IntoFrame
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_collect_schema(df_native: IntoFrame) -> Schema:
... df = nw.from_native(df_native)
... return df.collect_schema()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_collect_schema
:
>>> agnostic_collect_schema(df_pd)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_collect_schema(df_pl)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
>>> agnostic_collect_schema(df_pa)
Schema({'foo': Int64, 'bar': Float64, 'ham': String})
drop(*columns, strict=True)
Remove columns from the dataframe.
Returns:
Type | Description |
---|---|
Self
|
The dataframe with the specified columns removed. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*columns
|
str | Iterable[str]
|
Names of the columns that should be removed from the dataframe. |
()
|
strict
|
bool
|
Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema. |
True
|
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_drop(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).drop("ham").to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_drop
:
>>> agnostic_drop(df_pd)
foo bar
0 1 6.0
1 2 7.0
2 3 8.0
>>> agnostic_drop(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 6.0 │
│ 2 ┆ 7.0 │
│ 3 ┆ 8.0 │
└─────┴─────┘
>>> agnostic_drop(df_pa)
pyarrow.Table
foo: int64
bar: double
----
foo: [[1,2,3]]
bar: [[6,7,8]]
Use positional arguments to drop multiple columns.
>>> def agnostic_drop_multi(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).drop("foo", "ham").to_native()
>>> agnostic_drop_multi(df_pd)
bar
0 6.0
1 7.0
2 8.0
>>> agnostic_drop_multi(df_pl)
shape: (3, 1)
┌─────┐
│ bar │
│ --- │
│ f64 │
╞═════╡
│ 6.0 │
│ 7.0 │
│ 8.0 │
└─────┘
>>> agnostic_drop_multi(df_pa)
pyarrow.Table
bar: double
----
bar: [[6,7,8]]
drop_nulls(subset=None)
Drop rows that contain null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) for which null values are considered. If set to None (default), use all columns. |
None
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the rows removed that contained the null values. |
Notes
pandas handles null values differently from Polars and PyArrow. See null_handling for reference.
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1.0, 2.0, None], "ba": [1.0, None, 2.0]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_drop_nulls(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.drop_nulls().to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_drop_nulls
:
>>> agnostic_drop_nulls(df_pd)
a ba
0 1.0 1.0
>>> agnostic_drop_nulls(df_pl)
shape: (1, 2)
┌─────┬─────┐
│ a ┆ ba │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════╡
│ 1.0 ┆ 1.0 │
└─────┴─────┘
>>> agnostic_drop_nulls(df_pa)
pyarrow.Table
a: double
ba: double
----
a: [[1]]
ba: [[1]]
estimated_size(unit='b')
Return an estimation of the total (heap) allocated size of the DataFrame
.
Estimated size is given in the specified unit (bytes by default).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unit
|
SizeUnit
|
'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', 'gigabytes', or 'terabytes'. |
'b'
|
Returns:
Type | Description |
---|---|
int | float
|
Integer or Float. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_estimated_size(df_native: IntoDataFrameT) -> int | float:
... df = nw.from_native(df_native)
... return df.estimated_size()
We can then pass either pandas, Polars or PyArrow to agnostic_estimated_size
:
>>> agnostic_estimated_size(df_pd)
np.int64(330)
>>> agnostic_estimated_size(df_pl)
51
>>> agnostic_estimated_size(df_pa)
63
explode(columns, *more_columns)
Explode the dataframe to long format by exploding the given columns.
Notes
It is possible to explode multiple columns only if these columns must have matching element counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str | Sequence[str]
|
Column names. The underlying columns being exploded must be of the |
required |
*more_columns
|
str
|
Additional names of columns to explode, specified as positional arguments. |
()
|
Returns:
Type | Description |
---|---|
Self
|
New DataFrame |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "a": ["x", "y", "z", "w"],
... "lst1": [[1, 2], None, [None], []],
... "lst2": [[3, None], None, [42], []],
... }
We define a library agnostic function:
>>> def agnostic_explode(df_native: IntoFrameT) -> IntoFrameT:
... return (
... nw.from_native(df_native)
... .with_columns(nw.col("lst1", "lst2").cast(nw.List(nw.Int32())))
... .explode("lst1", "lst2")
... .to_native()
... )
We can then pass any supported library such as pandas, Polars (eager),
or PyArrow to agnostic_explode
:
>>> agnostic_explode(pd.DataFrame(data))
a lst1 lst2
0 x 1 3
0 x 2 <NA>
1 y <NA> <NA>
2 z <NA> 42
3 w <NA> <NA>
>>> agnostic_explode(pl.DataFrame(data))
shape: (5, 3)
┌─────┬──────┬──────┐
│ a ┆ lst1 ┆ lst2 │
│ --- ┆ --- ┆ --- │
│ str ┆ i32 ┆ i32 │
╞═════╪══════╪══════╡
│ x ┆ 1 ┆ 3 │
│ x ┆ 2 ┆ null │
│ y ┆ null ┆ null │
│ z ┆ null ┆ 42 │
│ w ┆ null ┆ null │
└─────┴──────┴──────┘
filter(*predicates, **constraints)
Filter the rows in the DataFrame based on one or more predicate expressions.
The original order of the remaining rows is preserved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*predicates
|
IntoExpr | Iterable[IntoExpr] | list[bool]
|
Expression(s) that evaluates to a boolean Series. Can also be a (single!) boolean list. |
()
|
**constraints
|
Any
|
Column filters; use |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The filtered dataframe. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function in which we filter on one condition.
>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.filter(nw.col("foo") > 1).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_filter
:
>>> agnostic_filter(df_pd)
foo bar ham
1 2 7 b
2 3 8 c
>>> agnostic_filter(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[2,3]]
bar: [[7,8]]
ham: [["b","c"]]
Filter on multiple conditions, combined with and/or operators:
>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.filter((nw.col("foo") < 3) & (nw.col("ham") == "a")).to_native()
>>> agnostic_filter(df_pd)
foo bar ham
0 1 6 a
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1]]
bar: [[6]]
ham: [["a"]]
>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... dframe = df.filter(
... (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
... return dframe
>>> agnostic_filter(df_pd)
foo bar ham
0 1 6 a
2 3 8 c
>>> agnostic_filter(df_pl)
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1,3]]
bar: [[6,8]]
ham: [["a","c"]]
Provide multiple filters using *args
syntax:
>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... dframe = df.filter(
... nw.col("foo") <= 2,
... ~nw.col("ham").is_in(["b", "c"]),
... ).to_native()
... return dframe
>>> agnostic_filter(df_pd)
foo bar ham
0 1 6 a
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1]]
bar: [[6]]
ham: [["a"]]
Provide multiple filters using **kwargs
syntax:
>>> def agnostic_filter(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.filter(foo=2, ham="b").to_native()
>>> agnostic_filter(df_pd)
foo bar ham
1 2 7 b
>>> agnostic_filter(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ b │
└─────┴─────┴─────┘
>>> agnostic_filter(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[2]]
bar: [[7]]
ham: [["b"]]
gather_every(n, offset=0)
Take every nth row in the DataFrame and return as a new DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Gather every n-th row. |
required |
offset
|
int
|
Starting index. |
0
|
Returns:
Type | Description |
---|---|
Self
|
The dataframe containing only the selected rows. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function in which gather every 2 rows, starting from a offset of 1:
>>> def agnostic_gather_every(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.gather_every(n=2, offset=1).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_gather_every
:
>>> agnostic_gather_every(df_pd)
a b
1 2 6
3 4 8
>>> agnostic_gather_every(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2 ┆ 6 │
│ 4 ┆ 8 │
└─────┴─────┘
>>> agnostic_gather_every(df_pa)
pyarrow.Table
a: int64
b: int64
----
a: [[2,4]]
b: [[6,8]]
get_column(name)
Get a single column by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The column name as a string. |
required |
Returns:
Type | Description |
---|---|
Series[Any]
|
A Narwhals Series, backed by a native series. |
Notes
Although name
is typed as str
, pandas does allow non-string column
names, and they will work when passed to this function if the
narwhals.DataFrame
is backed by a pandas dataframe with non-string
columns. This function can only be used to extract a column by name, so
there is no risk of ambiguity.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_get_column(df_native: IntoDataFrame) -> IntoSeries:
... df = nw.from_native(df_native)
... name = df.columns[0]
... return df.get_column(name).to_native()
We can then pass either pandas, Polars or PyArrow to agnostic_get_column
:
>>> agnostic_get_column(df_pd)
0 1
1 2
Name: a, dtype: int64
>>> agnostic_get_column(df_pl)
shape: (2,)
Series: 'a' [i64]
[
1
2
]
>>> agnostic_get_column(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
[
1,
2
]
]
group_by(*keys, drop_null_keys=False)
Start a group by operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*keys
|
str | Iterable[str]
|
Column(s) to group by. Accepts multiple columns names as a list. |
()
|
drop_null_keys
|
bool
|
if True, then groups where any key is null won't be included in the result. |
False
|
Returns:
Name | Type | Description |
---|---|---|
GroupBy |
GroupBy[Self]
|
Object which can be used to perform aggregations. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {
... "a": ["a", "b", "a", "b", "c"],
... "b": [1, 2, 1, 3, 3],
... "c": [5, 4, 3, 2, 1],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function in which we group by one column
and call agg
to compute the grouped sum of another column.
>>> def agnostic_group_by_agg(df_native: IntoDataFrameT) -> IntoDataFrameT:
... df = nw.from_native(df_native, eager_only=True)
... return df.group_by("a").agg(nw.col("b").sum()).sort("a").to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_group_by_agg
:
>>> agnostic_group_by_agg(df_pd)
a b
0 a 2
1 b 5
2 c 3
>>> agnostic_group_by_agg(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 2 │
│ b ┆ 5 │
│ c ┆ 3 │
└─────┴─────┘
>>> agnostic_group_by_agg(df_pa)
pyarrow.Table
a: string
b: int64
----
a: [["a","b","c"]]
b: [[2,5,3]]
Group by multiple columns by passing a list of column names.
>>> def agnostic_group_by_agg(df_native: IntoDataFrameT) -> IntoDataFrameT:
... df = nw.from_native(df_native, eager_only=True)
... return df.group_by(["a", "b"]).agg(nw.max("c")).sort("a", "b").to_native()
>>> agnostic_group_by_agg(df_pd)
a b c
0 a 1 5
1 b 2 4
2 b 3 2
3 c 3 1
>>> agnostic_group_by_agg(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a ┆ 1 ┆ 5 │
│ b ┆ 2 ┆ 4 │
│ b ┆ 3 ┆ 2 │
│ c ┆ 3 ┆ 1 │
└─────┴─────┴─────┘
>>> agnostic_group_by_agg(df_pa)
pyarrow.Table
a: string
b: int64
c: int64
----
a: [["a","b","b","c"]]
b: [[1,2,3,3]]
c: [[5,4,2,1]]
head(n=5)
Get the first n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. If a negative value is passed, return all rows
except the last |
5
|
Returns:
Type | Description |
---|---|
Self
|
A subset of the dataframe of shape (n, n_columns). |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, 2, 3, 4, 5],
... "bar": [6, 7, 8, 9, 10],
... "ham": ["a", "b", "c", "d", "e"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function that gets the first 3 rows.
>>> def agnostic_head(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).head(3).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_head
:
>>> agnostic_head(df_pd)
foo bar ham
0 1 6 a
1 2 7 b
2 3 8 c
>>> agnostic_head(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└─────┴─────┴─────┘
>>> agnostic_head(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]
is_duplicated()
Get a mask of all duplicated rows in this DataFrame.
Returns:
Type | Description |
---|---|
Series[Any]
|
A new Series. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {
... "a": [1, 2, 3, 1],
... "b": ["x", "y", "z", "x"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_is_duplicated(df_native: IntoDataFrame) -> IntoSeries:
... df = nw.from_native(df_native, eager_only=True)
... return df.is_duplicated().to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_is_duplicated
:
>>> agnostic_is_duplicated(df_pd)
0 True
1 False
2 False
3 True
dtype: bool
>>> agnostic_is_duplicated(df_pl)
shape: (4,)
Series: '' [bool]
[
true
false
false
true
]
>>> agnostic_is_duplicated(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
[
true,
false,
false,
true
]
]
is_empty()
Check if the dataframe is empty.
Returns:
Type | Description |
---|---|
bool
|
A boolean indicating whether the dataframe is empty (True) or not (False). |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
Let's define a dataframe-agnostic function that filters rows in which "foo" values are greater than 10, and then checks if the result is empty or not:
>>> def agnostic_is_empty(df_native: IntoDataFrame) -> bool:
... df = nw.from_native(df_native, eager_only=True)
... return df.filter(nw.col("foo") > 10).is_empty()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_is_empty
:
>>> data = {"foo": [1, 2, 3], "bar": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
>>> agnostic_is_empty(df_pd), agnostic_is_empty(df_pl), agnostic_is_empty(df_pa)
(True, True, True)
>>> data = {"foo": [100, 2, 3], "bar": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
>>> agnostic_is_empty(df_pd), agnostic_is_empty(df_pl), agnostic_is_empty(df_pa)
(False, False, False)
is_unique()
Get a mask of all unique rows in this DataFrame.
Returns:
Type | Description |
---|---|
Series[Any]
|
A new Series. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from narwhals.typing import IntoSeries
>>> data = {
... "a": [1, 2, 3, 1],
... "b": ["x", "y", "z", "x"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_is_unique(df_native: IntoDataFrame) -> IntoSeries:
... df = nw.from_native(df_native, eager_only=True)
... return df.is_unique().to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_is_unique
:
>>> agnostic_is_unique(df_pd)
0 False
1 True
2 True
3 False
dtype: bool
>>> agnostic_is_unique(df_pl)
shape: (4,)
Series: '' [bool]
[
false
true
true
false
]
>>> agnostic_is_unique(df_pa)
<pyarrow.lib.ChunkedArray object at ...>
[
[
false,
true,
true,
false
]
]
item(row=None, column=None)
Return the DataFrame as a scalar, or return the element at the given row/column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
int | None
|
The n-th row. |
None
|
column
|
int | str | None
|
The column selected via an integer or a string (column name). |
None
|
Returns:
Type | Description |
---|---|
Any
|
A scalar or the specified element in the dataframe. |
Notes
If row/col not provided, this is equivalent to df[0,0], with a check that the shape is (1,1). With row/col, this is equivalent to df[row,col].
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function that returns item at given row/column
>>> def agnostic_item(
... df_native: IntoDataFrame, row: int | None, column: int | str | None
... ):
... df = nw.from_native(df_native, eager_only=True)
... return df.item(row, column)
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_item
:
>>> agnostic_item(df_pd, 1, 1), agnostic_item(df_pd, 2, "b")
(np.int64(5), np.int64(6))
>>> agnostic_item(df_pl, 1, 1), agnostic_item(df_pl, 2, "b")
(5, 6)
>>> agnostic_item(df_pa, 1, 1), agnostic_item(df_pa, 2, "b")
(5, 6)
iter_rows(*, named=False, buffer_size=512)
Returns an iterator over the DataFrame of rows of python-native values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
named
|
bool
|
By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead. |
False
|
buffer_size
|
int
|
Determines the number of rows that are buffered internally while iterating over the data. See https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_rows.html |
512
|
Returns:
Type | Description |
---|---|
Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]
|
An iterator over the DataFrame of rows. |
Notes
cuDF doesn't support this method.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_iter_rows(df_native: IntoDataFrame, *, named: bool):
... return nw.from_native(df_native, eager_only=True).iter_rows(named=named)
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_iter_rows
:
>>> [row for row in agnostic_iter_rows(df_pd, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in agnostic_iter_rows(df_pd, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> [row for row in agnostic_iter_rows(df_pl, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in agnostic_iter_rows(df_pl, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> [row for row in agnostic_iter_rows(df_pa, named=False)]
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> [row for row in agnostic_iter_rows(df_pa, named=True)]
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
join(other, on=None, how='inner', *, left_on=None, right_on=None, suffix='_right')
Join in SQL-like fashion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
on
|
str | list[str] | None
|
Name(s) of the join columns in both DataFrames. If set, |
None
|
how
|
Literal['inner', 'left', 'cross', 'semi', 'anti']
|
Join strategy.
|
'inner'
|
left_on
|
str | list[str] | None
|
Join column of the left DataFrame. |
None
|
right_on
|
str | list[str] | None
|
Join column of the right DataFrame. |
None
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined DataFrame |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ["a", "b", "c"],
... }
>>> data_other = {
... "apple": ["x", "y", "z"],
... "ham": ["a", "b", "d"],
... }
>>> df_pd = pd.DataFrame(data)
>>> other_pd = pd.DataFrame(data_other)
>>> df_pl = pl.DataFrame(data)
>>> other_pl = pl.DataFrame(data_other)
>>> df_pa = pa.table(data)
>>> other_pa = pa.table(data_other)
Let's define a dataframe-agnostic function in which we join over "ham" column:
>>> def agnostic_join_on_ham(
... df_native: IntoFrameT, other_native: IntoFrameT
... ) -> IntoFrameT:
... df = nw.from_native(df_native)
... other = nw.from_native(other_native)
... return df.join(other, left_on="ham", right_on="ham").to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_join_on_ham
:
>>> agnostic_join_on_ham(df_pd, other_pd)
foo bar ham apple
0 1 6.0 a x
1 2 7.0 b y
>>> agnostic_join_on_ham(df_pl, other_pl)
shape: (2, 4)
┌─────┬─────┬─────┬───────┐
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6.0 ┆ a ┆ x │
│ 2 ┆ 7.0 ┆ b ┆ y │
└─────┴─────┴─────┴───────┘
>>> agnostic_join_on_ham(df_pa, other_pa)
pyarrow.Table
foo: int64
bar: double
ham: string
apple: string
----
foo: [[1,2]]
bar: [[6,7]]
ham: [["a","b"]]
apple: [["x","y"]]
join_asof(other, *, left_on=None, right_on=None, on=None, by_left=None, by_right=None, by=None, strategy='backward')
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the asof_join key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
left_on
|
str | None
|
Name(s) of the left join column(s). |
None
|
right_on
|
str | None
|
Name(s) of the right join column(s). |
None
|
on
|
str | None
|
Join column of both DataFrames. If set, left_on and right_on should be None. |
None
|
by_left
|
str | list[str] | None
|
join on these columns before doing asof join. |
None
|
by_right
|
str | list[str] | None
|
join on these columns before doing asof join. |
None
|
by
|
str | list[str] | None
|
join on these columns before doing asof join. |
None
|
strategy
|
Literal['backward', 'forward', 'nearest']
|
Join strategy. The default is "backward".
|
'backward'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined DataFrame |
Examples:
>>> from datetime import datetime
>>> from typing import Literal
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data_gdp = {
... "datetime": [
... datetime(2016, 1, 1),
... datetime(2017, 1, 1),
... datetime(2018, 1, 1),
... datetime(2019, 1, 1),
... datetime(2020, 1, 1),
... ],
... "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
... "datetime": [
... datetime(2016, 3, 1),
... datetime(2018, 8, 1),
... datetime(2019, 1, 1),
... ],
... "population": [82.19, 82.66, 83.12],
... }
>>> gdp_pd = pd.DataFrame(data_gdp)
>>> population_pd = pd.DataFrame(data_population)
>>> gdp_pl = pl.DataFrame(data_gdp).sort("datetime")
>>> population_pl = pl.DataFrame(data_population).sort("datetime")
Let's define a dataframe-agnostic function in which we join over "datetime" column:
>>> def agnostic_join_asof_datetime(
... df_native: IntoFrameT,
... other_native: IntoFrameT,
... strategy: Literal["backward", "forward", "nearest"],
... ) -> IntoFrameT:
... df = nw.from_native(df_native)
... other = nw.from_native(other_native)
... return df.join_asof(other, on="datetime", strategy=strategy).to_native()
We can then pass any supported library such as Pandas or Polars
to agnostic_join_asof_datetime
:
>>> agnostic_join_asof_datetime(population_pd, gdp_pd, strategy="backward")
datetime population gdp
0 2016-03-01 82.19 4164
1 2018-08-01 82.66 4566
2 2019-01-01 83.12 4696
>>> agnostic_join_asof_datetime(population_pl, gdp_pl, strategy="backward")
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19 ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66 ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
Here is a real-world times-series example that uses by
argument.
>>> from datetime import datetime
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data_quotes = {
... "datetime": [
... datetime(2016, 5, 25, 13, 30, 0, 23),
... datetime(2016, 5, 25, 13, 30, 0, 23),
... datetime(2016, 5, 25, 13, 30, 0, 30),
... datetime(2016, 5, 25, 13, 30, 0, 41),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... datetime(2016, 5, 25, 13, 30, 0, 49),
... datetime(2016, 5, 25, 13, 30, 0, 72),
... datetime(2016, 5, 25, 13, 30, 0, 75),
... ],
... "ticker": [
... "GOOG",
... "MSFT",
... "MSFT",
... "MSFT",
... "GOOG",
... "AAPL",
... "GOOG",
... "MSFT",
... ],
... "bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
... "ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03],
... }
>>> data_trades = {
... "datetime": [
... datetime(2016, 5, 25, 13, 30, 0, 23),
... datetime(2016, 5, 25, 13, 30, 0, 38),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... datetime(2016, 5, 25, 13, 30, 0, 48),
... ],
... "ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
... "price": [51.95, 51.95, 720.77, 720.92, 98.0],
... "quantity": [75, 155, 100, 100, 100],
... }
>>> quotes_pd = pd.DataFrame(data_quotes)
>>> trades_pd = pd.DataFrame(data_trades)
>>> quotes_pl = pl.DataFrame(data_quotes).sort("datetime")
>>> trades_pl = pl.DataFrame(data_trades).sort("datetime")
Let's define a dataframe-agnostic function in which we join over "datetime" and by "ticker" columns:
>>> def agnostic_join_asof_datetime_by_ticker(
... df_native: IntoFrameT, other_native: IntoFrameT
... ) -> IntoFrameT:
... df = nw.from_native(df_native)
... other = nw.from_native(other_native)
... return df.join_asof(other, on="datetime", by="ticker").to_native()
We can now pass either pandas or Polars to the function:
>>> agnostic_join_asof_datetime_by_ticker(trades_pd, quotes_pd)
datetime ticker price quantity bid ask
0 2016-05-25 13:30:00.000023 MSFT 51.95 75 51.95 51.96
1 2016-05-25 13:30:00.000038 MSFT 51.95 155 51.97 51.98
2 2016-05-25 13:30:00.000048 GOOG 720.77 100 720.50 720.93
3 2016-05-25 13:30:00.000048 GOOG 720.92 100 720.50 720.93
4 2016-05-25 13:30:00.000048 AAPL 98.00 100 NaN NaN
>>> agnostic_join_asof_datetime_by_ticker(trades_pl, quotes_pl)
shape: (5, 6)
┌────────────────────────────┬────────┬────────┬──────────┬───────┬────────┐
│ datetime ┆ ticker ┆ price ┆ quantity ┆ bid ┆ ask │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ f64 ┆ i64 ┆ f64 ┆ f64 │
╞════════════════════════════╪════════╪════════╪══════════╪═══════╪════════╡
│ 2016-05-25 13:30:00.000023 ┆ MSFT ┆ 51.95 ┆ 75 ┆ 51.95 ┆ 51.96 │
│ 2016-05-25 13:30:00.000038 ┆ MSFT ┆ 51.95 ┆ 155 ┆ 51.97 ┆ 51.98 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG ┆ 720.77 ┆ 100 ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ GOOG ┆ 720.92 ┆ 100 ┆ 720.5 ┆ 720.93 │
│ 2016-05-25 13:30:00.000048 ┆ AAPL ┆ 98.0 ┆ 100 ┆ null ┆ null │
└────────────────────────────┴────────┴────────┴──────────┴───────┴────────┘
lazy()
Lazify the DataFrame (if possible).
If a library does not support lazy execution, then this is a no-op.
Returns:
Type | Description |
---|---|
LazyFrame[Any]
|
A new LazyFrame. |
Examples:
Construct pandas, Polars and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_lazy(df_native: IntoFrame) -> IntoFrame:
... df = nw.from_native(df_native)
... return df.lazy().to_native()
Note that then, pandas and pyarrow dataframe stay eager, but Polars DataFrame becomes a Polars LazyFrame:
>>> agnostic_lazy(df_pd)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> agnostic_lazy(df_pl)
<LazyFrame ...>
>>> agnostic_lazy(df_pa)
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]
null_count()
Create a new DataFrame that shows the null counts per column.
Returns:
Type | Description |
---|---|
Self
|
A dataframe of shape (1, n_columns). |
Notes
pandas handles null values differently from Polars and PyArrow. See null_handling for reference.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, None, 3],
... "bar": [6, 7, None],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function that returns the null count of each columns:
>>> def agnostic_null_count(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.null_count().to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_null_count
:
>>> agnostic_null_count(df_pd)
foo bar ham
0 1 1 0
>>> agnostic_null_count(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪═════╡
│ 1 ┆ 1 ┆ 0 │
└─────┴─────┴─────┘
>>> agnostic_null_count(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: int64
----
foo: [[1]]
bar: [[1]]
ham: [[0]]
pipe(function, *args, **kwargs)
Pipe function call.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function
|
Callable[[Any], Self]
|
Function to apply. |
required |
args
|
Any
|
Positional arguments to pass to function. |
()
|
kwargs
|
Any
|
Keyword arguments to pass to function. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the function applied. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3], "ba": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_pipe(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.pipe(
... lambda _df: _df.select(
... [x for x in _df.columns if len(x) == 1]
... ).to_native()
... )
We can then pass either pandas, Polars or PyArrow to agnostic_pipe
:
>>> agnostic_pipe(df_pd)
a
0 1
1 2
2 3
>>> agnostic_pipe(df_pl)
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
>>> agnostic_pipe(df_pa)
pyarrow.Table
a: int64
----
a: [[1,2,3]]
pivot(on, *, index=None, values=None, aggregate_function=None, maintain_order=None, sort_columns=False, separator='_')
Create a spreadsheet-style pivot table as a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str]
|
Name of the column(s) whose values will be used as the header of the output DataFrame. |
required |
index
|
str | list[str] | None
|
One or multiple keys to group by. If None, all remaining columns not
specified on |
None
|
values
|
str | list[str] | None
|
One or multiple keys to group by. If None, all remaining columns not
specified on |
None
|
aggregate_function
|
Literal['min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'] | None
|
Choose from:
|
None
|
maintain_order
|
bool | None
|
Has no effect and is kept around only for backwards-compatibility. |
None
|
sort_columns
|
bool
|
Sort the transposed columns by name. Default is by order of discovery. |
False
|
separator
|
str
|
Used as separator/delimiter in generated column names in case of
multiple |
'_'
|
Returns:
Type | Description |
---|---|
Self
|
A new dataframe. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {
... "ix": [1, 1, 2, 2, 1, 2],
... "col": ["a", "a", "a", "a", "b", "b"],
... "foo": [0, 1, 2, 2, 7, 1],
... "bar": [0, 2, 0, 0, 9, 4],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_pivot(df_native: IntoDataFrameT) -> IntoDataFrameT:
... df = nw.from_native(df_native, eager_only=True)
... return df.pivot("col", index="ix", aggregate_function="sum").to_native()
We can then pass any supported library such as Pandas or Polars
to agnostic_pivot
:
>>> agnostic_pivot(df_pd)
ix foo_a foo_b bar_a bar_b
0 1 1 7 2 9
1 2 4 1 0 4
>>> agnostic_pivot(df_pl)
shape: (2, 5)
┌─────┬───────┬───────┬───────┬───────┐
│ ix ┆ foo_a ┆ foo_b ┆ bar_a ┆ bar_b │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═══════╪═══════╪═══════╪═══════╡
│ 1 ┆ 1 ┆ 7 ┆ 2 ┆ 9 │
│ 2 ┆ 4 ┆ 1 ┆ 0 ┆ 4 │
└─────┴───────┴───────┴───────┴───────┘
rename(mapping)
Rename column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping
|
dict[str, str]
|
Key value pairs that map from old name to new name. |
required |
Returns:
Type | Description |
---|---|
Self
|
The dataframe with the specified columns renamed. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_rename(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).rename({"foo": "apple"}).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_rename
:
>>> agnostic_rename(df_pd)
apple bar ham
0 1 6 a
1 2 7 b
2 3 8 c
>>> agnostic_rename(df_pl)
shape: (3, 3)
┌───────┬─────┬─────┐
│ apple ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
│ 2 ┆ 7 ┆ b │
│ 3 ┆ 8 ┆ c │
└───────┴─────┴─────┘
>>> agnostic_rename(df_pa)
pyarrow.Table
apple: int64
bar: int64
ham: string
----
apple: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]
row(index)
Get values at given row.
Warning
You should NEVER use this method to iterate over a DataFrame; if you require row-iteration you should strongly prefer use of iter_rows() instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int
|
Row number. |
required |
Returns:
Type | Description |
---|---|
tuple[Any, ...]
|
A tuple of the values in the selected row. |
Notes
cuDF doesn't support this method.
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> from narwhals.typing import IntoDataFrame
>>> from typing import Any
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a library-agnostic function to get the second row.
>>> def agnostic_row(df_native: IntoDataFrame) -> tuple[Any, ...]:
... return nw.from_native(df_native).row(1)
We can then pass either pandas, Polars or PyArrow to agnostic_row
:
>>> agnostic_row(df_pd)
(2, 5)
>>> agnostic_row(df_pl)
(2, 5)
>>> agnostic_row(df_pa)
(<pyarrow.Int64Scalar: 2>, <pyarrow.Int64Scalar: 5>)
rows(*, named=False)
Returns all data in the DataFrame as a list of rows of python-native values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
named
|
bool
|
By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead. |
False
|
Returns:
Type | Description |
---|---|
list[tuple[Any, ...]] | list[dict[str, Any]]
|
The data as a list of rows. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_rows(df_native: IntoDataFrame, *, named: bool):
... return nw.from_native(df_native, eager_only=True).rows(named=named)
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_rows
:
>>> agnostic_rows(df_pd, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> agnostic_rows(df_pd, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> agnostic_rows(df_pl, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> agnostic_rows(df_pl, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
>>> agnostic_rows(df_pa, named=False)
[(1, 6.0, 'a'), (2, 7.0, 'b'), (3, 8.0, 'c')]
>>> agnostic_rows(df_pa, named=True)
[{'foo': 1, 'bar': 6.0, 'ham': 'a'}, {'foo': 2, 'bar': 7.0, 'ham': 'b'}, {'foo': 3, 'bar': 8.0, 'ham': 'c'}]
sample(n=None, *, fraction=None, with_replacement=False, seed=None)
Sample from this DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int | None
|
Number of items to return. Cannot be used with fraction. |
None
|
fraction
|
float | None
|
Fraction of items to return. Cannot be used with n. |
None
|
with_replacement
|
bool
|
Allow values to be sampled more than once. |
False
|
seed
|
int | None
|
Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation. |
None
|
Returns:
Type | Description |
---|---|
Self
|
A new dataframe. |
Notes
The results may not be consistent across libraries.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> data = {"a": [1, 2, 3, 4], "b": ["x", "y", "x", "y"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_sample(df_native: IntoDataFrameT) -> IntoDataFrameT:
... df = nw.from_native(df_native, eager_only=True)
... return df.sample(n=2, seed=123).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_sample
:
>>> agnostic_sample(df_pd)
a b
3 4 y
0 1 x
>>> agnostic_sample(df_pl)
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 2 ┆ y │
│ 3 ┆ x │
└─────┴─────┘
>>> agnostic_sample(df_pa)
pyarrow.Table
a: int64
b: string
----
a: [[1,3]]
b: [["x","x"]]
As you can see, by using the same seed, the result will be consistent within the same backend, but not necessarely across different backends.
select(*exprs, **named_exprs)
Select columns from this DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The dataframe containing only the selected columns. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function in which we pass the name of a column to select that column.
>>> def agnostic_single_select(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).select("foo").to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_single_select
:
>>> agnostic_single_select(df_pd)
foo
0 1
1 2
2 3
>>> agnostic_single_select(df_pl)
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
>>> agnostic_single_select(df_pa)
pyarrow.Table
foo: int64
----
foo: [[1,2,3]]
Multiple columns can be selected by passing a list of column names.
>>> def agnostic_multi_select(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).select(["foo", "bar"]).to_native()
>>> agnostic_multi_select(df_pd)
foo bar
0 1 6
1 2 7
2 3 8
>>> agnostic_multi_select(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 6 │
│ 2 ┆ 7 │
│ 3 ┆ 8 │
└─────┴─────┘
>>> agnostic_multi_select(df_pa)
pyarrow.Table
foo: int64
bar: int64
----
foo: [[1,2,3]]
bar: [[6,7,8]]
Multiple columns can also be selected using positional arguments instead of a list. Expressions are also accepted.
>>> def agnostic_select(df_native: IntoFrameT) -> IntoFrameT:
... return (
... nw.from_native(df_native)
... .select(nw.col("foo"), nw.col("bar") + 1)
... .to_native()
... )
>>> agnostic_select(df_pd)
foo bar
0 1 7
1 2 8
2 3 9
>>> agnostic_select(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 7 │
│ 2 ┆ 8 │
│ 3 ┆ 9 │
└─────┴─────┘
>>> agnostic_select(df_pa)
pyarrow.Table
foo: int64
bar: int64
----
foo: [[1,2,3]]
bar: [[7,8,9]]
Use keyword arguments to easily name your expression inputs.
>>> def agnostic_select_w_kwargs(df_native: IntoFrameT) -> IntoFrameT:
... return (
... nw.from_native(df_native)
... .select(threshold=nw.col("foo") * 2)
... .to_native()
... )
>>> agnostic_select_w_kwargs(df_pd)
threshold
0 2
1 4
2 6
>>> agnostic_select_w_kwargs(df_pl)
shape: (3, 1)
┌───────────┐
│ threshold │
│ --- │
│ i64 │
╞═══════════╡
│ 2 │
│ 4 │
│ 6 │
└───────────┘
>>> agnostic_select_w_kwargs(df_pa)
pyarrow.Table
threshold: int64
----
threshold: [[2,4,6]]
sort(by, *more_by, descending=False, nulls_last=False)
Sort the dataframe by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by
|
str | Iterable[str]
|
Column(s) names to sort by. |
required |
*more_by
|
str
|
Additional columns to sort by, specified as positional arguments. |
()
|
descending
|
bool | Sequence[bool]
|
Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. |
False
|
nulls_last
|
bool
|
Place null values last. |
False
|
Returns:
Type | Description |
---|---|
Self
|
The sorted dataframe. |
Warning
Unlike Polars, it is not possible to specify a sequence of booleans for
nulls_last
in order to control per-column behaviour. Instead a single
boolean is applied for all by
columns.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "a": [1, 2, None],
... "b": [6.0, 5.0, 4.0],
... "c": ["a", "c", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function in which we sort by multiple columns in different orders
>>> def agnostic_sort(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.sort("c", "a", descending=[False, True]).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_sort
:
>>> agnostic_sort(df_pd)
a b c
0 1.0 6.0 a
2 NaN 4.0 b
1 2.0 5.0 c
>>> agnostic_sort(df_pl)
shape: (3, 3)
┌──────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞══════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ null ┆ 4.0 ┆ b │
│ 2 ┆ 5.0 ┆ c │
└──────┴─────┴─────┘
>>> agnostic_sort(df_pa)
pyarrow.Table
a: int64
b: double
c: string
----
a: [[1,null,2]]
b: [[6,4,5]]
c: [["a","b","c"]]
tail(n=5)
Get the last n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. If a negative value is passed, return all rows
except the first |
5
|
Returns:
Type | Description |
---|---|
Self
|
A subset of the dataframe of shape (n, n_columns). |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, 2, 3, 4, 5],
... "bar": [6, 7, 8, 9, 10],
... "ham": ["a", "b", "c", "d", "e"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function that gets the last 3 rows.
>>> def agnostic_tail(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).tail(3).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_tail
:
>>> agnostic_tail(df_pd)
foo bar ham
2 3 8 c
3 4 9 d
4 5 10 e
>>> agnostic_tail(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 3 ┆ 8 ┆ c │
│ 4 ┆ 9 ┆ d │
│ 5 ┆ 10 ┆ e │
└─────┴─────┴─────┘
>>> agnostic_tail(df_pa)
pyarrow.Table
foo: int64
bar: int64
ham: string
----
foo: [[3,4,5]]
bar: [[8,9,10]]
ham: [["c","d","e"]]
to_arrow()
Convert to arrow table.
Returns:
Type | Description |
---|---|
Table
|
A new PyArrow table. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function that converts to arrow table:
>>> def agnostic_to_arrow(df_native: IntoDataFrame) -> pa.Table:
... df = nw.from_native(df_native, eager_only=True)
... return df.to_arrow()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_to_arrow
:
>>> agnostic_to_arrow(df_pd)
pyarrow.Table
foo: int64
bar: string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]
>>> agnostic_to_arrow(df_pl)
pyarrow.Table
foo: int64
bar: large_string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]
>>> agnostic_to_arrow(df_pa)
pyarrow.Table
foo: int64
bar: string
----
foo: [[1,2,3]]
bar: [["a","b","c"]]
to_dict(*, as_series=True)
Convert DataFrame to a dictionary mapping column name to values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
as_series
|
bool
|
If set to true |
True
|
Returns:
Type | Description |
---|---|
dict[str, Series[Any]] | dict[str, list[Any]]
|
A mapping from column name to values / Series. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {
... "A": [1, 2, 3, 4, 5],
... "fruits": ["banana", "banana", "apple", "apple", "banana"],
... "B": [5, 4, 3, 2, 1],
... "animals": ["beetle", "fly", "beetle", "beetle", "beetle"],
... "optional": [28, 300, None, 2, -30],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_to_dict(
... df_native: IntoDataFrame,
... ) -> dict[str, list[int | str | float | None]]:
... df = nw.from_native(df_native)
... return df.to_dict(as_series=False)
We can then pass either pandas, Polars or PyArrow to agnostic_to_dict
:
>>> agnostic_to_dict(df_pd)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28.0, 300.0, nan, 2.0, -30.0]}
>>> agnostic_to_dict(df_pl)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28, 300, None, 2, -30]}
>>> agnostic_to_dict(df_pa)
{'A': [1, 2, 3, 4, 5], 'fruits': ['banana', 'banana', 'apple', 'apple', 'banana'], 'B': [5, 4, 3, 2, 1], 'animals': ['beetle', 'fly', 'beetle', 'beetle', 'beetle'], 'optional': [28, 300, None, 2, -30]}
to_native()
Convert Narwhals DataFrame to native one.
Returns:
Type | Description |
---|---|
DataFrameT
|
Object of class that user started with. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Calling to_native
on a Narwhals DataFrame returns the native object:
>>> nw.from_native(df_pd).to_native()
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> nw.from_native(df_pl).to_native()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 7.0 ┆ b │
│ 3 ┆ 8.0 ┆ c │
└─────┴─────┴─────┘
>>> nw.from_native(df_pa).to_native()
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]
to_numpy()
Convert this DataFrame to a NumPy ndarray.
Returns:
Type | Description |
---|---|
ndarray
|
A NumPy ndarray array. |
Examples:
Construct pandas and polars DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> import numpy as np
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.5, 7.0, 8.5], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_to_numpy(df_native: IntoDataFrame) -> np.ndarray:
... df = nw.from_native(df_native)
... return df.to_numpy()
We can then pass either pandas, Polars or PyArrow to agnostic_to_numpy
:
>>> agnostic_to_numpy(df_pd)
array([[1, 6.5, 'a'],
[2, 7.0, 'b'],
[3, 8.5, 'c']], dtype=object)
>>> agnostic_to_numpy(df_pl)
array([[1, 6.5, 'a'],
[2, 7.0, 'b'],
[3, 8.5, 'c']], dtype=object)
>>> agnostic_to_numpy(df_pa)
array([[1, 6.5, 'a'],
[2, 7.0, 'b'],
[3, 8.5, 'c']], dtype=object)
to_pandas()
Convert this DataFrame to a pandas DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame. |
Examples:
Construct pandas, Polars (eager) and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_to_pandas(df_native: IntoDataFrame) -> pd.DataFrame:
... df = nw.from_native(df_native)
... return df.to_pandas()
We can then pass any supported library such as pandas, Polars (eager), or
PyArrow to agnostic_to_pandas
:
>>> agnostic_to_pandas(df_pd)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> agnostic_to_pandas(df_pl)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> agnostic_to_pandas(df_pa)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
to_polars()
Convert this DataFrame to a polars DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
A polars DataFrame. |
Examples:
Construct pandas, Polars (eager) and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_to_polars(df_native: IntoDataFrame) -> pl.DataFrame:
... df = nw.from_native(df_native)
... return df.to_polars()
We can then pass any supported library such as pandas, Polars (eager), or
PyArrow to agnostic_to_polars
:
>>> agnostic_to_polars(df_pd)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 7.0 ┆ b │
│ 3 ┆ 8.0 ┆ c │
└─────┴─────┴─────┘
>>> agnostic_to_polars(df_pl)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 7.0 ┆ b │
│ 3 ┆ 8.0 ┆ c │
└─────┴─────┴─────┘
>>> agnostic_to_polars(df_pa)
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 7.0 ┆ b │
│ 3 ┆ 8.0 ┆ c │
└─────┴─────┴─────┘
unique(subset=None, *, keep='any', maintain_order=False)
Drop duplicate rows from this dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) to consider when identifying duplicate rows. |
None
|
keep
|
Literal['any', 'first', 'last', 'none']
|
{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.
|
'any'
|
maintain_order
|
bool
|
Keep the same order as the original DataFrame. This may be more expensive to compute. |
False
|
Returns:
Type | Description |
---|---|
Self
|
The dataframe with the duplicate rows removed. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "foo": [1, 2, 3, 1],
... "bar": ["a", "a", "a", "a"],
... "ham": ["b", "b", "b", "b"],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_unique(df_native: IntoFrameT) -> IntoFrameT:
... return nw.from_native(df_native).unique(["bar", "ham"]).to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_unique
:
>>> agnostic_unique(df_pd)
foo bar ham
0 1 a b
>>> agnostic_unique(df_pl)
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ a ┆ b │
└─────┴─────┴─────┘
>>> agnostic_unique(df_pa)
pyarrow.Table
foo: int64
bar: string
ham: string
----
foo: [[1]]
bar: [["a"]]
ham: [["b"]]
unpivot(on=None, *, index=None, variable_name=None, value_name=None)
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str] | None
|
Column(s) to use as values variables; if |
None
|
index
|
str | list[str] | None
|
Column(s) to use as identifier variables. |
None
|
variable_name
|
str | None
|
Name to give to the |
None
|
value_name
|
str | None
|
Name to give to the |
None
|
Returns:
Type | Description |
---|---|
Self
|
The unpivoted dataframe. |
Notes
If you're coming from pandas, this is similar to pandas.DataFrame.melt
,
but with index
replacing id_vars
and on
replacing value_vars
.
In other frameworks, you might know this operation as pivot_longer
.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "a": ["x", "y", "z"],
... "b": [1, 3, 5],
... "c": [2, 4, 6],
... }
We define a library agnostic function:
>>> def agnostic_unpivot(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.unpivot(on=["b", "c"], index="a").to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_unpivot
:
>>> agnostic_unpivot(pl.DataFrame(data))
shape: (6, 3)
┌─────┬──────────┬───────┐
│ a ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════╪══════════╪═══════╡
│ x ┆ b ┆ 1 │
│ y ┆ b ┆ 3 │
│ z ┆ b ┆ 5 │
│ x ┆ c ┆ 2 │
│ y ┆ c ┆ 4 │
│ z ┆ c ┆ 6 │
└─────┴──────────┴───────┘
>>> agnostic_unpivot(pd.DataFrame(data))
a variable value
0 x b 1
1 y b 3
2 z b 5
3 x c 2
4 y c 4
5 z c 6
>>> agnostic_unpivot(pa.table(data))
pyarrow.Table
a: string
variable: string
value: int64
----
a: [["x","y","z"],["x","y","z"]]
variable: [["b","b","b"],["c","c","c"]]
value: [[1,3,5],[2,4,6]]
with_columns(*exprs, **named_exprs)
Add columns to this DataFrame.
Added columns will replace existing columns with the same name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Self
|
A new DataFrame with the columns added. |
Note
Creating a new DataFrame using this method does not create a new copy of existing data.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {
... "a": [1, 2, 3, 4],
... "b": [0.5, 4, 10, 13],
... "c": [True, True, False, True],
... }
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function in which we pass an expression to add it as a new column:
>>> def agnostic_with_columns(df_native: IntoFrameT) -> IntoFrameT:
... return (
... nw.from_native(df_native)
... .with_columns((nw.col("a") * 2).alias("a*2"))
... .to_native()
... )
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_with_columns
:
>>> agnostic_with_columns(df_pd)
a b c a*2
0 1 0.5 True 2
1 2 4.0 True 4
2 3 10.0 False 6
3 4 13.0 True 8
>>> agnostic_with_columns(df_pl)
shape: (4, 4)
┌─────┬──────┬───────┬─────┐
│ a ┆ b ┆ c ┆ a*2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ bool ┆ i64 │
╞═════╪══════╪═══════╪═════╡
│ 1 ┆ 0.5 ┆ true ┆ 2 │
│ 2 ┆ 4.0 ┆ true ┆ 4 │
│ 3 ┆ 10.0 ┆ false ┆ 6 │
│ 4 ┆ 13.0 ┆ true ┆ 8 │
└─────┴──────┴───────┴─────┘
>>> agnostic_with_columns(df_pa)
pyarrow.Table
a: int64
b: double
c: bool
a*2: int64
----
a: [[1,2,3,4]]
b: [[0.5,4,10,13]]
c: [[true,true,false,true]]
a*2: [[2,4,6,8]]
with_row_index(name='index')
Insert column which enumerates rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the column as a string. The default is "index". |
'index'
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the column added. |
Examples:
Construct pandas as polars DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_with_row_index(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.with_row_index().to_native()
We can then pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_with_row_index
:
>>> agnostic_with_row_index(df_pd)
index a b
0 0 1 4
1 1 2 5
2 2 3 6
>>> agnostic_with_row_index(df_pl)
shape: (3, 3)
┌───────┬─────┬─────┐
│ index ┆ a ┆ b │
│ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 │
╞═══════╪═════╪═════╡
│ 0 ┆ 1 ┆ 4 │
│ 1 ┆ 2 ┆ 5 │
│ 2 ┆ 3 ┆ 6 │
└───────┴─────┴─────┘
>>> agnostic_with_row_index(df_pa)
pyarrow.Table
index: int64
a: int64
b: int64
----
index: [[0,1,2]]
a: [[1,2,3]]
b: [[4,5,6]]
write_csv(file=None)
Write dataframe to comma-separated values (CSV) file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | Path | BytesIO | None
|
String, path object or file-like object to which the dataframe will be written. If None, the resulting csv format is returned as a string. |
None
|
Returns:
Type | Description |
---|---|
str | None
|
String or None. |
Examples:
Construct pandas, Polars (eager) and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_write_csv(df_native: IntoDataFrame) -> str:
... df = nw.from_native(df_native)
... return df.write_csv()
We can pass any supported library such as pandas, Polars or PyArrow to agnostic_write_csv
:
>>> agnostic_write_csv(df_pd)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> agnostic_write_csv(df_pl)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> agnostic_write_csv(df_pa)
'"foo","bar","ham"\n1,6,"a"\n2,7,"b"\n3,8,"c"\n'
If we had passed a file name to write_csv
, it would have been
written to that file.
write_parquet(file)
Write dataframe to parquet file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | Path | BytesIO
|
String, path object or file-like object to which the dataframe will be written. |
required |
Returns:
Type | Description |
---|---|
None
|
None. |
Examples:
Construct pandas, Polars and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_write_parquet(df_native: IntoDataFrame):
... df = nw.from_native(df_native)
... df.write_parquet("foo.parquet")
We can then pass either pandas, Polars or PyArrow to agnostic_write_parquet
:
>>> agnostic_write_parquet(df_pd)
>>> agnostic_write_parquet(df_pl)
>>> agnostic_write_parquet(df_pa)