narwhals.DataFrame
Narwhals DataFrame, backed by a native eager dataframe.
Warning
This class is not meant to be instantiated directly - instead:
-
If the native object is a eager dataframe from one of the supported backend (e.g. pandas.DataFrame, polars.DataFrame, pyarrow.Table), you can use
narwhals.from_native
:narwhals.from_native(native_dataframe) narwhals.from_native(native_dataframe, eager_only=True)
-
If the object is a dictionary of column names and generic sequences mapping (e.g.
dict[str, list]
), you can create a DataFrame vianarwhals.from_dict
:narwhals.from_dict( data={"a": [1, 2, 3]}, native_namespace=narwhals.get_native_namespace(another_object), )
columns: list[str]
property
Get column names.
Returns:
Type | Description |
---|---|
list[str]
|
The column names stored in a list. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).columns
['foo', 'bar']
implementation: Implementation
property
Return implementation of native frame.
This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.
Returns:
Type | Description |
---|---|
Implementation
|
Implementation. |
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> df_native = pd.DataFrame({"a": [1, 2, 3]})
>>> df = nw.from_native(df_native)
>>> df.implementation
<Implementation.PANDAS: 1>
>>> df.implementation.is_pandas()
True
>>> df.implementation.is_pandas_like()
True
>>> df.implementation.is_polars()
False
schema: Schema
property
Get an ordered mapping of column names to their data type.
Returns:
Type | Description |
---|---|
Schema
|
A Narwhals Schema object that displays the mapping of column names. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).schema
Schema({'foo': Int64, 'bar': Float64})
shape: tuple[int, int]
property
Get the shape of the DataFrame.
Returns:
Type | Description |
---|---|
tuple[int, int]
|
The shape of the dataframe as a tuple. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2]})
>>> df = nw.from_native(df_native)
>>> df.shape
(2, 1)
__arrow_c_stream__(requested_schema: object | None = None) -> object
Export a DataFrame via the Arrow PyCapsule Interface.
- if the underlying dataframe implements the interface, it'll return that
- else, it'll call
to_arrow
and then defer to PyArrow's implementation
See PyCapsule Interface for more.
__getitem__(item: str | int | slice | Sequence[int] | Sequence[str] | _1DArray | tuple[slice | Sequence[int] | _1DArray, int | str] | tuple[slice | Sequence[int] | _1DArray, slice | Sequence[int] | Sequence[str]]) -> Series[Any] | Self
Extract column or slice of DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
item
|
str | int | slice | Sequence[int] | Sequence[str] | _1DArray | tuple[slice | Sequence[int] | _1DArray, int | str] | tuple[slice | Sequence[int] | _1DArray, slice | Sequence[int] | Sequence[str]]
|
How to slice dataframe. What happens depends on what is passed. It's easiest
to explain by example. Suppose we have a Dataframe
|
required |
Returns:
Type | Description |
---|---|
Series[Any] | Self
|
A Narwhals Series, backed by a native series. |
Notes
- Integers are always interpreted as positions
- Strings are always interpreted as column names.
In contrast with Polars, pandas allows non-string column names.
If you don't know whether the column name you're trying to extract
is definitely a string (e.g. df[df.columns[0]]
) then you should
use DataFrame.get_column
instead.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2]})
>>> df = nw.from_native(df_native)
>>> df["a"].to_native()
0 1
1 2
Name: a, dtype: int64
clone() -> Self
Create a copy of this DataFrame.
Returns:
Type | Description |
---|---|
Self
|
An identical copy of the original dataframe. |
collect_schema() -> Schema
Get an ordered mapping of column names to their data type.
Returns:
Type | Description |
---|---|
Schema
|
A Narwhals Schema object that displays the mapping of column names. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).collect_schema()
Schema({'foo': Int64, 'bar': Float64})
drop(*columns: str | Iterable[str], strict: bool = True) -> Self
Remove columns from the dataframe.
Returns:
Type | Description |
---|---|
Self
|
The dataframe with the specified columns removed. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*columns
|
str | Iterable[str]
|
Names of the columns that should be removed from the dataframe. |
()
|
strict
|
bool
|
Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema. |
True
|
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
... {"foo": [1, 2], "bar": [6.0, 7.0], "ham": ["a", "b"]}
... )
>>> nw.from_native(df_native).drop("ham").to_native()
foo bar
0 1 6.0
1 2 7.0
drop_nulls(subset: str | list[str] | None = None) -> Self
Drop rows that contain null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) for which null values are considered. If set to None (default), use all columns. |
None
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the rows removed that contained the null values. |
Notes
pandas handles null values differently from Polars and PyArrow. See null_handling for reference.
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1.0, None], "ba": [1.0, 2.0]})
>>> nw.from_native(df_native).drop_nulls().to_native()
pyarrow.Table
a: double
ba: double
----
a: [[1]]
ba: [[1]]
estimated_size(unit: SizeUnit = 'b') -> int | float
Return an estimation of the total (heap) allocated size of the DataFrame
.
Estimated size is given in the specified unit (bytes by default).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
unit
|
SizeUnit
|
'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', 'gigabytes', or 'terabytes'. |
'b'
|
Returns:
Type | Description |
---|---|
int | float
|
Integer or Float. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.estimated_size()
32
explode(columns: str | Sequence[str], *more_columns: str) -> Self
Explode the dataframe to long format by exploding the given columns.
Notes
It is possible to explode multiple columns only if these columns must have matching element counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str | Sequence[str]
|
Column names. The underlying columns being exploded must be of the |
required |
*more_columns
|
str
|
Additional names of columns to explode, specified as positional arguments. |
()
|
Returns:
Type | Description |
---|---|
Self
|
New DataFrame |
Examples:
>>> import polars as pl
>>> import narwhals as nw
>>> data = {"a": ["x", "y"], "b": [[1, 2], [3]]}
>>> df_native = pl.DataFrame(data)
>>> nw.from_native(df_native).explode("b").to_native()
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ x ┆ 1 │
│ x ┆ 2 │
│ y ┆ 3 │
└─────┴─────┘
filter(*predicates: IntoExpr | Iterable[IntoExpr] | list[bool], **constraints: Any) -> Self
Filter the rows in the DataFrame based on one or more predicate expressions.
The original order of the remaining rows is preserved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*predicates
|
IntoExpr | Iterable[IntoExpr] | list[bool]
|
Expression(s) that evaluates to a boolean Series. Can also be a (single!) boolean list. |
()
|
**constraints
|
Any
|
Column filters; use |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The filtered dataframe. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
... {"foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"]}
... )
Filter on one condition
>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
foo bar ham
1 2 7 b
2 3 8 c
Filter on multiple conditions with implicit &
>>> nw.from_native(df_native).filter(
... nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
foo bar ham
0 1 6 a
Filter on multiple conditions with |
>>> nw.from_native(df_native).filter(
... (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
foo bar ham
0 1 6 a
2 3 8 c
Filter using **kwargs
syntax
>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
foo bar ham
1 2 7 b
gather_every(n: int, offset: int = 0) -> Self
Take every nth row in the DataFrame and return as a new DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Gather every n-th row. |
required |
offset
|
int
|
Starting index. |
0
|
Returns:
Type | Description |
---|---|
Self
|
The dataframe containing only the selected rows. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, None, 2, 3]})
>>> nw.from_native(df_native).gather_every(2)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| pyarrow.Table |
| foo: int64 |
| ---- |
| foo: [[1,2]] |
└──────────────────┘
get_column(name: str) -> Series[Any]
Get a single column by name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The column name as a string. |
required |
Returns:
Type | Description |
---|---|
Series[Any]
|
A Narwhals Series, backed by a native series. |
Notes
Although name
is typed as str
, pandas does allow non-string column
names, and they will work when passed to this function if the
narwhals.DataFrame
is backed by a pandas dataframe with non-string
columns. This function can only be used to extract a column by name, so
there is no risk of ambiguity.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2]})
>>> df = nw.from_native(df_native)
>>> df.get_column("a").to_native()
0 1
1 2
Name: a, dtype: int64
group_by(*keys: str | Iterable[str], drop_null_keys: bool = False) -> GroupBy[Self]
Start a group by operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*keys
|
str | Iterable[str]
|
Column(s) to group by. Accepts multiple columns names as a list. |
()
|
drop_null_keys
|
bool
|
if True, then groups where any key is null won't be included in the result. |
False
|
Returns:
Name | Type | Description |
---|---|---|
GroupBy |
GroupBy[Self]
|
Object which can be used to perform aggregations. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
... {
... "a": ["a", "b", "a", "b", "c"],
... "b": [1, 2, 1, 3, 3],
... "c": [5, 4, 3, 2, 1],
... }
... )
Group by one column and compute the sum of another column
>>> nw.from_native(df_native, eager_only=True).group_by("a").agg(
... nw.col("b").sum()
... ).sort("a").to_native()
a b
0 a 2
1 b 5
2 c 3
Group by multiple columns and compute the max of another column
>>> (
... nw.from_native(df_native, eager_only=True)
... .group_by(["a", "b"])
... .agg(nw.max("c"))
... .sort("a", "b")
... .to_native()
... )
a b c
0 a 1 5
1 b 2 4
2 b 3 2
3 c 3 1
head(n: int = 5) -> Self
Get the first n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. If a negative value is passed, return all rows
except the last |
5
|
Returns:
Type | Description |
---|---|
Self
|
A subset of the dataframe of shape (n, n_columns). |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "b": [0.5, 4.0]})
>>> nw.from_native(df_native).head(1).to_native()
a b
0 1 0.5
is_duplicated() -> Series[Any]
Get a mask of all duplicated rows in this DataFrame.
Returns:
Type | Description |
---|---|
Series[Any]
|
A new Series. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [2, 2, 2], "bar": [6.0, 6.0, 7.0]})
>>> nw.from_native(df_native).is_duplicated()
┌───────────────┐
|Narwhals Series|
|---------------|
| 0 True |
| 1 True |
| 2 False |
| dtype: bool |
└───────────────┘
is_empty() -> bool
Check if the dataframe is empty.
Returns:
Type | Description |
---|---|
bool
|
A boolean indicating whether the dataframe is empty (True) or not (False). |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [2, 2, 2], "bar": [6.0, 6.0, 7.0]})
>>> nw.from_native(df_native).is_empty()
False
is_unique() -> Series[Any]
Get a mask of all unique rows in this DataFrame.
Returns:
Type | Description |
---|---|
Series[Any]
|
A new Series. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [2, 2, 2], "bar": [6.0, 6.0, 7.0]})
>>> nw.from_native(df_native).is_unique()
┌───────────────┐
|Narwhals Series|
|---------------|
| 0 False |
| 1 False |
| 2 True |
| dtype: bool |
└───────────────┘
item(row: int | None = None, column: int | str | None = None) -> Any
Return the DataFrame as a scalar, or return the element at the given row/column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
int | None
|
The n-th row. |
None
|
column
|
int | str | None
|
The column selected via an integer or a string (column name). |
None
|
Returns:
Type | Description |
---|---|
Any
|
A scalar or the specified element in the dataframe. |
Notes
If row/col not provided, this is equivalent to df[0,0], with a check that the shape is (1,1). With row/col, this is equivalent to df[row,col].
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, None], "bar": [2, 3]})
>>> nw.from_native(df_native).item(0, 1)
2
iter_rows(*, named: bool = False, buffer_size: int = 512) -> Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]
Returns an iterator over the DataFrame of rows of python-native values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
named
|
bool
|
By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead. |
False
|
buffer_size
|
int
|
Determines the number of rows that are buffered internally while iterating over the data. See https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_rows.html |
512
|
Returns:
Type | Description |
---|---|
Iterator[tuple[Any, ...]] | Iterator[dict[str, Any]]
|
An iterator over the DataFrame of rows. |
Notes
cuDF doesn't support this method.
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> iter_rows = nw.from_native(df_native).iter_rows()
>>> next(iter_rows)
(1, 6.0)
>>> next(iter_rows)
(2, 7.0)
join(other: Self, on: str | list[str] | None = None, how: Literal['inner', 'left', 'cross', 'semi', 'anti'] = 'inner', *, left_on: str | list[str] | None = None, right_on: str | list[str] | None = None, suffix: str = '_right') -> Self
Join in SQL-like fashion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
on
|
str | list[str] | None
|
Name(s) of the join columns in both DataFrames. If set, |
None
|
how
|
Literal['inner', 'left', 'cross', 'semi', 'anti']
|
Join strategy.
|
'inner'
|
left_on
|
str | list[str] | None
|
Join column of the left DataFrame. |
None
|
right_on
|
str | list[str] | None
|
Join column of the right DataFrame. |
None
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined DataFrame |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_1_native = pd.DataFrame({"id": ["a", "b"], "price": [6.0, 7.0]})
>>> df_2_native = pd.DataFrame({"id": ["a", "b", "c"], "qty": [1, 2, 3]})
>>> nw.from_native(df_1_native).join(nw.from_native(df_2_native), on="id")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| id price qty |
| 0 a 6.0 1 |
| 1 b 7.0 2 |
└──────────────────┘
join_asof(other: Self, *, left_on: str | None = None, right_on: str | None = None, on: str | None = None, by_left: str | list[str] | None = None, by_right: str | list[str] | None = None, by: str | list[str] | None = None, strategy: Literal['backward', 'forward', 'nearest'] = 'backward', suffix: str = '_right') -> Self
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the asof_join key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
left_on
|
str | None
|
Name(s) of the left join column(s). |
None
|
right_on
|
str | None
|
Name(s) of the right join column(s). |
None
|
on
|
str | None
|
Join column of both DataFrames. If set, left_on and right_on should be None. |
None
|
by_left
|
str | list[str] | None
|
join on these columns before doing asof join. |
None
|
by_right
|
str | list[str] | None
|
join on these columns before doing asof join. |
None
|
by
|
str | list[str] | None
|
join on these columns before doing asof join. |
None
|
strategy
|
Literal['backward', 'forward', 'nearest']
|
Join strategy. The default is "backward". |
'backward'
|
suffix
|
str
|
Suffix to append to columns with a duplicate name.
|
'_right'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined DataFrame |
Examples:
>>> from datetime import datetime
>>> import pandas as pd
>>> import narwhals as nw
>>> data_gdp = {
... "datetime": [
... datetime(2016, 1, 1),
... datetime(2017, 1, 1),
... datetime(2018, 1, 1),
... datetime(2019, 1, 1),
... datetime(2020, 1, 1),
... ],
... "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
... "datetime": [
... datetime(2016, 3, 1),
... datetime(2018, 8, 1),
... datetime(2019, 1, 1),
... ],
... "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pd.DataFrame(data_gdp)
>>> population_native = pd.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward")
┌──────────────────────────────┐
| Narwhals DataFrame |
|------------------------------|
| datetime population gdp|
|0 2016-03-01 82.19 4164|
|1 2018-08-01 82.66 4566|
|2 2019-01-01 83.12 4696|
└──────────────────────────────┘
lazy(backend: ModuleType | Implementation | str | None = None) -> LazyFrame[Any]
Restrict available API methods to lazy-only ones.
If backend
is specified, then a conversion between different backends
might be triggered.
If a library does not support lazy execution and backend
is not specified,
then this is will only restrict the API to lazy-only operations. This is useful
if you want to ensure that you write dataframe-agnostic code which all has
the possibility of running entirely lazily.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backend
|
ModuleType | Implementation | str | None
|
Which lazy backend collect to. This will be the underlying backend for the resulting Narwhals LazyFrame. If not specified, and the given library does not support lazy execution, then this will restrict the API to lazy-only operations.
|
None
|
Returns:
Type | Description |
---|---|
LazyFrame[Any]
|
A new LazyFrame. |
Examples:
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pl.DataFrame({"a": [1, 2], "b": [4, 6]})
>>> df = nw.from_native(df_native)
If we call df.lazy
, we get a narwhals.LazyFrame
backed by a Polars
LazyFrame.
>>> df.lazy()
┌─────────────────────────────┐
| Narwhals LazyFrame |
|-----------------------------|
|<LazyFrame at 0x7F52B9937230>|
└─────────────────────────────┘
We can also pass DuckDB as the backend, and then we'll get a
narwhals.LazyFrame
backed by a duckdb.DuckDBPyRelation
.
>>> df.lazy(backend=nw.Implementation.DUCKDB)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int64 │ int64 │ |
|├───────┼───────┤ |
|│ 1 │ 4 │ |
|│ 2 │ 6 │ |
|└───────┴───────┘ |
└──────────────────┘
null_count() -> Self
Create a new DataFrame that shows the null counts per column.
Returns:
Type | Description |
---|---|
Self
|
A dataframe of shape (1, n_columns). |
Notes
pandas handles null values differently from Polars and PyArrow. See null_handling for reference.
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, None], "bar": [2, 3]})
>>> nw.from_native(df_native).null_count()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| pyarrow.Table |
| foo: int64 |
| bar: int64 |
| ---- |
| foo: [[1]] |
| bar: [[0]] |
└──────────────────┘
pipe(function: Callable[Concatenate[Self, PS], R], *args: PS.args, **kwargs: PS.kwargs) -> R
Pipe function call.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function
|
Callable[Concatenate[Self, PS], R]
|
Function to apply. |
required |
args
|
args
|
Positional arguments to pass to function. |
()
|
kwargs
|
kwargs
|
Keyword arguments to pass to function. |
{}
|
Returns:
Type | Description |
---|---|
R
|
The original object with the function applied. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "ba": [4, 5]})
>>> nw.from_native(df_native).pipe(
... lambda _df: _df.select(
... [x for x in _df.columns if len(x) == 1]
... ).to_native()
... )
a
0 1
1 2
pivot(on: str | list[str], *, index: str | list[str] | None = None, values: str | list[str] | None = None, aggregate_function: Literal['min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'] | None = None, maintain_order: bool | None = None, sort_columns: bool = False, separator: str = '_') -> Self
Create a spreadsheet-style pivot table as a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str]
|
Name of the column(s) whose values will be used as the header of the output DataFrame. |
required |
index
|
str | list[str] | None
|
One or multiple keys to group by. If None, all remaining columns not
specified on |
None
|
values
|
str | list[str] | None
|
One or multiple keys to group by. If None, all remaining columns not
specified on |
None
|
aggregate_function
|
Literal['min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'] | None
|
Choose from:
|
None
|
maintain_order
|
bool | None
|
Has no effect and is kept around only for backwards-compatibility. |
None
|
sort_columns
|
bool
|
Sort the transposed columns by name. Default is by order of discovery. |
False
|
separator
|
str
|
Used as separator/delimiter in generated column names in case of
multiple |
'_'
|
Returns:
Type | Description |
---|---|
Self
|
A new dataframe. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {
... "ix": [1, 1, 2, 2, 1, 2],
... "col": ["a", "a", "a", "a", "b", "b"],
... "foo": [0, 1, 2, 2, 7, 1],
... "bar": [0, 2, 0, 0, 9, 4],
... }
>>> df_native = pd.DataFrame(data)
>>> nw.from_native(df_native).pivot(
... "col", index="ix", aggregate_function="sum"
... )
┌─────────────────────────────────┐
| Narwhals DataFrame |
|---------------------------------|
| ix foo_a foo_b bar_a bar_b|
|0 1 1 7 2 9|
|1 2 4 1 0 4|
└─────────────────────────────────┘
rename(mapping: dict[str, str]) -> Self
Rename column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping
|
dict[str, str]
|
Key value pairs that map from old name to new name. |
required |
Returns:
Type | Description |
---|---|
Self
|
The dataframe with the specified columns renamed. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6, 7]})
>>> nw.from_native(df_native).rename({"foo": "apple"}).to_native()
pyarrow.Table
apple: int64
bar: int64
----
apple: [[1,2]]
bar: [[6,7]]
row(index: int) -> tuple[Any, ...]
Get values at given row.
Warning
You should NEVER use this method to iterate over a DataFrame; if you require row-iteration you should strongly prefer use of iter_rows() instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int
|
Row number. |
required |
Returns:
Type | Description |
---|---|
tuple[Any, ...]
|
A tuple of the values in the selected row. |
Notes
cuDF doesn't support this method.
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1, 2], "b": [4, 5]})
>>> nw.from_native(df_native).row(1)
(<pyarrow.Int64Scalar: 2>, <pyarrow.Int64Scalar: 5>)
rows(*, named: bool = False) -> list[tuple[Any, ...]] | list[dict[str, Any]]
Returns all data in the DataFrame as a list of rows of python-native values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
named
|
bool
|
By default, each row is returned as a tuple of values given in the same order as the frame columns. Setting named=True will return rows of dictionaries instead. |
False
|
Returns:
Type | Description |
---|---|
list[tuple[Any, ...]] | list[dict[str, Any]]
|
The data as a list of rows. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> nw.from_native(df_native).rows()
[(1, 6.0), (2, 7.0)]
sample(n: int | None = None, *, fraction: float | None = None, with_replacement: bool = False, seed: int | None = None) -> Self
Sample from this DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int | None
|
Number of items to return. Cannot be used with fraction. |
None
|
fraction
|
float | None
|
Fraction of items to return. Cannot be used with n. |
None
|
with_replacement
|
bool
|
Allow values to be sampled more than once. |
False
|
seed
|
int | None
|
Seed for the random number generator. If set to None (default), a random seed is generated for each sample operation. |
None
|
Returns:
Type | Description |
---|---|
Self
|
A new dataframe. |
Notes
The results may not be consistent across libraries.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2, 3], "bar": [19, 32, 4]})
>>> nw.from_native(df_native).sample(n=2)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| foo bar |
| 2 3 4 |
| 1 2 32 |
└──────────────────┘
select(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr) -> Self
Select columns from this DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The dataframe containing only the selected columns. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1, 2], "b": [3, 4]})
>>> nw.from_native(df_native).select("a", a_plus_1=nw.col("a") + 1)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|pyarrow.Table |
|a: int64 |
|a_plus_1: int64 |
|---- |
|a: [[1,2]] |
|a_plus_1: [[2,3]] |
└──────────────────┘
sort(by: str | Iterable[str], *more_by: str, descending: bool | Sequence[bool] = False, nulls_last: bool = False) -> Self
Sort the dataframe by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by
|
str | Iterable[str]
|
Column(s) names to sort by. |
required |
*more_by
|
str
|
Additional columns to sort by, specified as positional arguments. |
()
|
descending
|
bool | Sequence[bool]
|
Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. |
False
|
nulls_last
|
bool
|
Place null values last. |
False
|
Returns:
Type | Description |
---|---|
Self
|
The sorted dataframe. |
Note
Unlike Polars, it is not possible to specify a sequence of booleans for
nulls_last
in order to control per-column behaviour. Instead a single
boolean is applied for all by
columns.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
... {"foo": [2, 1], "bar": [6.0, 7.0], "ham": ["a", "b"]}
... )
>>> nw.from_native(df_native).sort("foo")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| foo bar ham |
| 1 1 7.0 b |
| 0 2 6.0 a |
└──────────────────┘
tail(n: int = 5) -> Self
Get the last n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. If a negative value is passed, return all rows
except the first |
5
|
Returns:
Type | Description |
---|---|
Self
|
A subset of the dataframe of shape (n, n_columns). |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "b": [0.5, 4.0]})
>>> nw.from_native(df_native).tail(1)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| a b |
| 1 2 4.0 |
└──────────────────┘
to_arrow() -> pa.Table
Convert to arrow table.
Returns:
Type | Description |
---|---|
Table
|
A new PyArrow table. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, None], "bar": [2, 3]})
>>> nw.from_native(df_native).to_arrow()
pyarrow.Table
foo: double
bar: int64
----
foo: [[1,null]]
bar: [[2,3]]
to_dict(*, as_series: bool = True) -> dict[str, Series[Any]] | dict[str, list[Any]]
Convert DataFrame to a dictionary mapping column name to values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
as_series
|
bool
|
If set to true |
True
|
Returns:
Type | Description |
---|---|
dict[str, Series[Any]] | dict[str, list[Any]]
|
A mapping from column name to values / Series. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"A": [1, 2], "fruits": ["banana", "apple"]})
>>> df = nw.from_native(df_native)
>>> df.to_dict(as_series=False)
{'A': [1, 2], 'fruits': ['banana', 'apple']}
to_native() -> DataFrameT
Convert Narwhals DataFrame to native one.
Returns:
Type | Description |
---|---|
DataFrameT
|
Object of class that user started with. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Calling to_native
on a Narwhals DataFrame returns the native object:
>>> nw.from_native(df_pd).to_native()
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> nw.from_native(df_pl).to_native()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6.0 ┆ a │
│ 2 ┆ 7.0 ┆ b │
│ 3 ┆ 8.0 ┆ c │
└─────┴─────┴─────┘
>>> nw.from_native(df_pa).to_native()
pyarrow.Table
foo: int64
bar: double
ham: string
----
foo: [[1,2,3]]
bar: [[6,7,8]]
ham: [["a","b","c"]]
to_numpy() -> _2DArray
Convert this DataFrame to a NumPy ndarray.
Returns:
Type | Description |
---|---|
_2DArray
|
A NumPy ndarray array. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"foo": [1, 2], "bar": [6.5, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.to_numpy()
array([[1. , 6.5],
[2. , 7. ]])
to_pandas() -> pd.DataFrame
Convert this DataFrame to a pandas DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame. |
Examples:
Construct pandas, Polars (eager) and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_to_pandas(df_native: IntoDataFrame) -> pd.DataFrame:
... df = nw.from_native(df_native)
... return df.to_pandas()
We can then pass any supported library such as pandas, Polars (eager), or
PyArrow to agnostic_to_pandas
:
>>> agnostic_to_pandas(df_pd)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> agnostic_to_pandas(df_pl)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
>>> agnostic_to_pandas(df_pa)
foo bar ham
0 1 6.0 a
1 2 7.0 b
2 3 8.0 c
to_polars() -> pl.DataFrame
Convert this DataFrame to a polars DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
A polars DataFrame. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.to_polars()
shape: (2, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 6.0 │
│ 2 ┆ 7.0 │
└─────┴─────┘
unique(subset: str | list[str] | None = None, *, keep: Literal['any', 'first', 'last', 'none'] = 'any', maintain_order: bool = False) -> Self
Drop duplicate rows from this dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) to consider when identifying duplicate rows. |
None
|
keep
|
Literal['any', 'first', 'last', 'none']
|
{'first', 'last', 'any', 'none'} Which of the duplicate rows to keep.
|
'any'
|
maintain_order
|
bool
|
Keep the same order as the original DataFrame. This may be more expensive to compute. |
False
|
Returns:
Type | Description |
---|---|
Self
|
The dataframe with the duplicate rows removed. |
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame(
... {"foo": [1, 2], "bar": ["a", "a"], "ham": ["b", "b"]}
... )
>>> nw.from_native(df_native).unique(["bar", "ham"]).to_native()
foo bar ham
0 1 a b
unpivot(on: str | list[str] | None = None, *, index: str | list[str] | None = None, variable_name: str = 'variable', value_name: str = 'value') -> Self
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str] | None
|
Column(s) to use as values variables; if |
None
|
index
|
str | list[str] | None
|
Column(s) to use as identifier variables. |
None
|
variable_name
|
str
|
Name to give to the |
'variable'
|
value_name
|
str
|
Name to give to the |
'value'
|
Returns:
Type | Description |
---|---|
Self
|
The unpivoted dataframe. |
Notes
If you're coming from pandas, this is similar to pandas.DataFrame.melt
,
but with index
replacing id_vars
and on
replacing value_vars
.
In other frameworks, you might know this operation as pivot_longer
.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> data = {
... "a": ["x", "y", "z"],
... "b": [1, 3, 5],
... "c": [2, 4, 6],
... }
>>> df_native = pd.DataFrame(data)
>>> nw.from_native(df_native).unpivot(["b", "c"], index="a")
┌────────────────────┐
| Narwhals DataFrame |
|--------------------|
| a variable value|
|0 x b 1|
|1 y b 3|
|2 z b 5|
|3 x c 2|
|4 y c 4|
|5 z c 6|
└────────────────────┘
with_columns(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr) -> Self
Add columns to this DataFrame.
Added columns will replace existing columns with the same name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
Self
|
A new DataFrame with the columns added. |
Note
Creating a new DataFrame using this method does not create a new copy of existing data.
Examples:
>>> import pandas as pd
>>> import narwhals as nw
>>> df_native = pd.DataFrame({"a": [1, 2], "b": [0.5, 4.0]})
>>> (
... nw.from_native(df_native)
... .with_columns((nw.col("a") * 2).alias("a*2"))
... .to_native()
... )
a b a*2
0 1 0.5 2
1 2 4.0 4
with_row_index(name: str = 'index') -> Self
Insert column which enumerates rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the column as a string. The default is "index". |
'index'
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the column added. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"a": [1, 2], "b": [4, 5]})
>>> nw.from_native(df_native).with_row_index().to_native()
pyarrow.Table
index: int64
a: int64
b: int64
----
index: [[0,1]]
a: [[1,2]]
b: [[4,5]]
write_csv(file: str | Path | BytesIO | None = None) -> str | None
Write dataframe to comma-separated values (CSV) file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | Path | BytesIO | None
|
String, path object or file-like object to which the dataframe will be written. If None, the resulting csv format is returned as a string. |
None
|
Returns:
Type | Description |
---|---|
str | None
|
String or None. |
Examples:
Construct pandas, Polars (eager) and PyArrow DataFrames:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> data = {"foo": [1, 2, 3], "bar": [6.0, 7.0, 8.0], "ham": ["a", "b", "c"]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
We define a library agnostic function:
>>> def agnostic_write_csv(df_native: IntoDataFrame) -> str:
... df = nw.from_native(df_native)
... return df.write_csv()
We can pass any supported library such as pandas, Polars or PyArrow to agnostic_write_csv
:
>>> agnostic_write_csv(df_pd)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> agnostic_write_csv(df_pl)
'foo,bar,ham\n1,6.0,a\n2,7.0,b\n3,8.0,c\n'
>>> agnostic_write_csv(df_pa)
'"foo","bar","ham"\n1,6,"a"\n2,7,"b"\n3,8,"c"\n'
If we had passed a file name to write_csv
, it would have been
written to that file.
write_parquet(file: str | Path | BytesIO) -> None
Write dataframe to parquet file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | Path | BytesIO
|
String, path object or file-like object to which the dataframe will be written. |
required |
Returns:
Type | Description |
---|---|
None
|
None. |
Examples:
>>> import pyarrow as pa
>>> import narwhals as nw
>>> df_native = pa.table({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.write_parquet("out.parquet")