Top-level functions
Here are the top-level functions available in Narwhals.
all() -> Expr
Instantiate an expression representing all columns.
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_all(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.all() * 2).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_all
:
>>> agnostic_all(df_pd)
a b
0 2 8
1 4 10
2 6 12
>>> agnostic_all(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2 ┆ 8 │
│ 4 ┆ 10 │
│ 6 ┆ 12 │
└─────┴─────┘
>>> agnostic_all(df_pa)
pyarrow.Table
a: int64
b: int64
----
a: [[2,4,6]]
b: [[8,10,12]]
all_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr
Compute the bitwise AND horizontally across columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Name(s) of the columns to use in the aggregation function. Accepts expression input. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
... "a": [False, False, True, True, False, None],
... "b": [False, True, True, None, None, None],
... }
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data).convert_dtypes(dtype_backend="pyarrow")
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_all_horizontal(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select("a", "b", all=nw.all_horizontal("a", "b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_all_horizontal
:
>>> agnostic_all_horizontal(df_pd)
a b all
0 False False False
1 False True False
2 True True True
3 True <NA> <NA>
4 False <NA> False
5 <NA> <NA> <NA>
>>> agnostic_all_horizontal(df_pl)
shape: (6, 3)
┌───────┬───────┬───────┐
│ a ┆ b ┆ all │
│ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool │
╞═══════╪═══════╪═══════╡
│ false ┆ false ┆ false │
│ false ┆ true ┆ false │
│ true ┆ true ┆ true │
│ true ┆ null ┆ null │
│ false ┆ null ┆ false │
│ null ┆ null ┆ null │
└───────┴───────┴───────┘
>>> agnostic_all_horizontal(df_pa)
pyarrow.Table
a: bool
b: bool
all: bool
----
a: [[false,false,true,true,false,null]]
b: [[false,true,true,null,null,null]]
all: [[false,false,true,null,false,null]]
any_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr
Compute the bitwise OR horizontally across columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Name(s) of the columns to use in the aggregation function. Accepts expression input. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
... "a": [False, False, True, True, False, None],
... "b": [False, True, True, None, None, None],
... }
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data).convert_dtypes(dtype_backend="pyarrow")
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_any_horizontal(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select("a", "b", any=nw.any_horizontal("a", "b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_any_horizontal
:
>>> agnostic_any_horizontal(df_pd)
a b any
0 False False False
1 False True True
2 True True True
3 True <NA> True
4 False <NA> <NA>
5 <NA> <NA> <NA>
>>> agnostic_any_horizontal(df_pl)
shape: (6, 3)
┌───────┬───────┬───────┐
│ a ┆ b ┆ any │
│ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool │
╞═══════╪═══════╪═══════╡
│ false ┆ false ┆ false │
│ false ┆ true ┆ true │
│ true ┆ true ┆ true │
│ true ┆ null ┆ true │
│ false ┆ null ┆ null │
│ null ┆ null ┆ null │
└───────┴───────┴───────┘
>>> agnostic_any_horizontal(df_pa)
pyarrow.Table
a: bool
b: bool
any: bool
----
a: [[false,false,true,true,false,null]]
b: [[false,true,true,null,null,null]]
any: [[false,true,true,true,null,null]]
col(*names: str | Iterable[str]) -> Expr
Creates an expression that references one or more columns by their name(s).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
names
|
str | Iterable[str]
|
Name(s) of the columns to use. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_col(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.col("a") * nw.col("b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_col
:
>>> agnostic_col(df_pd)
a
0 3
1 8
>>> agnostic_col(df_pl)
shape: (2, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 3 │
│ 8 │
└─────┘
>>> agnostic_col(df_pa)
pyarrow.Table
a: int64
----
a: [[3,8]]
concat(items: Iterable[DataFrame[IntoDataFrameT] | LazyFrame[IntoFrameT]], *, how: Literal['horizontal', 'vertical', 'diagonal'] = 'vertical') -> DataFrame[IntoDataFrameT] | LazyFrame[IntoFrameT]
Concatenate multiple DataFrames, LazyFrames into a single entity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
items
|
Iterable[DataFrame[IntoDataFrameT] | LazyFrame[IntoFrameT]]
|
DataFrames, LazyFrames to concatenate. |
required |
how
|
Literal['horizontal', 'vertical', 'diagonal']
|
concatenating strategy:
|
'vertical'
|
Returns:
Type | Description |
---|---|
DataFrame[IntoDataFrameT] | LazyFrame[IntoFrameT]
|
A new DataFrame, Lazyframe resulting from the concatenation. |
Raises:
Type | Description |
---|---|
TypeError
|
The items to concatenate should either all be eager, or all lazy |
Examples:
Let's take an example of vertical concatenation:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data_1 = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> data_2 = {"a": [5, 2], "b": [1, 4]}
>>> df_pd_1 = pd.DataFrame(data_1)
>>> df_pd_2 = pd.DataFrame(data_2)
>>> df_pl_1 = pl.DataFrame(data_1)
>>> df_pl_2 = pl.DataFrame(data_2)
Let's define a dataframe-agnostic function:
>>> @nw.narwhalify
... def agnostic_vertical_concat(df1, df2):
... return nw.concat([df1, df2], how="vertical")
>>> agnostic_vertical_concat(df_pd_1, df_pd_2)
a b
0 1 4
1 2 5
2 3 6
0 5 1
1 2 4
>>> agnostic_vertical_concat(df_pl_1, df_pl_2)
shape: (5, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
│ 5 ┆ 1 │
│ 2 ┆ 4 │
└─────┴─────┘
Let's look at case a for horizontal concatenation:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data_1 = {"a": [1, 2, 3], "b": [4, 5, 6]}
>>> data_2 = {"c": [5, 2], "d": [1, 4]}
>>> df_pd_1 = pd.DataFrame(data_1)
>>> df_pd_2 = pd.DataFrame(data_2)
>>> df_pl_1 = pl.DataFrame(data_1)
>>> df_pl_2 = pl.DataFrame(data_2)
Defining a dataframe-agnostic function:
>>> @nw.narwhalify
... def agnostic_horizontal_concat(df1, df2):
... return nw.concat([df1, df2], how="horizontal")
>>> agnostic_horizontal_concat(df_pd_1, df_pd_2)
a b c d
0 1 4 5.0 1.0
1 2 5 2.0 4.0
2 3 6 NaN NaN
>>> agnostic_horizontal_concat(df_pl_1, df_pl_2)
shape: (3, 4)
┌─────┬─────┬──────┬──────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪══════╪══════╡
│ 1 ┆ 4 ┆ 5 ┆ 1 │
│ 2 ┆ 5 ┆ 2 ┆ 4 │
│ 3 ┆ 6 ┆ null ┆ null │
└─────┴─────┴──────┴──────┘
Let's look at case a for diagonal concatenation:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> data_1 = {"a": [1, 2], "b": [3.5, 4.5]}
>>> data_2 = {"a": [3, 4], "z": ["x", "y"]}
>>> df_pd_1 = pd.DataFrame(data_1)
>>> df_pd_2 = pd.DataFrame(data_2)
>>> df_pl_1 = pl.DataFrame(data_1)
>>> df_pl_2 = pl.DataFrame(data_2)
Defining a dataframe-agnostic function:
>>> @nw.narwhalify
... def agnostic_diagonal_concat(df1, df2):
... return nw.concat([df1, df2], how="diagonal")
>>> agnostic_diagonal_concat(df_pd_1, df_pd_2)
a b z
0 1 3.5 NaN
1 2 4.5 NaN
0 3 NaN x
1 4 NaN y
>>> agnostic_diagonal_concat(df_pl_1, df_pl_2)
shape: (4, 3)
┌─────┬──────┬──────┐
│ a ┆ b ┆ z │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪══════╪══════╡
│ 1 ┆ 3.5 ┆ null │
│ 2 ┆ 4.5 ┆ null │
│ 3 ┆ null ┆ x │
│ 4 ┆ null ┆ y │
└─────┴──────┴──────┘
concat_str(exprs: IntoExpr | Iterable[IntoExpr], *more_exprs: IntoExpr, separator: str = '', ignore_nulls: bool = False) -> Expr
Horizontally concatenate columns into a single string column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Columns to concatenate into a single string column. Accepts expression
input. Strings are parsed as column names, other non-expression inputs are
parsed as literals. Non- |
required |
*more_exprs
|
IntoExpr
|
Additional columns to concatenate into a single string column, specified as positional arguments. |
()
|
separator
|
str
|
String that will be used to separate the values of each column. |
''
|
ignore_nulls
|
bool
|
Ignore null values (default is |
False
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
... "a": [1, 2, 3],
... "b": ["dogs", "cats", None],
... "c": ["play", "swim", "walk"],
... }
We define a dataframe-agnostic function that computes the horizontal string concatenation of different columns
>>> def agnostic_concat_str(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(
... nw.concat_str(
... [
... nw.col("a") * 2,
... nw.col("b"),
... nw.col("c"),
... ],
... separator=" ",
... ).alias("full_sentence")
... ).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow
to agnostic_concat_str
:
>>> agnostic_concat_str(pd.DataFrame(data))
full_sentence
0 2 dogs play
1 4 cats swim
2 None
>>> agnostic_concat_str(pl.DataFrame(data))
shape: (3, 1)
┌───────────────┐
│ full_sentence │
│ --- │
│ str │
╞═══════════════╡
│ 2 dogs play │
│ 4 cats swim │
│ null │
└───────────────┘
>>> agnostic_concat_str(pa.table(data))
pyarrow.Table
full_sentence: string
----
full_sentence: [["2 dogs play","4 cats swim",null]]
from_arrow(native_frame: ArrowStreamExportable, *, native_namespace: ModuleType) -> DataFrame[Any]
Construct a DataFrame from an object which supports the PyCapsule Interface.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
native_frame
|
ArrowStreamExportable
|
Object which implements |
required |
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
A new DataFrame. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
Let's define a dataframe-agnostic function which creates a PyArrow Table.
>>> def agnostic_to_arrow(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return nw.from_arrow(df, native_namespace=pa).to_native()
Let's see what happens when passing pandas / Polars input:
>>> agnostic_to_arrow(pd.DataFrame(data))
pyarrow.Table
a: int64
b: int64
----
a: [[1,2,3]]
b: [[4,5,6]]
>>> agnostic_to_arrow(pl.DataFrame(data))
pyarrow.Table
a: int64
b: int64
----
a: [[1,2,3]]
b: [[4,5,6]]
from_dict(data: dict[str, Any], schema: dict[str, DType] | Schema | None = None, *, backend: ModuleType | Implementation | str | None = None, native_namespace: ModuleType | None = None) -> DataFrame[Any]
Instantiate DataFrame from dictionary.
Indexes (if present, for pandas-like backends) are aligned following the left-hand-rule.
Notes
For pandas-like dataframes, conversion to schema is applied after dataframe creation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict[str, Any]
|
Dictionary to create DataFrame from. |
required |
schema
|
dict[str, DType] | Schema | None
|
The DataFrame schema as Schema or dict of {name: type}. |
None
|
backend
|
ModuleType | Implementation | str | None
|
specifies which eager backend instantiate to. Only necessary if inputs are not Narwhals Series.
|
None
|
native_namespace
|
ModuleType | None
|
The native library to use for DataFrame creation. Deprecated (v1.26.0):
Please use |
None
|
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
A new DataFrame. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
Let's create a new dataframe and specify the backend argument.
>>> def agnostic_from_dict(backend: str) -> IntoFrameT:
... data = {"c": [5, 2], "d": [1, 4]}
... return nw.from_dict(data, backend=backend).to_native()
Let's see what happens when passing pandas, Polars or PyArrow input:
>>> agnostic_from_dict(backend="pandas")
c d
0 5 1
1 2 4
>>> agnostic_from_dict(backend="polars")
shape: (2, 2)
┌─────┬─────┐
│ c ┆ d │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 5 ┆ 1 │
│ 2 ┆ 4 │
└─────┴─────┘
>>> agnostic_from_dict(backend="pyarrow")
pyarrow.Table
c: int64
d: int64
----
c: [[5,2]]
d: [[1,4]]
from_native(native_object: IntoFrameT | IntoSeriesT | IntoFrame | IntoSeries | T, *, strict: bool | None = None, pass_through: bool | None = None, eager_only: bool = False, series_only: bool = False, allow_series: bool | None = None) -> LazyFrame[IntoFrameT] | DataFrame[IntoFrameT] | Series[IntoSeriesT] | T
Convert native_object
to Narwhals Dataframe, Lazyframe, or Series.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
native_object
|
IntoFrameT | IntoSeriesT | IntoFrame | IntoSeries | T
|
Raw object from user. Depending on the other arguments, input object can be:
|
required |
strict
|
bool | None
|
Determine what happens if the object can't be converted to Narwhals:
Deprecated (v1.13.0):
Please use |
None
|
pass_through
|
bool | None
|
Determine what happens if the object can't be converted to Narwhals:
|
None
|
eager_only
|
bool
|
Whether to only allow eager objects:
|
False
|
series_only
|
bool
|
Whether to only allow Series:
|
False
|
allow_series
|
bool | None
|
Whether to allow Series (default is only Dataframe / Lazyframe):
|
None
|
Returns:
Type | Description |
---|---|
LazyFrame[IntoFrameT] | DataFrame[IntoFrameT] | Series[IntoSeriesT] | T
|
DataFrame, LazyFrame, Series, or original object, depending on which combination of parameters was passed. |
from_numpy(data: np.ndarray, schema: dict[str, DType] | Schema | list[str] | None = None, *, native_namespace: ModuleType) -> DataFrame[Any]
Construct a DataFrame from a NumPy ndarray.
Notes
Only row orientation is currently supported.
For pandas-like dataframes, conversion to schema is applied after dataframe creation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
ndarray
|
Two-dimensional data represented as a NumPy ndarray. |
required |
schema
|
dict[str, DType] | Schema | list[str] | None
|
The DataFrame schema as Schema, dict of {name: type}, or a list of str. |
None
|
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
A new DataFrame. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> import numpy as np
>>> from narwhals.typing import IntoFrameT
>>> data = {"a": [1, 2], "b": [3, 4]}
Let's create a new dataframe of the same class as the dataframe we started with, from a NumPy ndarray of new data:
>>> def agnostic_from_numpy(df_native: IntoFrameT) -> IntoFrameT:
... new_data = np.array([[5, 2, 1], [1, 4, 3]])
... df = nw.from_native(df_native)
... native_namespace = nw.get_native_namespace(df)
... return nw.from_numpy(
... new_data, native_namespace=native_namespace
... ).to_native()
Let's see what happens when passing pandas, Polars or PyArrow input:
>>> agnostic_from_numpy(pd.DataFrame(data))
column_0 column_1 column_2
0 5 2 1
1 1 4 3
>>> agnostic_from_numpy(pl.DataFrame(data))
shape: (2, 3)
┌──────────┬──────────┬──────────┐
│ column_0 ┆ column_1 ┆ column_2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════════╪══════════╪══════════╡
│ 5 ┆ 2 ┆ 1 │
│ 1 ┆ 4 ┆ 3 │
└──────────┴──────────┴──────────┘
>>> agnostic_from_numpy(pa.table(data))
pyarrow.Table
column_0: int64
column_1: int64
column_2: int64
----
column_0: [[5,1]]
column_1: [[2,4]]
column_2: [[1,3]]
Let's specify the column names:
>>> def agnostic_from_numpy(df_native: IntoFrameT) -> IntoFrameT:
... new_data = np.array([[5, 2, 1], [1, 4, 3]])
... schema = ["c", "d", "e"]
... df = nw.from_native(df_native)
... native_namespace = nw.get_native_namespace(df)
... return nw.from_numpy(
... new_data, native_namespace=native_namespace, schema=schema
... ).to_native()
Let's see the modified outputs:
>>> agnostic_from_numpy(pd.DataFrame(data))
c d e
0 5 2 1
1 1 4 3
>>> agnostic_from_numpy(pl.DataFrame(data))
shape: (2, 3)
┌─────┬─────┬─────┐
│ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 5 ┆ 2 ┆ 1 │
│ 1 ┆ 4 ┆ 3 │
└─────┴─────┴─────┘
>>> agnostic_from_numpy(pa.table(data))
pyarrow.Table
c: int64
d: int64
e: int64
----
c: [[5,1]]
d: [[2,4]]
e: [[1,3]]
Let's modify the function so that it specifies the schema:
>>> def agnostic_from_numpy(df_native: IntoFrameT) -> IntoFrameT:
... new_data = np.array([[5, 2, 1], [1, 4, 3]])
... schema = {"c": nw.Int16(), "d": nw.Float32(), "e": nw.Int8()}
... df = nw.from_native(df_native)
... native_namespace = nw.get_native_namespace(df)
... return nw.from_numpy(
... new_data, native_namespace=native_namespace, schema=schema
... ).to_native()
Let's see the outputs:
>>> agnostic_from_numpy(pd.DataFrame(data))
c d e
0 5 2.0 1
1 1 4.0 3
>>> agnostic_from_numpy(pl.DataFrame(data))
shape: (2, 3)
┌─────┬─────┬─────┐
│ c ┆ d ┆ e │
│ --- ┆ --- ┆ --- │
│ i16 ┆ f32 ┆ i8 │
╞═════╪═════╪═════╡
│ 5 ┆ 2.0 ┆ 1 │
│ 1 ┆ 4.0 ┆ 3 │
└─────┴─────┴─────┘
>>> agnostic_from_numpy(pa.table(data))
pyarrow.Table
c: int16
d: float
e: int8
----
c: [[5,1]]
d: [[2,4]]
e: [[1,3]]
generate_temporary_column_name(n_bytes: int, columns: list[str]) -> str
Generates a unique column name that is not present in the given list of columns.
It relies on python secrets token_hex function to return a string nbytes random bytes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_bytes
|
int
|
The number of bytes to generate for the token. |
required |
columns
|
list[str]
|
The list of columns to check for uniqueness. |
required |
Returns:
Type | Description |
---|---|
str
|
A unique token that is not present in the given list of columns. |
Raises:
Type | Description |
---|---|
AssertionError
|
If a unique token cannot be generated after 100 attempts. |
Examples:
>>> import narwhals as nw
>>> columns = ["abc", "xyz"]
>>> nw.generate_temporary_column_name(n_bytes=8, columns=columns) not in columns
True
get_level(obj: DataFrame[Any] | LazyFrame[Any] | Series[IntoSeriesT]) -> Literal['full', 'lazy', 'interchange']
Level of support Narwhals has for current object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
DataFrame[Any] | LazyFrame[Any] | Series[IntoSeriesT]
|
Dataframe or Series. |
required |
Returns:
Type | Description |
---|---|
Literal['full', 'lazy', 'interchange']
|
This can be one of:
|
get_native_namespace(obj: DataFrame[Any] | LazyFrame[Any] | Series[Any] | pd.DataFrame | pd.Series | pl.DataFrame | pl.LazyFrame | pl.Series | pa.Table | pa.ChunkedArray) -> Any
Get native namespace from object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
DataFrame[Any] | LazyFrame[Any] | Series[Any] | DataFrame | Series | DataFrame | LazyFrame | Series | Table | ChunkedArray
|
Dataframe, Lazyframe, or Series. |
required |
Returns:
Type | Description |
---|---|
Any
|
Native module. |
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import narwhals as nw
>>> df = nw.from_native(pd.DataFrame({"a": [1, 2, 3]}))
>>> nw.get_native_namespace(df)
<module 'pandas'...>
>>> df = nw.from_native(pl.DataFrame({"a": [1, 2, 3]}))
>>> nw.get_native_namespace(df)
<module 'polars'...>
is_ordered_categorical(series: Series[Any]) -> bool
Return whether indices of categories are semantically meaningful.
This is a convenience function to accessing what would otherwise be
the is_ordered
property from the DataFrame Interchange Protocol,
see https://data-apis.org/dataframe-protocol/latest/API.html.
- For Polars:
- Enums are always ordered.
- Categoricals are ordered if
dtype.ordering == "physical"
. - For pandas-like APIs:
- Categoricals are ordered if
dtype.cat.ordered == True
. - For PyArrow table:
- Categoricals are ordered if
dtype.type.ordered == True
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
series
|
Series[Any]
|
Input Series. |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether the Series is an ordered categorical. |
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> import polars as pl
>>> data = ["x", "y"]
>>> s_pd = pd.Series(data, dtype=pd.CategoricalDtype(ordered=True))
>>> s_pl = pl.Series(data, dtype=pl.Categorical(ordering="physical"))
Let's define a library-agnostic function:
>>> @nw.narwhalify
... def func(s):
... return nw.is_ordered_categorical(s)
Then, we can pass any supported library to func
:
>>> func(s_pd)
True
>>> func(s_pl)
True
len() -> Expr
Return the number of rows.
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2], "b": [5, 10]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_len(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.len()).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_len
:
>>> agnostic_len(df_pd)
len
0 2
>>> agnostic_len(df_pl)
shape: (1, 1)
┌─────┐
│ len │
│ --- │
│ u32 │
╞═════╡
│ 2 │
└─────┘
>>> agnostic_len(df_pa)
pyarrow.Table
len: int64
----
len: [[2]]
lit(value: Any, dtype: DType | type[DType] | None = None) -> Expr
Return an expression representing a literal value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value
|
Any
|
The value to use as literal. |
required |
dtype
|
DType | type[DType] | None
|
The data type of the literal value. If not provided, the data type will be inferred. |
None
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_lit(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.with_columns(nw.lit(3)).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_lit
:
>>> agnostic_lit(df_pd)
a literal
0 1 3
1 2 3
>>> agnostic_lit(df_pl)
shape: (2, 2)
┌─────┬─────────┐
│ a ┆ literal │
│ --- ┆ --- │
│ i64 ┆ i32 │
╞═════╪═════════╡
│ 1 ┆ 3 │
│ 2 ┆ 3 │
└─────┴─────────┘
>>> agnostic_lit(df_pa)
pyarrow.Table
a: int64
literal: int64
----
a: [[1,2]]
literal: [[3,3]]
max(*columns: str) -> Expr
Return the maximum value.
Note
Syntactic sugar for nw.col(columns).max()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str
|
Name(s) of the columns to use in the aggregation function. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2], "b": [5, 10]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_max(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.max("a")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_max
:
>>> agnostic_max(df_pd)
a
0 2
>>> agnostic_max(df_pl)
shape: (1, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 2 │
└─────┘
>>> agnostic_max(df_pa)
pyarrow.Table
a: int64
----
a: [[2]]
max_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr
Get the maximum value horizontally across columns.
Notes
We support max_horizontal
over numeric columns only.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Name(s) of the columns to use in the aggregation function. Accepts expression input. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
... "a": [1, 8, 3],
... "b": [4, 5, None],
... "c": ["x", "y", "z"],
... }
We define a dataframe-agnostic function that computes the horizontal max of "a" and "b" columns:
>>> def agnostic_max_horizontal(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.max_horizontal("a", "b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_max_horizontal
:
>>> agnostic_max_horizontal(pd.DataFrame(data))
a
0 4.0
1 8.0
2 3.0
>>> agnostic_max_horizontal(pl.DataFrame(data))
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 4 │
│ 8 │
│ 3 │
└─────┘
>>> agnostic_max_horizontal(pa.table(data))
pyarrow.Table
a: int64
----
a: [[4,8,3]]
maybe_align_index(lhs: FrameOrSeriesT, rhs: Series[Any] | DataFrame[Any] | LazyFrame[Any]) -> FrameOrSeriesT
Align lhs
to the Index of rhs
, if they're both pandas-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lhs
|
FrameOrSeriesT
|
Dataframe or Series. |
required |
rhs
|
Series[Any] | DataFrame[Any] | LazyFrame[Any]
|
Dataframe or Series to align with. |
required |
Returns:
Type | Description |
---|---|
FrameOrSeriesT
|
Same type as input. |
Notes
This is only really intended for backwards-compatibility purposes,
for example if your library already aligns indices for users.
If you're designing a new library, we highly encourage you to not
rely on the Index.
For non-pandas-like inputs, this only checks that lhs
and rhs
are the same length.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df_pd = pd.DataFrame({"a": [1, 2]}, index=[3, 4])
>>> s_pd = pd.Series([6, 7], index=[4, 3])
>>> df = nw.from_native(df_pd)
>>> s = nw.from_native(s_pd, series_only=True)
>>> nw.to_native(nw.maybe_align_index(df, s))
a
4 2
3 1
maybe_convert_dtypes(obj: FrameOrSeriesT, *args: bool, **kwargs: bool | str) -> FrameOrSeriesT
Convert columns or series to the best possible dtypes using dtypes supporting pd.NA
, if df is pandas-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
FrameOrSeriesT
|
DataFrame or Series. |
required |
*args
|
bool
|
Additional arguments which gets passed through. |
()
|
**kwargs
|
bool | str
|
Additional arguments which gets passed through. |
{}
|
Returns:
Type | Description |
---|---|
FrameOrSeriesT
|
Same type as input. |
Notes
For non-pandas-like inputs, this is a no-op.
Also, args
and kwargs
just get passed down to the underlying library as-is.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> import numpy as np
>>> df_pd = pd.DataFrame(
... {
... "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
... "b": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
... }
... )
>>> df = nw.from_native(df_pd)
>>> nw.to_native(
... nw.maybe_convert_dtypes(df)
... ).dtypes
a Int32
b boolean
dtype: object
maybe_get_index(obj: DataFrame[Any] | LazyFrame[Any] | Series[Any]) -> Any | None
Get the index of a DataFrame or a Series, if it's pandas-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
DataFrame[Any] | LazyFrame[Any] | Series[Any]
|
Dataframe or Series. |
required |
Returns:
Type | Description |
---|---|
Any | None
|
Same type as input. |
Notes
This is only really intended for backwards-compatibility purposes,
for example if your library already aligns indices for users.
If you're designing a new library, we highly encourage you to not
rely on the Index.
For non-pandas-like inputs, this returns None
.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df_pd = pd.DataFrame({"a": [1, 2], "b": [4, 5]})
>>> df = nw.from_native(df_pd)
>>> nw.maybe_get_index(df)
RangeIndex(start=0, stop=2, step=1)
>>> series_pd = pd.Series([1, 2])
>>> series = nw.from_native(series_pd, series_only=True)
>>> nw.maybe_get_index(series)
RangeIndex(start=0, stop=2, step=1)
maybe_reset_index(obj: FrameOrSeriesT) -> FrameOrSeriesT
Reset the index to the default integer index of a DataFrame or a Series, if it's pandas-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
FrameOrSeriesT
|
Dataframe or Series. |
required |
Returns:
Type | Description |
---|---|
FrameOrSeriesT
|
Same type as input. |
Notes
This is only really intended for backwards-compatibility purposes, for example if your library already resets the index for users. If you're designing a new library, we highly encourage you to not rely on the Index. For non-pandas-like inputs, this is a no-op.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df_pd = pd.DataFrame({"a": [1, 2], "b": [4, 5]}, index=([6, 7]))
>>> df = nw.from_native(df_pd)
>>> nw.to_native(nw.maybe_reset_index(df))
a b
0 1 4
1 2 5
>>> series_pd = pd.Series([1, 2])
>>> series = nw.from_native(series_pd, series_only=True)
>>> nw.maybe_get_index(series)
RangeIndex(start=0, stop=2, step=1)
maybe_set_index(obj: FrameOrSeriesT, column_names: str | list[str] | None = None, *, index: Series[IntoSeriesT] | list[Series[IntoSeriesT]] | None = None) -> FrameOrSeriesT
Set the index of a DataFrame or a Series, if it's pandas-like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
FrameOrSeriesT
|
object for which maybe set the index (can be either a Narwhals |
required |
column_names
|
str | list[str] | None
|
name or list of names of the columns to set as index.
For dataframes, only one of |
None
|
index
|
Series[IntoSeriesT] | list[Series[IntoSeriesT]] | None
|
series or list of series to set as index. |
None
|
Returns:
Type | Description |
---|---|
FrameOrSeriesT
|
Same type as input. |
Raises:
Type | Description |
---|---|
ValueError
|
If one of the following condition happens:
|
Notes
This is only really intended for backwards-compatibility purposes, for example if your library already aligns indices for users. If you're designing a new library, we highly encourage you to not rely on the Index.
For non-pandas-like inputs, this is a no-op.
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df_pd = pd.DataFrame({"a": [1, 2], "b": [4, 5]})
>>> df = nw.from_native(df_pd)
>>> nw.to_native(nw.maybe_set_index(df, "b"))
a
b
4 1
5 2
mean(*columns: str) -> Expr
Get the mean value.
Note
Syntactic sugar for nw.col(columns).mean()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str
|
Name(s) of the columns to use in the aggregation function |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 8, 3]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe agnostic function:
>>> def agnostic_mean(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.mean("a")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_mean
:
>>> agnostic_mean(df_pd)
a
0 4.0
>>> agnostic_mean(df_pl)
shape: (1, 1)
┌─────┐
│ a │
│ --- │
│ f64 │
╞═════╡
│ 4.0 │
└─────┘
>>> agnostic_mean(df_pa)
pyarrow.Table
a: double
----
a: [[4]]
mean_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr
Compute the mean of all values horizontally across columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Name(s) of the columns to use in the aggregation function. Accepts expression input. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
... "a": [1, 8, 3],
... "b": [4, 5, None],
... "c": ["x", "y", "z"],
... }
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function that computes the horizontal mean of "a" and "b" columns:
>>> def agnostic_mean_horizontal(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.mean_horizontal("a", "b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_mean_horizontal
:
>>> agnostic_mean_horizontal(df_pd)
a
0 2.5
1 6.5
2 3.0
>>> agnostic_mean_horizontal(df_pl)
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ f64 │
╞═════╡
│ 2.5 │
│ 6.5 │
│ 3.0 │
└─────┘
>>> agnostic_mean_horizontal(df_pa)
pyarrow.Table
a: double
----
a: [[2.5,6.5,3]]
median(*columns: str) -> Expr
Get the median value.
Notes
- Syntactic sugar for
nw.col(columns).median()
- Results might slightly differ across backends due to differences in the underlying algorithms used to compute the median.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str
|
Name(s) of the columns to use in the aggregation function |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [4, 5, 2]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe agnostic function:
>>> def agnostic_median(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.median("a")).to_native()
We can then pass any supported library such as pandas, Polars, or
PyArrow to agnostic_median
:
>>> agnostic_median(df_pd)
a
0 4.0
>>> agnostic_median(df_pl)
shape: (1, 1)
┌─────┐
│ a │
│ --- │
│ f64 │
╞═════╡
│ 4.0 │
└─────┘
>>> agnostic_median(df_pa)
pyarrow.Table
a: double
----
a: [[4]]
min(*columns: str) -> Expr
Return the minimum value.
Note
Syntactic sugar for nw.col(columns).min()
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str
|
Name(s) of the columns to use in the aggregation function. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import polars as pl
>>> import pandas as pd
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2], "b": [5, 10]}
>>> df_pd = pd.DataFrame(data)
>>> df_pl = pl.DataFrame(data)
>>> df_pa = pa.table(data)
Let's define a dataframe-agnostic function:
>>> def agnostic_min(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.min("b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_min
:
>>> agnostic_min(df_pd)
b
0 5
>>> agnostic_min(df_pl)
shape: (1, 1)
┌─────┐
│ b │
│ --- │
│ i64 │
╞═════╡
│ 5 │
└─────┘
>>> agnostic_min(df_pa)
pyarrow.Table
b: int64
----
b: [[5]]
min_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr
Get the minimum value horizontally across columns.
Notes
We support min_horizontal
over numeric columns only.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Name(s) of the columns to use in the aggregation function. Accepts expression input. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {
... "a": [1, 8, 3],
... "b": [4, 5, None],
... "c": ["x", "y", "z"],
... }
We define a dataframe-agnostic function that computes the horizontal min of "a" and "b" columns:
>>> def agnostic_min_horizontal(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.min_horizontal("a", "b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_min_horizontal
:
>>> agnostic_min_horizontal(pd.DataFrame(data))
a
0 1.0
1 5.0
2 3.0
>>> agnostic_min_horizontal(pl.DataFrame(data))
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 5 │
│ 3 │
└─────┘
>>> agnostic_min_horizontal(pa.table(data))
pyarrow.Table
a: int64
----
a: [[1,5,3]]
narwhalify(func: Callable[..., Any] | None = None, *, strict: bool | None = None, pass_through: bool | None = None, eager_only: bool = False, series_only: bool = False, allow_series: bool | None = True) -> Callable[..., Any]
Decorate function so it becomes dataframe-agnostic.
This will try to convert any dataframe/series-like object into the Narwhals
respective DataFrame/Series, while leaving the other parameters as they are.
Similarly, if the output of the function is a Narwhals DataFrame or Series, it will be
converted back to the original dataframe/series type, while if the output is another
type it will be left as is.
By setting pass_through=False
, then every input and every output will be required to be a
dataframe/series-like object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func
|
Callable[..., Any] | None
|
Function to wrap in a |
None
|
strict
|
bool | None
|
Deprecated (v1.13.0):
Please use Determine what happens if the object can't be converted to Narwhals:
|
None
|
pass_through
|
bool | None
|
Determine what happens if the object can't be converted to Narwhals:
|
None
|
eager_only
|
bool
|
Whether to only allow eager objects:
|
False
|
series_only
|
bool
|
Whether to only allow Series:
|
False
|
allow_series
|
bool | None
|
Whether to allow Series (default is only Dataframe / Lazyframe):
|
True
|
Returns:
Type | Description |
---|---|
Callable[..., Any]
|
Decorated function. |
Examples:
Instead of writing
>>> import narwhals as nw
>>> def agnostic_group_by_sum(df):
... df = nw.from_native(df, pass_through=True)
... df = df.group_by("a").agg(nw.col("b").sum())
... return nw.to_native(df)
you can just write
>>> @nw.narwhalify
... def agnostic_group_by_sum(df):
... return df.group_by("a").agg(nw.col("b").sum())
new_series(name: str, values: Any, dtype: DType | type[DType] | None = None, *, native_namespace: ModuleType) -> Series[Any]
Instantiate Narwhals Series from iterable (e.g. list or array).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of resulting Series. |
required |
values
|
Any
|
Values of make Series from. |
required |
dtype
|
DType | type[DType] | None
|
(Narwhals) dtype. If not provided, the native library
may auto-infer it from |
None
|
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
Returns:
Type | Description |
---|---|
Series[Any]
|
A new Series |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT, IntoSeriesT
>>> data = {"a": [1, 2, 3], "b": [4, 5, 6]}
Let's define a dataframe-agnostic function:
>>> def agnostic_new_series(df_native: IntoFrameT) -> IntoSeriesT:
... values = [4, 1, 2, 3]
... native_namespace = nw.get_native_namespace(df_native)
... return nw.new_series(
... name="a",
... values=values,
... dtype=nw.Int32,
... native_namespace=native_namespace,
... ).to_native()
We can then pass any supported eager library, such as pandas / Polars / PyArrow:
>>> agnostic_new_series(pd.DataFrame(data))
0 4
1 1
2 2
3 3
Name: a, dtype: int32
>>> agnostic_new_series(pl.DataFrame(data))
shape: (4,)
Series: 'a' [i32]
[
4
1
2
3
]
>>> agnostic_new_series(pa.table(data))
<pyarrow.lib.ChunkedArray object at ...>
[
[
4,
1,
2,
3
]
]
nth(*indices: int | Sequence[int]) -> Expr
Creates an expression that references one or more columns by their index(es).
Notes
nth
is not supported for Polars version<1.0.0. Please use
narwhals.col
instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
indices
|
int | Sequence[int]
|
One or more indices representing the columns to retrieve. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_nth(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.nth(0) * 2).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to agnostic_nth
:
>>> agnostic_nth(df_pd)
a
0 2
1 4
>>> agnostic_nth(df_pl)
shape: (2, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 2 │
│ 4 │
└─────┘
>>> agnostic_nth(df_pa)
pyarrow.Table
a: int64
----
a: [[2,4]]
read_csv(source: str, *, native_namespace: ModuleType, **kwargs: Any) -> DataFrame[Any]
Read a CSV file into a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
str
|
Path to a file. |
required |
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
kwargs
|
Any
|
Extra keyword arguments which are passed to the native CSV reader.
For example, you could use
|
{}
|
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
DataFrame. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from types import ModuleType
Let's create an agnostic function that reads a csv file with a specified native namespace:
>>> def agnostic_read_csv(native_namespace: ModuleType) -> IntoDataFrame:
... return nw.read_csv(
... "file.csv", native_namespace=native_namespace
... ).to_native()
Then we can read the file by passing pandas, Polars or PyArrow namespaces:
>>> agnostic_read_csv(native_namespace=pd)
a b
0 1 4
1 2 5
2 3 6
>>> agnostic_read_csv(native_namespace=pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
>>> agnostic_read_csv(native_namespace=pa)
pyarrow.Table
a: int64
b: int64
----
a: [[1,2,3]]
b: [[4,5,6]]
read_parquet(source: str, *, native_namespace: ModuleType, **kwargs: Any) -> DataFrame[Any]
Read into a DataFrame from a parquet file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
str
|
Path to a file. |
required |
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
kwargs
|
Any
|
Extra keyword arguments which are passed to the native parquet reader.
For example, you could use
|
{}
|
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
DataFrame. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> from types import ModuleType
Let's create an agnostic function that reads a parquet file with a specified native namespace:
>>> def agnostic_read_parquet(native_namespace: ModuleType) -> IntoDataFrame:
... return nw.read_parquet(
... "file.parquet", native_namespace=native_namespace
... ).to_native()
Then we can read the file by passing pandas, Polars or PyArrow namespaces:
>>> agnostic_read_parquet(native_namespace=pd)
a b
0 1 4
1 2 5
2 3 6
>>> agnostic_read_parquet(native_namespace=pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
>>> agnostic_read_parquet(native_namespace=pa)
pyarrow.Table
a: int64
b: int64
----
a: [[1,2,3]]
b: [[4,5,6]]
scan_csv(source: str, *, native_namespace: ModuleType, **kwargs: Any) -> LazyFrame[Any]
Lazily read from a CSV file.
For the libraries that do not support lazy dataframes, the function reads a csv file eagerly and then converts the resulting dataframe to a lazyframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
str
|
Path to a file. |
required |
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
kwargs
|
Any
|
Extra keyword arguments which are passed to the native CSV reader.
For example, you could use
|
{}
|
Returns:
Type | Description |
---|---|
LazyFrame[Any]
|
LazyFrame. |
Examples:
>>> import dask.dataframe as dd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>> from types import ModuleType
Let's create an agnostic function that lazily reads a csv file with a specified native namespace:
>>> def agnostic_scan_csv(native_namespace: ModuleType) -> IntoFrame:
... return nw.scan_csv(
... "file.csv", native_namespace=native_namespace
... ).to_native()
Then we can read the file by passing, for example, Polars or Dask namespaces:
>>> agnostic_scan_csv(native_namespace=pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
>>> agnostic_scan_csv(native_namespace=dd).compute()
a b
0 1 4
1 2 5
2 3 6
scan_parquet(source: str, *, native_namespace: ModuleType, **kwargs: Any) -> LazyFrame[Any]
Lazily read from a parquet file.
For the libraries that do not support lazy dataframes, the function reads a parquet file eagerly and then converts the resulting dataframe to a lazyframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
str
|
Path to a file. |
required |
native_namespace
|
ModuleType
|
The native library to use for DataFrame creation. |
required |
kwargs
|
Any
|
Extra keyword arguments which are passed to the native parquet reader.
For example, you could use
|
{}
|
Returns:
Type | Description |
---|---|
LazyFrame[Any]
|
LazyFrame. |
Examples:
>>> import dask.dataframe as dd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrame
>>> from types import ModuleType
Let's create an agnostic function that lazily reads a parquet file with a specified native namespace:
>>> def agnostic_scan_parquet(native_namespace: ModuleType) -> IntoFrame:
... return nw.scan_parquet(
... "file.parquet", native_namespace=native_namespace
... ).to_native()
Then we can read the file by passing, for example, Polars or Dask namespaces:
>>> agnostic_scan_parquet(native_namespace=pl).collect()
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
>>> agnostic_scan_parquet(native_namespace=dd).compute()
a b
0 1 4
1 2 5
2 3 6
sum(*columns: str) -> Expr
Sum all values.
Note
Syntactic sugar for nw.col(columns).sum()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str
|
Name(s) of the columns to use in the aggregation function |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_sum(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.sum("a")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_sum
:
>>> agnostic_sum(df_pd)
a
0 3
>>> agnostic_sum(df_pl)
shape: (1, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 3 │
└─────┘
>>> agnostic_sum(df_pa)
pyarrow.Table
a: int64
----
a: [[3]]
sum_horizontal(*exprs: IntoExpr | Iterable[IntoExpr]) -> Expr
Sum all values horizontally across columns.
Warning
Unlike Polars, we support horizontal sum over numeric columns only.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
exprs
|
IntoExpr | Iterable[IntoExpr]
|
Name(s) of the columns to use in the aggregation function. Accepts expression input. |
()
|
Returns:
Type | Description |
---|---|
Expr
|
A new expression. |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2, 3], "b": [5, 10, None]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_sum_horizontal(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.select(nw.sum_horizontal("a", "b")).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to agnostic_sum_horizontal
:
>>> agnostic_sum_horizontal(df_pd)
a
0 6.0
1 12.0
2 3.0
>>> agnostic_sum_horizontal(df_pl)
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 6 │
│ 12 │
│ 3 │
└─────┘
>>> agnostic_sum_horizontal(df_pa)
pyarrow.Table
a: int64
----
a: [[6,12,3]]
show_versions() -> None
Print useful debugging information.
Examples:
>>> from narwhals import show_versions
>>> show_versions()
to_native(narwhals_object: DataFrame[IntoDataFrameT] | LazyFrame[IntoFrameT] | Series[IntoSeriesT], *, strict: bool | None = None, pass_through: bool | None = None) -> IntoDataFrameT | IntoFrameT | IntoSeriesT | Any
Convert Narwhals object to native one.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
narwhals_object
|
DataFrame[IntoDataFrameT] | LazyFrame[IntoFrameT] | Series[IntoSeriesT]
|
Narwhals object. |
required |
strict
|
bool | None
|
Determine what happens if
Deprecated (v1.13.0):
Please use |
None
|
pass_through
|
bool | None
|
Determine what happens if
|
None
|
Returns:
Type | Description |
---|---|
IntoDataFrameT | IntoFrameT | IntoSeriesT | Any
|
Object of class that user started with. |
to_py_scalar(scalar_like: Any) -> Any
If a scalar is not Python native, converts it to Python native.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scalar_like
|
Any
|
Scalar-like value. |
required |
Returns:
Type | Description |
---|---|
Any
|
Python scalar. |
Raises:
Type | Description |
---|---|
ValueError
|
If the object is not convertible to a scalar. |
Examples:
>>> import narwhals as nw
>>> import pandas as pd
>>> df = nw.from_native(pd.DataFrame({"a": [1, 2, 3]}))
>>> nw.to_py_scalar(df["a"].item(0))
1
>>> import pyarrow as pa
>>> df = nw.from_native(pa.table({"a": [1, 2, 3]}))
>>> nw.to_py_scalar(df["a"].item(0))
1
>>> nw.to_py_scalar(1)
1
when(*predicates: IntoExpr | Iterable[IntoExpr]) -> When
Start a when-then-otherwise
expression.
Expression similar to an if-else
statement in Python. Always initiated by a
pl.when(<condition>).then(<value if condition>)
, and optionally followed by
chaining one or more .when(<condition>).then(<value>)
statements.
Chained when-then operations should be read as Python if, elif, ... elif
blocks, not as if, if, ... if
, i.e. the first condition that evaluates to
True
will be picked.
If none of the conditions are True
, an optional
.otherwise(<value if all statements are false>)
can be appended at the end.
If not appended, and none of the conditions are True
, None
will be returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicates
|
IntoExpr | Iterable[IntoExpr]
|
Condition(s) that must be met in order to apply the subsequent
statement. Accepts one or more boolean expressions, which are implicitly
combined with |
()
|
Returns:
Type | Description |
---|---|
When
|
A "when" object, which |
Examples:
>>> import pandas as pd
>>> import polars as pl
>>> import pyarrow as pa
>>> import narwhals as nw
>>> from narwhals.typing import IntoFrameT
>>>
>>> data = {"a": [1, 2, 3], "b": [5, 10, 15]}
>>> df_pl = pl.DataFrame(data)
>>> df_pd = pd.DataFrame(data)
>>> df_pa = pa.table(data)
We define a dataframe-agnostic function:
>>> def agnostic_when_then_otherwise(df_native: IntoFrameT) -> IntoFrameT:
... df = nw.from_native(df_native)
... return df.with_columns(
... nw.when(nw.col("a") < 3).then(5).otherwise(6).alias("a_when")
... ).to_native()
We can pass any supported library such as Pandas, Polars, or PyArrow to
agnostic_when_then_otherwise
:
>>> agnostic_when_then_otherwise(df_pd)
a b a_when
0 1 5 5
1 2 10 5
2 3 15 6
>>> agnostic_when_then_otherwise(df_pl)
shape: (3, 3)
┌─────┬─────┬────────┐
│ a ┆ b ┆ a_when │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪════════╡
│ 1 ┆ 5 ┆ 5 │
│ 2 ┆ 10 ┆ 5 │
│ 3 ┆ 15 ┆ 6 │
└─────┴─────┴────────┘
>>> agnostic_when_then_otherwise(df_pa)
pyarrow.Table
a: int64
b: int64
a_when: int64
----
a: [[1,2,3]]
b: [[5,10,15]]
a_when: [[5,5,6]]