DataFrame
To write a dataframe-agnostic function, the steps you'll want to follow are:
-
Initialise a Narwhals DataFrame or LazyFrame by passing your dataframe to
nw.from_native
. All the calculations stay lazy if we start with a lazy dataframe - Narwhals will never automatically trigger computation without you asking it to.Note: if you need eager execution, make sure to pass
eager_only=True
tonw.from_native
. -
Express your logic using the subset of the Polars API supported by Narwhals.
- If you need to return a dataframe to the user in its original library, call
nw.to_native
.
Let's try writing a simple example.
Example 1: descriptive statistics
Just like in Polars, we can pass expressions to
DataFrame.select
or LazyFrame.select
.
Make a Python file with the following content:
import narwhals as nw
def func(df_any):
# 1. Create a Narwhals dataframe
df = nw.from_native(df_any)
# 2. Use the subset of the Polars API supported by Narwhals
df = df.select(
a_sum=nw.col('a').sum(),
a_mean=nw.col('a').mean(),
a_std=nw.col('a').std(),
)
# 3. Return a library from the user's original library
return nw.to_native(df)
import pandas as pd
df = pd.DataFrame({'a': [1, 1, 2]})
print(func(df))
a_sum a_mean a_std
0 4 1.333333 0.57735
import polars as pl
df = pl.DataFrame({'a': [1, 1, 2]})
print(func(df))
shape: (1, 3)
┌───────┬──────────┬─────────┐
│ a_sum ┆ a_mean ┆ a_std │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═══════╪══════════╪═════════╡
│ 4 ┆ 1.333333 ┆ 0.57735 │
└───────┴──────────┴─────────┘
import polars as pl
df = pl.LazyFrame({'a': [1, 1, 2]})
print(func(df).collect())
shape: (1, 3)
┌───────┬──────────┬─────────┐
│ a_sum ┆ a_mean ┆ a_std │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═══════╪══════════╪═════════╡
│ 4 ┆ 1.333333 ┆ 0.57735 │
└───────┴──────────┴─────────┘
Example 2: group-by and mean
Make a Python file with the following content:
import narwhals as nw
def func(df_any):
# 1. Create a Narwhals dataframe
df = nw.from_native(df_any)
# 2. Use the subset of the Polars API supported by Narwhals
df = df.group_by('a').agg(nw.col('b').mean()).sort('a')
# 3. Return a library from the user's original library
return nw.to_native(df)
import pandas as pd
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
a b
0 1 4.5
1 2 6.0
import polars as pl
df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 4.5 │
│ 2 ┆ 6.0 │
└─────┴─────┘
import polars as pl
df = pl.LazyFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df).collect())
shape: (2, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 4.5 │
│ 2 ┆ 6.0 │
└─────┴─────┘
Example 3: horizontal sum
Expressions can be free-standing functions which accept other
expressions as inputs. For example, we can compute a horizontal
sum using nw.sum_horizontal
.
Make a Python file with the following content:
import narwhals as nw
def func(df_any):
# 1. Create a Narwhals dataframe
df = nw.from_native(df_any)
# 2. Use the subset of the Polars API supported by Narwhals
df = df.with_columns(a_plus_b=nw.sum_horizontal('a', 'b'))
# 3. Return a library from the user's original library
return nw.to_native(df)
import pandas as pd
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
a b a_plus_b
0 1 4 5
1 1 5 6
2 2 6 8
import polars as pl
df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df))
shape: (3, 3)
┌─────┬─────┬──────────┐
│ a ┆ b ┆ a_plus_b │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪══════════╡
│ 1 ┆ 4 ┆ 5 │
│ 1 ┆ 5 ┆ 6 │
│ 2 ┆ 6 ┆ 8 │
└─────┴─────┴──────────┘
import polars as pl
df = pl.LazyFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
print(func(df).collect())
shape: (3, 3)
┌─────┬─────┬──────────┐
│ a ┆ b ┆ a_plus_b │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪══════════╡
│ 1 ┆ 4 ┆ 5 │
│ 1 ┆ 5 ┆ 6 │
│ 2 ┆ 6 ┆ 8 │
└─────┴─────┴──────────┘