narwhals.LazyFrame
Narwhals LazyFrame, backed by a native lazyframe.
Warning
This class is not meant to be instantiated directly - instead use
narwhals.from_native
with a native
object that is a lazy dataframe from one of the supported
backend (e.g. polars.LazyFrame, dask_expr._collection.DataFrame):
narwhals.from_native(native_lazyframe)
columns
property
columns: list[str]
Get column names.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).columns
['a', 'b']
implementation
property
implementation: Implementation
Return implementation of native frame.
This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.
Examples:
>>> import narwhals as nw
>>> import dask.dataframe as dd
>>> lf_native = dd.from_dict({"a": [1, 2]}, npartitions=1)
>>> nw.from_native(lf_native).implementation
<Implementation.DASK: 'dask'>
schema
property
schema: Schema
Get an ordered mapping of column names to their data type.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).schema
Schema({'a': Int32, 'b': Decimal})
collect
collect(
backend: (
IntoBackend[Polars | Pandas | Arrow] | None
) = None,
**kwargs: Any
) -> DataFrame[Any]
Materialize this LazyFrame into a DataFrame.
As each underlying lazyframe has different arguments to set when materializing the lazyframe into a dataframe, we allow to pass them as kwargs (see examples below for how to generalize the specification).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backend
|
IntoBackend[Polars | Pandas | Arrow] | None
|
specifies which eager backend collect to. This will be the underlying backend for the resulting Narwhals DataFrame. If None, then the following default conversions will be applied
|
None
|
kwargs
|
Any
|
backend specific kwargs to pass along. To know more please check the backend specific documentation |
{}
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> lf = nw.from_native(lf_native)
>>> lf
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│ 1 │ 2 │ |
|│ 3 │ 4 │ |
|└───────┴───────┘ |
└──────────────────┘
>>> lf.collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| pyarrow.Table |
| a: int32 |
| b: int32 |
| ---- |
| a: [[1,3]] |
| b: [[2,4]] |
└──────────────────┘
collect_schema
collect_schema() -> Schema
Get an ordered mapping of column names to their data type.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).collect_schema()
Schema({'a': Int32, 'b': Decimal})
drop
drop(
*columns: str | Iterable[str], strict: bool = True
) -> Self
Remove columns from the LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*columns
|
str | Iterable[str]
|
Names of the columns that should be removed from the dataframe. |
()
|
strict
|
bool
|
Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema. |
True
|
Warning
strict
argument is ignored for polars<1.0.0
.
Please consider upgrading to a newer version or pass to eager mode.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop("a").to_native()
┌───────┐
│ b │
│ int32 │
├───────┤
│ 2 │
│ 4 │
└───────┘
drop_nulls
drop_nulls(subset: str | list[str] | None = None) -> Self
Drop rows that contain null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) for which null values are considered. If set to None (default), use all columns. |
None
|
Notes
pandas handles null values differently from Polars and PyArrow. See null_handling for reference.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, NULL), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop_nulls()
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│ 3 │ 4 │ |
|└───────┴───────┘ |
└──────────────────┘
explode
explode(
columns: str | Sequence[str], *more_columns: str
) -> Self
Explode the dataframe to long format by exploding the given columns.
Notes
It is possible to explode multiple columns only if these columns have matching element counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str | Sequence[str]
|
Column names. The underlying columns being exploded must be of the |
required |
*more_columns
|
str
|
Additional names of columns to explode, specified as positional arguments. |
()
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES ('x', [1, 2]), ('y', [3, 4]), ('z', [5, 6]) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.explode("b").to_native()
┌─────────┬───────┐
│ a │ b │
│ varchar │ int32 │
├─────────┼───────┤
│ x │ 1 │
│ x │ 2 │
│ y │ 3 │
│ y │ 4 │
│ z │ 5 │
│ z │ 6 │
└─────────┴───────┘
filter
filter(
*predicates: IntoExpr | Iterable[IntoExpr] | list[bool],
**constraints: Any
) -> Self
Filter the rows in the LazyFrame based on a predicate expression.
The original order of the remaining rows is preserved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*predicates
|
IntoExpr | Iterable[IntoExpr] | list[bool]
|
Expression that evaluates to a boolean Series. Can also be a (single!) boolean list. |
()
|
**constraints
|
Any
|
Column filters; use |
{}
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql('''
... SELECT * FROM VALUES
... (1, 6, 'a'),
... (2, 7, 'b'),
... (3, 8, 'c')
... df(foo, bar, ham)
... ''')
Filter on one condition
>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 2 │ 7 │ b │
│ 3 │ 8 │ c │
└───────┴───────┴─────────┘
Filter on multiple conditions with implicit &
>>> nw.from_native(df_native).filter(
... nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 1 │ 6 │ a │
└───────┴───────┴─────────┘
Filter on multiple conditions with |
>>> nw.from_native(df_native).filter(
... (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 1 │ 6 │ a │
│ 3 │ 8 │ c │
└───────┴───────┴─────────┘
Filter using **kwargs
syntax
>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 2 │ 7 │ b │
└───────┴───────┴─────────┘
group_by
group_by(
*keys: IntoExpr | Iterable[IntoExpr],
drop_null_keys: Literal[False] = ...
) -> LazyGroupBy[Self]
group_by(
*keys: str | Iterable[str],
drop_null_keys: Literal[True]
) -> LazyGroupBy[Self]
group_by(
*keys: IntoExpr | Iterable[IntoExpr],
drop_null_keys: bool = False
) -> LazyGroupBy[Self]
Start a group by operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*keys
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to group by. Accepts expression input. Strings are parsed as column names. |
()
|
drop_null_keys
|
bool
|
if True, then groups where any key is null won't be included in the result. |
False
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES (1, 'a'), (2, 'b'), (3, 'a') df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.group_by("b").agg(nw.col("a").sum()).sort("b").to_native()
┌─────────┬────────┐
│ b │ a │
│ varchar │ int128 │
├─────────┼────────┤
│ a │ 4 │
│ b │ 2 │
└─────────┴────────┘
Expressions are also accepted.
>>> df.group_by(nw.col("b").str.len_chars()).agg(
... nw.col("a").sum()
... ).to_native()
┌───────┬────────┐
│ b │ a │
│ int64 │ int128 │
├───────┼────────┤
│ 1 │ 6 │
└───────┴────────┘
head
head(n: int = 5) -> Self
Get n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. |
5
|
Examples:
>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2, 3], "b": [4, 5, 6]}, npartitions=1)
>>> nw.from_native(lf_native).head(2).collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| a b |
| 0 1 4 |
| 1 2 5 |
└──────────────────┘
join
join(
other: Self,
on: str | list[str] | None = None,
how: JoinStrategy = "inner",
*,
left_on: str | list[str] | None = None,
right_on: str | list[str] | None = None,
suffix: str = "_right"
) -> Self
Add a join operation to the Logical Plan.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
Lazy DataFrame to join with. |
required |
on
|
str | list[str] | None
|
Name(s) of the join columns in both DataFrames. If set, |
None
|
how
|
JoinStrategy
|
Join strategy.
|
'inner'
|
left_on
|
str | list[str] | None
|
Join column of the left DataFrame. |
None
|
right_on
|
str | list[str] | None
|
Join column of the right DataFrame. |
None
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native1 = duckdb.sql(
... "SELECT * FROM VALUES (1, 'a'), (2, 'b') df(a, b)"
... )
>>> df_native2 = duckdb.sql(
... "SELECT * FROM VALUES (1, 'x'), (3, 'y') df(a, c)"
... )
>>> df1 = nw.from_native(df_native1)
>>> df2 = nw.from_native(df_native2)
>>> df1.join(df2, on="a")
┌─────────────────────────────┐
| Narwhals LazyFrame |
|-----------------------------|
|┌───────┬─────────┬─────────┐|
|│ a │ b │ c │|
|│ int32 │ varchar │ varchar │|
|├───────┼─────────┼─────────┤|
|│ 1 │ a │ x │|
|└───────┴─────────┴─────────┘|
└─────────────────────────────┘
join_asof
join_asof(
other: Self,
*,
left_on: str | None = None,
right_on: str | None = None,
on: str | None = None,
by_left: str | list[str] | None = None,
by_right: str | list[str] | None = None,
by: str | list[str] | None = None,
strategy: AsofJoinStrategy = "backward",
suffix: str = "_right"
) -> Self
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
For Polars, both DataFrames must be sorted by the on
key (within each by
group
if specified).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
left_on
|
str | None
|
Name(s) of the left join column(s). |
None
|
right_on
|
str | None
|
Name(s) of the right join column(s). |
None
|
on
|
str | None
|
Join column of both DataFrames. If set, left_on and right_on should be None. |
None
|
by_left
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
by_right
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
by
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
strategy
|
AsofJoinStrategy
|
Join strategy. The default is "backward".
|
'backward'
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Examples:
>>> from datetime import datetime
>>> import polars as pl
>>> import narwhals as nw
>>> data_gdp = {
... "datetime": [
... datetime(2016, 1, 1),
... datetime(2017, 1, 1),
... datetime(2018, 1, 1),
... datetime(2019, 1, 1),
... datetime(2020, 1, 1),
... ],
... "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
... "datetime": [
... datetime(2016, 3, 1),
... datetime(2018, 8, 1),
... datetime(2019, 1, 1),
... ],
... "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pl.DataFrame(data_gdp)
>>> population_native = pl.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward").to_native()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19 ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66 ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
lazy
lazy() -> Self
Restrict available API methods to lazy-only ones.
This is a no-op, and exists only for compatibility with DataFrame.lazy
.
pipe
pipe(
function: Callable[Concatenate[Self, PS], R],
*args: args,
**kwargs: kwargs
) -> R
Pipe function call.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function
|
Callable[Concatenate[Self, PS], R]
|
Function to apply. |
required |
args
|
args
|
Positional arguments to pass to function. |
()
|
kwargs
|
kwargs
|
Keyword arguments to pass to function. |
{}
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).pipe(lambda x: x.select("a")).to_native()
┌───────┐
│ a │
│ int32 │
├───────┤
│ 1 │
│ 3 │
└───────┘
rename
rename(mapping: dict[str, str]) -> Self
Rename column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping
|
dict[str, str]
|
Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name. |
required |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).rename({"a": "c"})
┌────────────────────────┐
| Narwhals LazyFrame |
|------------------------|
|┌───────┬──────────────┐|
|│ c │ b │|
|│ int32 │ decimal(2,1) │|
|├───────┼──────────────┤|
|│ 1 │ 4.5 │|
|│ 3 │ 2.0 │|
|└───────┴──────────────┘|
└────────────────────────┘
select
select(
*exprs: IntoExpr | Iterable[IntoExpr],
**named_exprs: IntoExpr
) -> Self
Select columns from this LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Notes
If you'd like to select a column whose name isn't a string (for example,
if you're working with pandas) then you should explicitly use nw.col
instead
of just passing the column name. For example, to select a column named
0
use df.select(nw.col(0))
, not df.select(0)
.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).select("a", a_plus_1=nw.col("a") + 1)
┌────────────────────┐
| Narwhals LazyFrame |
|--------------------|
|┌───────┬──────────┐|
|│ a │ a_plus_1 │|
|│ int32 │ int32 │|
|├───────┼──────────┤|
|│ 1 │ 2 │|
|│ 3 │ 4 │|
|└───────┴──────────┘|
└────────────────────┘
sink_parquet
sink_parquet(file: str | Path | BytesIO) -> None
Write LazyFrame to Parquet file.
This may allow larger-than-RAM datasets to be written to disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | Path | BytesIO
|
String, path object or file-like object to which the dataframe will be written. |
required |
Examples:
>>> import polars as pl
>>> import narwhals as nw
>>> df_native = pl.LazyFrame({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.sink_parquet("out.parquet")
sort
sort(
by: str | Iterable[str],
*more_by: str,
descending: bool | Sequence[bool] = False,
nulls_last: bool = False
) -> Self
Sort the LazyFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by
|
str | Iterable[str]
|
Column(s) names to sort by. |
required |
*more_by
|
str
|
Additional columns to sort by, specified as positional arguments. |
()
|
descending
|
bool | Sequence[bool]
|
Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. |
False
|
nulls_last
|
bool
|
Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control. |
False
|
Warning
Unlike Polars, it is not possible to specify a sequence of booleans for
nulls_last
in order to control per-column behaviour. Instead a single
boolean is applied for all by
columns.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES (1, 6.0, 'a'), (2, 5.0, 'c'), (NULL, 4.0, 'b') df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.sort("a")
┌──────────────────────────────────┐
| Narwhals LazyFrame |
|----------------------------------|
|┌───────┬──────────────┬─────────┐|
|│ a │ b │ c │|
|│ int32 │ decimal(2,1) │ varchar │|
|├───────┼──────────────┼─────────┤|
|│ NULL │ 4.0 │ b │|
|│ 1 │ 6.0 │ a │|
|│ 2 │ 5.0 │ c │|
|└───────┴──────────────┴─────────┘|
└──────────────────────────────────┘
top_k
top_k(
k: int,
*,
by: str | Iterable[str],
reverse: bool | Sequence[bool] = False
) -> Self
Return the k
largest rows.
Non-null elements are always preferred over null elements, regardless of the value of reverse. The output is not guaranteed to be in any particular order, sort the outputs afterwards if you wish the output to be sorted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k
|
int
|
Number of rows to return. |
required |
by
|
str | Iterable[str]
|
Column(s) used to determine the top rows. Accepts expression input. Strings are parsed as column names. |
required |
reverse
|
bool | Sequence[bool]
|
Consider the k smallest elements of the by column(s) (instead of the k largest). This can be specified per column by passing a sequence of booleans. |
False
|
Returns:
Type | Description |
---|---|
Self
|
The LazyFrame with the |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES ('a', 2), ('b', 1), ('a', 1), ('b', 3), (NULL, 2), ('c', 1) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.top_k(4, by=["b", "a"])
┌───────────────────┐
|Narwhals LazyFrame |
|-------------------|
|┌─────────┬───────┐|
|│ a │ b │|
|│ varchar │ int32 │|
|├─────────┼───────┤|
|│ b │ 3 │|
|│ a │ 2 │|
|│ NULL │ 2 │|
|│ c │ 1 │|
|└─────────┴───────┘|
└───────────────────┘
to_native
to_native() -> LazyFrameT
Convert Narwhals LazyFrame to native one.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).to_native()
┌───────┬───────┐
│ a │ b │
│ int32 │ int32 │
├───────┼───────┤
│ 1 │ 2 │
│ 3 │ 4 │
└───────┴───────┘
unique
unique(
subset: str | list[str] | None = None,
*,
keep: LazyUniqueKeepStrategy = "any"
) -> Self
Drop duplicate rows from this LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) to consider when identifying duplicate rows.
If set to |
None
|
keep
|
LazyUniqueKeepStrategy
|
{'any', 'none'} Which of the duplicate rows to keep.
|
'any'
|
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 1), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).unique("a").sort("a", descending=True)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│ 3 │ 4 │ |
|│ 1 │ 1 │ |
|└───────┴───────┘ |
└──────────────────┘
unpivot
unpivot(
on: str | list[str] | None = None,
*,
index: str | list[str] | None = None,
variable_name: str = "variable",
value_name: str = "value"
) -> Self
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str] | None
|
Column(s) to use as values variables; if |
None
|
index
|
str | list[str] | None
|
Column(s) to use as identifier variables. |
None
|
variable_name
|
str
|
Name to give to the |
'variable'
|
value_name
|
str
|
Name to give to the |
'value'
|
Notes
If you're coming from pandas, this is similar to pandas.DataFrame.melt
,
but with index
replacing id_vars
and on
replacing value_vars
.
In other frameworks, you might know this operation as pivot_longer
.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES ('x', 1, 2), ('y', 3, 4), ('z', 5, 6) df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.unpivot(on=["b", "c"], index="a").sort("a", "variable").to_native()
┌─────────┬──────────┬───────┐
│ a │ variable │ value │
│ varchar │ varchar │ int32 │
├─────────┼──────────┼───────┤
│ x │ b │ 1 │
│ x │ c │ 2 │
│ y │ b │ 3 │
│ y │ c │ 4 │
│ z │ b │ 5 │
│ z │ c │ 6 │
└─────────┴──────────┴───────┘
with_columns
with_columns(
*exprs: IntoExpr | Iterable[IntoExpr],
**named_exprs: IntoExpr
) -> Self
Add columns to this LazyFrame.
Added columns will replace existing columns with the same name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Note
Creating a new LazyFrame using this method does not create a new copy of existing data.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).with_columns(c=nw.col("a") + 1)
┌────────────────────────────────┐
| Narwhals LazyFrame |
|--------------------------------|
|┌───────┬──────────────┬───────┐|
|│ a │ b │ c │|
|│ int32 │ decimal(2,1) │ int32 │|
|├───────┼──────────────┼───────┤|
|│ 1 │ 4.5 │ 2 │|
|│ 3 │ 2.0 │ 4 │|
|└───────┴──────────────┴───────┘|
└────────────────────────────────┘
with_row_index
with_row_index(
name: str = "index", *, order_by: str | Sequence[str]
) -> Self
Insert column which enumerates rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the column as a string. The default is "index". |
'index'
|
order_by
|
str | Sequence[str]
|
Column(s) to order by when computing the row index. |
required |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 5), (2, 4) df(a, b)")
>>> nw.from_native(lf_native).with_row_index(order_by="a").sort("a").collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| pyarrow.Table |
| index: int64 |
| a: int32 |
| b: int32 |
| ---- |
| index: [[0,1]] |
| a: [[1,2]] |
| b: [[5,4]] |
└──────────────────┘
>>> nw.from_native(lf_native).with_row_index(order_by="b").sort("a").collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| pyarrow.Table |
| index: int64 |
| a: int32 |
| b: int32 |
| ---- |
| index: [[1,0]] |
| a: [[1,2]] |
| b: [[5,4]] |
└──────────────────┘