Skip to content

narwhals.LazyFrame

Narwhals LazyFrame, backed by a native lazyframe.

Warning

This class is not meant to be instantiated directly - instead use narwhals.from_native with a native object that is a lazy dataframe from one of the supported backend (e.g. polars.LazyFrame, dask_expr._collection.DataFrame):

narwhals.from_native(native_lazyframe)

columns property

columns: list[str]

Get column names.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).columns
['a', 'b']

implementation property

implementation: Implementation

Return implementation of native frame.

This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.

Examples:

>>> import narwhals as nw
>>> import dask.dataframe as dd
>>> lf_native = dd.from_dict({"a": [1, 2]}, npartitions=1)
>>> nw.from_native(lf_native).implementation
<Implementation.DASK: 'dask'>

schema property

schema: Schema

Get an ordered mapping of column names to their data type.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).schema
Schema({'a': Int32, 'b': Decimal})

collect

collect(
    backend: (
        IntoBackend[Polars | Pandas | Arrow] | None
    ) = None,
    **kwargs: Any
) -> DataFrame[Any]

Materialize this LazyFrame into a DataFrame.

As each underlying lazyframe has different arguments to set when materializing the lazyframe into a dataframe, we allow to pass them as kwargs (see examples below for how to generalize the specification).

Parameters:

Name Type Description Default
backend IntoBackend[Polars | Pandas | Arrow] | None

specifies which eager backend collect to. This will be the underlying backend for the resulting Narwhals DataFrame. If None, then the following default conversions will be applied

  • polars.LazyFrame -> polars.DataFrame
  • dask.DataFrame -> pandas.DataFrame
  • duckdb.PyRelation -> pyarrow.Table
  • pyspark.DataFrame -> pyarrow.Table

backend can be specified in various ways

  • As Implementation.<BACKEND> with BACKEND being PANDAS, PYARROW or POLARS.
  • As a string: "pandas", "pyarrow" or "polars"
  • Directly as a module pandas, pyarrow or polars.
None
kwargs Any

backend specific kwargs to pass along. To know more please check the backend specific documentation

{}

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> lf = nw.from_native(lf_native)
>>> lf
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     1 │     2 │ |
|│     3 │     4 │ |
|└───────┴───────┘ |
└──────────────────┘
>>> lf.collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  a: int32        |
|  b: int32        |
|  ----            |
|  a: [[1,3]]      |
|  b: [[2,4]]      |
└──────────────────┘

collect_schema

collect_schema() -> Schema

Get an ordered mapping of column names to their data type.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).collect_schema()
Schema({'a': Int32, 'b': Decimal})

drop

drop(
    *columns: str | Iterable[str], strict: bool = True
) -> Self

Remove columns from the LazyFrame.

Parameters:

Name Type Description Default
*columns str | Iterable[str]

Names of the columns that should be removed from the dataframe.

()
strict bool

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

True
Warning

strict argument is ignored for polars<1.0.0.

Please consider upgrading to a newer version or pass to eager mode.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop("a").to_native()
┌───────┐
│   b   │
│ int32 │
├───────┤
│     2 │
│     4 │
└───────┘

drop_nulls

drop_nulls(subset: str | list[str] | None = None) -> Self

Drop rows that contain null values.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) for which null values are considered. If set to None (default), use all columns.

None
Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, NULL), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop_nulls()
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     3 │     4 │ |
|└───────┴───────┘ |
└──────────────────┘

explode

explode(
    columns: str | Sequence[str], *more_columns: str
) -> Self

Explode the dataframe to long format by exploding the given columns.

Notes

It is possible to explode multiple columns only if these columns have matching element counts.

Parameters:

Name Type Description Default
columns str | Sequence[str]

Column names. The underlying columns being exploded must be of the List data type.

required
*more_columns str

Additional names of columns to explode, specified as positional arguments.

()

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('x', [1, 2]), ('y', [3, 4]), ('z', [5, 6]) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.explode("b").to_native()
┌─────────┬───────┐
│    a    │   b   │
│ varchar │ int32 │
├─────────┼───────┤
│ x       │     1 │
│ x       │     2 │
│ y       │     3 │
│ y       │     4 │
│ z       │     5 │
│ z       │     6 │
└─────────┴───────┘

filter

filter(
    *predicates: IntoExpr | Iterable[IntoExpr] | list[bool],
    **constraints: Any
) -> Self

Filter the rows in the LazyFrame based on a predicate expression.

The original order of the remaining rows is preserved.

Parameters:

Name Type Description Default
*predicates IntoExpr | Iterable[IntoExpr] | list[bool]

Expression that evaluates to a boolean Series. Can also be a (single!) boolean list.

()
**constraints Any

Column filters; use name = value to filter columns by the supplied value. Each constraint will behave the same as nw.col(name).eq(value), and will be implicitly joined with the other filter conditions using &.

{}

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql('''
...     SELECT * FROM VALUES
...         (1, 6, 'a'),
...         (2, 7, 'b'),
...         (3, 8, 'c')
...     df(foo, bar, ham)
... ''')

Filter on one condition

>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     2 │     7 │ b       │
│     3 │     8 │ c       │
└───────┴───────┴─────────┘

Filter on multiple conditions with implicit &

>>> nw.from_native(df_native).filter(
...     nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     1 │     6 │ a       │
└───────┴───────┴─────────┘

Filter on multiple conditions with |

>>> nw.from_native(df_native).filter(
...     (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     1 │     6 │ a       │
│     3 │     8 │ c       │
└───────┴───────┴─────────┘

Filter using **kwargs syntax

>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     2 │     7 │ b       │
└───────┴───────┴─────────┘

group_by

group_by(
    *keys: IntoExpr | Iterable[IntoExpr],
    drop_null_keys: Literal[False] = ...
) -> LazyGroupBy[Self]
group_by(
    *keys: str | Iterable[str],
    drop_null_keys: Literal[True]
) -> LazyGroupBy[Self]
group_by(
    *keys: IntoExpr | Iterable[IntoExpr],
    drop_null_keys: bool = False
) -> LazyGroupBy[Self]

Start a group by operation.

Parameters:

Name Type Description Default
*keys IntoExpr | Iterable[IntoExpr]

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

()
drop_null_keys bool

if True, then groups where any key is null won't be included in the result.

False

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'a'), (2, 'b'), (3, 'a') df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.group_by("b").agg(nw.col("a").sum()).sort("b").to_native()
┌─────────┬────────┐
│    b    │   a    │
│ varchar │ int128 │
├─────────┼────────┤
│ a       │      4 │
│ b       │      2 │
└─────────┴────────┘

Expressions are also accepted.

>>> df.group_by(nw.col("b").str.len_chars()).agg(
...     nw.col("a").sum()
... ).to_native()
┌───────┬────────┐
│   b   │   a    │
│ int64 │ int128 │
├───────┼────────┤
│     1 │      6 │
└───────┴────────┘

head

head(n: int = 5) -> Self

Get n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Examples:

>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2, 3], "b": [4, 5, 6]}, npartitions=1)
>>> nw.from_native(lf_native).head(2).collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|        a  b      |
|     0  1  4      |
|     1  2  5      |
└──────────────────┘

join

join(
    other: Self,
    on: str | list[str] | None = None,
    how: JoinStrategy = "inner",
    *,
    left_on: str | list[str] | None = None,
    right_on: str | list[str] | None = None,
    suffix: str = "_right"
) -> Self

Add a join operation to the Logical Plan.

Parameters:

Name Type Description Default
other Self

Lazy DataFrame to join with.

required
on str | list[str] | None

Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None.

None
how JoinStrategy

Join strategy.

  • inner: Returns rows that have matching values in both tables.
  • left: Returns all rows from the left table, and the matched rows from the right table.
  • full: Returns all rows in both dataframes, with the suffix appended to the right join keys.
  • cross: Returns the Cartesian product of rows from both tables.
  • semi: Filter rows that have a match in the right table.
  • anti: Filter rows that do not have a match in the right table.
'inner'
left_on str | list[str] | None

Join column of the left DataFrame.

None
right_on str | list[str] | None

Join column of the right DataFrame.

None
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native1 = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'a'), (2, 'b') df(a, b)"
... )
>>> df_native2 = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'x'), (3, 'y') df(a, c)"
... )
>>> df1 = nw.from_native(df_native1)
>>> df2 = nw.from_native(df_native2)
>>> df1.join(df2, on="a")
┌─────────────────────────────┐
|     Narwhals LazyFrame      |
|-----------------------------|
|┌───────┬─────────┬─────────┐|
|│   a   │    b    │    c    │|
|│ int32 │ varchar │ varchar │|
|├───────┼─────────┼─────────┤|
|│     1 │ a       │ x       │|
|└───────┴─────────┴─────────┘|
└─────────────────────────────┘

join_asof

join_asof(
    other: Self,
    *,
    left_on: str | None = None,
    right_on: str | None = None,
    on: str | None = None,
    by_left: str | list[str] | None = None,
    by_right: str | list[str] | None = None,
    by: str | list[str] | None = None,
    strategy: AsofJoinStrategy = "backward",
    suffix: str = "_right"
) -> Self

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

For Polars, both DataFrames must be sorted by the on key (within each by group if specified).

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
left_on str | None

Name(s) of the left join column(s).

None
right_on str | None

Name(s) of the right join column(s).

None
on str | None

Join column of both DataFrames. If set, left_on and right_on should be None.

None
by_left str | list[str] | None

join on these columns before doing asof join

None
by_right str | list[str] | None

join on these columns before doing asof join

None
by str | list[str] | None

join on these columns before doing asof join

None
strategy AsofJoinStrategy

Join strategy. The default is "backward".

  • backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key.
  • forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key.
  • nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.
'backward'
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Examples:

>>> from datetime import datetime
>>> import polars as pl
>>> import narwhals as nw
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pl.DataFrame(data_gdp)
>>> population_native = pl.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward").to_native()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

lazy

lazy() -> Self

Restrict available API methods to lazy-only ones.

This is a no-op, and exists only for compatibility with DataFrame.lazy.

pipe

pipe(
    function: Callable[Concatenate[Self, PS], R],
    *args: args,
    **kwargs: kwargs
) -> R

Pipe function call.

Parameters:

Name Type Description Default
function Callable[Concatenate[Self, PS], R]

Function to apply.

required
args args

Positional arguments to pass to function.

()
kwargs kwargs

Keyword arguments to pass to function.

{}

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).pipe(lambda x: x.select("a")).to_native()
┌───────┐
│   a   │
│ int32 │
├───────┤
│     1 │
│     3 │
└───────┘

rename

rename(mapping: dict[str, str]) -> Self

Rename column names.

Parameters:

Name Type Description Default
mapping dict[str, str]

Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name.

required

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).rename({"a": "c"})
┌────────────────────────┐
|   Narwhals LazyFrame   |
|------------------------|
|┌───────┬──────────────┐|
|│   c   │      b       │|
|│ int32 │ decimal(2,1) │|
|├───────┼──────────────┤|
|│     1 │          4.5 │|
|│     3 │          2.0 │|
|└───────┴──────────────┘|
└────────────────────────┘

select

select(
    *exprs: IntoExpr | Iterable[IntoExpr],
    **named_exprs: IntoExpr
) -> Self

Select columns from this LazyFrame.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

()
**named_exprs IntoExpr

Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

{}
Notes

If you'd like to select a column whose name isn't a string (for example, if you're working with pandas) then you should explicitly use nw.col instead of just passing the column name. For example, to select a column named 0 use df.select(nw.col(0)), not df.select(0).

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).select("a", a_plus_1=nw.col("a") + 1)
┌────────────────────┐
| Narwhals LazyFrame |
|--------------------|
|┌───────┬──────────┐|
|│   a   │ a_plus_1 │|
|│ int32 │  int32   │|
|├───────┼──────────┤|
|│     1 │        2 │|
|│     3 │        4 │|
|└───────┴──────────┘|
└────────────────────┘

sink_parquet

sink_parquet(file: str | Path | BytesIO) -> None

Write LazyFrame to Parquet file.

This may allow larger-than-RAM datasets to be written to disk.

Parameters:

Name Type Description Default
file str | Path | BytesIO

String, path object or file-like object to which the dataframe will be written.

required

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> df_native = pl.LazyFrame({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.sink_parquet("out.parquet")

sort

sort(
    by: str | Iterable[str],
    *more_by: str,
    descending: bool | Sequence[bool] = False,
    nulls_last: bool = False
) -> Self

Sort the LazyFrame by the given columns.

Parameters:

Name Type Description Default
by str | Iterable[str]

Column(s) names to sort by.

required
*more_by str

Additional columns to sort by, specified as positional arguments.

()
descending bool | Sequence[bool]

Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.

False
nulls_last bool

Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.

False
Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES (1, 6.0, 'a'), (2, 5.0, 'c'), (NULL, 4.0, 'b') df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.sort("a")
┌──────────────────────────────────┐
|        Narwhals LazyFrame        |
|----------------------------------|
|┌───────┬──────────────┬─────────┐|
|│   a   │      b       │    c    │|
|│ int32 │ decimal(2,1) │ varchar │|
|├───────┼──────────────┼─────────┤|
|│  NULL │          4.0 │ b       │|
|│     1 │          6.0 │ a       │|
|│     2 │          5.0 │ c       │|
|└───────┴──────────────┴─────────┘|
└──────────────────────────────────┘

top_k

top_k(
    k: int,
    *,
    by: str | Iterable[str],
    reverse: bool | Sequence[bool] = False
) -> Self

Return the k largest rows.

Non-null elements are always preferred over null elements, regardless of the value of reverse. The output is not guaranteed to be in any particular order, sort the outputs afterwards if you wish the output to be sorted.

Parameters:

Name Type Description Default
k int

Number of rows to return.

required
by str | Iterable[str]

Column(s) used to determine the top rows. Accepts expression input. Strings are parsed as column names.

required
reverse bool | Sequence[bool]

Consider the k smallest elements of the by column(s) (instead of the k largest). This can be specified per column by passing a sequence of booleans.

False

Returns:

Type Description
Self

The LazyFrame with the k largest rows.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('a', 2), ('b', 1), ('a', 1), ('b', 3), (NULL, 2), ('c', 1) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.top_k(4, by=["b", "a"])
┌───────────────────┐
|Narwhals LazyFrame |
|-------------------|
|┌─────────┬───────┐|
|│    a    │   b   │|
|│ varchar │ int32 │|
|├─────────┼───────┤|
|│ b       │     3 │|
|│ a       │     2 │|
|│ NULL    │     2 │|
|│ c       │     1 │|
|└─────────┴───────┘|
└───────────────────┘

to_native

to_native() -> LazyFrameT

Convert Narwhals LazyFrame to native one.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).to_native()
┌───────┬───────┐
│   a   │   b   │
│ int32 │ int32 │
├───────┼───────┤
│     1 │     2 │
│     3 │     4 │
└───────┴───────┘

unique

unique(
    subset: str | list[str] | None = None,
    *,
    keep: LazyUniqueKeepStrategy = "any"
) -> Self

Drop duplicate rows from this LazyFrame.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) to consider when identifying duplicate rows. If set to None, use all columns.

None
keep LazyUniqueKeepStrategy

{'any', 'none'} Which of the duplicate rows to keep.

  • 'any': Does not give any guarantee of which row is kept.
  • 'none': Don't keep duplicate rows.
'any'

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 1), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).unique("a").sort("a", descending=True)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     3 │     4 │ |
|│     1 │     1 │ |
|└───────┴───────┘ |
└──────────────────┘

unpivot

unpivot(
    on: str | list[str] | None = None,
    *,
    index: str | list[str] | None = None,
    variable_name: str = "variable",
    value_name: str = "value"
) -> Self

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name Type Description Default
on str | list[str] | None

Column(s) to use as values variables; if on is empty all columns that are not in index will be used.

None
index str | list[str] | None

Column(s) to use as identifier variables.

None
variable_name str

Name to give to the variable column. Defaults to "variable".

'variable'
value_name str

Name to give to the value column. Defaults to "value".

'value'
Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('x', 1, 2), ('y', 3, 4), ('z', 5, 6) df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.unpivot(on=["b", "c"], index="a").sort("a", "variable").to_native()
┌─────────┬──────────┬───────┐
│    a    │ variable │ value │
│ varchar │ varchar  │ int32 │
├─────────┼──────────┼───────┤
│ x       │ b        │     1 │
│ x       │ c        │     2 │
│ y       │ b        │     3 │
│ y       │ c        │     4 │
│ z       │ b        │     5 │
│ z       │ c        │     6 │
└─────────┴──────────┴───────┘

with_columns

with_columns(
    *exprs: IntoExpr | Iterable[IntoExpr],
    **named_exprs: IntoExpr
) -> Self

Add columns to this LazyFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

{}
Note

Creating a new LazyFrame using this method does not create a new copy of existing data.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).with_columns(c=nw.col("a") + 1)
┌────────────────────────────────┐
|       Narwhals LazyFrame       |
|--------------------------------|
|┌───────┬──────────────┬───────┐|
|│   a   │      b       │   c   │|
|│ int32 │ decimal(2,1) │ int32 │|
|├───────┼──────────────┼───────┤|
|│     1 │          4.5 │     2 │|
|│     3 │          2.0 │     4 │|
|└───────┴──────────────┴───────┘|
└────────────────────────────────┘

with_row_index

with_row_index(
    name: str = "index", *, order_by: str | Sequence[str]
) -> Self

Insert column which enumerates rows.

Parameters:

Name Type Description Default
name str

The name of the column as a string. The default is "index".

'index'
order_by str | Sequence[str]

Column(s) to order by when computing the row index.

required

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 5), (2, 4) df(a, b)")
>>> nw.from_native(lf_native).with_row_index(order_by="a").sort("a").collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  index: int64    |
|  a: int32        |
|  b: int32        |
|  ----            |
|  index: [[0,1]]  |
|  a: [[1,2]]      |
|  b: [[5,4]]      |
└──────────────────┘
>>> nw.from_native(lf_native).with_row_index(order_by="b").sort("a").collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  index: int64    |
|  a: int32        |
|  b: int32        |
|  ----            |
|  index: [[1,0]]  |
|  a: [[1,2]]      |
|  b: [[5,4]]      |
└──────────────────┘