`narwhals.LazyFrame`

Narwhals LazyFrame, backed by a native lazyframe.

Warning

This class is not meant to be instantiated directly - instead use narwhals.from_native with a native object that is a lazy dataframe from one of the supported backend (e.g. polars.LazyFrame, dask_expr._collection.DataFrame):

narwhals.from_native(native_lazyframe)

columns `property`

columns: list[str]

Get column names.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).columns
['a', 'b']

implementation `class-attribute` `instance-attribute`

implementation: _Implementation = _Implementation()

Return narwhals.Implementation of native frame.

This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.

Examples:

>>> import narwhals as nw
>>> import pandas as pd
>>> df_native = pd.DataFrame({"a": [1, 2, 3]})
>>> df = nw.from_native(df_native)
>>> df.implementation
<Implementation.PANDAS: 'pandas'>
>>> df.implementation.is_pandas()
True
>>> df.implementation.is_pandas_like()
True
>>> df.implementation.is_polars()
False

schema `property`

schema: Schema

Get an ordered mapping of column names to their data type.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).schema
Schema({'a': Int32, 'b': Decimal})

collect

collect(
    backend: (
        IntoBackend[Polars | Pandas | Arrow] | None
    ) = None,
    **kwargs: Any
) -> DataFrame[Any]

Materialize this LazyFrame into a DataFrame.

As each underlying lazyframe has different arguments to set when materializing the lazyframe into a dataframe, we allow to pass them as kwargs (see examples below for how to generalize the specification).

Parameters:

Name	Type	Description	Default
`backend`	`IntoBackend[Polars \| Pandas \| Arrow] \| None`	specifies which eager backend collect to. This will be the underlying backend for the resulting Narwhals DataFrame. If None, then the following default conversions will be applied `polars.LazyFrame` -> `polars.DataFrame` `dask.DataFrame` -> `pandas.DataFrame` `duckdb.PyRelation` -> `pyarrow.Table` `pyspark.DataFrame` -> `pyarrow.Table` `backend` can be specified in various ways As `Implementation.<BACKEND>` with `BACKEND` being `PANDAS`, `PYARROW` or `POLARS`. As a string: `"pandas"`, `"pyarrow"` or `"polars"` Directly as a module `pandas`, `pyarrow` or `polars`.	`None`
`kwargs`	`Any`	backend specific kwargs to pass along. To know more please check the backend specific documentation polars.LazyFrame.collect dask.dataframe.DataFrame.compute	`{}`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> lf = nw.from_native(lf_native)
>>> lf
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     1 │     2 │ |
|│     3 │     4 │ |
|└───────┴───────┘ |
└──────────────────┘
>>> lf.collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  a: int32        |
|  b: int32        |
|  ----            |
|  a: [[1,3]]      |
|  b: [[2,4]]      |
└──────────────────┘

collect_schema

collect_schema() -> Schema

Get an ordered mapping of column names to their data type.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).collect_schema()
Schema({'a': Int32, 'b': Decimal})

drop

drop(
    *columns: str | Iterable[str], strict: bool = True
) -> Self

Remove columns from the LazyFrame.

Parameters:

Name	Type	Description	Default
`*columns`	`str \| Iterable[str]`	Names of the columns that should be removed from the dataframe.	`()`
`strict`	`bool`	Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.	`True`

Warning

strict argument is ignored for polars<1.0.0.

Please consider upgrading to a newer version or pass to eager mode.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop("a").to_native()
┌───────┐
│   b   │
│ int32 │
├───────┤
│     2 │
│     4 │
└───────┘

drop_nulls

drop_nulls(subset: str | list[str] | None = None) -> Self

Drop rows that contain null values.

Parameters:

Name	Type	Description	Default
`subset`	`str \| list[str] \| None`	Column name(s) for which null values are considered. If set to None (default), use all columns.	`None`

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, NULL), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop_nulls()
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     3 │     4 │ |
|└───────┴───────┘ |
└──────────────────┘

explode

explode(
    columns: str | Sequence[str], *more_columns: str
) -> Self

Explode the dataframe to long format by exploding the given columns.

Notes

It is possible to explode multiple columns only if these columns have matching element counts.

Parameters:

Name	Type	Description	Default
`columns`	`str \| Sequence[str]`	Column names. The underlying columns being exploded must be of the `List` data type.	required
`*more_columns`	`str`	Additional names of columns to explode, specified as positional arguments.	`()`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('x', [1, 2]), ('y', [3, 4]), ('z', [5, 6]) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.explode("b").to_native()
┌─────────┬───────┐
│    a    │   b   │
│ varchar │ int32 │
├─────────┼───────┤
│ x       │     1 │
│ x       │     2 │
│ y       │     3 │
│ y       │     4 │
│ z       │     5 │
│ z       │     6 │
└─────────┴───────┘

filter

filter(
    *predicates: IntoExpr | Iterable[IntoExpr],
    **constraints: Any
) -> Self

Filter the rows in the LazyFrame based on a predicate expression.

The original order of the remaining rows is preserved.

Parameters:

Name	Type	Description	Default
`*predicates`	`IntoExpr \| Iterable[IntoExpr]`	Expression(s) that evaluates to a boolean Series.	`()`
`**constraints`	`Any`	Column filters; use `name = value` to filter columns by the supplied value. Each constraint will behave the same as `nw.col(name).eq(value)`, and will be implicitly joined with the other filter conditions using &.	`{}`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql('''
...     SELECT * FROM VALUES
...         (1, 6, 'a'),
...         (2, 7, 'b'),
...         (3, 8, 'c')
...     df(foo, bar, ham)
... ''')

Filter on one condition

>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     2 │     7 │ b       │
│     3 │     8 │ c       │
└───────┴───────┴─────────┘

Filter on multiple conditions with implicit &

>>> nw.from_native(df_native).filter(
...     nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     1 │     6 │ a       │
└───────┴───────┴─────────┘

Filter on multiple conditions with |

>>> nw.from_native(df_native).filter(
...     (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     1 │     6 │ a       │
│     3 │     8 │ c       │
└───────┴───────┴─────────┘

Filter using **kwargs syntax

>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     2 │     7 │ b       │
└───────┴───────┴─────────┘

gather_every

gather_every(n: int, offset: int = 0) -> Self

group_by

group_by(
    *keys: IntoExpr | Iterable[IntoExpr],
    drop_null_keys: Literal[False] = ...
) -> LazyGroupBy[Self]

group_by(
    *keys: str | Iterable[str],
    drop_null_keys: Literal[True]
) -> LazyGroupBy[Self]

group_by(
    *keys: IntoExpr | Iterable[IntoExpr],
    drop_null_keys: bool = False
) -> LazyGroupBy[Self]

Start a group by operation.

Parameters:

Name	Type	Description	Default
`*keys`	`IntoExpr \| Iterable[IntoExpr]`	Column(s) to group by. Accepts expression input. Strings are parsed as column names.	`()`
`drop_null_keys`	`bool`	if True, then groups where any key is null won't be included in the result.	`False`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'a'), (2, 'b'), (3, 'a') df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.group_by("b").agg(nw.col("a").sum()).sort("b").to_native()
┌─────────┬────────┐
│    b    │   a    │
│ varchar │ int128 │
├─────────┼────────┤
│ a       │      4 │
│ b       │      2 │
└─────────┴────────┘

Expressions are also accepted.

>>> df.group_by(nw.col("b").str.len_chars()).agg(
...     nw.col("a").sum()
... ).to_native()
┌───────┬────────┐
│   b   │   a    │
│ int64 │ int128 │
├───────┼────────┤
│     1 │      6 │
└───────┴────────┘

head

head(n: int = 5) -> Self

Get n rows.

Parameters:

Name	Type	Description	Default
`n`	`int`	Number of rows to return.	`5`

Examples:

>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2, 3], "b": [4, 5, 6]}, npartitions=1)
>>> nw.from_native(lf_native).head(2).collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|        a  b      |
|     0  1  4      |
|     1  2  5      |
└──────────────────┘

join

join(
    other: Self,
    on: str | list[str] | None = None,
    how: JoinStrategy = "inner",
    *,
    left_on: str | list[str] | None = None,
    right_on: str | list[str] | None = None,
    suffix: str = "_right"
) -> Self

Add a join operation to the Logical Plan.

Parameters:

Name	Type	Description	Default
`other`	`Self`	Lazy DataFrame to join with.	required
`on`	`str \| list[str] \| None`	Name(s) of the join columns in both DataFrames. If set, `left_on` and `right_on` should be None.	`None`
`how`	`JoinStrategy`	Join strategy. inner: Returns rows that have matching values in both tables. left: Returns all rows from the left table, and the matched rows from the right table. full: Returns all rows in both dataframes, with the suffix appended to the right join keys. cross: Returns the Cartesian product of rows from both tables. semi: Filter rows that have a match in the right table. anti: Filter rows that do not have a match in the right table.	`'inner'`
`left_on`	`str \| list[str] \| None`	Join column of the left DataFrame.	`None`
`right_on`	`str \| list[str] \| None`	Join column of the right DataFrame.	`None`
`suffix`	`str`	Suffix to append to columns with a duplicate name.	`'_right'`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native1 = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'a'), (2, 'b') df(a, b)"
... )
>>> df_native2 = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'x'), (3, 'y') df(a, c)"
... )
>>> df1 = nw.from_native(df_native1)
>>> df2 = nw.from_native(df_native2)
>>> df1.join(df2, on="a")
┌─────────────────────────────┐
|     Narwhals LazyFrame      |
|-----------------------------|
|┌───────┬─────────┬─────────┐|
|│   a   │    b    │    c    │|
|│ int32 │ varchar │ varchar │|
|├───────┼─────────┼─────────┤|
|│     1 │ a       │ x       │|
|└───────┴─────────┴─────────┘|
└─────────────────────────────┘

join_asof

join_asof(
    other: Self,
    *,
    left_on: str | None = None,
    right_on: str | None = None,
    on: str | None = None,
    by_left: str | list[str] | None = None,
    by_right: str | list[str] | None = None,
    by: str | list[str] | None = None,
    strategy: AsofJoinStrategy = "backward",
    suffix: str = "_right"
) -> Self

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

For Polars, both DataFrames must be sorted by the on key (within each by group if specified).

Parameters:

Name	Type	Description	Default
`other`	`Self`	DataFrame to join with.	required
`left_on`	`str \| None`	Name(s) of the left join column(s).	`None`
`right_on`	`str \| None`	Name(s) of the right join column(s).	`None`
`on`	`str \| None`	Join column of both DataFrames. If set, left_on and right_on should be None.	`None`
`by_left`	`str \| list[str] \| None`	join on these columns before doing asof join	`None`
`by_right`	`str \| list[str] \| None`	join on these columns before doing asof join	`None`
`by`	`str \| list[str] \| None`	join on these columns before doing asof join	`None`
`strategy`	`AsofJoinStrategy`	Join strategy. The default is "backward". backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key. forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key. nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.	`'backward'`
`suffix`	`str`	Suffix to append to columns with a duplicate name.	`'_right'`

Examples:

>>> from datetime import datetime
>>> import polars as pl
>>> import narwhals as nw
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pl.DataFrame(data_gdp)
>>> population_native = pl.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward").to_native()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

lazy

lazy() -> Self

Restrict available API methods to lazy-only ones.

This is a no-op, and exists only for compatibility with DataFrame.lazy.

pipe

pipe(
    function: Callable[Concatenate[Self, PS], R],
    *args: args,
    **kwargs: kwargs
) -> R

Pipe function call.

Parameters:

Name	Type	Description	Default
`function`	`Callable[Concatenate[Self, PS], R]`	Function to apply.	required
`args`	`args`	Positional arguments to pass to function.	`()`
`kwargs`	`kwargs`	Keyword arguments to pass to function.	`{}`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).pipe(lambda x: x.select("a")).to_native()
┌───────┐
│   a   │
│ int32 │
├───────┤
│     1 │
│     3 │
└───────┘

rename

rename(mapping: dict[str, str]) -> Self

Rename column names.

Parameters:

Name	Type	Description	Default
`mapping`	`dict[str, str]`	Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name.	required

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).rename({"a": "c"})
┌────────────────────────┐
|   Narwhals LazyFrame   |
|------------------------|
|┌───────┬──────────────┐|
|│   c   │      b       │|
|│ int32 │ decimal(2,1) │|
|├───────┼──────────────┤|
|│     1 │          4.5 │|
|│     3 │          2.0 │|
|└───────┴──────────────┘|
└────────────────────────┘

select

select(
    *exprs: IntoExpr | Iterable[IntoExpr],
    **named_exprs: IntoExpr
) -> Self

Select columns from this LazyFrame.

Parameters:

Name	Type	Description	Default
`*exprs`	`IntoExpr \| Iterable[IntoExpr]`	Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names.	`()`
`**named_exprs`	`IntoExpr`	Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.	`{}`

Notes

If you'd like to select a column whose name isn't a string (for example, if you're working with pandas) then you should explicitly use nw.col instead of just passing the column name. For example, to select a column named 0 use df.select(nw.col(0)), not df.select(0).

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).select("a", a_plus_1=nw.col("a") + 1)
┌────────────────────┐
| Narwhals LazyFrame |
|--------------------|
|┌───────┬──────────┐|
|│   a   │ a_plus_1 │|
|│ int32 │  int32   │|
|├───────┼──────────┤|
|│     1 │        2 │|
|│     3 │        4 │|
|└───────┴──────────┘|
└────────────────────┘

sink_parquet

sink_parquet(file: str | Path | BytesIO) -> None

Write LazyFrame to Parquet file.

This may allow larger-than-RAM datasets to be written to disk.

Parameters:

Name	Type	Description	Default
`file`	`str \| Path \| BytesIO`	String, path object or file-like object to which the dataframe will be written.	required

Examples:

>>> import polars as pl
>>> import narwhals as nw
>>> df_native = pl.LazyFrame({"foo": [1, 2], "bar": [6.0, 7.0]})
>>> df = nw.from_native(df_native)
>>> df.sink_parquet("out.parquet")

sort

sort(
    by: str | Iterable[str],
    *more_by: str,
    descending: bool | Sequence[bool] = False,
    nulls_last: bool = False
) -> Self

Sort the LazyFrame by the given columns.

Parameters:

Name	Type	Description	Default
`by`	`str \| Iterable[str]`	Column(s) names to sort by.	required
`*more_by`	`str`	Additional columns to sort by, specified as positional arguments.	`()`
`descending`	`bool \| Sequence[bool]`	Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.	`False`
`nulls_last`	`bool`	Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.	`False`

Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES (1, 6.0, 'a'), (2, 5.0, 'c'), (NULL, 4.0, 'b') df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.sort("a")
┌──────────────────────────────────┐
|        Narwhals LazyFrame        |
|----------------------------------|
|┌───────┬──────────────┬─────────┐|
|│   a   │      b       │    c    │|
|│ int32 │ decimal(2,1) │ varchar │|
|├───────┼──────────────┼─────────┤|
|│  NULL │          4.0 │ b       │|
|│     1 │          6.0 │ a       │|
|│     2 │          5.0 │ c       │|
|└───────┴──────────────┴─────────┘|
└──────────────────────────────────┘

tail

tail(n: int) -> Self

to_native

to_native() -> LazyFrameT

Convert Narwhals LazyFrame to native one.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).to_native()
┌───────┬───────┐
│   a   │   b   │
│ int32 │ int32 │
├───────┼───────┤
│     1 │     2 │
│     3 │     4 │
└───────┴───────┘

top_k

top_k(
    k: int,
    *,
    by: str | Iterable[str],
    reverse: bool | Sequence[bool] = False
) -> Self

Return the k largest rows.

Non-null elements are always preferred over null elements, regardless of the value of reverse. The output is not guaranteed to be in any particular order, sort the outputs afterwards if you wish the output to be sorted.

Parameters:

Name	Type	Description	Default
`k`	`int`	Number of rows to return.	required
`by`	`str \| Iterable[str]`	Column(s) used to determine the top rows. Accepts expression input. Strings are parsed as column names.	required
`reverse`	`bool \| Sequence[bool]`	Consider the k smallest elements of the by column(s) (instead of the k largest). This can be specified per column by passing a sequence of booleans.	`False`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('a', 2), ('b', 1), ('a', 1), ('b', 3), (NULL, 2), ('c', 1) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.top_k(4, by=["b", "a"])
┌───────────────────┐
|Narwhals LazyFrame |
|-------------------|
|┌─────────┬───────┐|
|│    a    │   b   │|
|│ varchar │ int32 │|
|├─────────┼───────┤|
|│ b       │     3 │|
|│ a       │     2 │|
|│ NULL    │     2 │|
|│ c       │     1 │|
|└─────────┴───────┘|
└───────────────────┘

unique

unique(
    subset: str | list[str] | None = None,
    *,
    keep: UniqueKeepStrategy = "any",
    order_by: str | Sequence[str] | None = None
) -> Self

Drop duplicate rows from this LazyFrame.

Parameters:

Name	Type	Description	Default
`subset`	`str \| list[str] \| None`	Column name(s) to consider when identifying duplicate rows. If set to `None`, use all columns.	`None`
`keep`	`UniqueKeepStrategy`	{'any', 'none', 'first', 'last} Which of the duplicate rows to keep. 'any': Does not give any guarantee of which row is kept. 'none': Don't keep duplicate rows. 'first': Keep the first row. Requires `order_by` to be specified. 'last': Keep the last row. Requires `order_by` to be specified.	`'any'`
`order_by`	`str \| Sequence[str] \| None`	Column(s) to order by when computing the row index.	`None`

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 3), (1, 4) df(a, b)")
>>> nw.from_native(lf_native).unique("a").sort("a", descending=True)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     1 │     3 │ |
|└───────┴───────┘ |
└──────────────────┘

unpivot

unpivot(
    on: str | list[str] | None = None,
    *,
    index: str | list[str] | None = None,
    variable_name: str = "variable",
    value_name: str = "value"
) -> Self

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name	Type	Description	Default
`on`	`str \| list[str] \| None`	Column(s) to use as values variables; if `on` is empty all columns that are not in `index` will be used.	`None`
`index`	`str \| list[str] \| None`	Column(s) to use as identifier variables.	`None`
`variable_name`	`str`	Name to give to the `variable` column. Defaults to "variable".	`'variable'`
`value_name`	`str`	Name to give to the `value` column. Defaults to "value".	`'value'`

Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('x', 1, 2), ('y', 3, 4), ('z', 5, 6) df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.unpivot(on=["b", "c"], index="a").sort("a", "variable").to_native()
┌─────────┬──────────┬───────┐
│    a    │ variable │ value │
│ varchar │ varchar  │ int32 │
├─────────┼──────────┼───────┤
│ x       │ b        │     1 │
│ x       │ c        │     2 │
│ y       │ b        │     3 │
│ y       │ c        │     4 │
│ z       │ b        │     5 │
│ z       │ c        │     6 │
└─────────┴──────────┴───────┘

with_columns

with_columns(
    *exprs: IntoExpr | Iterable[IntoExpr],
    **named_exprs: IntoExpr
) -> Self

Add columns to this LazyFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name	Type	Description	Default
`*exprs`	`IntoExpr \| Iterable[IntoExpr]`	Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.	`()`
`**named_exprs`	`IntoExpr`	Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.	`{}`

Note

Creating a new LazyFrame using this method does not create a new copy of existing data.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).with_columns(c=nw.col("a") + 1)
┌────────────────────────────────┐
|       Narwhals LazyFrame       |
|--------------------------------|
|┌───────┬──────────────┬───────┐|
|│   a   │      b       │   c   │|
|│ int32 │ decimal(2,1) │ int32 │|
|├───────┼──────────────┼───────┤|
|│     1 │          4.5 │     2 │|
|│     3 │          2.0 │     4 │|
|└───────┴──────────────┴───────┘|
└────────────────────────────────┘

with_row_index

with_row_index(
    name: str = "index", *, order_by: str | Sequence[str]
) -> Self

Insert column which enumerates rows.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the column as a string. The default is "index".	`'index'`
`order_by`	`str \| Sequence[str]`	Column(s) to order by when computing the row index.	required

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 5), (2, 4) df(a, b)")
>>> nw.from_native(lf_native).with_row_index(order_by="a").sort("a").collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  index: int64    |
|  a: int32        |
|  b: int32        |
|  ----            |
|  index: [[0,1]]  |
|  a: [[1,2]]      |
|  b: [[5,4]]      |
└──────────────────┘
>>> nw.from_native(lf_native).with_row_index(order_by="b").sort("a").collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  index: int64    |
|  a: int32        |
|  b: int32        |
|  ----            |
|  index: [[1,0]]  |
|  a: [[1,2]]      |
|  b: [[5,4]]      |
└──────────────────┘

narwhals.LazyFrame

columns property

implementation class-attribute instance-attribute

schema property

collect

collect_schema

drop

drop_nulls

explode

filter

gather_every

group_by

head

join

join_asof

lazy

pipe

rename

select

sink_parquet

sort

tail

to_native

top_k

unique

unpivot

with_columns

with_row_index

`narwhals.LazyFrame`

columns `property`

implementation `class-attribute` `instance-attribute`

schema `property`