Skip to content

narwhals.LazyFrame

Narwhals LazyFrame, backed by a native lazyframe.

Warning

This class is not meant to be instantiated directly - instead use narwhals.from_native with a native object that is a lazy dataframe from one of the supported backend (e.g. polars.LazyFrame, dask_expr._collection.DataFrame):

narwhals.from_native(native_lazyframe)

columns: list[str] property

Get column names.

Returns:

Type Description
list[str]

The column names stored in a list.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).columns
['a', 'b']

implementation: Implementation property

Return implementation of native frame.

This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.

Returns:

Type Description
Implementation

Implementation.

Examples:

>>> import narwhals as nw
>>> import dask.dataframe as dd
>>> lf_native = dd.from_dict({"a": [1, 2]}, npartitions=1)
>>> nw.from_native(lf_native).implementation
<Implementation.DASK: 7>

schema: Schema property

Get an ordered mapping of column names to their data type.

Returns:

Type Description
Schema

A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).schema
Schema({'a': Int32, 'b': Decimal})

clone() -> Self

Create a copy of this DataFrame.

Returns:

Type Description
Self

An identical copy of the original LazyFrame.

collect(backend: ModuleType | Implementation | str | None = None, **kwargs: Any) -> DataFrame[Any]

Materialize this LazyFrame into a DataFrame.

As each underlying lazyframe has different arguments to set when materializing the lazyframe into a dataframe, we allow to pass them as kwargs (see examples below for how to generalize the specification).

Parameters:

Name Type Description Default
backend ModuleType | Implementation | str | None

specifies which eager backend collect to. This will be the underlying backend for the resulting Narwhals DataFrame. If None, then the following default conversions will be applied:

  • polars.LazyFrame -> polars.DataFrame
  • dask.DataFrame -> pandas.DataFrame
  • duckdb.PyRelation -> pyarrow.Table
  • pyspark.DataFrame -> pyarrow.Table

backend can be specified in various ways:

  • As Implementation.<BACKEND> with BACKEND being PANDAS, PYARROW or POLARS.
  • As a string: "pandas", "pyarrow" or "polars"
  • Directly as a module pandas, pyarrow or polars.
None
kwargs Any

backend specific kwargs to pass along. To know more please check the backend specific documentation:

{}

Returns:

Type Description
DataFrame[Any]

DataFrame

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> lf = nw.from_native(lf_native)
>>> lf
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     1 │     2 │ |
|│     3 │     4 │ |
|└───────┴───────┘ |
└──────────────────┘
>>> lf.collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  pyarrow.Table   |
|  a: int32        |
|  b: int32        |
|  ----            |
|  a: [[1,3]]      |
|  b: [[2,4]]      |
└──────────────────┘

collect_schema() -> Schema

Get an ordered mapping of column names to their data type.

Returns:

Type Description
Schema

A Narwhals Schema object that displays the mapping of column names.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).collect_schema()
Schema({'a': Int32, 'b': Decimal})

drop(*columns: str | Iterable[str], strict: bool = True) -> Self

Remove columns from the LazyFrame.

Parameters:

Name Type Description Default
*columns str | Iterable[str]

Names of the columns that should be removed from the dataframe.

()
strict bool

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

True

Returns:

Type Description
Self

The LazyFrame with the specified columns removed.

Warning

strict argument is ignored for polars<1.0.0.

Please consider upgrading to a newer version or pass to eager mode.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop("a").to_native()
┌───────┐
│   b   │
│ int32 │
├───────┤
│     2 │
│     4 │
└───────┘

drop_nulls(subset: str | list[str] | None = None) -> Self

Drop rows that contain null values.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) for which null values are considered. If set to None (default), use all columns.

None

Returns:

Type Description
Self

The original object with the rows removed that contained the null values.

Notes

pandas handles null values differently from Polars and PyArrow. See null_handling for reference.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, NULL), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop_nulls()
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     3 │     4 │ |
|└───────┴───────┘ |
└──────────────────┘

explode(columns: str | Sequence[str], *more_columns: str) -> Self

Explode the dataframe to long format by exploding the given columns.

Notes

It is possible to explode multiple columns only if these columns have matching element counts.

Parameters:

Name Type Description Default
columns str | Sequence[str]

Column names. The underlying columns being exploded must be of the List data type.

required
*more_columns str

Additional names of columns to explode, specified as positional arguments.

()

Returns:

Type Description
Self

New LazyFrame

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('x', [1, 2]), ('y', [3, 4]), ('z', [5, 6]) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.explode("b").to_native()
┌─────────┬───────┐
│    a    │   b   │
│ varchar │ int32 │
├─────────┼───────┤
│ x       │     1 │
│ x       │     2 │
│ y       │     3 │
│ y       │     4 │
│ z       │     5 │
│ z       │     6 │
└─────────┴───────┘

filter(*predicates: IntoExpr | Iterable[IntoExpr] | list[bool], **constraints: Any) -> Self

Filter the rows in the LazyFrame based on a predicate expression.

The original order of the remaining rows is preserved.

Parameters:

Name Type Description Default
*predicates IntoExpr | Iterable[IntoExpr] | list[bool]

Expression that evaluates to a boolean Series. Can also be a (single!) boolean list.

()
**constraints Any

Column filters; use name = value to filter columns by the supplied value. Each constraint will behave the same as nw.col(name).eq(value), and will be implicitly joined with the other filter conditions using &.

{}

Returns:

Type Description
Self

The filtered LazyFrame.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql('''
...     SELECT * FROM VALUES
...         (1, 6, 'a'),
...         (2, 7, 'b'),
...         (3, 8, 'c')
...     df(foo, bar, ham)
... ''')

Filter on one condition

>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     2 │     7 │ b       │
│     3 │     8 │ c       │
└───────┴───────┴─────────┘

Filter on multiple conditions with implicit &

>>> nw.from_native(df_native).filter(
...     nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     1 │     6 │ a       │
└───────┴───────┴─────────┘

Filter on multiple conditions with |

>>> nw.from_native(df_native).filter(
...     (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     1 │     6 │ a       │
│     3 │     8 │ c       │
└───────┴───────┴─────────┘

Filter using **kwargs syntax

>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
┌───────┬───────┬─────────┐
│  foo  │  bar  │   ham   │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│     2 │     7 │ b       │
└───────┴───────┴─────────┘

gather_every(n: int, offset: int = 0) -> Self

Take every nth row in the DataFrame and return as a new DataFrame.

Parameters:

Name Type Description Default
n int

Gather every n-th row.

required
offset int

Starting index.

0

Returns:

Type Description
Self

The LazyFrame containing only the selected rows.

group_by(*keys: str | Iterable[str], drop_null_keys: bool = False) -> LazyGroupBy[Self]

Start a group by operation.

Parameters:

Name Type Description Default
*keys str | Iterable[str]

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

()
drop_null_keys bool

if True, then groups where any key is null won't be included in the result.

False

Returns:

Type Description
LazyGroupBy[Self]

Object which can be used to perform aggregations.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'a'), (2, 'b'), (3, 'a') df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.group_by("b").agg(nw.col("a").sum()).sort("b").to_native()
┌─────────┬────────┐
│    b    │   a    │
│ varchar │ int128 │
├─────────┼────────┤
│ a       │      4 │
│ b       │      2 │
└─────────┴────────┘

head(n: int = 5) -> Self

Get n rows.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Returns:

Type Description
Self

A subset of the LazyFrame of shape (n, n_columns).

Examples:

>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2, 3], "b": [4, 5, 6]}, npartitions=1)
>>> nw.from_native(lf_native).head(2).collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|        a  b      |
|     0  1  4      |
|     1  2  5      |
└──────────────────┘

join(other: Self, on: str | list[str] | None = None, how: Literal['inner', 'left', 'cross', 'semi', 'anti'] = 'inner', *, left_on: str | list[str] | None = None, right_on: str | list[str] | None = None, suffix: str = '_right') -> Self

Add a join operation to the Logical Plan.

Parameters:

Name Type Description Default
other Self

Lazy DataFrame to join with.

required
on str | list[str] | None

Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None.

None
how Literal['inner', 'left', 'cross', 'semi', 'anti']

Join strategy.

  • inner: Returns rows that have matching values in both tables.
  • left: Returns all rows from the left table, and the matched rows from the right table.
  • cross: Returns the Cartesian product of rows from both tables.
  • semi: Filter rows that have a match in the right table.
  • anti: Filter rows that do not have a match in the right table.
'inner'
left_on str | list[str] | None

Join column of the left DataFrame.

None
right_on str | list[str] | None

Join column of the right DataFrame.

None
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Returns:

Type Description
Self

A new joined LazyFrame.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native1 = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'a'), (2, 'b') df(a, b)"
... )
>>> df_native2 = duckdb.sql(
...     "SELECT * FROM VALUES (1, 'x'), (3, 'y') df(a, c)"
... )
>>> df1 = nw.from_native(df_native1)
>>> df2 = nw.from_native(df_native2)
>>> df1.join(df2, on="a")
┌─────────────────────────────┐
|     Narwhals LazyFrame      |
|-----------------------------|
|┌───────┬─────────┬─────────┐|
|│   a   │    b    │    c    │|
|│ int32 │ varchar │ varchar │|
|├───────┼─────────┼─────────┤|
|│     1 │ a       │ x       │|
|└───────┴─────────┴─────────┘|
└─────────────────────────────┘

join_asof(other: Self, *, left_on: str | None = None, right_on: str | None = None, on: str | None = None, by_left: str | list[str] | None = None, by_right: str | list[str] | None = None, by: str | list[str] | None = None, strategy: Literal['backward', 'forward', 'nearest'] = 'backward', suffix: str = '_right') -> Self

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the asof_join key.

Parameters:

Name Type Description Default
other Self

DataFrame to join with.

required
left_on str | None

Name(s) of the left join column(s).

None
right_on str | None

Name(s) of the right join column(s).

None
on str | None

Join column of both DataFrames. If set, left_on and right_on should be None.

None
by_left str | list[str] | None

join on these columns before doing asof join

None
by_right str | list[str] | None

join on these columns before doing asof join

None
by str | list[str] | None

join on these columns before doing asof join

None
strategy Literal['backward', 'forward', 'nearest']

Join strategy. The default is "backward".

  • backward: selects the last row in the right DataFrame whose "on" key is less than or equal to the left's key.
  • forward: selects the first row in the right DataFrame whose "on" key is greater than or equal to the left's key.
  • nearest: search selects the last row in the right DataFrame whose value is nearest to the left's key.
'backward'
suffix str

Suffix to append to columns with a duplicate name.

'_right'

Returns:

Type Description
Self

A new joined LazyFrame.

Examples:

>>> from datetime import datetime
>>> import polars as pl
>>> import narwhals as nw
>>> data_gdp = {
...     "datetime": [
...         datetime(2016, 1, 1),
...         datetime(2017, 1, 1),
...         datetime(2018, 1, 1),
...         datetime(2019, 1, 1),
...         datetime(2020, 1, 1),
...     ],
...     "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
...     "datetime": [
...         datetime(2016, 3, 1),
...         datetime(2018, 8, 1),
...         datetime(2019, 1, 1),
...     ],
...     "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pl.DataFrame(data_gdp)
>>> population_native = pl.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward").to_native()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime            ┆ population ┆ gdp  │
│ ---                 ┆ ---        ┆ ---  │
│ datetime[μs]        ┆ f64        ┆ i64  │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19      ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66      ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12      ┆ 4696 │
└─────────────────────┴────────────┴──────┘

lazy() -> Self

Restrict available API methods to lazy-only ones.

This is a no-op, and exists only for compatibility with DataFrame.lazy.

Returns:

Type Description
Self

A LazyFrame.

pipe(function: Callable[Concatenate[Self, PS], R], *args: PS.args, **kwargs: PS.kwargs) -> R

Pipe function call.

Parameters:

Name Type Description Default
function Callable[Concatenate[Self, PS], R]

Function to apply.

required
args args

Positional arguments to pass to function.

()
kwargs kwargs

Keyword arguments to pass to function.

{}

Returns:

Type Description
R

The original object with the function applied.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).pipe(lambda x: x.select("a")).to_native()
┌───────┐
│   a   │
│ int32 │
├───────┤
│     1 │
│     3 │
└───────┘

rename(mapping: dict[str, str]) -> Self

Rename column names.

Parameters:

Name Type Description Default
mapping dict[str, str]

Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name.

required

Returns:

Type Description
Self

The LazyFrame with the specified columns renamed.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).rename({"a": "c"})
┌────────────────────────┐
|   Narwhals LazyFrame   |
|------------------------|
|┌───────┬──────────────┐|
|│   c   │      b       │|
|│ int32 │ decimal(2,1) │|
|├───────┼──────────────┤|
|│     1 │          4.5 │|
|│     3 │          2.0 │|
|└───────┴──────────────┘|
└────────────────────────┘

select(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr) -> Self

Select columns from this LazyFrame.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

()
**named_exprs IntoExpr

Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Type Description
Self

The LazyFrame containing only the selected columns.

Notes

If you'd like to select a column whose name isn't a string (for example, if you're working with pandas) then you should explicitly use nw.col instead of just passing the column name. For example, to select a column named 0 use df.select(nw.col(0)), not df.select(0).

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).select("a", a_plus_1=nw.col("a") + 1)
┌────────────────────┐
| Narwhals LazyFrame |
|--------------------|
|┌───────┬──────────┐|
|│   a   │ a_plus_1 │|
|│ int32 │  int32   │|
|├───────┼──────────┤|
|│     1 │        2 │|
|│     3 │        4 │|
|└───────┴──────────┘|
└────────────────────┘

sort(by: str | Iterable[str], *more_by: str, descending: bool | Sequence[bool] = False, nulls_last: bool = False) -> Self

Sort the LazyFrame by the given columns.

Parameters:

Name Type Description Default
by str | Iterable[str]

Column(s) names to sort by.

required
*more_by str

Additional columns to sort by, specified as positional arguments.

()
descending bool | Sequence[bool]

Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.

False
nulls_last bool

Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.

False

Returns:

Type Description
Self

The sorted LazyFrame.

Warning

Unlike Polars, it is not possible to specify a sequence of booleans for nulls_last in order to control per-column behaviour. Instead a single boolean is applied for all by columns.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES (1, 6.0, 'a'), (2, 5.0, 'c'), (NULL, 4.0, 'b') df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.sort("a")
┌──────────────────────────────────┐
|        Narwhals LazyFrame        |
|----------------------------------|
|┌───────┬──────────────┬─────────┐|
|│   a   │      b       │    c    │|
|│ int32 │ decimal(2,1) │ varchar │|
|├───────┼──────────────┼─────────┤|
|│  NULL │          4.0 │ b       │|
|│     1 │          6.0 │ a       │|
|│     2 │          5.0 │ c       │|
|└───────┴──────────────┴─────────┘|
└──────────────────────────────────┘

tail(n: int = 5) -> Self

Get the last n rows.

Warning

LazyFrame.tail is deprecated and will be removed in a future version. Note: this will remain available in narwhals.stable.v1. See stable api for more information.

Parameters:

Name Type Description Default
n int

Number of rows to return.

5

Returns:

Type Description
Self

A subset of the LazyFrame of shape (n, n_columns).

to_native() -> FrameT

Convert Narwhals LazyFrame to native one.

Returns:

Type Description
FrameT

Object of class that user started with.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).to_native()
┌───────┬───────┐
│   a   │   b   │
│ int32 │ int32 │
├───────┼───────┤
│     1 │     2 │
│     3 │     4 │
└───────┴───────┘

unique(subset: str | list[str] | None = None, *, keep: Literal['any', 'none'] = 'any', maintain_order: bool | None = None) -> Self

Drop duplicate rows from this LazyFrame.

Parameters:

Name Type Description Default
subset str | list[str] | None

Column name(s) to consider when identifying duplicate rows. If set to None, use all columns.

None
keep Literal['any', 'none']

{'first', 'none'} Which of the duplicate rows to keep.

  • 'any': Does not give any guarantee of which row is kept. This allows more optimizations.
  • 'none': Don't keep duplicate rows.
'any'
maintain_order bool | None

Has no effect and is kept around only for backwards-compatibility.

None

Returns:

Type Description
Self

The LazyFrame with unique rows.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 1), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).unique("a").sort("a", descending=True)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│   a   │   b   │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│     3 │     4 │ |
|│     1 │     1 │ |
|└───────┴───────┘ |
└──────────────────┘

unpivot(on: str | list[str] | None = None, *, index: str | list[str] | None = None, variable_name: str = 'variable', value_name: str = 'value') -> Self

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.

Parameters:

Name Type Description Default
on str | list[str] | None

Column(s) to use as values variables; if on is empty all columns that are not in index will be used.

None
index str | list[str] | None

Column(s) to use as identifier variables.

None
variable_name str

Name to give to the variable column. Defaults to "variable".

'variable'
value_name str

Name to give to the value column. Defaults to "value".

'value'

Returns:

Type Description
Self

The unpivoted LazyFrame.

Notes

If you're coming from pandas, this is similar to pandas.DataFrame.melt, but with index replacing id_vars and on replacing value_vars. In other frameworks, you might know this operation as pivot_longer.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
...     "SELECT * FROM VALUES ('x', 1, 2), ('y', 3, 4), ('z', 5, 6) df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.unpivot(on=["b", "c"], index="a").sort("a", "variable").to_native()
┌─────────┬──────────┬───────┐
│    a    │ variable │ value │
│ varchar │ varchar  │ int32 │
├─────────┼──────────┼───────┤
│ x       │ b        │     1 │
│ x       │ c        │     2 │
│ y       │ b        │     3 │
│ y       │ c        │     4 │
│ z       │ b        │     5 │
│ z       │ c        │     6 │
└─────────┴──────────┴───────┘

with_columns(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr) -> Self

Add columns to this LazyFrame.

Added columns will replace existing columns with the same name.

Parameters:

Name Type Description Default
*exprs IntoExpr | Iterable[IntoExpr]

Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

()
**named_exprs IntoExpr

Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

{}

Returns:

Name Type Description
LazyFrame Self

A new LazyFrame with the columns added.

Note

Creating a new LazyFrame using this method does not create a new copy of existing data.

Examples:

>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).with_columns(c=nw.col("a") + 1)
┌────────────────────────────────┐
|       Narwhals LazyFrame       |
|--------------------------------|
|┌───────┬──────────────┬───────┐|
|│   a   │      b       │   c   │|
|│ int32 │ decimal(2,1) │ int32 │|
|├───────┼──────────────┼───────┤|
|│     1 │          4.5 │     2 │|
|│     3 │          2.0 │     4 │|
|└───────┴──────────────┴───────┘|
└────────────────────────────────┘

with_row_index(name: str = 'index') -> Self

Insert column which enumerates rows.

Parameters:

Name Type Description Default
name str

The name of the column as a string. The default is "index".

'index'

Returns:

Type Description
Self

The original object with the column added.

Examples:

>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2], "b": [4, 5]}, npartitions=1)
>>> nw.from_native(lf_native).with_row_index().collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|     index  a  b  |
|  0      0  1  4  |
|  1      1  2  5  |
└──────────────────┘