narwhals.LazyFrame
Narwhals LazyFrame, backed by a native lazyframe.
Warning
This class is not meant to be instantiated directly - instead use
narwhals.from_native
with a native
object that is a lazy dataframe from one of the supported
backend (e.g. polars.LazyFrame, dask_expr._collection.DataFrame):
narwhals.from_native(native_lazyframe)
columns: list[str]
property
Get column names.
Returns:
Type | Description |
---|---|
list[str]
|
The column names stored in a list. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).columns
['a', 'b']
implementation: Implementation
property
Return implementation of native frame.
This can be useful when you need to use special-casing for features outside of Narwhals' scope - for example, when dealing with pandas' Period Dtype.
Returns:
Type | Description |
---|---|
Implementation
|
Implementation. |
Examples:
>>> import narwhals as nw
>>> import dask.dataframe as dd
>>> lf_native = dd.from_dict({"a": [1, 2]}, npartitions=1)
>>> nw.from_native(lf_native).implementation
<Implementation.DASK: 7>
schema: Schema
property
Get an ordered mapping of column names to their data type.
Returns:
Type | Description |
---|---|
Schema
|
A Narwhals Schema object that displays the mapping of column names. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).schema
Schema({'a': Int32, 'b': Decimal})
clone() -> Self
Create a copy of this DataFrame.
Returns:
Type | Description |
---|---|
Self
|
An identical copy of the original LazyFrame. |
collect(backend: ModuleType | Implementation | str | None = None, **kwargs: Any) -> DataFrame[Any]
Materialize this LazyFrame into a DataFrame.
As each underlying lazyframe has different arguments to set when materializing the lazyframe into a dataframe, we allow to pass them as kwargs (see examples below for how to generalize the specification).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backend
|
ModuleType | Implementation | str | None
|
specifies which eager backend collect to. This will be the underlying backend for the resulting Narwhals DataFrame. If None, then the following default conversions will be applied:
|
None
|
kwargs
|
Any
|
backend specific kwargs to pass along. To know more please check the backend specific documentation: |
{}
|
Returns:
Type | Description |
---|---|
DataFrame[Any]
|
DataFrame |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> lf = nw.from_native(lf_native)
>>> lf
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│ 1 │ 2 │ |
|│ 3 │ 4 │ |
|└───────┴───────┘ |
└──────────────────┘
>>> lf.collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| pyarrow.Table |
| a: int32 |
| b: int32 |
| ---- |
| a: [[1,3]] |
| b: [[2,4]] |
└──────────────────┘
collect_schema() -> Schema
Get an ordered mapping of column names to their data type.
Returns:
Type | Description |
---|---|
Schema
|
A Narwhals Schema object that displays the mapping of column names. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).collect_schema()
Schema({'a': Int32, 'b': Decimal})
drop(*columns: str | Iterable[str], strict: bool = True) -> Self
Remove columns from the LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*columns
|
str | Iterable[str]
|
Names of the columns that should be removed from the dataframe. |
()
|
strict
|
bool
|
Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema. |
True
|
Returns:
Type | Description |
---|---|
Self
|
The LazyFrame with the specified columns removed. |
Warning
strict
argument is ignored for polars<1.0.0
.
Please consider upgrading to a newer version or pass to eager mode.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop("a").to_native()
┌───────┐
│ b │
│ int32 │
├───────┤
│ 2 │
│ 4 │
└───────┘
drop_nulls(subset: str | list[str] | None = None) -> Self
Drop rows that contain null values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) for which null values are considered. If set to None (default), use all columns. |
None
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the rows removed that contained the null values. |
Notes
pandas handles null values differently from Polars and PyArrow. See null_handling for reference.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, NULL), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).drop_nulls()
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│ 3 │ 4 │ |
|└───────┴───────┘ |
└──────────────────┘
explode(columns: str | Sequence[str], *more_columns: str) -> Self
Explode the dataframe to long format by exploding the given columns.
Notes
It is possible to explode multiple columns only if these columns have matching element counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
str | Sequence[str]
|
Column names. The underlying columns being exploded must be of the |
required |
*more_columns
|
str
|
Additional names of columns to explode, specified as positional arguments. |
()
|
Returns:
Type | Description |
---|---|
Self
|
New LazyFrame |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES ('x', [1, 2]), ('y', [3, 4]), ('z', [5, 6]) df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.explode("b").to_native()
┌─────────┬───────┐
│ a │ b │
│ varchar │ int32 │
├─────────┼───────┤
│ x │ 1 │
│ x │ 2 │
│ y │ 3 │
│ y │ 4 │
│ z │ 5 │
│ z │ 6 │
└─────────┴───────┘
filter(*predicates: IntoExpr | Iterable[IntoExpr] | list[bool], **constraints: Any) -> Self
Filter the rows in the LazyFrame based on a predicate expression.
The original order of the remaining rows is preserved.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*predicates
|
IntoExpr | Iterable[IntoExpr] | list[bool]
|
Expression that evaluates to a boolean Series. Can also be a (single!) boolean list. |
()
|
**constraints
|
Any
|
Column filters; use |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The filtered LazyFrame. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql('''
... SELECT * FROM VALUES
... (1, 6, 'a'),
... (2, 7, 'b'),
... (3, 8, 'c')
... df(foo, bar, ham)
... ''')
Filter on one condition
>>> nw.from_native(df_native).filter(nw.col("foo") > 1).to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 2 │ 7 │ b │
│ 3 │ 8 │ c │
└───────┴───────┴─────────┘
Filter on multiple conditions with implicit &
>>> nw.from_native(df_native).filter(
... nw.col("foo") < 3, nw.col("ham") == "a"
... ).to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 1 │ 6 │ a │
└───────┴───────┴─────────┘
Filter on multiple conditions with |
>>> nw.from_native(df_native).filter(
... (nw.col("foo") == 1) | (nw.col("ham") == "c")
... ).to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 1 │ 6 │ a │
│ 3 │ 8 │ c │
└───────┴───────┴─────────┘
Filter using **kwargs
syntax
>>> nw.from_native(df_native).filter(foo=2, ham="b").to_native()
┌───────┬───────┬─────────┐
│ foo │ bar │ ham │
│ int32 │ int32 │ varchar │
├───────┼───────┼─────────┤
│ 2 │ 7 │ b │
└───────┴───────┴─────────┘
gather_every(n: int, offset: int = 0) -> Self
Take every nth row in the DataFrame and return as a new DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Gather every n-th row. |
required |
offset
|
int
|
Starting index. |
0
|
Returns:
Type | Description |
---|---|
Self
|
The LazyFrame containing only the selected rows. |
group_by(*keys: str | Iterable[str], drop_null_keys: bool = False) -> LazyGroupBy[Self]
Start a group by operation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*keys
|
str | Iterable[str]
|
Column(s) to group by. Accepts expression input. Strings are parsed as column names. |
()
|
drop_null_keys
|
bool
|
if True, then groups where any key is null won't be included in the result. |
False
|
Returns:
Type | Description |
---|---|
LazyGroupBy[Self]
|
Object which can be used to perform aggregations. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES (1, 'a'), (2, 'b'), (3, 'a') df(a, b)"
... )
>>> df = nw.from_native(df_native)
>>> df.group_by("b").agg(nw.col("a").sum()).sort("b").to_native()
┌─────────┬────────┐
│ b │ a │
│ varchar │ int128 │
├─────────┼────────┤
│ a │ 4 │
│ b │ 2 │
└─────────┴────────┘
head(n: int = 5) -> Self
Get n
rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. |
5
|
Returns:
Type | Description |
---|---|
Self
|
A subset of the LazyFrame of shape (n, n_columns). |
Examples:
>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2, 3], "b": [4, 5, 6]}, npartitions=1)
>>> nw.from_native(lf_native).head(2).collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| a b |
| 0 1 4 |
| 1 2 5 |
└──────────────────┘
join(other: Self, on: str | list[str] | None = None, how: Literal['inner', 'left', 'cross', 'semi', 'anti'] = 'inner', *, left_on: str | list[str] | None = None, right_on: str | list[str] | None = None, suffix: str = '_right') -> Self
Add a join operation to the Logical Plan.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
Lazy DataFrame to join with. |
required |
on
|
str | list[str] | None
|
Name(s) of the join columns in both DataFrames. If set, |
None
|
how
|
Literal['inner', 'left', 'cross', 'semi', 'anti']
|
Join strategy.
|
'inner'
|
left_on
|
str | list[str] | None
|
Join column of the left DataFrame. |
None
|
right_on
|
str | list[str] | None
|
Join column of the right DataFrame. |
None
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined LazyFrame. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native1 = duckdb.sql(
... "SELECT * FROM VALUES (1, 'a'), (2, 'b') df(a, b)"
... )
>>> df_native2 = duckdb.sql(
... "SELECT * FROM VALUES (1, 'x'), (3, 'y') df(a, c)"
... )
>>> df1 = nw.from_native(df_native1)
>>> df2 = nw.from_native(df_native2)
>>> df1.join(df2, on="a")
┌─────────────────────────────┐
| Narwhals LazyFrame |
|-----------------------------|
|┌───────┬─────────┬─────────┐|
|│ a │ b │ c │|
|│ int32 │ varchar │ varchar │|
|├───────┼─────────┼─────────┤|
|│ 1 │ a │ x │|
|└───────┴─────────┴─────────┘|
└─────────────────────────────┘
join_asof(other: Self, *, left_on: str | None = None, right_on: str | None = None, on: str | None = None, by_left: str | list[str] | None = None, by_right: str | list[str] | None = None, by: str | list[str] | None = None, strategy: Literal['backward', 'forward', 'nearest'] = 'backward', suffix: str = '_right') -> Self
Perform an asof join.
This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the asof_join key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Self
|
DataFrame to join with. |
required |
left_on
|
str | None
|
Name(s) of the left join column(s). |
None
|
right_on
|
str | None
|
Name(s) of the right join column(s). |
None
|
on
|
str | None
|
Join column of both DataFrames. If set, left_on and right_on should be None. |
None
|
by_left
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
by_right
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
by
|
str | list[str] | None
|
join on these columns before doing asof join |
None
|
strategy
|
Literal['backward', 'forward', 'nearest']
|
Join strategy. The default is "backward".
|
'backward'
|
suffix
|
str
|
Suffix to append to columns with a duplicate name. |
'_right'
|
Returns:
Type | Description |
---|---|
Self
|
A new joined LazyFrame. |
Examples:
>>> from datetime import datetime
>>> import polars as pl
>>> import narwhals as nw
>>> data_gdp = {
... "datetime": [
... datetime(2016, 1, 1),
... datetime(2017, 1, 1),
... datetime(2018, 1, 1),
... datetime(2019, 1, 1),
... datetime(2020, 1, 1),
... ],
... "gdp": [4164, 4411, 4566, 4696, 4827],
... }
>>> data_population = {
... "datetime": [
... datetime(2016, 3, 1),
... datetime(2018, 8, 1),
... datetime(2019, 1, 1),
... ],
... "population": [82.19, 82.66, 83.12],
... }
>>> gdp_native = pl.DataFrame(data_gdp)
>>> population_native = pl.DataFrame(data_population)
>>> gdp = nw.from_native(gdp_native)
>>> population = nw.from_native(population_native)
>>> population.join_asof(gdp, on="datetime", strategy="backward").to_native()
shape: (3, 3)
┌─────────────────────┬────────────┬──────┐
│ datetime ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-03-01 00:00:00 ┆ 82.19 ┆ 4164 │
│ 2018-08-01 00:00:00 ┆ 82.66 ┆ 4566 │
│ 2019-01-01 00:00:00 ┆ 83.12 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
lazy() -> Self
Restrict available API methods to lazy-only ones.
This is a no-op, and exists only for compatibility with DataFrame.lazy
.
Returns:
Type | Description |
---|---|
Self
|
A LazyFrame. |
pipe(function: Callable[Concatenate[Self, PS], R], *args: PS.args, **kwargs: PS.kwargs) -> R
Pipe function call.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
function
|
Callable[Concatenate[Self, PS], R]
|
Function to apply. |
required |
args
|
args
|
Positional arguments to pass to function. |
()
|
kwargs
|
kwargs
|
Keyword arguments to pass to function. |
{}
|
Returns:
Type | Description |
---|---|
R
|
The original object with the function applied. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).pipe(lambda x: x.select("a")).to_native()
┌───────┐
│ a │
│ int32 │
├───────┤
│ 1 │
│ 3 │
└───────┘
rename(mapping: dict[str, str]) -> Self
Rename column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mapping
|
dict[str, str]
|
Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name. |
required |
Returns:
Type | Description |
---|---|
Self
|
The LazyFrame with the specified columns renamed. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).rename({"a": "c"})
┌────────────────────────┐
| Narwhals LazyFrame |
|------------------------|
|┌───────┬──────────────┐|
|│ c │ b │|
|│ int32 │ decimal(2,1) │|
|├───────┼──────────────┤|
|│ 1 │ 4.5 │|
|│ 3 │ 2.0 │|
|└───────┴──────────────┘|
└────────────────────────┘
select(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr) -> Self
Select columns from this LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Type | Description |
---|---|
Self
|
The LazyFrame containing only the selected columns. |
Notes
If you'd like to select a column whose name isn't a string (for example,
if you're working with pandas) then you should explicitly use nw.col
instead
of just passing the column name. For example, to select a column named
0
use df.select(nw.col(0))
, not df.select(0)
.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).select("a", a_plus_1=nw.col("a") + 1)
┌────────────────────┐
| Narwhals LazyFrame |
|--------------------|
|┌───────┬──────────┐|
|│ a │ a_plus_1 │|
|│ int32 │ int32 │|
|├───────┼──────────┤|
|│ 1 │ 2 │|
|│ 3 │ 4 │|
|└───────┴──────────┘|
└────────────────────┘
sort(by: str | Iterable[str], *more_by: str, descending: bool | Sequence[bool] = False, nulls_last: bool = False) -> Self
Sort the LazyFrame by the given columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
by
|
str | Iterable[str]
|
Column(s) names to sort by. |
required |
*more_by
|
str
|
Additional columns to sort by, specified as positional arguments. |
()
|
descending
|
bool | Sequence[bool]
|
Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. |
False
|
nulls_last
|
bool
|
Place null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control. |
False
|
Returns:
Type | Description |
---|---|
Self
|
The sorted LazyFrame. |
Warning
Unlike Polars, it is not possible to specify a sequence of booleans for
nulls_last
in order to control per-column behaviour. Instead a single
boolean is applied for all by
columns.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES (1, 6.0, 'a'), (2, 5.0, 'c'), (NULL, 4.0, 'b') df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.sort("a")
┌──────────────────────────────────┐
| Narwhals LazyFrame |
|----------------------------------|
|┌───────┬──────────────┬─────────┐|
|│ a │ b │ c │|
|│ int32 │ decimal(2,1) │ varchar │|
|├───────┼──────────────┼─────────┤|
|│ NULL │ 4.0 │ b │|
|│ 1 │ 6.0 │ a │|
|│ 2 │ 5.0 │ c │|
|└───────┴──────────────┴─────────┘|
└──────────────────────────────────┘
tail(n: int = 5) -> Self
Get the last n
rows.
Warning
LazyFrame.tail
is deprecated and will be removed in a future version.
Note: this will remain available in narwhals.stable.v1
.
See stable api for more information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of rows to return. |
5
|
Returns:
Type | Description |
---|---|
Self
|
A subset of the LazyFrame of shape (n, n_columns). |
to_native() -> FrameT
Convert Narwhals LazyFrame to native one.
Returns:
Type | Description |
---|---|
FrameT
|
Object of class that user started with. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 2), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).to_native()
┌───────┬───────┐
│ a │ b │
│ int32 │ int32 │
├───────┼───────┤
│ 1 │ 2 │
│ 3 │ 4 │
└───────┴───────┘
unique(subset: str | list[str] | None = None, *, keep: Literal['any', 'none'] = 'any', maintain_order: bool | None = None) -> Self
Drop duplicate rows from this LazyFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subset
|
str | list[str] | None
|
Column name(s) to consider when identifying duplicate rows.
If set to |
None
|
keep
|
Literal['any', 'none']
|
{'first', 'none'} Which of the duplicate rows to keep.
|
'any'
|
maintain_order
|
bool | None
|
Has no effect and is kept around only for backwards-compatibility. |
None
|
Returns:
Type | Description |
---|---|
Self
|
The LazyFrame with unique rows. |
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 1), (3, 4) df(a, b)")
>>> nw.from_native(lf_native).unique("a").sort("a", descending=True)
┌──────────────────┐
|Narwhals LazyFrame|
|------------------|
|┌───────┬───────┐ |
|│ a │ b │ |
|│ int32 │ int32 │ |
|├───────┼───────┤ |
|│ 3 │ 4 │ |
|│ 1 │ 1 │ |
|└───────┴───────┘ |
└──────────────────┘
unpivot(on: str | list[str] | None = None, *, index: str | list[str] | None = None, variable_name: str = 'variable', value_name: str = 'value') -> Self
Unpivot a DataFrame from wide to long format.
Optionally leaves identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (index) while all other columns, considered measured variables (on), are "unpivoted" to the row axis leaving just two non-identifier columns, 'variable' and 'value'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
on
|
str | list[str] | None
|
Column(s) to use as values variables; if |
None
|
index
|
str | list[str] | None
|
Column(s) to use as identifier variables. |
None
|
variable_name
|
str
|
Name to give to the |
'variable'
|
value_name
|
str
|
Name to give to the |
'value'
|
Returns:
Type | Description |
---|---|
Self
|
The unpivoted LazyFrame. |
Notes
If you're coming from pandas, this is similar to pandas.DataFrame.melt
,
but with index
replacing id_vars
and on
replacing value_vars
.
In other frameworks, you might know this operation as pivot_longer
.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> df_native = duckdb.sql(
... "SELECT * FROM VALUES ('x', 1, 2), ('y', 3, 4), ('z', 5, 6) df(a, b, c)"
... )
>>> df = nw.from_native(df_native)
>>> df.unpivot(on=["b", "c"], index="a").sort("a", "variable").to_native()
┌─────────┬──────────┬───────┐
│ a │ variable │ value │
│ varchar │ varchar │ int32 │
├─────────┼──────────┼───────┤
│ x │ b │ 1 │
│ x │ c │ 2 │
│ y │ b │ 3 │
│ y │ c │ 4 │
│ z │ b │ 5 │
│ z │ c │ 6 │
└─────────┴──────────┴───────┘
with_columns(*exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr) -> Self
Add columns to this LazyFrame.
Added columns will replace existing columns with the same name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*exprs
|
IntoExpr | Iterable[IntoExpr]
|
Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals. |
()
|
**named_exprs
|
IntoExpr
|
Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
LazyFrame |
Self
|
A new LazyFrame with the columns added. |
Note
Creating a new LazyFrame using this method does not create a new copy of existing data.
Examples:
>>> import duckdb
>>> import narwhals as nw
>>> lf_native = duckdb.sql("SELECT * FROM VALUES (1, 4.5), (3, 2.) df(a, b)")
>>> nw.from_native(lf_native).with_columns(c=nw.col("a") + 1)
┌────────────────────────────────┐
| Narwhals LazyFrame |
|--------------------------------|
|┌───────┬──────────────┬───────┐|
|│ a │ b │ c │|
|│ int32 │ decimal(2,1) │ int32 │|
|├───────┼──────────────┼───────┤|
|│ 1 │ 4.5 │ 2 │|
|│ 3 │ 2.0 │ 4 │|
|└───────┴──────────────┴───────┘|
└────────────────────────────────┘
with_row_index(name: str = 'index') -> Self
Insert column which enumerates rows.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the column as a string. The default is "index". |
'index'
|
Returns:
Type | Description |
---|---|
Self
|
The original object with the column added. |
Examples:
>>> import dask.dataframe as dd
>>> import narwhals as nw
>>> lf_native = dd.from_dict({"a": [1, 2], "b": [4, 5]}, npartitions=1)
>>> nw.from_native(lf_native).with_row_index().collect()
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| index a b |
| 0 0 1 4 |
| 1 1 2 5 |
└──────────────────┘