Generating SQL
Suppose you want to write Polars syntax and translate it to SQL. For example, what's the SQL equivalent to:
import narwhals as nw
from narwhals.typing import FrameT
def avg_monthly_price(df: FrameT) -> FrameT:
return (
df.group_by(nw.col("date").dt.truncate("1mo"))
.agg(nw.col("price").mean())
.sort("date")
)
?
Narwhals provides you with a narwhals.sql module to do just that!
Info
narwhals.sql currently requires DuckDB to be installed.
narwhals.sql
You can generate SQL directly from DuckDB.
import narwhals as nw
from narwhals.sql import table
prices = table("prices", {"date": nw.Date, "price": nw.Float64})
result = (
prices.group_by(nw.col("date").dt.truncate("1mo"))
.agg(nw.col("price").mean())
.sort("date")
)
print(result.to_sql())
SELECT * FROM (SELECT _date_tmp AS date, price FROM (SELECT _date_tmp, mean(price) AS price FROM (SELECT date, price, date_trunc('month', date) AS _date_tmp FROM main.prices) AS prices GROUP BY _date_tmp) AS prices) AS prices ORDER BY date NULLS FIRST
To make it look a bit prettier, you can pass pretty=True, but
note that this currently requires sqlparse to be installed.
print(result.to_sql(pretty=True))
SELECT *
FROM
(SELECT _date_tmp AS date,
price
FROM
(SELECT _date_tmp,
mean(price) AS price
FROM
(SELECT date, price,
date_trunc('month', date) AS _date_tmp
FROM main.prices) AS prices
GROUP BY _date_tmp) AS prices) AS prices
ORDER BY date NULLS FIRST
Note that the generated SQL follows DuckDB's dialect. To translate it to other dialects, you may want to look into sqlglot, or use one of the solutions below (which also use sqlglot).
Via Ibis
You can also use Ibis or SQLFrame to generate SQL:
import ibis
df = nw.from_native(ibis.table({"date": "date", "price": "double"}, name="prices"))
print(ibis.to_sql(avg_monthly_price(df).to_native()))
Via SQLFrame
You can also use SQLFrame:
from sqlframe.standalone import StandaloneSession
session = StandaloneSession.builder.getOrCreate()
session.catalog.add_table("prices", column_mapping={"date": "date", "price": "float"})
df = nw.from_native(session.read.table("prices"))
print(avg_monthly_price(df).to_native().sql(dialect="duckdb"))
WITH "t20239543" AS (
SELECT
CAST(DATE_TRUNC('MONTH', CAST("prices"."date" AS TIMESTAMPTZ)) AS TIMESTAMPTZ) AS "_date_tmp",
AVG("prices"."price") AS "price"
FROM "prices" AS "prices"
GROUP BY
CAST(DATE_TRUNC('MONTH', CAST("prices"."date" AS TIMESTAMPTZ)) AS TIMESTAMPTZ)
)
SELECT
"t20239543"."_date_tmp" AS "date",
"t20239543"."price" AS "price"
FROM "t20239543" AS "t20239543"
ORDER BY
"date" NULLS FIRST