Prompt library

SQL prompts for everyday work

100 ready-to-use prompts for the tasks SQL developers and data analysts hit daily — joins, window functions, dedupe, cohorts, performance, JSON and more. Copy one, swap in your own {{table}} and column names, and paste it into the AI assistant or your editor.

100 prompts

Count rows per group

Aggregation & GROUP BY

Get the number of rows in each category.

Write a query that returns each distinct {{group_column}} from {{table}} along with the count of rows in that group. Order by the count descending, and include a final total row using ROLLUP or a separate UNION ALL if you want a grand total.

#aggregation#count#group-by

Sum, average, min, max per group

Aggregation & GROUP BY

Compute several summary metrics in one grouped query.

Write a query that, for each {{group_column}} in {{table}}, returns COUNT(*), SUM({{amount}}), AVG({{amount}}), MIN({{amount}}) and MAX({{amount}}). Round the average to 2 decimals and order by SUM({{amount}}) descending.

#aggregation#sum#average

Filter groups with HAVING

Aggregation & GROUP BY

Keep only groups whose aggregate meets a threshold.

Write a query that groups {{table}} by {{group_column}} and returns only the groups where COUNT(*) is greater than {{n}} (or SUM({{amount}}) exceeds {{threshold}}). Use HAVING for the post-aggregation filter and WHERE for any row-level filter, and explain the difference in a comment.

#aggregation#having#group-by

Count distinct values per group

Aggregation & GROUP BY

Measure how many unique values each group contains.

Write a query that returns, per {{group_column}} in {{table}}, the number of distinct {{column}} values using COUNT(DISTINCT {{column}}). If the table is very large, add a comment suggesting an approximate count via a HyperLogLog extension or a sampled estimate.

#aggregation#distinct#count

Conditional aggregates with FILTER

Aggregation & GROUP BY

Compute multiple conditional totals in a single pass.

Write a query that, per {{group_column}} in {{table}}, returns COUNT(*) FILTER (WHERE {{status_column}} = 'active') as active_count and SUM({{amount}}) FILTER (WHERE {{date_column}} >= {{start_date}}) as recent_total. Use the FILTER clause rather than CASE expressions for clarity.

#aggregation#filter#conditional

Subtotals and grand totals with ROLLUP

Aggregation & GROUP BY

Produce subtotal and total rows in one grouped result.

Write a query that aggregates SUM({{amount}}) from {{table}} grouped by ROLLUP ({{group_column}}, {{sub_group_column}}) so you get per-subgroup rows, per-group subtotals, and a grand total. Use GROUPING() to label which rows are subtotals.

#aggregation#rollup#subtotals

Concatenate group members into a list

Aggregation & GROUP BY

Roll up the values of each group into one delimited string.

Write a query that, per {{group_column}} in {{table}}, returns a comma-separated list of {{column}} values using STRING_AGG({{column}}, ', ' ORDER BY {{column}}). Optionally use STRING_AGG(DISTINCT ...) to remove duplicates within each group.

#aggregation#string-agg#group-by

Cohort retention analysis

Analytics & Reporting

Measure how many users from each signup cohort stay active over time.

Write a PostgreSQL query that builds a cohort retention table from {{table}}: define each {{user_id}}'s cohort as the month of their first event (DATE_TRUNC('month', MIN({{date_column}}))), then count distinct users active in each subsequent month offset (months since cohort). Output cohort_month, month_number, and retained_users so it can be charted as a triangle.

#analytics#cohort#retention

Funnel conversion rates

Analytics & Reporting

Track how users progress through ordered steps.

Write a query that computes a conversion funnel from {{events_table}} for steps in order: '{{step_1}}', '{{step_2}}', '{{step_3}}'. For each {{user_id}}, determine the furthest step reached, then output per-step counts and the step-to-step conversion percentage. Count a user at a step only if they completed all prior steps.

#analytics#funnel#conversion

Week-over-week growth

Analytics & Reporting

Compare each week's metric to the prior week.

Write a query that aggregates {{amount}} by week from {{table}} (DATE_TRUNC('week', {{date_column}})), then uses LAG to compute the prior week and the week-over-week percent change. Round to 1 decimal and order chronologically.

#analytics#growth#wow

Sessionize event streams

Analytics & Reporting

Group consecutive events into sessions by an idle gap.

Write a query that sessionizes {{events_table}}: order events per {{user_id}} by {{date_column}}, start a new session whenever the gap from the previous event exceeds {{gap_minutes}} minutes (compare via LAG), then assign a session id with a cumulative SUM of the new-session flag. Return user, session id, and session start/end.

#analytics#sessionization#window

Gaps and islands

Analytics & Reporting

Collapse consecutive runs into ranges, or find missing gaps.

Write a query that solves the gaps-and-islands problem on {{table}}: group consecutive {{date_column}} values (or sequential {{id}}s) into islands using the difference between ROW_NUMBER() and the value, then return each island's start and end. Add a variant that returns the gaps between islands.

#analytics#gaps-islands#window

DAU, WAU, MAU and stickiness

Analytics & Reporting

Compute active-user metrics and the DAU/MAU ratio.

Write a query over {{events_table}} that computes daily active users (distinct {{user_id}} per day), 7-day rolling weekly active users, and 28-day rolling monthly active users, plus the stickiness ratio DAU/MAU. Use COUNT(DISTINCT ...) over date windows and present one row per day.

#analytics#active-users#stickiness

RFM customer segmentation

Analytics & Reporting

Score customers by recency, frequency, and monetary value.

Write a query that computes RFM scores per {{user_id}} from {{table}}: recency = days since MAX({{date_column}}), frequency = COUNT(*), monetary = SUM({{amount}}). Bucket each into quintiles with NTILE(5), then concatenate into an RFM segment code. Order by monetary descending.

#analytics#rfm#segmentation

UPSERT with ON CONFLICT

DML & Upserts

Insert a row or update it if the key already exists.

Write an INSERT INTO {{table}} ... ON CONFLICT ({{unique_column}}) DO UPDATE SET statement that inserts a row and, if {{unique_column}} already exists, updates {{column_1}} and {{column_2}} to the new values using EXCLUDED.{{column_1}}. Also set updated_at = now(), and show the DO NOTHING variant for insert-or-ignore.

#dml#upsert#on-conflict

Update from another table

DML & Upserts

Bulk-update rows using values looked up in a second table.

Write an UPDATE {{table}} SET {{column}} = src.{{source_column}} FROM {{other_table}} src WHERE {{table}}.{{id}} = src.{{foreign_key}} statement that copies values from {{other_table}} into {{table}}. Add a matching WHERE to limit scope, and tell me to run a SELECT join first to preview affected rows.

#dml#update#join

Insert rows from a SELECT

DML & Upserts

Populate a table from the result of a query.

Write an INSERT INTO {{table}} ({{column_1}}, {{column_2}}) SELECT ... FROM {{other_table}} WHERE {{condition}} statement that copies/transforms rows from {{other_table}} into {{table}}. Make the column lists line up explicitly, and add ON CONFLICT DO NOTHING if duplicates are possible.

#dml#insert-select#bulk

Delete rows safely with a filter

DML & Upserts

Remove a targeted set of rows, with safeguards.

Write a DELETE FROM {{table}} WHERE {{condition}} statement (e.g. {{date_column}} < NOW() - INTERVAL '{{n}} days'). Remind me to run the equivalent SELECT COUNT(*) first to confirm the row count, wrap it in a transaction, and consider batching large deletes to avoid bloat and long locks.

#dml#delete#safety

Return affected rows with RETURNING

DML & Upserts

Get back the inserted/updated/deleted rows in one round trip.

Write an INSERT/UPDATE/DELETE on {{table}} that uses RETURNING to return the affected rows (e.g. RETURNING {{id}}, {{column}}). Explain how RETURNING avoids a follow-up SELECT and is ideal for capturing generated identity keys after an insert.

#dml#returning#insert

Update large tables in batches

DML & Upserts

Apply a bulk update without one giant long-locking transaction.

Write a batched UPDATE strategy for {{table}}: update rows in chunks of {{batch_size}} using a loop or repeated UPDATE ... WHERE {{id}} IN (SELECT {{id}} FROM {{table}} WHERE {{condition}} AND not_yet_updated LIMIT {{batch_size}}). Explain why batching reduces lock contention and bloat versus one massive UPDATE.

#dml#batch#update

MERGE for conditional insert/update/delete

DML & Upserts

Synchronize a target table from a source in one statement.

Write a MERGE INTO {{table}} t USING {{other_table}} s ON t.{{id}} = s.{{foreign_key}} statement (PostgreSQL 15+) with WHEN MATCHED THEN UPDATE SET, WHEN NOT MATCHED THEN INSERT, and optionally WHEN NOT MATCHED BY SOURCE THEN DELETE. Compare it to ON CONFLICT and note when MERGE is the clearer choice.

#dml#merge#sync

Bucket timestamps by day/week/month

Date & Time

Group time-series rows into calendar periods.

Write a query that aggregates {{table}} by month using DATE_TRUNC('month', {{date_column}}) as period, returning COUNT(*) and SUM({{amount}}) per period ordered chronologically. Mention how to switch the bucket to 'day' or 'week', and to 'week' starting Monday.

#date#date-trunc#bucket

Filter to the last N days

Date & Time

Keep only recent rows relative to now.

Write a query that returns rows from {{table}} where {{date_column}} >= NOW() - INTERVAL '{{n}} days'. Note that this is sargable (index-friendly) because the function is on NOW(), not on {{date_column}}, and avoid wrapping {{date_column}} in a function.

#date#interval#recent

Month-over-month growth

Date & Time

Compare each month's metric to the previous month.

Write a query that aggregates {{amount}} by month from {{table}} (DATE_TRUNC('month', {{date_column}})), then uses LAG over the monthly series to compute the prior month's value and the month-over-month percent change. Round the percent to 1 decimal and handle the first month's NULL.

#date#growth#mom

Fill missing dates with generate_series

Date & Time

Produce a continuous date axis even where data is sparse.

Write a query that builds a complete daily date axis from {{start_date}} to {{end_date}} using generate_series, then LEFT JOINs the aggregated daily counts from {{table}} so days with no rows show 0 instead of being missing. COALESCE the counts to 0.

#date#generate-series#gaps

Time elapsed between two timestamps

Date & Time

Measure duration between two events on the same or related rows.

Write a query that returns the elapsed time between {{start_column}} and {{end_column}} in {{table}}, both as an INTERVAL and as total minutes/hours using EXTRACT(EPOCH FROM ({{end_column}} - {{start_column}})). Round the numeric result and order by duration descending.

#date#duration#interval

Convert and report in a timezone

Date & Time

Display UTC timestamps in a user's local timezone.

Write a query that converts {{date_column}} (stored as timestamptz/UTC) to '{{timezone}}' local time using {{date_column}} AT TIME ZONE '{{timezone}}', then groups by the local calendar day. Add a comment about the difference between timestamp and timestamptz so the conversion is correct.

#date#timezone#timestamptz

Find duplicate rows

Deduplication

Identify which key combinations appear more than once.

Write a query that finds duplicate values in {{table}} by {{column_1}} (and optionally {{column_2}}): GROUP BY those columns HAVING COUNT(*) > 1, returning the duplicated key plus the count. Order by the count descending so the worst offenders surface first.

#dedup#duplicates#group-by

Keep only the latest row per key

Deduplication

Reduce duplicates to a single most-recent record each.

Write a query that returns one row per {{column}} from {{table}}: the most recent by {{date_column}}. Use ROW_NUMBER() OVER (PARTITION BY {{column}} ORDER BY {{date_column}} DESC) in a CTE and keep rows where the row number = 1. Tie-break with {{id}} DESC for determinism.

#dedup#latest#row-number

DISTINCT ON for first row per group

Deduplication

Use PostgreSQL's DISTINCT ON to pick one row per key efficiently.

Write a PostgreSQL query using SELECT DISTINCT ON ({{column}}) * FROM {{table}} ORDER BY {{column}}, {{date_column}} DESC to keep the latest row per {{column}}. Note that the ORDER BY must lead with the DISTINCT ON columns, and this is often the most concise dedupe in Postgres.

#dedup#distinct-on#postgres

Delete duplicate rows safely

Deduplication

Remove duplicate records while keeping one canonical row.

Write a DELETE statement that removes duplicate rows from {{table}}, keeping the row with the lowest ctid (or smallest {{id}}) per ({{column_1}}, {{column_2}}). Use a CTE with ROW_NUMBER() to mark which to delete. Tell me to run the equivalent SELECT first to preview before deleting, and to back up.

#dedup#delete#cleanup

Count unique vs total rows

Deduplication

Quantify how much duplication exists in a column.

Write a query that reports, for {{table}}, the total row count, the COUNT(DISTINCT {{column}}) of unique values, and the difference (number of duplicate rows). This gives a quick measure of how much deduplication is needed.

#dedup#count#distinct

Find near-duplicate (fuzzy) records

Deduplication

Detect records that are almost-but-not-exactly the same.

Write a query that finds near-duplicate rows in {{table}} where {{column}} values are highly similar but not identical, using a self-join on similarity({{a}}.{{column}}, {{b}}.{{column}}) > {{threshold}} with a.{{id}} < b.{{id}} to avoid self and mirror pairs. Requires the pg_trgm extension.

#dedup#fuzzy#pg_trgm

Filter on multiple conditions

Filtering & Search

Return rows that satisfy several combined AND/OR conditions.

Write a PostgreSQL query that returns all rows from {{table}} where {{column}} equals {{value}} AND {{date_column}} is within the last {{n}} days, OR {{status_column}} is in ('active','trial'). Group the OR/AND logic with parentheses so the precedence is explicit, and order the results by {{date_column}} descending.

#filter#where#boolean

Filter by a list of values

Filtering & Search

Keep only rows whose column matches any value in a fixed list.

Write a query against {{table}} that returns rows where {{column}} is in this list: ({{value_1}}, {{value_2}}, {{value_3}}). Also return a count of how many rows matched each value, grouped by {{column}} and ordered by the count descending.

#filter#in#list

Filter a numeric or date range

Filtering & Search

Select rows whose value falls inside an inclusive range.

Write a query that selects rows from {{table}} where {{amount}} is BETWEEN {{low}} AND {{high}} inclusive. Note that BETWEEN is inclusive on both ends; if you need a half-open range use >= {{low}} AND < {{high}} instead, which is safer for timestamps. Return the rows ordered by {{amount}}.

#filter#range#between

Exclude a set of values safely

Filtering & Search

Remove rows matching a list, while staying NULL-safe.

Write a query that returns rows from {{table}} where {{column}} is NOT IN ({{value_1}}, {{value_2}}). Warn me in a comment that NOT IN returns zero rows if any value in the list (or the column) is NULL; rewrite it using NOT EXISTS or an explicit OR {{column}} IS NULL guard if NULLs are possible.

#filter#exclude#null

Case-insensitive text search

Filtering & Search

Find rows where a text column contains a term regardless of case.

Write a PostgreSQL query that returns rows from {{table}} where {{column}} contains the term '{{term}}' case-insensitively. Use ILIKE '%{{term}}%'. Mention that a leading wildcard prevents normal B-tree index use and suggest a trigram (pg_trgm) GIN index if this search is frequent.

#search#ilike#text

Search across multiple columns

Filtering & Search

Match a keyword against several text columns at once.

Write a query that returns rows from {{table}} where the search term '{{term}}' appears (case-insensitively) in any of these columns: {{column_1}}, {{column_2}}, {{column_3}}. Combine the conditions with OR using ILIKE, and return a column indicating which field matched.

#search#multi-column#ilike

Extract a field from JSONB

JSON & Arrays

Pull a scalar value out of a JSONB column.

Write a query that extracts the value at key '{{json_key}}' from the JSONB column {{json_column}} in {{table}}. Use {{json_column}} ->> '{{json_key}}' to get it as text (or -> for JSON), and cast to the right type, e.g. ({{json_column}} ->> '{{json_key}}')::int. Show nested access with the #>> '{path}' operator.

#json#jsonb#extract

Filter rows by a JSONB value

JSON & Arrays

Select rows where a key inside a JSON document matches.

Write a query that returns rows from {{table}} where {{json_column}} ->> '{{json_key}}' = '{{value}}', or where the document contains a key/value using the @> containment operator: {{json_column}} @> '{"{{json_key}}": "{{value}}"}'. Note that @> can use a GIN index for speed.

#json#jsonb#filter

Expand a JSON array to rows

JSON & Arrays

Flatten a JSON array stored in a column into individual rows.

Write a query that expands the JSON array stored in {{json_column}} of {{table}} into one row per element using jsonb_array_elements({{json_column}}) in a LATERAL/cross join, keeping {{id}}. If the elements are objects, extract sub-fields from each element with ->>.

#json#jsonb#array

Build JSON from columns

JSON & Arrays

Assemble a JSON object/array from relational columns.

Write a query that builds a JSON object per row of {{table}} from {{column_1}}, {{column_2}} using jsonb_build_object('{{column_1}}', {{column_1}}, '{{column_2}}', {{column_2}}). Then show jsonb_agg() to roll multiple rows per {{group_column}} into a JSON array for an API response.

#json#jsonb#build

Update a key inside JSONB

JSON & Arrays

Set or change a nested value without rewriting the whole document.

Write an UPDATE on {{table}} that sets the key '{{json_key}}' inside {{json_column}} to '{{value}}' using jsonb_set({{json_column}}, '{{{json_key}}}', '"{{value}}"'::jsonb, true). Explain the path array syntax and how to remove a key with the - operator.

#json#jsonb#update

Array membership and overlap

JSON & Arrays

Filter rows by whether an array contains or overlaps values.

Write a query that returns rows from {{table}} where the array column {{array_column}} contains {{value}} ({{value}} = ANY({{array_column}})), and a variant where it overlaps a set using {{array_column}} && ARRAY[{{value_1}}, {{value_2}}]. Mention a GIN index on the array column for performance.

#array#membership#overlap

Inner join two tables

Joins

Combine matching rows from two related tables.

Write a PostgreSQL query that inner-joins {{table}} to {{other_table}} on {{table}}.{{id}} = {{other_table}}.{{foreign_key}}. Select {{table}}.{{column}} and {{other_table}}.{{column_2}}, alias each table with a short prefix, and qualify every selected column with its alias.

#join#inner

Left join keeping all left rows

Joins

Return every row from the left table, with matches where they exist.

Write a query that LEFT JOINs {{table}} to {{other_table}} on {{table}}.{{id}} = {{other_table}}.{{foreign_key}}, returning all rows from {{table}} even when there is no match. Use COALESCE on the {{other_table}} columns to supply default values for the unmatched rows.

#join#left#coalesce

Anti-join: find rows with no match

Joins

List rows in one table that have no counterpart in another.

Write a query that returns rows from {{table}} that have NO matching row in {{other_table}} on {{table}}.{{id}} = {{other_table}}.{{foreign_key}}. Implement it as a LEFT JOIN ... WHERE {{other_table}}.{{foreign_key}} IS NULL, and also show the equivalent NOT EXISTS version, noting which one the planner usually prefers.

#join#anti-join#missing

Join three or more tables

Joins

Chain joins across several related tables in one query.

Write a query that joins {{table}} to {{other_table}} on {{key_1}}, then to {{third_table}} on {{key_2}}. Choose INNER vs LEFT join per relationship based on whether the related row is mandatory, alias each table, and select a meaningful column from each. Add a one-line comment explaining each join's grain.

#join#multi-table

Self-join for hierarchy or pairs

Joins

Join a table to itself to relate rows like manager/employee.

Write a self-join on {{table}} aliased as child and parent, joining child.{{parent_id}} = parent.{{id}} so each row shows its record alongside its parent record (e.g. employee and manager). Return child.{{name}} and parent.{{name}}, and use a LEFT join so top-level rows with no parent are kept.

#join#self-join#hierarchy

Join only the latest related row

Joins

Attach the single most recent child row to each parent.

Write a query that joins {{table}} to its most recent related row in {{other_table}} (latest by {{date_column}}). Use a LATERAL join: LEFT JOIN LATERAL (SELECT ... FROM {{other_table}} o WHERE o.{{foreign_key}} = {{table}}.{{id}} ORDER BY o.{{date_column}} DESC LIMIT 1) latest ON true. Return the parent columns plus the latest child's {{column}}.

#join#lateral#latest

NULL count per column

NULL & Data Quality

Profile how complete each column is across the table.

Write a query that returns, for {{table}}, the NULL count and NULL percentage of each important column ({{column_1}}, {{column_2}}, {{column_3}}). Use COUNT(*) - COUNT(col) for nulls and divide by COUNT(*); present one row per column if possible via a UNION ALL or a single row with named expressions.

#null#audit#completeness

Replace NULLs with a default

NULL & Data Quality

Substitute a fallback value wherever a column is NULL.

Write a query that returns {{column}} from {{table}} with NULLs replaced by {{default_value}} using COALESCE({{column}}, {{default_value}}). If several columns should fall back in priority order, show COALESCE({{column_1}}, {{column_2}}, {{default_value}}).

#null#coalesce#default

Find rows missing required fields

NULL & Data Quality

Surface records that violate required-field expectations.

Write a query that returns rows from {{table}} that are missing required data: {{column_1}} IS NULL OR {{column_2}} IS NULL OR TRIM({{column_3}}) = ''. Return the {{id}} and a text column listing which fields are missing so the rows can be fixed.

#null#data-quality#required

Validate value ranges and domains

NULL & Data Quality

Catch out-of-range or invalid categorical values.

Write a data-quality query over {{table}} that flags rows where {{amount}} is negative or implausibly large (> {{max_value}}), or where {{status_column}} is not in the allowed set ('active','inactive','pending'). Return the {{id}} and the offending value for each violation.

#data-quality#validation#range

Detect orphaned foreign keys

NULL & Data Quality

Find child rows pointing to a parent that no longer exists.

Write a query that finds rows in {{table}} whose {{foreign_key}} has no matching {{id}} in {{other_table}} (referential integrity gap). Use NOT EXISTS, exclude rows where {{foreign_key}} IS NULL, and return the orphaned {{id}} and {{foreign_key}} values.

#data-quality#orphan#foreign-key

NULL-safe equality comparison

NULL & Data Quality

Compare two columns treating NULL = NULL as true.

Write a query that compares {{column_1}} and {{column_2}} in {{table}} treating two NULLs as equal, using {{column_1}} IS NOT DISTINCT FROM {{column_2}}. Explain why a plain = returns NULL (not true) when both sides are NULL, which silently drops rows.

#null#comparison#is-distinct

Read an EXPLAIN ANALYZE plan

Performance & EXPLAIN

Interpret a query plan to find the bottleneck.

Here is the output of EXPLAIN (ANALYZE, BUFFERS) for my query on {{table}}: {{plan}}. Explain it top-down: identify the most expensive node, where estimated vs actual rows diverge (bad statistics), any sequential scans on large tables, and whether sorts or hashes spill to disk. Then recommend concrete fixes.

#performance#explain#tuning

Suggest an index for a query

Performance & EXPLAIN

Recommend the right index to speed up a slow query.

My query filters {{table}} on {{column_1}} = {{value}} and orders by {{date_column}}. Suggest the best B-tree index (column order matters: equality columns first, then the range/sort column), give the CREATE INDEX statement, and explain whether a covering index with INCLUDE would enable an index-only scan.

#performance#index#btree

Find tables doing sequential scans

Performance & EXPLAIN

Use catalog stats to spot tables that may need indexes.

Write a query against pg_stat_user_tables that lists tables with a high ratio of seq_scan to idx_scan and large seq_tup_read, ordered by seq_scan descending. These are candidates for new indexes. Add a note that this reflects activity since the last stats reset.

#performance#index#pg_stat

Make a WHERE clause sargable

Performance & EXPLAIN

Rewrite filters so an index can actually be used.

My filter wraps the column in a function: WHERE DATE({{date_column}}) = '{{day}}' (or LOWER({{column}}) = ...). Rewrite it to be sargable so an index applies: use a half-open range {{date_column}} >= '{{day}}' AND {{date_column}} < '{{day}}'::date + 1, or add an expression index for the LOWER case.

#performance#sargable#index

Find the slowest queries

Performance & EXPLAIN

Use pg_stat_statements to rank queries by total time.

Write a query against pg_stat_statements that returns the top {{n}} queries by total_exec_time, showing calls, mean_exec_time, total_exec_time, and rows. Order by total_exec_time descending so the highest-impact queries surface. Mention the pg_stat_statements extension must be enabled.

#performance#pg_stat_statements#slow

Inspect table and index sizes

Performance & EXPLAIN

Report disk usage per relation to find bloat or large tables.

Write a query that lists the largest tables in the current database with their total size, table size, and index size using pg_total_relation_size and pg_size_pretty, ordered by total size descending. Add a comment about checking dead tuples (n_dead_tup) and running VACUUM/ANALYZE for bloat.

#performance#size#bloat

Pivot rows into columns

Pivoting & Reshaping

Turn categorical row values into one column each.

Write a query that pivots {{table}} so each value of {{category_column}} becomes its own column of SUM({{amount}}), grouped by {{group_column}}. Use conditional aggregation: SUM({{amount}}) FILTER (WHERE {{category_column}} = '{{value_1}}') AS {{value_1}}, repeated per category.

#pivot#reshape#filter

Crosstab with tablefunc

Pivoting & Reshaping

Pivot using PostgreSQL's crosstab function for dynamic-ish layouts.

Write a query using the crosstab() function from the tablefunc extension to pivot {{table}}: the source query selects {{group_column}}, {{category_column}}, SUM({{amount}}) ordered by 1,2, and the column definition list names the categories. Include the CREATE EXTENSION IF NOT EXISTS tablefunc note.

#pivot#crosstab#tablefunc

Unpivot columns into rows

Pivoting & Reshaping

Convert wide columns into a tall key/value format.

Write a query that unpivots {{table}} from wide to long: turn columns {{column_1}}, {{column_2}}, {{column_3}} into rows of (metric_name, metric_value) keyed by {{id}}. Use a LATERAL VALUES ((' {{column_1}} ', {{column_1}}), ...) join to do it in one pass.

#unpivot#reshape#lateral

Monthly columns matrix

Pivoting & Reshaping

Lay out a metric with one column per month.

Write a query that produces a matrix from {{table}} with {{group_column}} as rows and one column per month (Jan..Dec) of the current year, each cell = SUM({{amount}}). Use SUM({{amount}}) FILTER (WHERE EXTRACT(MONTH FROM {{date_column}}) = 1) AS jan, and so on through December.

#pivot#monthly#matrix

Expand an array column to rows

Pivoting & Reshaping

Flatten an array column so each element is its own row.

Write a query that expands the array column {{array_column}} in {{table}} into one row per element using unnest, e.g. SELECT {{id}}, elem FROM {{table}}, unnest({{array_column}}) AS elem. Use WITH ORDINALITY if you also need each element's position.

#reshape#unnest#array

Top N rows overall

Ranking & Top-N

Return the highest-ranked records across the whole table.

Write a query that returns the top {{n}} rows from {{table}} by {{amount}} descending using ORDER BY {{amount}} DESC LIMIT {{n}}. If ties at the boundary should all be included, show the WITH TIES variant: FETCH FIRST {{n}} ROWS WITH TIES.

#top-n#limit#ranking

Top N rows per group

Ranking & Top-N

Get the leading records within each category.

Write a PostgreSQL query that returns the top {{n}} {{column}} by total {{amount}} for each {{group_column}} in {{table}}, using ROW_NUMBER() OVER (PARTITION BY {{group_column}} ORDER BY SUM({{amount}}) DESC) in a CTE and filtering to rn <= {{n}}. Add a note on using RANK() instead if ties should be kept.

#top-n#per-group#window

Bottom N (worst performers)

Ranking & Top-N

Surface the lowest-ranked records.

Write a query that returns the bottom {{n}} {{group_column}} values from {{table}} by SUM({{amount}}) ascending. Group, order ascending, LIMIT {{n}}, and exclude groups with NULL totals so empty groups do not masquerade as the worst.

#top-n#bottom#ranking

Rows above a percentile

Ranking & Top-N

Select records in the top X% of a metric.

Write a query that returns rows from {{table}} whose {{amount}} is in the top {{pct}} percent, using PERCENT_RANK() OVER (ORDER BY {{amount}} DESC) <= {{fraction}} (e.g. 0.10 for top 10%). Alternatively show a CUME_DIST or NTILE(100) approach.

#top-n#percentile#percent-rank

Nth highest value

Ranking & Top-N

Find a specific ranked position, like the 3rd-highest salary.

Write a query that returns the {{n}}-th highest distinct {{amount}} from {{table}}. Use DENSE_RANK() OVER (ORDER BY {{amount}} DESC) in a subquery and filter to the rank = {{n}}; explain why DENSE_RANK handles duplicate values correctly here.

#top-n#nth#dense-rank

Rank with explicit tie-breakers

Ranking & Top-N

Make ranking deterministic when values are equal.

Write a query that ranks rows in {{table}} by {{amount}} descending but breaks ties deterministically by {{date_column}} ascending then {{id}} ascending, using ROW_NUMBER() OVER (ORDER BY {{amount}} DESC, {{date_column}} ASC, {{id}} ASC). Explain why a stable tie-break matters for reproducible pagination.

#top-n#ties#tiebreak

Create a well-designed table

Schema & DDL

Define a new table with sensible types, keys, and constraints.

Write a CREATE TABLE statement for {{table}} with an identity primary key (id bigint GENERATED ALWAYS AS IDENTITY), appropriate column types for {{column_1}}, {{column_2}}, NOT NULL where required, a UNIQUE constraint on {{unique_column}}, sensible DEFAULTs, and created_at/updated_at timestamptz columns defaulting to now().

#ddl#create-table#schema

Add a column without long locks

Schema & DDL

Safely add a column to a large, busy table.

Write the ALTER TABLE statement to add column {{column}} of type {{type}} to {{table}}. Explain that in modern PostgreSQL adding a nullable column (or one with a constant DEFAULT) is fast and does not rewrite the table, but adding a volatile default does; if backfilling, do it in batches to avoid long locks.

#ddl#alter-table#locks

Add a foreign key constraint

Schema & DDL

Enforce referential integrity between two tables.

Write an ALTER TABLE statement adding a foreign key on {{table}}.{{foreign_key}} referencing {{other_table}}({{id}}), with ON DELETE {{on_delete}} (e.g. CASCADE or SET NULL). Recommend creating an index on the {{foreign_key}} column since Postgres does not index FKs automatically, and adding the constraint as NOT VALID then VALIDATE to avoid a long lock on big tables.

#ddl#foreign-key#constraint

Create an index without blocking writes

Schema & DDL

Build an index on a live table without locking out writers.

Write a CREATE INDEX CONCURRENTLY statement on {{table}}({{column}}). Explain that CONCURRENTLY avoids an exclusive lock so writes continue, but it cannot run inside a transaction block and may leave an INVALID index if it fails (which must be dropped and rebuilt). Suggest a partial index if only a subset of rows is queried.

#ddl#index#concurrent

Add a CHECK constraint

Schema & DDL

Enforce a business rule at the database level.

Write an ALTER TABLE statement adding a CHECK constraint to {{table}} enforcing that {{amount}} >= 0 (or {{status_column}} IN ('active','inactive','pending')). On a large table, add it as NOT VALID first so existing rows are not scanned under lock, then run VALIDATE CONSTRAINT separately.

#ddl#check#constraint

Inspect a table's columns and types

Schema & DDL

Read a table's structure from the information schema.

Write a query against information_schema.columns that lists every column of {{table}} in {{schema}} with its data type, is_nullable, column_default, and ordinal_position, ordered by position. This documents the schema without needing psql's \d.

#ddl#information-schema#inspect

Partition a large table by range

Schema & DDL

Set up declarative range partitioning for a big time-series table.

Write the DDL to create {{table}} as a range-partitioned table on {{date_column}} (PARTITION BY RANGE), plus monthly partition definitions for the current and next month. Explain how partition pruning speeds up date-filtered queries and how to attach/detach partitions for retention.

#ddl#partition#range

Break a query into readable CTEs

Subqueries & CTEs

Refactor one complex query into named, sequential steps.

Refactor my query on {{table}} into a chain of WITH (CTE) steps: a first CTE that filters/cleans, a second that aggregates by {{group_column}}, and a final SELECT that joins or formats the result. Name each CTE for what it produces and keep each step doing one thing.

#cte#refactor#readability

Recursive CTE to walk a hierarchy

Subqueries & CTEs

Traverse parent/child trees to any depth.

Write a recursive CTE over {{table}} that walks the hierarchy from a root where {{parent_id}} IS NULL down through children joined on {{parent_id}} = {{id}}. Return each node with its depth level and a materialized path string. Add a depth guard to prevent infinite loops on cyclic data.

#cte#recursive#hierarchy

Correlated subquery per row

Subqueries & CTEs

Compute a value that depends on the current outer row.

Write a query that, for each row in {{table}}, returns a correlated subquery value such as the count of related rows in {{other_table}} where {{other_table}}.{{foreign_key}} = {{table}}.{{id}}. Then show the same result rewritten as a LEFT JOIN with GROUP BY and note when each form is faster.

#subquery#correlated

Compare each row to an aggregate

Subqueries & CTEs

Filter rows against a single computed value like the overall average.

Write a query that returns rows from {{table}} where {{amount}} is greater than the overall average {{amount}}, using a scalar subquery (SELECT AVG({{amount}}) FROM {{table}}). Also show a window-function alternative that avoids scanning the table twice.

#subquery#scalar#average

Control CTE materialization

Subqueries & CTEs

Force or prevent a CTE from being materialized for performance.

My CTE on {{table}} is being re-evaluated multiple times (or conversely, materialized when it should be inlined). Show me how to use WITH name AS MATERIALIZED (...) and AS NOT MATERIALIZED (...) in PostgreSQL to control this, and explain when each helps performance.

#cte#materialized#performance

Derived table in the FROM clause

Subqueries & CTEs

Build an intermediate result inline and query against it.

Write a query that selects from a derived table: an inner subquery that aggregates {{table}} by {{group_column}} into totals, wrapped as ( ... ) sub, then filter and order the outer query on those totals. Make sure the derived table is aliased.

#subquery#derived-table#from

Trim and normalize whitespace

Text & Strings

Clean leading/trailing and collapsed internal spaces.

Write a query that returns a cleaned version of {{column}} from {{table}}: TRIM the ends, collapse internal runs of whitespace to a single space with REGEXP_REPLACE({{column}}, '\\s+', ' ', 'g'), and convert empty strings to NULL with NULLIF. Show before and after side by side.

#text#trim#clean

Split a string into parts

Text & Strings

Extract a segment from a delimited string column.

Write a query that splits {{column}} in {{table}} on the delimiter '{{delimiter}}' and returns the {{n}}-th part using SPLIT_PART({{column}}, '{{delimiter}}', {{n}}). If you instead need every part as separate rows, show the STRING_TO_ARRAY + unnest version.

#text#split#parse

Extract a pattern with regex

Text & Strings

Pull a substring matching a regular expression.

Write a query that extracts the first match of the pattern '{{pattern}}' from {{column}} in {{table}} using (REGEXP_MATCH({{column}}, '{{pattern}}'))[1]. For example extract an email, a numeric id, or a domain. Return rows where no match was found as NULL.

#text#regex#extract

Build a formatted display string

Text & Strings

Combine several columns into one readable label.

Write a query that builds a display label from {{table}} by combining {{column_1}} and {{column_2}} (e.g. "Last, First") using FORMAT('%s, %s', {{column_1}}, {{column_2}}) or CONCAT_WS. Use CONCAT_WS so NULL fields are skipped cleanly rather than producing 'null' text.

#text#concat#format

Match a prefix, suffix, or contains

Text & Strings

Filter rows by how a string starts, ends, or what it contains.

Write a query returning rows from {{table}} where {{column}} starts with '{{prefix}}' (LIKE '{{prefix}}%'). Show variants for ends-with ('%{{suffix}}') and contains ('%{{term}}%'), and note that only the prefix form can use a standard B-tree index.

#text#like#prefix

Fuzzy match with similarity

Text & Strings

Find approximate string matches tolerant of typos.

Write a query that finds rows in {{table}} whose {{column}} is similar to '{{term}}' using the pg_trgm similarity() function, ordered by similarity descending and filtered to similarity > {{threshold}} (e.g. 0.3). Mention the GIN/GiST trigram index that makes this fast.

#text#fuzzy#pg_trgm

Running total (cumulative sum)

Window Functions

Accumulate a value over an ordered sequence of rows.

Write a PostgreSQL query that computes a running total of {{amount}} from {{table}}, ordered by {{date_column}}, using SUM({{amount}}) OVER (ORDER BY {{date_column}} ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). If totals should reset per group, add PARTITION BY {{group_column}}.

#window#running-total#cumulative

Moving average over N rows

Window Functions

Smooth a metric with a trailing rolling average.

Write a query that computes a {{n}}-row trailing moving average of {{amount}} from {{table}} ordered by {{date_column}}, using AVG({{amount}}) OVER (ORDER BY {{date_column}} ROWS BETWEEN {{n_minus_1}} PRECEDING AND CURRENT ROW). Round to 2 decimals and keep the original value alongside the average.

#window#moving-average#rolling

RANK vs DENSE_RANK vs ROW_NUMBER

Window Functions

Assign ordered positions, choosing the right ranking function.

Write a query against {{table}} that adds three columns ranking rows by {{amount}} descending within each {{group_column}}: ROW_NUMBER(), RANK(), and DENSE_RANK(), all OVER (PARTITION BY {{group_column}} ORDER BY {{amount}} DESC). Add a comment explaining how each handles ties.

#window#rank#row-number

Compare to previous row with LAG

Window Functions

Compute the change between each row and the one before it.

Write a query that, for {{table}} ordered by {{date_column}}, returns each {{amount}} alongside the previous row's value via LAG({{amount}}) OVER (ORDER BY {{date_column}}), plus the absolute and percent change. Partition by {{group_column}} if the comparison should stay within each group.

#window#lag#delta

First and last value in a window

Window Functions

Carry the first or last value of a partition onto every row.

Write a query that, per {{group_column}} in {{table}}, returns each row plus the first and last {{amount}} ordered by {{date_column}}. Use FIRST_VALUE and LAST_VALUE, and set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING so LAST_VALUE sees the whole partition.

#window#first-value#last-value

Each row's share of the total

Window Functions

Express every row's value as a percentage of the group total.

Write a query that returns each {{column}} in {{table}} with {{amount}} and its percent of the total {{amount}}, computed as {{amount}} / SUM({{amount}}) OVER (PARTITION BY {{group_column}}) * 100. Round to 1 decimal and guard against divide-by-zero with NULLIF.

#window#percent#share

Split rows into N equal buckets

Window Functions

Divide an ordered set into quantile buckets (e.g. quartiles).

Write a query that assigns each row in {{table}} to one of {{n}} equal-sized buckets based on {{amount}} using NTILE({{n}}) OVER (ORDER BY {{amount}}). Use 4 for quartiles or 10 for deciles, and return the bucket number alongside each row.

#window#ntile#buckets