100 ready-to-use prompts for the tasks SQL developers and data analysts hit daily — joins, window functions, dedupe, cohorts, performance, JSON and more. Copy one, swap in your own {{table}} and column names, and paste it into the AI assistant or your editor.
100 prompts
Count rows per group
Aggregation & GROUP BY
Get the number of rows in each category.
Write a query that returns each distinct {{group_column}} from {{table}} along with the count of rows in that group. Order by the count descending, and include a final total row using ROLLUP or a separate UNION ALL if you want a grand total.
#aggregation#count#group-by
Sum, average, min, max per group
Aggregation & GROUP BY
Compute several summary metrics in one grouped query.
Write a query that, for each {{group_column}} in {{table}}, returns COUNT(*), SUM({{amount}}), AVG({{amount}}), MIN({{amount}}) and MAX({{amount}}). Round the average to 2 decimals and order by SUM({{amount}}) descending.
#aggregation#sum#average
Filter groups with HAVING
Aggregation & GROUP BY
Keep only groups whose aggregate meets a threshold.
Write a query that groups {{table}} by {{group_column}} and returns only the groups where COUNT(*) is greater than {{n}} (or SUM({{amount}}) exceeds {{threshold}}). Use HAVING for the post-aggregation filter and WHERE for any row-level filter, and explain the difference in a comment.
#aggregation#having#group-by
Count distinct values per group
Aggregation & GROUP BY
Measure how many unique values each group contains.
Write a query that returns, per {{group_column}} in {{table}}, the number of distinct {{column}} values using COUNT(DISTINCT {{column}}). If the table is very large, add a comment suggesting an approximate count via a HyperLogLog extension or a sampled estimate.
#aggregation#distinct#count
Conditional aggregates with FILTER
Aggregation & GROUP BY
Compute multiple conditional totals in a single pass.
Write a query that, per {{group_column}} in {{table}}, returns COUNT(*) FILTER (WHERE {{status_column}} = 'active') as active_count and SUM({{amount}}) FILTER (WHERE {{date_column}} >= {{start_date}}) as recent_total. Use the FILTER clause rather than CASE expressions for clarity.
#aggregation#filter#conditional
Subtotals and grand totals with ROLLUP
Aggregation & GROUP BY
Produce subtotal and total rows in one grouped result.
Write a query that aggregates SUM({{amount}}) from {{table}} grouped by ROLLUP ({{group_column}}, {{sub_group_column}}) so you get per-subgroup rows, per-group subtotals, and a grand total. Use GROUPING() to label which rows are subtotals.
#aggregation#rollup#subtotals
Concatenate group members into a list
Aggregation & GROUP BY
Roll up the values of each group into one delimited string.
Write a query that, per {{group_column}} in {{table}}, returns a comma-separated list of {{column}} values using STRING_AGG({{column}}, ', ' ORDER BY {{column}}). Optionally use STRING_AGG(DISTINCT ...) to remove duplicates within each group.
#aggregation#string-agg#group-by
Cohort retention analysis
Analytics & Reporting
Measure how many users from each signup cohort stay active over time.
Write a PostgreSQL query that builds a cohort retention table from {{table}}: define each {{user_id}}'s cohort as the month of their first event (DATE_TRUNC('month', MIN({{date_column}}))), then count distinct users active in each subsequent month offset (months since cohort). Output cohort_month, month_number, and retained_users so it can be charted as a triangle.
#analytics#cohort#retention
Funnel conversion rates
Analytics & Reporting
Track how users progress through ordered steps.
Write a query that computes a conversion funnel from {{events_table}} for steps in order: '{{step_1}}', '{{step_2}}', '{{step_3}}'. For each {{user_id}}, determine the furthest step reached, then output per-step counts and the step-to-step conversion percentage. Count a user at a step only if they completed all prior steps.
#analytics#funnel#conversion
Week-over-week growth
Analytics & Reporting
Compare each week's metric to the prior week.
Write a query that aggregates {{amount}} by week from {{table}} (DATE_TRUNC('week', {{date_column}})), then uses LAG to compute the prior week and the week-over-week percent change. Round to 1 decimal and order chronologically.
#analytics#growth#wow
Sessionize event streams
Analytics & Reporting
Group consecutive events into sessions by an idle gap.
Write a query that sessionizes {{events_table}}: order events per {{user_id}} by {{date_column}}, start a new session whenever the gap from the previous event exceeds {{gap_minutes}} minutes (compare via LAG), then assign a session id with a cumulative SUM of the new-session flag. Return user, session id, and session start/end.
#analytics#sessionization#window
Gaps and islands
Analytics & Reporting
Collapse consecutive runs into ranges, or find missing gaps.
Write a query that solves the gaps-and-islands problem on {{table}}: group consecutive {{date_column}} values (or sequential {{id}}s) into islands using the difference between ROW_NUMBER() and the value, then return each island's start and end. Add a variant that returns the gaps between islands.
#analytics#gaps-islands#window
DAU, WAU, MAU and stickiness
Analytics & Reporting
Compute active-user metrics and the DAU/MAU ratio.
Write a query over {{events_table}} that computes daily active users (distinct {{user_id}} per day), 7-day rolling weekly active users, and 28-day rolling monthly active users, plus the stickiness ratio DAU/MAU. Use COUNT(DISTINCT ...) over date windows and present one row per day.
#analytics#active-users#stickiness
RFM customer segmentation
Analytics & Reporting
Score customers by recency, frequency, and monetary value.
Write a query that computes RFM scores per {{user_id}} from {{table}}: recency = days since MAX({{date_column}}), frequency = COUNT(*), monetary = SUM({{amount}}). Bucket each into quintiles with NTILE(5), then concatenate into an RFM segment code. Order by monetary descending.
#analytics#rfm#segmentation
UPSERT with ON CONFLICT
DML & Upserts
Insert a row or update it if the key already exists.
Write an INSERT INTO {{table}} ... ON CONFLICT ({{unique_column}}) DO UPDATE SET statement that inserts a row and, if {{unique_column}} already exists, updates {{column_1}} and {{column_2}} to the new values using EXCLUDED.{{column_1}}. Also set updated_at = now(), and show the DO NOTHING variant for insert-or-ignore.
#dml#upsert#on-conflict
Update from another table
DML & Upserts
Bulk-update rows using values looked up in a second table.
Write an UPDATE {{table}} SET {{column}} = src.{{source_column}} FROM {{other_table}} src WHERE {{table}}.{{id}} = src.{{foreign_key}} statement that copies values from {{other_table}} into {{table}}. Add a matching WHERE to limit scope, and tell me to run a SELECT join first to preview affected rows.
#dml#update#join
Insert rows from a SELECT
DML & Upserts
Populate a table from the result of a query.
Write an INSERT INTO {{table}} ({{column_1}}, {{column_2}}) SELECT ... FROM {{other_table}} WHERE {{condition}} statement that copies/transforms rows from {{other_table}} into {{table}}. Make the column lists line up explicitly, and add ON CONFLICT DO NOTHING if duplicates are possible.
#dml#insert-select#bulk
Delete rows safely with a filter
DML & Upserts
Remove a targeted set of rows, with safeguards.
Write a DELETE FROM {{table}} WHERE {{condition}} statement (e.g. {{date_column}} < NOW() - INTERVAL '{{n}} days'). Remind me to run the equivalent SELECT COUNT(*) first to confirm the row count, wrap it in a transaction, and consider batching large deletes to avoid bloat and long locks.
#dml#delete#safety
Return affected rows with RETURNING
DML & Upserts
Get back the inserted/updated/deleted rows in one round trip.
Write an INSERT/UPDATE/DELETE on {{table}} that uses RETURNING to return the affected rows (e.g. RETURNING {{id}}, {{column}}). Explain how RETURNING avoids a follow-up SELECT and is ideal for capturing generated identity keys after an insert.
#dml#returning#insert
Update large tables in batches
DML & Upserts
Apply a bulk update without one giant long-locking transaction.
Write a batched UPDATE strategy for {{table}}: update rows in chunks of {{batch_size}} using a loop or repeated UPDATE ... WHERE {{id}} IN (SELECT {{id}} FROM {{table}} WHERE {{condition}} AND not_yet_updated LIMIT {{batch_size}}). Explain why batching reduces lock contention and bloat versus one massive UPDATE.
#dml#batch#update
MERGE for conditional insert/update/delete
DML & Upserts
Synchronize a target table from a source in one statement.
Write a MERGE INTO {{table}} t USING {{other_table}} s ON t.{{id}} = s.{{foreign_key}} statement (PostgreSQL 15+) with WHEN MATCHED THEN UPDATE SET, WHEN NOT MATCHED THEN INSERT, and optionally WHEN NOT MATCHED BY SOURCE THEN DELETE. Compare it to ON CONFLICT and note when MERGE is the clearer choice.
#dml#merge#sync
Bucket timestamps by day/week/month
Date & Time
Group time-series rows into calendar periods.
Write a query that aggregates {{table}} by month using DATE_TRUNC('month', {{date_column}}) as period, returning COUNT(*) and SUM({{amount}}) per period ordered chronologically. Mention how to switch the bucket to 'day' or 'week', and to 'week' starting Monday.
#date#date-trunc#bucket
Filter to the last N days
Date & Time
Keep only recent rows relative to now.
Write a query that returns rows from {{table}} where {{date_column}} >= NOW() - INTERVAL '{{n}} days'. Note that this is sargable (index-friendly) because the function is on NOW(), not on {{date_column}}, and avoid wrapping {{date_column}} in a function.
#date#interval#recent
Month-over-month growth
Date & Time
Compare each month's metric to the previous month.
Write a query that aggregates {{amount}} by month from {{table}} (DATE_TRUNC('month', {{date_column}})), then uses LAG over the monthly series to compute the prior month's value and the month-over-month percent change. Round the percent to 1 decimal and handle the first month's NULL.
#date#growth#mom
Fill missing dates with generate_series
Date & Time
Produce a continuous date axis even where data is sparse.
Write a query that builds a complete daily date axis from {{start_date}} to {{end_date}} using generate_series, then LEFT JOINs the aggregated daily counts from {{table}} so days with no rows show 0 instead of being missing. COALESCE the counts to 0.
#date#generate-series#gaps
Time elapsed between two timestamps
Date & Time
Measure duration between two events on the same or related rows.
Write a query that returns the elapsed time between {{start_column}} and {{end_column}} in {{table}}, both as an INTERVAL and as total minutes/hours using EXTRACT(EPOCH FROM ({{end_column}} - {{start_column}})). Round the numeric result and order by duration descending.
#date#duration#interval
Convert and report in a timezone
Date & Time
Display UTC timestamps in a user's local timezone.
Write a query that converts {{date_column}} (stored as timestamptz/UTC) to '{{timezone}}' local time using {{date_column}} AT TIME ZONE '{{timezone}}', then groups by the local calendar day. Add a comment about the difference between timestamp and timestamptz so the conversion is correct.
#date#timezone#timestamptz
Find duplicate rows
Deduplication
Identify which key combinations appear more than once.
Write a query that finds duplicate values in {{table}} by {{column_1}} (and optionally {{column_2}}): GROUP BY those columns HAVING COUNT(*) > 1, returning the duplicated key plus the count. Order by the count descending so the worst offenders surface first.
#dedup#duplicates#group-by
Keep only the latest row per key
Deduplication
Reduce duplicates to a single most-recent record each.
Write a query that returns one row per {{column}} from {{table}}: the most recent by {{date_column}}. Use ROW_NUMBER() OVER (PARTITION BY {{column}} ORDER BY {{date_column}} DESC) in a CTE and keep rows where the row number = 1. Tie-break with {{id}} DESC for determinism.
#dedup#latest#row-number
DISTINCT ON for first row per group
Deduplication
Use PostgreSQL's DISTINCT ON to pick one row per key efficiently.
Write a PostgreSQL query using SELECT DISTINCT ON ({{column}}) * FROM {{table}} ORDER BY {{column}}, {{date_column}} DESC to keep the latest row per {{column}}. Note that the ORDER BY must lead with the DISTINCT ON columns, and this is often the most concise dedupe in Postgres.
#dedup#distinct-on#postgres
Delete duplicate rows safely
Deduplication
Remove duplicate records while keeping one canonical row.
Write a DELETE statement that removes duplicate rows from {{table}}, keeping the row with the lowest ctid (or smallest {{id}}) per ({{column_1}}, {{column_2}}). Use a CTE with ROW_NUMBER() to mark which to delete. Tell me to run the equivalent SELECT first to preview before deleting, and to back up.
#dedup#delete#cleanup
Count unique vs total rows
Deduplication
Quantify how much duplication exists in a column.
Write a query that reports, for {{table}}, the total row count, the COUNT(DISTINCT {{column}}) of unique values, and the difference (number of duplicate rows). This gives a quick measure of how much deduplication is needed.
#dedup#count#distinct
Find near-duplicate (fuzzy) records
Deduplication
Detect records that are almost-but-not-exactly the same.
Write a query that finds near-duplicate rows in {{table}} where {{column}} values are highly similar but not identical, using a self-join on similarity({{a}}.{{column}}, {{b}}.{{column}}) > {{threshold}} with a.{{id}} < b.{{id}} to avoid self and mirror pairs. Requires the pg_trgm extension.
#dedup#fuzzy#pg_trgm
Filter on multiple conditions
Filtering & Search
Return rows that satisfy several combined AND/OR conditions.
Write a PostgreSQL query that returns all rows from {{table}} where {{column}} equals {{value}} AND {{date_column}} is within the last {{n}} days, OR {{status_column}} is in ('active','trial'). Group the OR/AND logic with parentheses so the precedence is explicit, and order the results by {{date_column}} descending.
#filter#where#boolean
Filter by a list of values
Filtering & Search
Keep only rows whose column matches any value in a fixed list.
Write a query against {{table}} that returns rows where {{column}} is in this list: ({{value_1}}, {{value_2}}, {{value_3}}). Also return a count of how many rows matched each value, grouped by {{column}} and ordered by the count descending.
#filter#in#list
Filter a numeric or date range
Filtering & Search
Select rows whose value falls inside an inclusive range.
Write a query that selects rows from {{table}} where {{amount}} is BETWEEN {{low}} AND {{high}} inclusive. Note that BETWEEN is inclusive on both ends; if you need a half-open range use >= {{low}} AND < {{high}} instead, which is safer for timestamps. Return the rows ordered by {{amount}}.
#filter#range#between
Exclude a set of values safely
Filtering & Search
Remove rows matching a list, while staying NULL-safe.
Write a query that returns rows from {{table}} where {{column}} is NOT IN ({{value_1}}, {{value_2}}). Warn me in a comment that NOT IN returns zero rows if any value in the list (or the column) is NULL; rewrite it using NOT EXISTS or an explicit OR {{column}} IS NULL guard if NULLs are possible.
#filter#exclude#null
Case-insensitive text search
Filtering & Search
Find rows where a text column contains a term regardless of case.
Write a PostgreSQL query that returns rows from {{table}} where {{column}} contains the term '{{term}}' case-insensitively. Use ILIKE '%{{term}}%'. Mention that a leading wildcard prevents normal B-tree index use and suggest a trigram (pg_trgm) GIN index if this search is frequent.
#search#ilike#text
Search across multiple columns
Filtering & Search
Match a keyword against several text columns at once.
Write a query that returns rows from {{table}} where the search term '{{term}}' appears (case-insensitively) in any of these columns: {{column_1}}, {{column_2}}, {{column_3}}. Combine the conditions with OR using ILIKE, and return a column indicating which field matched.
#search#multi-column#ilike
Extract a field from JSONB
JSON & Arrays
Pull a scalar value out of a JSONB column.
Write a query that extracts the value at key '{{json_key}}' from the JSONB column {{json_column}} in {{table}}. Use {{json_column}} ->> '{{json_key}}' to get it as text (or -> for JSON), and cast to the right type, e.g. ({{json_column}} ->> '{{json_key}}')::int. Show nested access with the #>> '{path}' operator.
#json#jsonb#extract
Filter rows by a JSONB value
JSON & Arrays
Select rows where a key inside a JSON document matches.
Write a query that returns rows from {{table}} where {{json_column}} ->> '{{json_key}}' = '{{value}}', or where the document contains a key/value using the @> containment operator: {{json_column}} @> '{"{{json_key}}": "{{value}}"}'. Note that @> can use a GIN index for speed.
#json#jsonb#filter
Expand a JSON array to rows
JSON & Arrays
Flatten a JSON array stored in a column into individual rows.
Write a query that expands the JSON array stored in {{json_column}} of {{table}} into one row per element using jsonb_array_elements({{json_column}}) in a LATERAL/cross join, keeping {{id}}. If the elements are objects, extract sub-fields from each element with ->>.
#json#jsonb#array
Build JSON from columns
JSON & Arrays
Assemble a JSON object/array from relational columns.
Write a query that builds a JSON object per row of {{table}} from {{column_1}}, {{column_2}} using jsonb_build_object('{{column_1}}', {{column_1}}, '{{column_2}}', {{column_2}}). Then show jsonb_agg() to roll multiple rows per {{group_column}} into a JSON array for an API response.
#json#jsonb#build
Update a key inside JSONB
JSON & Arrays
Set or change a nested value without rewriting the whole document.
Write an UPDATE on {{table}} that sets the key '{{json_key}}' inside {{json_column}} to '{{value}}' using jsonb_set({{json_column}}, '{{{json_key}}}', '"{{value}}"'::jsonb, true). Explain the path array syntax and how to remove a key with the - operator.
#json#jsonb#update
Array membership and overlap
JSON & Arrays
Filter rows by whether an array contains or overlaps values.
Write a query that returns rows from {{table}} where the array column {{array_column}} contains {{value}} ({{value}} = ANY({{array_column}})), and a variant where it overlaps a set using {{array_column}} && ARRAY[{{value_1}}, {{value_2}}]. Mention a GIN index on the array column for performance.
#array#membership#overlap
Inner join two tables
Joins
Combine matching rows from two related tables.
Write a PostgreSQL query that inner-joins {{table}} to {{other_table}} on {{table}}.{{id}} = {{other_table}}.{{foreign_key}}. Select {{table}}.{{column}} and {{other_table}}.{{column_2}}, alias each table with a short prefix, and qualify every selected column with its alias.
#join#inner
Left join keeping all left rows
Joins
Return every row from the left table, with matches where they exist.
Write a query that LEFT JOINs {{table}} to {{other_table}} on {{table}}.{{id}} = {{other_table}}.{{foreign_key}}, returning all rows from {{table}} even when there is no match. Use COALESCE on the {{other_table}} columns to supply default values for the unmatched rows.
#join#left#coalesce
Anti-join: find rows with no match
Joins
List rows in one table that have no counterpart in another.
Write a query that returns rows from {{table}} that have NO matching row in {{other_table}} on {{table}}.{{id}} = {{other_table}}.{{foreign_key}}. Implement it as a LEFT JOIN ... WHERE {{other_table}}.{{foreign_key}} IS NULL, and also show the equivalent NOT EXISTS version, noting which one the planner usually prefers.
#join#anti-join#missing
Join three or more tables
Joins
Chain joins across several related tables in one query.
Write a query that joins {{table}} to {{other_table}} on {{key_1}}, then to {{third_table}} on {{key_2}}. Choose INNER vs LEFT join per relationship based on whether the related row is mandatory, alias each table, and select a meaningful column from each. Add a one-line comment explaining each join's grain.
#join#multi-table
Self-join for hierarchy or pairs
Joins
Join a table to itself to relate rows like manager/employee.
Write a self-join on {{table}} aliased as child and parent, joining child.{{parent_id}} = parent.{{id}} so each row shows its record alongside its parent record (e.g. employee and manager). Return child.{{name}} and parent.{{name}}, and use a LEFT join so top-level rows with no parent are kept.
#join#self-join#hierarchy
Join only the latest related row
Joins
Attach the single most recent child row to each parent.
Write a query that joins {{table}} to its most recent related row in {{other_table}} (latest by {{date_column}}). Use a LATERAL join: LEFT JOIN LATERAL (SELECT ... FROM {{other_table}} o WHERE o.{{foreign_key}} = {{table}}.{{id}} ORDER BY o.{{date_column}} DESC LIMIT 1) latest ON true. Return the parent columns plus the latest child's {{column}}.
#join#lateral#latest
NULL count per column
NULL & Data Quality
Profile how complete each column is across the table.
Write a query that returns, for {{table}}, the NULL count and NULL percentage of each important column ({{column_1}}, {{column_2}}, {{column_3}}). Use COUNT(*) - COUNT(col) for nulls and divide by COUNT(*); present one row per column if possible via a UNION ALL or a single row with named expressions.
#null#audit#completeness
Replace NULLs with a default
NULL & Data Quality
Substitute a fallback value wherever a column is NULL.
Write a query that returns {{column}} from {{table}} with NULLs replaced by {{default_value}} using COALESCE({{column}}, {{default_value}}). If several columns should fall back in priority order, show COALESCE({{column_1}}, {{column_2}}, {{default_value}}).
#null#coalesce#default
Find rows missing required fields
NULL & Data Quality
Surface records that violate required-field expectations.
Write a query that returns rows from {{table}} that are missing required data: {{column_1}} IS NULL OR {{column_2}} IS NULL OR TRIM({{column_3}}) = ''. Return the {{id}} and a text column listing which fields are missing so the rows can be fixed.
#null#data-quality#required
Validate value ranges and domains
NULL & Data Quality
Catch out-of-range or invalid categorical values.
Write a data-quality query over {{table}} that flags rows where {{amount}} is negative or implausibly large (> {{max_value}}), or where {{status_column}} is not in the allowed set ('active','inactive','pending'). Return the {{id}} and the offending value for each violation.
#data-quality#validation#range
Detect orphaned foreign keys
NULL & Data Quality
Find child rows pointing to a parent that no longer exists.
Write a query that finds rows in {{table}} whose {{foreign_key}} has no matching {{id}} in {{other_table}} (referential integrity gap). Use NOT EXISTS, exclude rows where {{foreign_key}} IS NULL, and return the orphaned {{id}} and {{foreign_key}} values.
#data-quality#orphan#foreign-key
NULL-safe equality comparison
NULL & Data Quality
Compare two columns treating NULL = NULL as true.
Write a query that compares {{column_1}} and {{column_2}} in {{table}} treating two NULLs as equal, using {{column_1}} IS NOT DISTINCT FROM {{column_2}}. Explain why a plain = returns NULL (not true) when both sides are NULL, which silently drops rows.
#null#comparison#is-distinct
Read an EXPLAIN ANALYZE plan
Performance & EXPLAIN
Interpret a query plan to find the bottleneck.
Here is the output of EXPLAIN (ANALYZE, BUFFERS) for my query on {{table}}: {{plan}}. Explain it top-down: identify the most expensive node, where estimated vs actual rows diverge (bad statistics), any sequential scans on large tables, and whether sorts or hashes spill to disk. Then recommend concrete fixes.
#performance#explain#tuning
Suggest an index for a query
Performance & EXPLAIN
Recommend the right index to speed up a slow query.
My query filters {{table}} on {{column_1}} = {{value}} and orders by {{date_column}}. Suggest the best B-tree index (column order matters: equality columns first, then the range/sort column), give the CREATE INDEX statement, and explain whether a covering index with INCLUDE would enable an index-only scan.
#performance#index#btree
Find tables doing sequential scans
Performance & EXPLAIN
Use catalog stats to spot tables that may need indexes.
Write a query against pg_stat_user_tables that lists tables with a high ratio of seq_scan to idx_scan and large seq_tup_read, ordered by seq_scan descending. These are candidates for new indexes. Add a note that this reflects activity since the last stats reset.
#performance#index#pg_stat
Make a WHERE clause sargable
Performance & EXPLAIN
Rewrite filters so an index can actually be used.
My filter wraps the column in a function: WHERE DATE({{date_column}}) = '{{day}}' (or LOWER({{column}}) = ...). Rewrite it to be sargable so an index applies: use a half-open range {{date_column}} >= '{{day}}' AND {{date_column}} < '{{day}}'::date + 1, or add an expression index for the LOWER case.
#performance#sargable#index
Find the slowest queries
Performance & EXPLAIN
Use pg_stat_statements to rank queries by total time.
Write a query against pg_stat_statements that returns the top {{n}} queries by total_exec_time, showing calls, mean_exec_time, total_exec_time, and rows. Order by total_exec_time descending so the highest-impact queries surface. Mention the pg_stat_statements extension must be enabled.
#performance#pg_stat_statements#slow
Inspect table and index sizes
Performance & EXPLAIN
Report disk usage per relation to find bloat or large tables.
Write a query that lists the largest tables in the current database with their total size, table size, and index size using pg_total_relation_size and pg_size_pretty, ordered by total size descending. Add a comment about checking dead tuples (n_dead_tup) and running VACUUM/ANALYZE for bloat.
#performance#size#bloat
Pivot rows into columns
Pivoting & Reshaping
Turn categorical row values into one column each.
Write a query that pivots {{table}} so each value of {{category_column}} becomes its own column of SUM({{amount}}), grouped by {{group_column}}. Use conditional aggregation: SUM({{amount}}) FILTER (WHERE {{category_column}} = '{{value_1}}') AS {{value_1}}, repeated per category.
#pivot#reshape#filter
Crosstab with tablefunc
Pivoting & Reshaping
Pivot using PostgreSQL's crosstab function for dynamic-ish layouts.
Write a query using the crosstab() function from the tablefunc extension to pivot {{table}}: the source query selects {{group_column}}, {{category_column}}, SUM({{amount}}) ordered by 1,2, and the column definition list names the categories. Include the CREATE EXTENSION IF NOT EXISTS tablefunc note.
#pivot#crosstab#tablefunc
Unpivot columns into rows
Pivoting & Reshaping
Convert wide columns into a tall key/value format.
Write a query that unpivots {{table}} from wide to long: turn columns {{column_1}}, {{column_2}}, {{column_3}} into rows of (metric_name, metric_value) keyed by {{id}}. Use a LATERAL VALUES ((' {{column_1}} ', {{column_1}}), ...) join to do it in one pass.
#unpivot#reshape#lateral
Monthly columns matrix
Pivoting & Reshaping
Lay out a metric with one column per month.
Write a query that produces a matrix from {{table}} with {{group_column}} as rows and one column per month (Jan..Dec) of the current year, each cell = SUM({{amount}}). Use SUM({{amount}}) FILTER (WHERE EXTRACT(MONTH FROM {{date_column}}) = 1) AS jan, and so on through December.
#pivot#monthly#matrix
Expand an array column to rows
Pivoting & Reshaping
Flatten an array column so each element is its own row.
Write a query that expands the array column {{array_column}} in {{table}} into one row per element using unnest, e.g. SELECT {{id}}, elem FROM {{table}}, unnest({{array_column}}) AS elem. Use WITH ORDINALITY if you also need each element's position.
#reshape#unnest#array
Top N rows overall
Ranking & Top-N
Return the highest-ranked records across the whole table.
Write a query that returns the top {{n}} rows from {{table}} by {{amount}} descending using ORDER BY {{amount}} DESC LIMIT {{n}}. If ties at the boundary should all be included, show the WITH TIES variant: FETCH FIRST {{n}} ROWS WITH TIES.
#top-n#limit#ranking
Top N rows per group
Ranking & Top-N
Get the leading records within each category.
Write a PostgreSQL query that returns the top {{n}}{{column}} by total {{amount}} for each {{group_column}} in {{table}}, using ROW_NUMBER() OVER (PARTITION BY {{group_column}} ORDER BY SUM({{amount}}) DESC) in a CTE and filtering to rn <= {{n}}. Add a note on using RANK() instead if ties should be kept.
#top-n#per-group#window
Bottom N (worst performers)
Ranking & Top-N
Surface the lowest-ranked records.
Write a query that returns the bottom {{n}}{{group_column}} values from {{table}} by SUM({{amount}}) ascending. Group, order ascending, LIMIT {{n}}, and exclude groups with NULL totals so empty groups do not masquerade as the worst.
#top-n#bottom#ranking
Rows above a percentile
Ranking & Top-N
Select records in the top X% of a metric.
Write a query that returns rows from {{table}} whose {{amount}} is in the top {{pct}} percent, using PERCENT_RANK() OVER (ORDER BY {{amount}} DESC) <= {{fraction}} (e.g. 0.10 for top 10%). Alternatively show a CUME_DIST or NTILE(100) approach.
#top-n#percentile#percent-rank
Nth highest value
Ranking & Top-N
Find a specific ranked position, like the 3rd-highest salary.
Write a query that returns the {{n}}-th highest distinct {{amount}} from {{table}}. Use DENSE_RANK() OVER (ORDER BY {{amount}} DESC) in a subquery and filter to the rank = {{n}}; explain why DENSE_RANK handles duplicate values correctly here.
#top-n#nth#dense-rank
Rank with explicit tie-breakers
Ranking & Top-N
Make ranking deterministic when values are equal.
Write a query that ranks rows in {{table}} by {{amount}} descending but breaks ties deterministically by {{date_column}} ascending then {{id}} ascending, using ROW_NUMBER() OVER (ORDER BY {{amount}} DESC, {{date_column}} ASC, {{id}} ASC). Explain why a stable tie-break matters for reproducible pagination.
#top-n#ties#tiebreak
Create a well-designed table
Schema & DDL
Define a new table with sensible types, keys, and constraints.
Write a CREATE TABLE statement for {{table}} with an identity primary key (id bigint GENERATED ALWAYS AS IDENTITY), appropriate column types for {{column_1}}, {{column_2}}, NOT NULL where required, a UNIQUE constraint on {{unique_column}}, sensible DEFAULTs, and created_at/updated_at timestamptz columns defaulting to now().
#ddl#create-table#schema
Add a column without long locks
Schema & DDL
Safely add a column to a large, busy table.
Write the ALTER TABLE statement to add column {{column}} of type {{type}} to {{table}}. Explain that in modern PostgreSQL adding a nullable column (or one with a constant DEFAULT) is fast and does not rewrite the table, but adding a volatile default does; if backfilling, do it in batches to avoid long locks.
#ddl#alter-table#locks
Add a foreign key constraint
Schema & DDL
Enforce referential integrity between two tables.
Write an ALTER TABLE statement adding a foreign key on {{table}}.{{foreign_key}} referencing {{other_table}}({{id}}), with ON DELETE {{on_delete}} (e.g. CASCADE or SET NULL). Recommend creating an index on the {{foreign_key}} column since Postgres does not index FKs automatically, and adding the constraint as NOT VALID then VALIDATE to avoid a long lock on big tables.
#ddl#foreign-key#constraint
Create an index without blocking writes
Schema & DDL
Build an index on a live table without locking out writers.
Write a CREATE INDEX CONCURRENTLY statement on {{table}}({{column}}). Explain that CONCURRENTLY avoids an exclusive lock so writes continue, but it cannot run inside a transaction block and may leave an INVALID index if it fails (which must be dropped and rebuilt). Suggest a partial index if only a subset of rows is queried.
#ddl#index#concurrent
Add a CHECK constraint
Schema & DDL
Enforce a business rule at the database level.
Write an ALTER TABLE statement adding a CHECK constraint to {{table}} enforcing that {{amount}} >= 0 (or {{status_column}} IN ('active','inactive','pending')). On a large table, add it as NOT VALID first so existing rows are not scanned under lock, then run VALIDATE CONSTRAINT separately.
#ddl#check#constraint
Inspect a table's columns and types
Schema & DDL
Read a table's structure from the information schema.
Write a query against information_schema.columns that lists every column of {{table}} in {{schema}} with its data type, is_nullable, column_default, and ordinal_position, ordered by position. This documents the schema without needing psql's \d.
#ddl#information-schema#inspect
Partition a large table by range
Schema & DDL
Set up declarative range partitioning for a big time-series table.
Write the DDL to create {{table}} as a range-partitioned table on {{date_column}} (PARTITION BY RANGE), plus monthly partition definitions for the current and next month. Explain how partition pruning speeds up date-filtered queries and how to attach/detach partitions for retention.
#ddl#partition#range
Break a query into readable CTEs
Subqueries & CTEs
Refactor one complex query into named, sequential steps.
Refactor my query on {{table}} into a chain of WITH (CTE) steps: a first CTE that filters/cleans, a second that aggregates by {{group_column}}, and a final SELECT that joins or formats the result. Name each CTE for what it produces and keep each step doing one thing.
#cte#refactor#readability
Recursive CTE to walk a hierarchy
Subqueries & CTEs
Traverse parent/child trees to any depth.
Write a recursive CTE over {{table}} that walks the hierarchy from a root where {{parent_id}} IS NULL down through children joined on {{parent_id}} = {{id}}. Return each node with its depth level and a materialized path string. Add a depth guard to prevent infinite loops on cyclic data.
#cte#recursive#hierarchy
Correlated subquery per row
Subqueries & CTEs
Compute a value that depends on the current outer row.
Write a query that, for each row in {{table}}, returns a correlated subquery value such as the count of related rows in {{other_table}} where {{other_table}}.{{foreign_key}} = {{table}}.{{id}}. Then show the same result rewritten as a LEFT JOIN with GROUP BY and note when each form is faster.
#subquery#correlated
Compare each row to an aggregate
Subqueries & CTEs
Filter rows against a single computed value like the overall average.
Write a query that returns rows from {{table}} where {{amount}} is greater than the overall average {{amount}}, using a scalar subquery (SELECT AVG({{amount}}) FROM {{table}}). Also show a window-function alternative that avoids scanning the table twice.
#subquery#scalar#average
Control CTE materialization
Subqueries & CTEs
Force or prevent a CTE from being materialized for performance.
My CTE on {{table}} is being re-evaluated multiple times (or conversely, materialized when it should be inlined). Show me how to use WITH name AS MATERIALIZED (...) and AS NOT MATERIALIZED (...) in PostgreSQL to control this, and explain when each helps performance.
#cte#materialized#performance
Derived table in the FROM clause
Subqueries & CTEs
Build an intermediate result inline and query against it.
Write a query that selects from a derived table: an inner subquery that aggregates {{table}} by {{group_column}} into totals, wrapped as ( ... ) sub, then filter and order the outer query on those totals. Make sure the derived table is aliased.
#subquery#derived-table#from
Trim and normalize whitespace
Text & Strings
Clean leading/trailing and collapsed internal spaces.
Write a query that returns a cleaned version of {{column}} from {{table}}: TRIM the ends, collapse internal runs of whitespace to a single space with REGEXP_REPLACE({{column}}, '\\s+', ' ', 'g'), and convert empty strings to NULL with NULLIF. Show before and after side by side.
#text#trim#clean
Split a string into parts
Text & Strings
Extract a segment from a delimited string column.
Write a query that splits {{column}} in {{table}} on the delimiter '{{delimiter}}' and returns the {{n}}-th part using SPLIT_PART({{column}}, '{{delimiter}}', {{n}}). If you instead need every part as separate rows, show the STRING_TO_ARRAY + unnest version.
#text#split#parse
Extract a pattern with regex
Text & Strings
Pull a substring matching a regular expression.
Write a query that extracts the first match of the pattern '{{pattern}}' from {{column}} in {{table}} using (REGEXP_MATCH({{column}}, '{{pattern}}'))[1]. For example extract an email, a numeric id, or a domain. Return rows where no match was found as NULL.
#text#regex#extract
Build a formatted display string
Text & Strings
Combine several columns into one readable label.
Write a query that builds a display label from {{table}} by combining {{column_1}} and {{column_2}} (e.g. "Last, First") using FORMAT('%s, %s', {{column_1}}, {{column_2}}) or CONCAT_WS. Use CONCAT_WS so NULL fields are skipped cleanly rather than producing 'null' text.
#text#concat#format
Match a prefix, suffix, or contains
Text & Strings
Filter rows by how a string starts, ends, or what it contains.
Write a query returning rows from {{table}} where {{column}} starts with '{{prefix}}' (LIKE '{{prefix}}%'). Show variants for ends-with ('%{{suffix}}') and contains ('%{{term}}%'), and note that only the prefix form can use a standard B-tree index.
#text#like#prefix
Fuzzy match with similarity
Text & Strings
Find approximate string matches tolerant of typos.
Write a query that finds rows in {{table}} whose {{column}} is similar to '{{term}}' using the pg_trgm similarity() function, ordered by similarity descending and filtered to similarity > {{threshold}} (e.g. 0.3). Mention the GIN/GiST trigram index that makes this fast.
#text#fuzzy#pg_trgm
Running total (cumulative sum)
Window Functions
Accumulate a value over an ordered sequence of rows.
Write a PostgreSQL query that computes a running total of {{amount}} from {{table}}, ordered by {{date_column}}, using SUM({{amount}}) OVER (ORDER BY {{date_column}} ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). If totals should reset per group, add PARTITION BY {{group_column}}.
#window#running-total#cumulative
Moving average over N rows
Window Functions
Smooth a metric with a trailing rolling average.
Write a query that computes a {{n}}-row trailing moving average of {{amount}} from {{table}} ordered by {{date_column}}, using AVG({{amount}}) OVER (ORDER BY {{date_column}} ROWS BETWEEN {{n_minus_1}} PRECEDING AND CURRENT ROW). Round to 2 decimals and keep the original value alongside the average.
#window#moving-average#rolling
RANK vs DENSE_RANK vs ROW_NUMBER
Window Functions
Assign ordered positions, choosing the right ranking function.
Write a query against {{table}} that adds three columns ranking rows by {{amount}} descending within each {{group_column}}: ROW_NUMBER(), RANK(), and DENSE_RANK(), all OVER (PARTITION BY {{group_column}} ORDER BY {{amount}} DESC). Add a comment explaining how each handles ties.
#window#rank#row-number
Compare to previous row with LAG
Window Functions
Compute the change between each row and the one before it.
Write a query that, for {{table}} ordered by {{date_column}}, returns each {{amount}} alongside the previous row's value via LAG({{amount}}) OVER (ORDER BY {{date_column}}), plus the absolute and percent change. Partition by {{group_column}} if the comparison should stay within each group.
#window#lag#delta
First and last value in a window
Window Functions
Carry the first or last value of a partition onto every row.
Write a query that, per {{group_column}} in {{table}}, returns each row plus the first and last {{amount}} ordered by {{date_column}}. Use FIRST_VALUE and LAST_VALUE, and set the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING so LAST_VALUE sees the whole partition.
#window#first-value#last-value
Each row's share of the total
Window Functions
Express every row's value as a percentage of the group total.
Write a query that returns each {{column}} in {{table}} with {{amount}} and its percent of the total {{amount}}, computed as {{amount}} / SUM({{amount}}) OVER (PARTITION BY {{group_column}}) * 100. Round to 1 decimal and guard against divide-by-zero with NULLIF.
#window#percent#share
Split rows into N equal buckets
Window Functions
Divide an ordered set into quantile buckets (e.g. quartiles).
Write a query that assigns each row in {{table}} to one of {{n}} equal-sized buckets based on {{amount}} using NTILE({{n}}) OVER (ORDER BY {{amount}}). Use 4 for quartiles or 10 for deciles, and return the bucket number alongside each row.