參考答案
Window functions perform calculations across a set of rows related to the current row without collapsing them into a single result. Unlike GROUP BY, which aggregates rows, window functions retain individual rows while adding computed values.
The general syntax looks like:
function_name() OVER (
PARTITION BY column
ORDER BY column
)
PARTITION BY divides the data into groups, and ORDER BY defines how rows are arranged within each group.
Suppose we have an employees table with employee_name, department, and salary.
ROW_NUMBER() assigns a unique sequential number within each partition. Even if two employees have the same salary, they still receive different row numbers.
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS row_num
FROM employees;
This is commonly used when you need to select exactly one row per group, such as removing duplicates or getting the top record per category.
RANK() also ranks rows within a partition, but if two values tie, they receive the same rank, and the next rank is skipped. For example, rankings might look like 1, 2, 2, 4.
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
)
This is useful when ranking position matters, such as identifying performance tiers.
DENSE_RANK() behaves similarly to RANK(), but it does not skip numbers after ties. Rankings would look like 1, 2, 2, 3.
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
)
This is useful when you want a continuous ranking without gaps.
Another important set of window functions includes LAG() and LEAD(), which allow you to access values from previous or next rows without joining the table to itself. For example, to calculate month-over-month revenue change:
SELECT
month,
revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS revenue_change
FROM monthly_sales;
LAG() retrieves the previous row's value, while LEAD() retrieves the next row's value.
Window functions are widely used for ranking, deduplication, running totals, and time-based comparisons like MoM or YoY growth. They are one of the most important intermediate SQL concepts for data analyst interviews because they allow advanced analytical queries without losing row-level detail.