Let’s say we have the following table Users
, where user_id is the column with unique values for this table.
The goal is to find for each user, the minimum of the two values for col_a and col_b.
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| user_id | int |
| col_a | int |
| col_b | int |
+-------------+----------+
To do this, we can use the LEAST
function (and its counterpart GREATEST
):
SELECT
user_id,
LEAST(col_a, col_b) AS min_val
FROM Users
- What if we want to cap a column to some minimum value
cap_val
?
SELECT
user_id,
GREATEST(10, col_a) AS cap_val
FROM Users
Window functions
Let’s say we have a table called Purchases
, where each row contains a user with user_id purchasing product product_name on buy_date of quantity.
| user_id | product_name | buy_date | quantity |
| ------- | ------------ | -------- | -------- |
| 1 | A | 5 | 10 |
| 1 | B | 6 | 12 |
| 2 | C | 2 | 21 |
| 2 | D | 2 | 42 |
| 2 | F | 1 | 16 |
| 2 | E | 6 | 18 |
| 3 | E | | |
NOTE that the frame used for calculating window function results for a given row depends on the presence of ORDER BY as stated here. To state here:
- If
ORDER BY
present:RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- If
ORDER BY
not present:RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Also, these window functions don’t use frame
but instead the entire partition
: DENSE_RANK(), LAG(), LEAD(), RANK(), ROW_NUMBER().
Here are some common window functions applied with OVER(PARTITION BY user_id ORDER BY buy_date DESC)
, as per the query (notice especially the behavior of LAST_VALUE and SUM):
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY buy_date) AS row_num,
RANK() OVER(PARTITION BY user_id ORDER BY buy_date) AS `rank`,
DENSE_RANK() OVER(PARTITION BY user_id ORDER BY buy_date) AS `dense_rank`,
FIRST_VALUE(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `first_product`,
LAST_VALUE(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `last_product`,
LEAD(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `lead_product`,
LAG(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `lag_product`,
SUM(quantity) OVER(PARTITION BY user_id ORDER BY buy_date) AS `total_quantity`
FROM Purchases;
The output is:
| user_id | product_name | buy_date | quantity | row_num | rank | dense_rank |
| ------- | ------------ | -------- | -------- | ------- | ---- | ---------- |
| 1 | A | 5 | 10 | 1 | 1 | 1 |
| 1 | B | 6 | 12 | 2 | 2 | 2 |
| 2 | F | 1 | 16 | 1 | 1 | 1 |
| 2 | C | 2 | 21 | 2 | 2 | 2 |
| 2 | D | 2 | 42 | 3 | 2 | 2 |
| 2 | E | 6 | 18 | 4 | 4 | 3 |
| 3 | E | | | 1 | 1 | 1 |
| user_id | product_name | buy_date | quantity | first_product | last_product | lead_product | lag_product | total_quantity |
| ------- | ------------ | -------- | -------- | ------------- | ------------ | ------------ | ----------- | -------------- |
| 1 | A | 5 | 10 | A | A | B | | 10 |
| 1 | B | 6 | 12 | A | B | | A | 22 |
| 2 | F | 1 | 16 | F | F | C | | 16 |
| 2 | C | 2 | 21 | F | D | D | F | 79 |
| 2 | D | 2 | 42 | F | D | E | C | 79 |
| 2 | E | 6 | 18 | F | E | | D | 97 |
| 3 | E | | | E | E | | | |
Window function with GROUP BY
Let’s say we have a table called Orders
; where each row (unique order_id) is an order for a certain product along with the quantity of that product bought.
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| order_id | int |
| product_id | int |
| quantity | int |
+-------------+----------+
- We would like to rank the products based on their total quantity purchased.
To do this one would first do an aggregation to count the number of items of each product bought and then rank them using a window function as follows:
SELECT
product_id,
RANK() OVER(ORDER BY total_quantity DESC) AS `rank`
FROM (
SELECT product_id, SUM(quantity) AS total_quantity
FROM Orders
GROUP BY 1
) AS k
However, the window and the group by can be combined as follows:
SELECT product_id,
RANK() OVER(ORDER BY SUM(quantity)) AS `rank`
FROM Orders
GROUP BY 1
The reason being that window function are evaluated in the SELECT clause; which executes after the GROUP BY clause.