Let’s say we have the following table Users, where user_id is the column with unique values for this table. The goal is to find for each user, the minimum of the two values for col_a and col_b.

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| user_id     | int      |
| col_a       | int      |
| col_b       | int      |
+-------------+----------+

To do this, we can use the LEAST function (and its counterpart GREATEST):

SELECT
  user_id,
  LEAST(col_a, col_b) AS min_val
FROM Users
  • What if we want to cap a column to some minimum value cap_val?
SELECT
    user_id,
    GREATEST(10, col_a) AS cap_val
FROM Users

Window functions

Let’s say we have a table called Purchases, where each row contains a user with user_id purchasing product product_name on buy_date of quantity.


| user_id | product_name | buy_date | quantity |
| ------- | ------------ | -------- | -------- |
| 1       | A            | 5        | 10       |
| 1       | B            | 6        | 12       |
| 2       | C            | 2        | 21       |
| 2       | D            | 2        | 42       |
| 2       | F            | 1        | 16       |
| 2       | E            | 6        | 18       |
| 3       | E            |          |          |

NOTE that the frame used for calculating window function results for a given row depends on the presence of ORDER BY as stated here. To state here:

  • If ORDER BY present: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  • If ORDER BY not present: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Also, these window functions don’t use frame but instead the entire partition: DENSE_RANK(), LAG(), LEAD(), RANK(), ROW_NUMBER().

Here are some common window functions applied with OVER(PARTITION BY user_id ORDER BY buy_date DESC), as per the query (notice especially the behavior of LAST_VALUE and SUM):

SELECT 
	*,
    ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY buy_date) AS row_num, 
    RANK() OVER(PARTITION BY user_id ORDER BY buy_date) AS `rank`, 
    DENSE_RANK() OVER(PARTITION BY user_id ORDER BY buy_date) AS `dense_rank`,
    FIRST_VALUE(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `first_product`,
    LAST_VALUE(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `last_product`,
    LEAD(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `lead_product`,
    LAG(product_name) OVER(PARTITION BY user_id ORDER BY buy_date) AS `lag_product`,
    SUM(quantity) OVER(PARTITION BY user_id ORDER BY buy_date) AS `total_quantity`
FROM Purchases;

The output is:


| user_id | product_name | buy_date | quantity | row_num | rank | dense_rank |
| ------- | ------------ | -------- | -------- | ------- | ---- | ---------- |
| 1       | A            | 5        | 10       | 1       | 1    | 1          |
| 1       | B            | 6        | 12       | 2       | 2    | 2          |
| 2       | F            | 1        | 16       | 1       | 1    | 1          |
| 2       | C            | 2        | 21       | 2       | 2    | 2          |
| 2       | D            | 2        | 42       | 3       | 2    | 2          |
| 2       | E            | 6        | 18       | 4       | 4    | 3          |
| 3       | E            |          |          | 1       | 1    | 1          |


| user_id | product_name | buy_date | quantity | first_product | last_product | lead_product | lag_product | total_quantity |
| ------- | ------------ | -------- | -------- | ------------- | ------------ | ------------ | ----------- | -------------- |
| 1       | A            | 5        | 10       | A             | A            | B            |             | 10             |
| 1       | B            | 6        | 12       | A             | B            |              | A           | 22             |
| 2       | F            | 1        | 16       | F             | F            | C            |             | 16             |
| 2       | C            | 2        | 21       | F             | D            | D            | F           | 79             |
| 2       | D            | 2        | 42       | F             | D            | E            | C           | 79             |
| 2       | E            | 6        | 18       | F             | E            |              | D           | 97             |
| 3       | E            |          |          | E             | E            |              |             |                |


Window function with GROUP BY

Let’s say we have a table called Orders; where each row (unique order_id) is an order for a certain product along with the quantity of that product bought.

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| order_id    | int      |
| product_id  | int      |
| quantity    | int      |
+-------------+----------+
  • We would like to rank the products based on their total quantity purchased.

To do this one would first do an aggregation to count the number of items of each product bought and then rank them using a window function as follows:

SELECT
  product_id,
  RANK() OVER(ORDER BY total_quantity DESC) AS `rank`
FROM (
  SELECT product_id, SUM(quantity) AS total_quantity
  FROM Orders
  GROUP BY 1
) AS k

However, the window and the group by can be combined as follows:

SELECT product_id,
       RANK() OVER(ORDER BY SUM(quantity)) AS `rank`
FROM Orders
GROUP BY 1

The reason being that window function are evaluated in the SELECT clause; which executes after the GROUP BY clause.