Restrict results to top N rows per group

Question

You want to find top n rows per group. This answer provides a generic solution using example data that is different from OP.

In MySQL 8 or later you can use the ROW_NUMBER, RANK or DENSE_RANK function depending on the exact definition of top 5. Below are the numbers generated by these functions based on value sorted descending. Notice how ties are handled:

pkid	catid	value	row_number	rank	dense_rank
1	p01	100	*1	*1	*1
2	p01	90	*2	*2	*2
3	p01	90	*3	*2	*2
4	p01	80	*4	*4	*3
5	p01	80	*5	*4	*3
6	p01	80	6	*4	*3
7	p01	70	7	7	*4
8	p01	60	8	8	*5
9	p01	50	9	9	6
10	p01	40	10	10	7

Once you have chosen the function, use it like so:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY catid ORDER BY value DESC) AS n
    FROM t
) AS x
WHERE n <= 5

DB<>Fiddle

In MySQL 5.x you can use poor man’s rank over partition to achieve desired result: outer join the table with itself and for each row, count the number of rows before it (e.g. the before row could be the one with higher value).

The following will produce results similar to RANK function:

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid

Make the following change to produce results similar to DENSE_RANK function:

COUNT(DISTINCT b.value)

Or make the following change to produce results similar to ROW_NUMBER function:

ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)

DB<>Fiddle

Leave a Comment Cancel reply