How to get two random records with Django

The order_by('?')[:2] solution suggested by other answers is actually an extraordinarily bad thing to do for tables that have large numbers of rows. It results in an ORDER BY RAND() SQL query. As an example, here’s how mysql handles that (the situation is not much different for other databases). Imagine your table has one billion rows:

  1. To accomplish ORDER BY RAND(), it needs a RAND() column to sort on.
  2. To do that, it needs a new table (the existing table has no such column).
  3. To do that, mysql creates a new, temporary table with the new columns and copies the existing ONE BILLION ROWS OF DATA into it.
  4. As it does so, it does as you asked, and runs rand() for every row to fill in that value. Yes, you’ve instructed mysql to GENERATE ONE BILLION RANDOM NUMBERS. That takes a while. 🙂
  5. A few hours/days later, when it’s done it now has to sort it. Yes, you’ve instructed mysql to SORT THIS ONE BILLION ROW, WORST-CASE-ORDERED TABLE (worst-case because the sort key is random).
  6. A few days/weeks later, when that’s done, it faithfully grabs the two measly rows you actually needed and returns them for you. Nice job. 😉

Note: just for a little extra gravy, be aware that mysql will initially try to create that temp table in RAM. When that’s exhausted, it puts everything on hold to copy the whole thing to disk, so you get that extra knife-twist of an I/O bottleneck for nearly the entire process.

Doubters should look at the generated query to confirm that it’s ORDER BY RAND() then Google for “order by rand()” (with the quotes).

A much better solution is to trade that one really expensive query for three cheap ones (limit/offset instead of ORDER BY RAND()):

import random
last = MyModel.objects.count() - 1

index1 = random.randint(0, last)
# Here's one simple way to keep even distribution for
# index2 while still gauranteeing not to match index1.
index2 = random.randint(0, last - 1)
if index2 == index1: index2 = last

# This syntax will generate "OFFSET=indexN LIMIT=1" queries
# so each returns a single record with no extraneous data.
MyObj1 = MyModel.objects.all()[index1]
MyObj2 = MyModel.objects.all()[index2]

Leave a Comment

tech