How do I read a random line from one file?

Not built-in, but algorithm R(3.4.2) (Waterman’s “Reservoir Algorithm”) from Knuth’s “The Art of Computer Programming” is good (in a very simplified version):

import random

def random_line(afile):
    line = next(afile)
    for num, aline in enumerate(afile, 2):
        if random.randrange(num):
            continue
        line = aline
    return line

The num, ... in enumerate(..., 2) iterator produces the sequence 2, 3, 4… The randrange will therefore be 0 with a probability of 1.0/num — and that’s the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm — see Knuth’s book for proof of correctness == and of course we’re also in the case of a small-enough “reservoir” to fit in memory ;-))… and exactly the probability with which we do so.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)