Looking for large text files for testing compression in all sizes

*** Linux users only ***

Arbitrarily large text files can be generated on Linux with the following command:

tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt

This command will generate a text file that will contain 100,000 lines of random text and look like this:

NsQlhbisDW5JVlLSaZVtCLSUUrkBijbkc5f9gFFscDkoGnN0J6GgIFqdCLyhbdWLHxRVY8IwDCrWF555JeY0yD0GtgH21NotZAEe
iWJR1A4 bxqq9VKKAzMJ0tW7TCOqNtMzVtPB6NrtCIg8NSmhrO7QjNcOzi4N b VGc0HB5HMNXdyEoWroU464ChM5R Lqdsm3iPo
1mz0cPKqobhjDYkvRs5LZO8n92GxEKGeCtt oX53Qu6T7O2E9nJLKoUeJI6Ul7keLsNGI2BC55qs7fhqW8eFDsGsLPaImF7kFJiz
...
...

On my Ubuntu 18 its size it about 10MB. Bumping up the number of lines, and thereby bumping up the size, is easy. Just increase the head -n 100000 part. So, say, this command:

tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 1000000 > bigfile.txt

will generate a file with 1,000,000 of random lines of text and be around 100MB. On my commodity hardware the latter command takes about 3 seconds to finish.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)