Performance of Redis vs Disk in caching application

Question

This is an apples to oranges comparison.
See http://redis.io/topics/benchmarks

Redis is an efficient remote data store. Each time a command is executed on Redis, a message is sent to the Redis server, and if the client is synchronous, it blocks waiting for the reply. So beyond the cost of the command itself, you will pay for a network roundtrip or an IPC.

On modern hardware, network roundtrips or IPCs are suprisingly expensive compared to other operations. This is due to several factors:

the raw latency of the medium (mainly for network)
the latency of the operating system scheduler (not guaranteed on Linux/Unix)
memory cache misses are expensive, and the probability of cache misses increases while the client and server processes are scheduled in/out.
on high-end boxes, NUMA side effects

Now, let’s review the results.

Comparing the implementation using generators and the one using function calls, they do not generate the same number of roundtrips to Redis. With the generator you simply have:

    while time.time() - t - expiry < 0:
        yield r.get(fpKey)

So 1 roundtrip per iteration. With the function, you have:

if r.exists(fpKey):
    return r.get(fpKey)

So 2 roundtrips per iteration. No wonder the generator is faster.

Of course you are supposed to reuse the same Redis connection for optimal performance. There is no point to run a benchmark which systematically connects/disconnects.

Finally, regarding the performance difference between Redis calls and the file reads, you are simply comparing a local call to a remote one. File reads are cached by the OS filesystem, so they are fast memory transfer operations between the kernel and Python. There is no disk I/O involved here. With Redis, you have to pay for the cost of the roundtrips, so it is much slower.

Leave a Comment Cancel reply