System design: Strategies for dealing with heavy writes to a DB

Question

I’d say a solution will be highly dependent of what exactly you need to do. A solution to write thousands of records per second might be very different from incrementing a counter in the example you provided. More so, there could be no tables at all to handle such load. Consistency/availability requirements are also missing in your question and depending on them the entire architecture may be very different.

Anyway, back to your specific simplistic case and your options

Option 1 (Master slave replication)

The problem you’ll face here is database locking – every increment would require a record lock to avoid race conditions and you’ll quickly get your processes writing to your db waiting in a queue and your system down. Even under a moderate load )

Option 2 (Sharding the DB)

Your assumption is correct, not much different from p.1

Option 3 (Batched updates)

Very close. A caching layer provided by a light-weight storage providing concurrent atomic incremens/decrements with persistence not to lose your data. We’ve used redis for a similar purpose although any other key-value database would do as well – there are literally dozens of such databases around.

A key-value database, or key-value store, is a data storage paradigm
designed for storing, retrieving, and managing associative arrays, a
data structure more commonly known today as a dictionary or hash table

The solution would look as follows:

incoming requests → your backend server -> kv_storage (atomic increment(product_id))

And you’ll have a “flushing” script running i.e. */5 that does the following (simplified):

for every product_id in kv_storage read its current value
update your db counter (+= value)
decrement the value in kv_storage

Further scaling

if the script fails nothing bad would happen – the updates would arrive on next run
if your backend boxes can’t handle load – you can easily add more boxes
if a single key-value db can’t handle load – most of them support scaling over multiple boxes or a simple sharding strategy in your backend scripts would work fine
if a single “flushing” script doesn’t keep up with increments – you can scale them to multiple boxes and decide what key ranges are handled by each one

Option 1 (Master slave replication)

Option 2 (Sharding the DB)

Option 3 (Batched updates)

Further scaling

Leave a Comment Cancel reply