sharding – Tarik Billa

multiple consumers per kinesis shard

January 6, 2024 by Tarik

MongoDB to Use Sharding with $lookup Aggregation Operator

December 26, 2023 by Tarik

As the docs you quote indicate, you can’t use $lookup on a sharded collection. So the best practice workaround is to perform the lookup yourself in a separate query. Perform your aggregate query. Pull the “localField” values from your query results into an array, possibly using Array#map. Perform a find query against the “from” collection, … Read more

Are there any REAL advantages to NoSQL over RDBMS for structured data on one machine?

September 23, 2023 by Tarik

If you’re starting off on a single server, then many advantages of NoSQL go out the window. The biggest advantages to the most popular NoSQL are high availability with less down time. Eventual consistency requirements can lead to performance improvements as well. It really depends on your needs. Document-based – If your data fits well … Read more

When do you start additional Elasticsearch nodes? [closed]

September 6, 2023 by Tarik

Let’s clarify the terminology a little first: Node: an Elasticsearch instance running (a java process). Usually every node runs on its own machine. Cluster: one or more nodes with the same cluster name. Index: more or less like a database. Type: more or less like a database table. Shard: effectively a lucene index. Every index … Read more

Extreme Sharding: One SQLite Database Per User

July 26, 2023 by Tarik

The place where this will fail is if you have to do what’s called “shard walking” – which is finding out all the data across a bunch of different users. That particular kind of “query” will have to be done programmatically, asking each of the SQLite databases in turn – and will very likely be … Read more

MySQL Partitioning / Sharding / Splitting – which way to go?

May 8, 2023 by Tarik

You will definitely start to run into issues on that 42 GB table once it no longer fits in memory. In fact, as soon as it does not fit in memory anymore, performance will degrade extremely quickly. One way to test is to put that table on another machine with less RAM and see how … Read more

Database partitioning – Horizontal vs Vertical – Difference between Normalization and Row Splitting?

April 24, 2023 by Tarik

Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data, it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically). Vertical partitioning, aka row splitting, uses the same splitting techniques as database normalization, but ususally the term (vertical / horizontal) data … Read more

MongoDB querying performance for over 5 million records

January 31, 2023 by Tarik

This is searching the needle in a haystack. We’d need some output of explain() for those queries that don’t perform well. Unfortunately, even that would fix the problem only for that particular query, so here’s a strategy on how to approach this: Ensure it’s not because of insufficient RAM and excessive paging Enable the DB … Read more

MySQL sharding approaches?

January 9, 2023 by Tarik

The best approach for sharding MySQL tables to not do it unless it is totally unavoidable to do it. When you are writing an application, you usually want to do so in a way that maximizes velocity, developer speed. You optimize for latency (time until the answer is ready) or throughput (number of answers per … Read more

ElasticSearch: Unassigned Shards, how to fix?

October 19, 2022 by Tarik

By default, Elasticsearch will re-assign shards to nodes dynamically. However, if you’ve disabled shard allocation (perhaps you did a rolling restart and forgot to re-enable it), you can re-enable shard allocation. # v0.90.x and earlier curl -XPUT ‘localhost:9200/_settings’ -d ‘{ “index.routing.allocation.disable_allocation”: false }’ # v1.0+ curl -XPUT ‘localhost:9200/_cluster/settings’ -d ‘{ “transient” : { “cluster.routing.allocation.enable” : … Read more