I’m not a specialist but the read/write master database and read-only slaves pattern is a “common” pattern, especially for big applications doing mostly read accesses or data warehouses:
- it allows to scale (you add more read-only slaves if required)
- it allows to tune the databases differently (for either efficient reads or efficient writes)
What would be a good resource to find out more about this architecture?
There are good resources available on the Internet. For example:
- Highscalability.com has good examples (e.g. Wikimedia architecture, the master-slave category,…)
- Handling Data in Mega Scale Systems (starting from slide 29)
- MySQL Scale-Out approach for better performance and scalability as a key factor for Wikipedia’s growth
- Chapter 24. High Availability and Load Balancing in PostgreSQL documentation
- Chapter 16. Replication in MySQL documentation
- http://www.google.com/search?q=read%2Fwrite+master+database+and+read-only+slaves
Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
I’m not sure – I’m eager to read answers from experts – but I think the schemas are identical in traditional replication scenari (the tuning may be different though). Maybe people are doing more exotic things but I wonder if they rely on database replication in that case, it sounds more like “real-time ETL”.
How do you insure that data written to one database is immediately available for reading from the second?
I guess you would need synchronous replication for that (which is of course slower than asynchronous). While some databases do support this mode, not all do AFAIK. But have a look at this answer or this one for SQL Server.