How can I improve this PHP/MySQL news feed?

Really cool question. I’m actually in the middle of implementing something like this myself. So, I’m going to think out loud a bit.

Here’s the flaws I see in my mind with your current implementation:

  1. You are processing all of the friends for all users, but you will end up processing the same users many times due to the fact that the same groups of people have similar friends.

  2. If one of my friends posts something, it won’t show up on my news feed for at most 5 minutes. Whereas it should show up immediately, right?

  3. We are reading the entire news feed for a user. Don’t we just need to grab the new activities since the last time we crunched the logs?

  4. This doesn’t scale that well.

The newsfeed looks like the exact same data as the activity log, I would stick with that one activity log table.

If you shard your activity logs across databases, it will allow you to scale easier. You can shard your users if you wish as well, but even if you have 10 million user records in one table, mysql should be fine doing reads. So whenever you lookup a user, you know which shard to access the user’s logs from. If you archive your older logs every so often and only maintain a fresh set of logs, you won’t have to shard as much. Or maybe even at all. You can manage many millions of records in MySQL if you are tuned even moderately well.

I would leverage memcached for your users table and possibly even the logs themselves. Memcached allows cache entries up to 1mb in size, and if you were smart in organizing your keys you could potentially retrieve all of the most recent logs from the cache.

This would be more work as far as architecture is concerned, but it will allow you to work in real-time and scale out in the future…especially when you want users to start commenting on each posting. 😉

Did you see this article?

http://bret.appspot.com/entry/how-friendfeed-uses-mysql

Leave a Comment