Overnormalization

Question

In the general sense, I think that overnormalized is when you are doing so many JOINs to retrieve data that it is causing notable performance penalties and deadlocks on your database, even after you’ve tuned the heck out of your indexes. Obviously, for huge applications and sites like MySpace or eBay, de-normalization is a scaling requirement.

As a developer for several small businesses, I tell you that in my experience it’s always been easier to go from normalized -> denormalized than the other way around, and in fact going the other way around (to avoid duplication of data now that the business requirements have changed a year or so later) is much more difficult.

When I read general statements such as “you should put the address in your customers table instead of a separate address table so you can avoid the join”, I shudder, because you just know that a year from now somebody’s going to ask you to do something with addresses that you totally didn’t foresee, like maintaining an audit trail, or storing multiple per customer. If your database allows you to create an indexed view, you can sidestep that issue until you get to the point where your dataset is so large that it can’t possibly exist or be served by a single server or set of servers in a 1-write, many-read environment. For most of us, I don’t think that scenario happens very often.

When in doubt, I aim for third normal form with some exceptions (for example, having a field contain a CSV-list of separated strings because I know I’ll never ever look at the data from the other angle). When I need to consolidate, I’ll look at my views or indexes first. Hope this helps.

Leave a Comment Cancel reply