I am currently working on an application which references the Microsoft Business Contact Manager SQL Server database add-on for Outlook. The main Contact table in the db has several sub tables which have identical keys to the main table and a small number of columns; for example, an email address table which has 3 columns: contactID, EmailAddress, and EmailDisplayAs. Because of this approach, you need to use a view with multiple outer joins just to see all the columns which make up each Contact record. I have never seen a database with this kind of structure; it seems messy to say the least. I would be interested to see comments on why one would take this approach.
Basically there are two general approaches to optional data:
- Include them in the tablea s nullable (ie optional) columns; or
- Use one-to-one relationships to child tables that contain logical groupings of attributes.
Personally I think that unless you’re dealing with an extremely large number of optional attributes you should favour nullable columns. The overhead is so low it’s negligible–particulalry when compared to doing lots of joins. Plus if you have child tables via one-to-one relationships you’re storing the primary key again (as a foreign key in the child table) so any storage overhead of nullable columns is in fact less.
But anyway, that’s all that is.
Some reasons could be:
- Most of the queries only run against the initial set of data, making for faster queries in general
- The additional tables were added after the initial development and an ALTER TABLE command may have been prohibitively expensive
- There was an initial intention to make contacts of one specific type or another but descended from a parent type
- There was a need for multiple child rows for a given parent (though this would be in the case of an indexed key as opposed to a unique/primary key, so it might not fly)
I’ve seen systems where a structure like this would actually speed things up a bit because of the smaller queries and the smaller associated tables being kept in a memory cache at the logic layer).
This sounds like basic normalization to me. Contact is a parent entity, and has a number of attributes that make up one-to-many relationships. Novice designers sometimes store these in the same table, and you get columns like phone1, phone2, phone3, email1, email2, etc.
Normalization allows for efficient storage of information, and provides a structure whereby you can get back to the multiple attribute record (the view in the question).
So I would expect to see a contact table(with a primary key of contactId), and a phone table(with a foreign key of contactId) and an email table(with a foreign key of contactId).
If you are not familiar with normalization, primary keys and foreign keys I recommend you pick up a book like Database Design for Mere Mortals.
This is probably either good normalization, or an optimization for faster queries (smaller table width means more rows fit into meory at one time means faster queries), as others have noted.
You’ll still have to insert/update to multiple tables, but if you want an easier way to select the data, write a view that uses left outer joins from the main table to join up all the data.
A design with multiple tables sharing a common key sometimes permits data to be omitted without using the NULL marker. For example, if a contact in the Outlook table has no known email address, it’s possible to simply omit a row from the table you mentioned.
While the design may seem messy to you, keeping NULL markers out of the base tables avoids a number of messes cuased by the three value logic used by SQL in the presence of NULLS. This is sometimes called 6NF.
If you need to see them all as one table joined by outer joins, you can make a view. I’m a little surprised that there isn’t such a view built in.