Database design and database normalization
A good database design is vital for a client/server application. It is important to think about the design of the tables among each other to optimize data storage, i.e. in which table should each quantity of information be placed, and how this table should be linked to the information in other tables. The normalization process helps here as it avoids double data storage as well as unnecessary wastage of space; data access becomes considerably more efficient, at the same time improving database performance and data integrity. Special business problems in the database can be solved with the aid of database design; for example, they enable typical relationships between master and detail tables.
Relational databases work best when data is broken up into different tables that are joined together on common columns. This design results in narrower, longer tables, where the primary key is used to access the data, and indices are used to speed this process.
Database models are generally designed to solve specific business problems: they allow typical business data relationships to be represented. This is particularly important, for example, when many detail rows need to be joined to one master row. This is most often done by splitting the data into two or more tables and joining them on a shared column. When data is represented in this way, some duplication is unavoidable. There are always columns that must appear in each table in order to actually create the join. However database models allow you to minimize unnecessary duplication.
These models also ensure that if a value is updated in one table, the matching values are updated in related tables, known as referential integrity.
The IBExpert Database Designer is an ideal tool for data modeling and design, whether creating a model of an existing database for analysis, or designing a new database.
Database normalization
The goal of normalization is to reduce redundant information. In other words, only store one piece of information one time. A table is said to have repeating groups and to be un-normalized if:
- it contains many repetitions of the same piece of information in the same column
- more than one column contains almost the same type of information
- a column consists of complex information that should be broken into several smaller pieces.
Tables without repetitive values are described as normalized. The transition from one design to the other is called normalization.
Five forms of normalization can be differentiated. The first four normalization forms will be described very briefly here, the fifth being an extremely theoretical demand on tables. There is a wide range of specialist literature available on this subject, for those requiring more in-depth information.
Rule zero
The relational theory requires, as a rule, a unique key in each table, in order to identify information clearly. This is composed from the three following:
- The table, in which the data is stored,
- The field in this table, which needs to be accessed,
- The value of the primary key for this data set.
It is clear that the primary key is important for the identification of a data set. At the same time Firebird/InterBase® automatically creates an index via the primary key, so that searches in multi-table queries are much quicker than those without an index.
A table has only one primary key, although the primary key can consist of several columns. So, a simple rule for normalizing databases is - always key your tables!
First normal norm
The first rule of database design states: eliminate repetitive groups. For each group of related columns, make a separate table and give that table a primary key.
A table is said to be in first normal form if all columns contain atomic (i.e. indivisible) values only. This is another way of saying that there are no repeating groups.
First normal form problems
INSERT
anomalies (e.g. certain master data cannot be recorded until an order or sale is placed), UPDATE
anomalies (it is too easy to miss certain entries when updating) and DELETE
anomalies (whole records disappear from the database, including master data).
Second normal norm
The second rule of database design is: If a table column is only dependent upon part of a multicolumn key, this column should be removed to a separate table.
For a table in the second normal form, it must already be in the first normal form, and all non-key-column contents must be dependent upon the complete primary key. The second normal form avoids double storage of information. Tables become narrower, the more the database is normalized, with less duplication of wide column values. Where duplication is unavoidable, it can be made as small as possible by using an ID number.
Second normal form problems
There are no repetitive groups, and all columns are dependent on their table's primary key. However some irregularities can still be found; from the relational viewpoint, certain fields may have no relationship to each other, e.g. a customer telephone number has nothing to do with an order number. It is a customer feature, not an order feature, and leads to storage of redundant data. For this reason, it makes sense to remove this information to a separate table.
Third normal norm
The third normal form is tantamount to the second normal form, as it is also aimed to avoid UDATE
, DELETE
and INSERT
problems. It is mainly concerned with relationships in tables with a single column primary key.
The rule can be defined: when column contents have no connection to the table's primary key, they should be removed to a separate table.
A table is in the third normal form, when each column describes data corresponding to the primary key.
Most operations are carried out on key fields, ensuring a high performance. Details are maintained in their own tables, secure from UPDATE
, DELETE
, and INSERT
anomalies.
Fourth normal norm
The majority of applications need go no further than the third normal form. There are however certain situations, in which the data segmentation needs to be refined. For example, each sales team order needs to be assigned to the sales person responsible, for a planned monthly sales per person summary. Where should this information be stored? A simple solution is to expand the relevant table to include the field SalesContact
.
The problem becomes clear, when it is considered that often more than one call was necessary to result in one sale. The fourth normal form rule is: isolate independent multiple relationships.
There are one or more calls leading to each order. The order position information has nothing to do with the telephone calls made. Therefore the call information is removed to its own table, to ensure that, here also, the independence of information in each table is warranted.
See also:
Keys
Declarative referential integrity versus triggers
back to top of page
<< Firebird for the database expert: episode 6: Why can't I shrink my databases | Database technology articles | Enterprise-wide data model >>