Database Normalization Detailed , Merits and Demerits




Normalization Concepts

Normalization is a technique for avoiding potential update anomalies, basically by minimizing redundant data in a logical database design. Normalized designs are in a sense “better” designs because they (ideally) keep each data item in only one place. Normalized database designs usually reduce update processing costs but can make query processing more complicated. These trade-offs must be carefully evaluated in terms of the required performance profile of a database. Often, a database design needs to be denormalized to adequately meet operational needs.

Normalizing a logical database design involves a set of formal processes to separate the data into multiple, related tables. The result of each process is referred to as a normal form. Five normal forms have been identified in theory, but most of the time third normal form (3NF) is as far as you need to go in practice. To be in 3NF, a relation (the formal termfor what SQL calls a table and the precise concept on which the mathematical theory of normalization rests) must already be in second normal form(2NF), and 2NF requires a relation to be in first normal form (1NF). Let’s look briefly at what these normal forms mean.

First Normal Form (1NF) In first normal form, all column values are scalar; in other words, they have a single value that can’t be further decomposed in terms of the data model. For example, although individual characters of a string can be accessed through a procedure that decomposes the string, only the full string is accessible by name in SQL, so, as far as the data model is concerned, they aren’t part of the model. Likewise, for a Managers table with a manager column and a column containing a list of employees in Employees table who work for a given manager, the manager and the list would be accessible by name, but the individual employees in the list wouldn’t be. All relations—and SQL tables—are by definition in 1NF since the lowest level of accessibility (known as the table’s granularity) is the column level, and column values are scalars in SQL.

Second Normal Form (2NF) Second normal form requires that attributes (the formal term for SQL columns) that aren’t parts of keys be functionally dependent on a key that uniquely identifies them. Functional dependence basically means that for a given key value, only one value exists in a table for a column or set of columns. For example, if a table contained employees and their titles, and more than one employee could have the same title (very likely), a key that uniquely identified employees wouldn’t uniquely identify titles, so the titles wouldn’t be functionally dependent on a key of the table. To put the table into 2NF,

you’d create a separate table for titles—with its own unique key—and replace the title in the original table with a foreign key to the new table. Note how this reduces data redundancy. The titles themselves now appear only once in the database. Only their keys appear in other tables, and key data isn’t considered redundant (though, of course, it requires columns in other tables and data storage).

Third Normal Form (3NF) Third normal formextends the concept of functional dependence to full functional dependence. Essentially, this means that all nonkey columns in a table are uniquely identified by the whole, not just part of, the primary key. For example, if you revised the hypothetical 1NF Managers-Employees table to have three columns (ManagerName, EmployeeId, and EmployeeName) instead of two, and you defined the composite primary key asManagerName + EmployeeId, the table would be in 2NF (since EmployeeName, the nonkey column, is dependent on the primary key), but it wouldn’t be in 3NF (since EmployeeName is uniquely identified by part of the primary key defined as column named EmployeeId). Creating a separate table for employees and removing EmployeeName from Managers-Employees would put the table into 3NF. Note that even though this table is now normalized to 3NF, the

database design is still not as normalized as it should be. Creating another table for managers using an ID shorter than the manager’s name, though not required for normalization here, is definitely a better approach and is probably advisable for a real-world database.

Drawbacks of Normalization

Database design is an art more than a technology, and applying normalization wisely is always important. On the other hand, normalization inherently increases the number of tables and therefore the number of operations (called joins) required to retrieve data. Because data is not in one table, queries that have a complex join can slow things down. This can cost in the form of CPU usage: the more complex the queries, the more CPU time is required.

Denormalizing one or more tables, by intentionally providing redundant data to reduce

the number or complexity of joins to get quicker query response times, may be necessary. With either normalization or denormalization, the goal is to control redundancy so that the database design adequately (and ideally, optimally) supports the actual use of the database.


Source : Beginning VB 2008 Vidya Vrat Agarwal
and James Huddleston


Please enter your comment!
Please enter your name here