Normalization is the process of efficiently organizing data in a
database. There are two goals of the normalization process:
eliminating
redundant data (
for example, storing the same data in more than one table) and
ensuring data dependencies
make sense (
only storing related data in a table). Both of these are
worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored.
The Normal Forms
The database community has developed a series
of guidelines for ensuring that databases are normalized. These are
referred to as normal forms and are numbered from one (the lowest form
of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF).
First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
-
Second Normal Form (2NF)
- Second normal form (2NF) further addresses the concept of removing duplicative data:
- Meet all the requirements of the first normal form.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
- Meet all the requirements of the second normal form.
- Remove columns that are not dependent upon the primary key.
- Imagine that we have a table of widget orders that contains the following attributes:
- Order Number
- Customer Number
- Unit Price
- Quantity
- Total
Remember, our first requirement is that the table must satisfy the
requirements of 1NF and 2NF. Are there any duplicative columns? No.
Do we have a primary key? Yes, the order number. Therefore, we satisfy
the requirements of 1NF. Are there any subsets of data that apply to
multiple rows? No, so we also satisfy the requirements of 2NF.
Now, are all of the columns fully dependent upon the primary key? The
customer number varies with the order number and it doesn't appear to
depend upon any of the other fields. What about the unit price? This
field could be dependent upon the customer number in a situation where
we charged each customer a set price. However, looking at the data
above, it appears we sometimes charge the same customer different
prices. Therefore, the unit price is fully dependent upon the order
number. The quantity of items also varies from order to order, so we're
OK there.
What about the total? It looks like we might be in trouble here.
The total can be derived by multiplying the unit price by the quantity,
therefore it's not fully dependent upon the primary key. We must remove
it from the table to comply with the third normal form. Perhaps we use
the following attributes:
- Order Number
- Customer Number
- Unit Price
- Quantity
Now our table is in 3NF. But, you might ask, what about the total?
This is a derived field and it's best not to store it in the database at
all. We can simply compute it "on the fly" when performing database
queries. For example, we might have previously used this query to
retrieve order numbers and totals:
SELECT OrderNumber, Total
FROM WidgetOrders
We can now use the following query:
SELECT OrderNumber, UnitPrice * Quantity AS Total
FROM WidgetOrders
Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
- Meet all the requirements of the third normal form.
- Every determinant must be a candidate key.
Examples:
Conisder a database table that stores
employee information and has the attributes employee_id, first_name,
last_name, title. In this table, the field employee_id determines
first_name and last_name. Similarly, the tuple (first_name, last_name)
determines employee_id.
Examples with Explanation:
1. 1NF (Eliminating Repeating Groups)
• Make a separate table for each set of related attributes and give each table a primary key.
In simple words I can say it as
• There are no repeating or duplicate fields.
• Each cell contains only a single value.
• Each record is unique and identified by primary key
Let’s see the example below :
Prior to Normalization
Item Colors Price Tax
Pen red, blue 2.0 0.20
Scale red, yellow 2.0 0.20
Pen red, blue 2.0 0.20
Bag blue, black 150.00 7.80
This table is not in first normal form because:
• A. There are multiple fields in color lab.
• B. Records are repeating (Duplicate records) or no primary key.
First Normal Form (1NF)
Item Colors Price Tax
Pen red 2.0 0.20
Pen blue 2.0 0.20
Scale red 2.0 0.20
Scale yellow 2.0 0.20
Bag blue 150.00 7.80
Bag black 150.00 7.80
• This table is now in first normal form.
2.
2 NF (Eliminating Redundant Data)
• If an attribute is dependent on only part of the multivalued key, then remove it to a separate table.
In simple words,
• It should meet all the requirements of the first normal form.
. It should remove subsets of data that apply to multiple rows of a table and place them in separate tables.
. It create relationships between these new tables and their predecessors through the use of foreign keys.
The First Normal form deals with the atomicity whereas the Second Normal
Form deals with the relationship between the composite key columns and
non-key columns. To achieve the next progressive level your table should
satisfy the requirement of First Normal Form then move towards the
Second Normal Form.
Let’s introduce a Review table as an example :
Item Colors Price Tax
Pen red 2.0 0.20
Pen blue 2.0 0.20
Scale red 2.0 0.20
Scale yellow 2.0 0.20
Bag blue 150.00 7.80
Bag black 150.00 7.80
Table is not in Second Normal Form because the price and tax depends on the item, but not color.
Item Colors
Pen red
Pen blue
Scale red
Scale yellow
Bag blue
Bag black
Item Price Tax
Pen 2.0 0.20
Scale 2.0 0.20
Bag 150.00 7.80
Tables are now in Second Normal Form.
3. 3NF (eliminate Columns not dependent on the Key)
• If attributes do not contribute description of the key,
then remove it to the separate table. All the attributes must be
directly dependent on the primary key. In simple words,
• It should meet all the requirements of the second normal form.
• It should remove columns that are not dependent upon the primary key.
In the Third Normal Form all columns depend upon the primary
key. When one column depends upon the other column, tables break the
rule and turn into the dependency on the primary key.
Item Colors
Pen red
Pen blue
Scale red
Scale yellow
Bag blue
Bag black
Item Price Tax
Pen 2.0 0.20
Scale 2.0 0.20
Bag 150.00 7.80
Tables are not in Second Normal Form because tax depends on price, not item.
Item Colors
Pen red
Pen blue
Scale red
Scale yellow
Bag blue
Bag black
Item Price
Pen 2.0
Scale 2.0
Bag 150.00
Price Tax
2.0 0.20
150.00 7.80
Tables are now in Third Normal Form.