Monday, May 13, 2013

Normalization in SQL Server

                       Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The Normal Forms

                        The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF).

First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database:
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
  •  
  • Second Normal Form (2NF)

  • Second normal form (2NF) further addresses the concept of removing duplicative data:
  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:
  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.
  • Imagine that we have a table of widget orders that contains the following attributes:
    • Order Number
    • Customer Number
    • Unit Price
    • Quantity
    • Total
 Remember, our first requirement is that the table must satisfy the requirements of 1NF and 2NF. Are there any duplicative columns? No. Do we have a primary key? Yes, the order number. Therefore, we satisfy the requirements of 1NF. Are there any subsets of data that apply to multiple rows? No, so we also satisfy the requirements of 2NF.

Now, are all of the columns fully dependent upon the primary key? The customer number varies with the order number and it doesn't appear to depend upon any of the other fields. What about the unit price? This field could be dependent upon the customer number in a situation where we charged each customer a set price. However, looking at the data above, it appears we sometimes charge the same customer different prices. Therefore, the unit price is fully dependent upon the order number. The quantity of items also varies from order to order, so we're OK there.

What about the total? It looks like we might be in trouble here. The total can be derived by multiplying the unit price by the quantity, therefore it's not fully dependent upon the primary key. We must remove it from the table to comply with the third normal form. Perhaps we use the following attributes:
  • Order Number
  • Customer Number
  • Unit Price
  • Quantity
Now our table is in 3NF. But, you might ask, what about the total? This is a derived field and it's best not to store it in the database at all. We can simply compute it "on the fly" when performing database queries. For example, we might have previously used this query to retrieve order numbers and totals:

 SELECT OrderNumber, Total
 FROM WidgetOrders
 
We can now use the following query:

 SELECT OrderNumber, UnitPrice * Quantity AS Total
 FROM WidgetOrders
 
 

Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
  • Meet all the requirements of the third normal form.
  • Every determinant must be a candidate key.
 
Examples: 
Conisder a database table that stores employee information and has the attributes employee_id, first_name, last_name, title. In this table, the field employee_id determines first_name and last_name. Similarly, the tuple (first_name, last_name) determines employee_id.
 
 
 
Examples with Explanation:
 
 
1.    1NF (Eliminating Repeating Groups)

•    Make a separate table for each set of related attributes and give each table a primary key. 
 
In simple words I can say it as

•    There are no repeating or duplicate fields.

•    Each cell contains only a single value.

•      Each record is unique and identified by primary key

Let’s see the example below :

Prior to Normalization

Item    Colors         Price    Tax

Pen       red, blue        2.0        0.20

Scale     red, yellow     2.0       0.20

Pen        red, blue       2.0        0.20

Bag      blue, black    150.00    7.80 
 
This table is not in first normal form because:

 •      A. There are multiple fields in color lab.

 •      B. Records are repeating (Duplicate records) or no primary key.

First Normal Form (1NF)

Item    Colors    Price    Tax

Pen         red         2.0          0.20

Pen        blue        2.0         0.20

Scale       red       2.0         0.20

Scale      yellow    2.0        0.20

Bag        blue      150.00    7.80

Bag       black      150.00    7.80

•    This table is now in first normal form. 



2.    2 NF (Eliminating Redundant Data)

•    If an attribute is dependent on only part of the multivalued key, then remove it to a separate table.


 In simple words,
•    It should meet all the requirements of the first normal form.
.  It should remove subsets of data that apply to multiple rows of a table and place them in separate tables.
.  It create relationships between these new tables and their predecessors through the use of foreign keys.
The First Normal form deals with the atomicity whereas the Second Normal Form deals with the relationship between the composite key columns and non-key columns. To achieve the next progressive level your table should satisfy the requirement of First Normal Form then move towards the Second Normal Form.
Let’s introduce a Review table as an example :
Item    Colors    Price    Tax
Pen        red           2.0        0.20
Pen        blue          2.0        0.20
Scale       red         2.0         0.20
Scale       yellow    2.0        0.20
Bag         blue       150.00    7.80
Bag        black    150.00    7.80
Table is not in Second Normal Form because the price and tax depends on the item, but not color.
Item    Colors
Pen    red
Pen    blue
Scale    red
Scale    yellow
Bag    blue
Bag    black

Item    Price    Tax
Pen        2.0         0.20
Scale      2.0        0.20
Bag     150.00     7.80
Tables are now in Second Normal Form.


3.    3NF (eliminate Columns not dependent on the Key)
•    If attributes do not contribute description of the key, then remove it to the separate table. All the attributes must be directly dependent on the primary key. In simple words,
•    It should meet all the requirements of the second normal form.
•     It should remove columns that are not dependent upon the primary key.
In the Third Normal Form all columns depend upon the primary key. When one column depends upon the other column, tables break the rule and turn into the dependency on the primary key.


Item    Colors
Pen        red
Pen      blue
Scale    red
Scale    yellow
Bag      blue
Bag      black

Item    Price    Tax
Pen        2.0    0.20
Scale       2.0    0.20
Bag       150.00    7.80
Tables are not in Second Normal Form because tax depends on price, not item.
Item      Colors
Pen         red
Pen         blue
Scale      red
Scale      yellow
Bag       blue
Bag       black

Item       Price
Pen          2.0
Scale       2.0
Bag          150.00

Price       Tax
2.0           0.20
150.00     7.80

Tables are now in Third Normal Form.



 
 
 
 
 
 





No comments:

Post a Comment