Thursday, 3 July 2014

Difference between normalization and de-normalization


Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data.
More disk space is used as the same data is duplicated in more than one table
DML operations are more expensive as the same data must be maintained in more than one table
You run the risk that the data can get "out of sync"
Denormalized databases fair well under heavy read-load and when the application is read intensive. This is because of the following reasons:
  • The data is present in the same table so there is no need for any joins, hence the selects are very fast.
  • A single table with all the required data allows much more efficient index usage. If the columns are indexed properly, then results can be filtered and sorted by utilizing the same index. While in the case of a normalized table, since the data would be spread out in different tables, this would not be possible.
Although for reasons mentioned above selects can be very fast on denormalized tables, but because the data is duplicated, the updates and inserts become complex and costly.
Example :
COMPANYCODE   DIVISION     CUSTOMER    RECORDS
1001                      LFA                 .115........... 5
1001                      POC                121........... 9
1002                      USA       ...........006........... 2
1002                      IN           ...........568........... 7
Normalized databases fair very well under conditions where the applications are write-intensive and the write-load is more than the read-load. This is because of the following reasons:
  • Normalized tables are usually smaller and have a smaller foot-print because the data is divided vertically among many tables. This allows them to perform better as they are small enough to get fit into the buffer.
  • The updates are very fast because the data to be updated is located at a single place and there are no duplicates.
  • Similarly the inserts are very fast because the data has to be inserted at a single place and does not have to be duplicated.
  • The selects are fast in cases where data has to be fetched from a single table, because normally normalized tables are small enough to get fit into the buffer.
  • Because the data is not duplicated so there is less need for heavy duty group by or distinct queries


Normalization is the process of de-composing a relation with anomalies into a well structured relation.
Example:
COMPANYCODE      CUSTOMER    RECORDS
1001................115..........5
1001................121..........9
1002................006..........2
1002................568..........7
COMPANYCODE   DIVISION    
1001.............LFA         
1001.............POC         
1002.............USA          
1002.............IN      

No comments:

Post a Comment