NORMALISATION
A large database defined as a single relation may result in data duplication. This repetition of data may result in:
Making relations very large.
It isn't easy to maintain and update data as it would involve searching many records in relation.
Wastage and poor utilization of disk space and resources.
The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-structured relations that are satisfy desirable properties. Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Database Normalization is used for the following Purpose:
To Eliminate the redundant or useless data
To Reduce the complexity of the data
To Ensure the relationship between tables as well as data in the tables
To Ensure data dependencies and data is logically stored.
Problem in existing system
Example: Student and Course
In a flatfile this would be represented as
STUDENT( Student number, student name, DOB, course number, course name, lecturer
number, lecturer name)
Because this is a many-to-many relationship many course entries would need to be entered . We
do not want repeating attributes such as course1, course2, course3.
1NF (First Normal Form) No repeating attributes
To be in the first normal form, a table should contain no repeating attributes. This avoids data inconsistency.
Divide the information into two entities in standard notation, course number becomes part of the composite key in the student table.
STUDENT( Student number, student name, DOB, Sex, course number)
COURSE (course number , course name, lecturer number, lecturer name)
It contains only automic values.
Automic values:- The Single cell have only single value
2NF (Second Normal Form) Partial key dependence test
For a table to be in the Second Normal Form,
It should be in the First Normal form.
And, it should not have Partial Dependency.
Student name is only dependent on Student number and not on course number)
A third table is needed to link between the two entities.
STUDENT( Student number, student name, DOB, Sex)
STUDENT-TAKES(Student number, course number)
COURSE (course number , course name, lecturer number, lecturer name)
3NF (Third Normal Form) Non-key dependence test
No non-key dependencies are allowed. No field should be dependant on another field (unless the
field is a key field). This prevents any duplication of attributes.
In addition to being in second normal form (2NF), each non-key item in the record is fully
independent of any other non-key data item in the record.
Even though lecturer no. is not a key there are issues here – it is not dependent on course number
which is the key in this table.
We therefore need to create a new relation and table.
A table is said to be in the Third Normal Form when,
- It is in the Second Normal form.
- And, it doesn't have Transitive Dependency.
STUDENT( Student number, student name, DOB, Sex)
STUDENT-TAKES(Student number, course number)
COURSE (course number , course name, lecturer number)
LECTURER(lecturer number, lecturer name)
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
and, for each functional dependency ( X → Y ), X should be a super Key.
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
- It is in the Boyce-Codd Normal Form.
- And, it doesn't have Multi-Valued Dependency.
Advantages of Normalization
Normalization helps to minimize data redundancy.
Greater overall database organization.
Data consistency within the database.
Much more flexible database design.
Enforces the concept of relational integrity.
Disadvantages of Normalization
You cannot start building the database before knowing what the user needs.
The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
It is very time-consuming and difficult to normalize relations of a higher degree.
Careless decomposition may lead to a bad database design, leading to serious problems.
0 Comments