Normalization in DBMS: Ensuring Data Integrity and Efficiency
Learn about the importance of normalization in database design and its role in reducing redundancy and preventing anomalies. Explore the concept of functional dependencies and understand how normalization techniques (1NF, 2NF, 3NF, BCNF) help achieve a well-structured and efficient database.
DBMS - Normalization
Normalization is a technique used in database design to reduce redundancy and eliminate undesirable characteristics such as update, deletion, and insertion anomalies. By applying normalization, databases can achieve a consistent state, making them easier to maintain and manage.
Functional Dependency
A functional dependency (FD) represents constraints between two sets of attributes in a relation. In an FD, if two tuples have identical values for attributes A1, A2, ..., An, then they must also have identical values for attributes B1, B2, ..., Bn.
It is denoted by an arrow (→), where X → Y means that the attributes on the left-hand side (X) determine the values on the right-hand side (Y).
Armstrong's Axioms
Armstrong's Axioms provide a foundation for generating the closure of a set of functional dependencies, denoted as F+, which includes all dependencies implied by F. These axioms consist of:
- Reflexive Rule – If α is a set of attributes and β is a subset of α, then α determines β.
- Augmentation Rule – If a → b holds, then adding attributes (y) results in ay → by.
- Transitivity Rule – If a → b and b → c hold, then a → c also holds.
Trivial and Non-Trivial Functional Dependency
- Trivial FD – If X → Y holds where Y is a subset of X, then it is called a trivial FD, which always holds.
- Non-Trivial FD – If X → Y holds where Y is not a subset of X, then it is non-trivial.
- Completely Non-Trivial FD – If X → Y holds where X ∩ Y = Φ, then it is completely non-trivial.
Normalization
When database design is imperfect, anomalies can occur:
- Update Anomalies – Data updates in one place may leave duplicates with outdated information, leading to inconsistencies.
- Deletion Anomalies – Attempting to delete information may leave some parts undeleted, resulting in incomplete removal.
- Insertion Anomalies – It may be difficult to add new data due to missing referenced records.
Normalization is used to eliminate these anomalies, ensuring data consistency.
First Normal Form (1NF)
A relation is in First Normal Form if it contains only atomic (indivisible) values. Each attribute must hold a single value from a predefined domain.
Second Normal Form (2NF)
To achieve Second Normal Form, a relation must be in First Normal Form, and every non-prime attribute must be fully dependent on the primary key. A non-prime attribute is not part of any candidate key.
Example: In a Student_Project relation, Stu_ID and Proj_ID together are the primary key. For 2NF compliance, non-key attributes like Stu_Name and Proj_Name should depend on both Stu_ID and Proj_ID combined, rather than individually. This prevents partial dependencies.
Third Normal Form (3NF)
A relation is in Third Normal Form if it is in Second Normal Form, and no non-prime attribute is transitively dependent on the primary key. For any non-trivial FD X → A, either X is a superkey, or A is a prime attribute.
Example: In the Student_detail relation, Stu_ID is the primary key. Since Zip is determined by Stu_ID and in turn determines City, we have a transitive dependency (Stu_ID → Zip → City), which violates 3NF. To achieve 3NF, we split the relation accordingly.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form is a stricter version of 3NF, requiring that for any FD X → A, X must be a superkey. In BCNF, the relations are split so that:
- Stu_ID is the superkey for the Student_Detail relation.
- Zip is the superkey for the ZipCodes relation.
This ensures both relations adhere to BCNF standards.