Database Normalization
01Why normalization exists
Imagine storing a student's course and the course instructor in the same row as the student record. If an instructor changes departments, you must update every row for every student in that course — miss one and your database now holds two different "truths." Normalization eliminates this by ensuring each fact lives in exactly one place.
02First Normal Form (1NF)
1NF requires that every column contains only indivisible (atomic) values and that each row is unique. A column storing "Math, Physics, Chemistry" as a single string violates 1NF. The fix is to split those values into separate rows or a separate table. This is the foundation — all higher normal forms assume 1NF is already satisfied.
-- Violates 1NF (multiple values in one cell)
Student | Courses
--------|--------------------------
Alice | Math, Physics, Chemistry
-- 1NF compliant
Student | Course
--------|--------
Alice | Math
Alice | Physics
Alice | Chemistry03Second Normal Form (2NF)
2NF applies only when the primary key is composite (multiple columns). A non-key attribute that depends on only one part of that composite key is called a partial dependency — and 2NF eliminates them. For example, if (StudentID, CourseID) is the key but InstructorName only depends on CourseID, InstructorName belongs in a separate Courses table, not the enrollment table.
04Third Normal Form (3NF)
A transitive dependency exists when a non-key column A determines another non-key column B. Example: a table has ZipCode and City — ZipCode determines City, but neither is the primary key. City is transitively dependent on the primary key via ZipCode. 3NF removes this by moving ZipCode → City into its own lookup table. The informal rule: "every non-key attribute must depend on the key, the whole key, and nothing but the key."
05BCNF and when to stop
Boyce-Codd Normal Form closes a loophole in 3NF where anomalies can still occur with certain multi-valued dependencies. In practice, most real-world schemas are designed to 3NF because BCNF can sometimes force lossless decompositions that make queries harder. Knowing when to denormalize intentionally (for read performance) is just as important as knowing how to normalize.