Normalization
Normalization is one of the most crucial skills for anyone planning to design and maintain databases. It's a design technique that helps eliminate undesirable characteristics such as insert, update, and delete anomalies and reduces data redundancy. Insert anomalies can come from the lack of primary keys, or the presence of functional dependency. Simply put, you will have duplicate records when there should be none.
If you have a big table with millions of records, the lookup, update, and deletion operations are very time-consuming. The first thing you can do is to give more resources to the server, but that does not scale well. The next thing to do is to normalize the table. This means you try to break up the big table you have into smaller ones and link the smaller tables by relationships using the primary and foreign keys.
This technique was first invented by Edgar Codd, and it has seven distinct forms called normal forms. The list goes from First Normal Form (1NF) to Sixth Normal Form (6NF), and one extra one, which is Boyce-Codd Normal Form (BCNF).
The first normal form states that each cell should contain a single value and each record should be unique. For example, suppose you have a database that stores information about employees. The first normal form implies that each column in your table contains a single piece of information, as shown here.
The second normal form means the database is in first normal, and it must also have a single-column primary key. With the previous example, you don't currently have a single unique column, since the employee name could duplicate, as well as the title and location. To convert it into a second normal form, you can add an ID as a unique identifier.
The third normal form requires the database to be in the second normal form and it is forbidden to have transitive functional dependencies. A transitive functional dependency is when a column in one table is dependent on a different column that is not a primary key. This means that every relationship in the database is between primary keys only. A database is considered normalized if it reaches the third normal form. The table here is in the third normal form, as it has a primary key that can be used to relate to any other tables, without the need for a non-key field:
For further details, you can visit the following site: https://docs.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description.
Now that you have learned all about working with datasets, let's perform an activity to recap everything we have learned so far in this chapter.