Understanding Data
For most companies, storing and retrieving data is a day-to-day activity. Based on how data is stored, we can broadly classify data as structured or unstructured. Unstructured data, simply put, is data that is not well-organized. Documents, PDFs, and videos fall into this category—they contain a mixture of different data types (text, images, audio, video, and so on) that have no consistent relationship between them. Media and publishing are examples of industries that deal with unstructured data such as this.
In this book, our focus will be on structured data. Structured data is organized according to a consistent structure. As such, structured data can be easily organized into tables. Thanks to its consistent organization, working with structured data is easier, and it can be processed more effectively. Tables are collections of entities or tuples (rows) and attributes (columns).
For example, consider the following table:
For each row, there is a clear relationship; a given student takes a particular subject and achieves a specific score in that subject. The columns are also known as fields, while the rows are known as records.
Data that is presented in tabular form can be stored in a relational database. Relational databases, as the name suggests, store data that has a certain relationship with another piece of data. A Relational Database Management System (RDBMS) is a system that's used to manage relational data. SQL works very well with relational data. Popular RDBMSs include Microsoft SQL Server, MySQL, and Oracle. Throughout this book, we will be working with MySQL. We can use various SQL commands to work with data in relational databases. We'll have a brief look at them in the next section.