Key data concept – basic row/column structure of a data table
If you already understand the difference between rows and columns, great! You can skip this section. If you are not sure, then read on because your understanding of this concept is essential for data analysis and report authoring.
The columns in the dataset represent how the information has been categorized. They exist even if there is no data. Most people these days are familiar with Microsoft Excel. When you start a new spreadsheet, one of the first things you probably do is decide what types of information you are going to add; for example, First Name, Last Name, and Department in a simple human resources spreadsheet.
Once you have structured your spreadsheet in this way with column headings, you can begin to add the actual information, row by row. Your columns don't change in number or description, but your rows grow and shrink in number, and changes might be made to the information at any time.
One important distinction between data tables and spreadsheet workbooks is the way in which almost everyone manipulates the visual layout of the rows in a spreadsheet. You might, for instance, not repeat a department value until it changes; you might merge cells to improve the look and feel. You cannot do this with data tables. If the department column value for the first three rows is Marketing, then Marketing must be repeated in each row.
Data content (the values in the cells) can change, but it is always filtered and selected through references to the column names. For example, the request, show me all the records for Marketing, might produce zero rows or several million rows, depending on how many records the query finds with the word Marketing in the Department column.
In a spreadsheet, it is easy to build calculations that reference any cell in the matrix. If you want to make a calculation in a data table, such as sale amount - cost amount, you can only do that across each row. You cannot subtract the cost amount in one row from the sale amount in another. This is a key distinction between data tables and spreadsheets. It might seem like a limitation in a data table, but the discipline of that structural integrity ultimately allows you to create very powerful analyses, and there are ways to change the structure of a data table into new forms to support a particular calculation requirement.
Reports and visualizations are built around column names. For example, you might want to create a simple sales by region visualization. What you are doing is putting the sales column against the region column and asking the visualization engine to populate the chart or graph with whatever row values are present beneath those columns in the given data set.