Data points and datasets
In data analysis, it is convenient to think of the data as points of information. For example, in a collection of biographical data, each data point would contain information about one person. Consider the following data point:
("Adams", "John", "M", 26, 704601929)
It could represent a 26
-year-old male named John Adams
with ID number 704601929
.
We call the individual data values in a data point fields (or attributes). Each of these values has its own type. The preceding example has five fields: three text and two numeric.
The sequence of data types for the fields of a data point is called its type signature. The type signature for the preceding example is (text, text, text, numeric, numeric). In Java, that type signature would be (String
, String
, String
, int
, int
).
A dataset is a set of data points, all of which have the same type signature. For example, we could have a dataset that represents a group of people, each point representing a unique member of the group. Since...