Processing Data in Machine Learning Systems
We talked about data in Chapter 3, where we introduced the types of data that are used in machine learning systems. In this chapter, we’ll dive deeper into ways in which data and algorithms are entangled. We’ll talk about data in generic terms, but in this chapter, we’ll explain what kind of data is needed in machine learning systems. I’ll explain the fact that all kinds of data are used in numerical form – either as a feature vector or as more complex feature matrices. Then, I’ll explain the need to transform unstructured data (for example, text) into structured data. This chapter will lay the foundations for diving deeper into each type of data, which is the content of the next few chapters.
In this chapter, we will do the following:
- Discuss the process of measurement (obtaining numerical data) and the measurement instruments that are used in that process
- Visualize numerical data...