Big data and the context of large-scale Machine learning
I have covered some of the core aspects of big data in my previous Packt book titled Getting Started with Greenplum for Big Data Analytics. In this section, we will quickly recap some of the core aspects of big data and its impact in the field of Machine learning:
- The definition of large-scale is a scale of terabytes, petabytes, exabytes, or higher. This is typically the volume that cannot be handled by traditional database engines. The following chart lists the orders of magnitude that represents data volumes:
Multiples of bytes
SI decimal prefixes
Binary Usage
Name(Symbol)
Value
Kilobyte (KB)
103
210
Megabyte (MB)
106
220
Gigabyte (GB)
109
230
Terabyte (TB)
1012
240
Petabyte (PB)
1015
250
Exabyte (EB)
1018
260
Zettabyte (ZB)
1021
270
Yottabyte (YB)
1024
280
- Data formats that are referred to in this context are distinct; they are generated and consumed, and need not be structured (for example,...