Artificial Intelligence and Big Data
In this chapter, we are going to learn what big data is and how big data technologies can be used in the context of artificial intelligence. We will discuss how big data can help accelerate machine learning pipelines. We will also discuss when it is a good idea to use big data techniques and when they are overkill, using some examples to further our understanding. We will learn about the building blocks of a machine learning pipeline that uses big data and the various challenges involved, and we will create an environment in Python to see how it works in practice. By the end of this chapter, we will have covered:
- Big data basics
- The three V's of big data
- Big data as it applies to artificial intelligence and machine learning
- A machine learning pipeline using big data
- Apache Hadoop
- Apache Spark
- Apache Impala
- NoSQL databases
Let's begin with the basics of big data.