Location comes in different forms, but what if it comes in a simple structured data format and we overlooked it all this time? Most machine learning algorithms, such as random forests, are geared toward creating insights from structured data in tabular form. In this chapter, we will discuss how to leverage spatial data that is masquerading as tabular data and apply machine learning techniques to it as any data scientist would. For this chapter, we will be using New York taxi trip data to predict trip duration for any given New York taxi trip. We are choosing this dataset because of the following reasons:
- Predicting trip duration has the right mix of geospatial analytics and machine learning
- Finding the time it takes to travel from point A to point B is a routing problem, which will be dealt with in Chapter 6, Let's Build a Routing...