Processing, analyzing, and summarizing data using visualizations
We're working in real estate now, and since we want to do well, we really want to build an algorithm that helps us analyze data and predict housing prices. But let's think about that for a second. We can define that problem very broadly or narrowly. We can do a pricing analysis for all houses in a state or houses with three bedrooms or more in a neighborhood. Does performing the analysis matter? Maybe. But isn't that why we want to look at this problem?
Let's take a look at how we can process the data first.
Processing data
Let's start by gathering some data. For this problem, we're using the kv_house_data.csv
dataset, which is available in our GitHub repository. To look at this dataset, we'll need quite a few libraries. We've been talking about pandas mostly, yes, but we want to also do visualizations and perform some analysis, so we'll need Seaborn, SciPy, and Scikit...