Introduction to data modeling
Data is often provided to you in a form that isn't completely suitable for analysis and modeling. As an example, suppose you are trying to summarize and analyze the sales of students selling cookies in an effort to raise money for a school trip. You would like to get an idea of the expected sales per student per week, in order to recognize students putting in effort and achieving higher sales. Unfortunately, the data for any given student comes in at somewhat random times, making comparisons more difficult. You decide to take each student's sales and fill in the missing days by interpolating between the days for which you have data. The process is quite tedious, and part-way through, you realize you will also have to go back and divide each day by the weekly total, otherwise you are inflating the total sales. Pandas provides the .resample()
method you saw in Chapter 9, Data Modeling – Preprocessing, and by combing that with a .rolling...