Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Science for Marketing Analytics

You're reading from   Data Science for Marketing Analytics Achieve your marketing goals with the data analytics power of Python

Arrow left icon
Product type Paperback
Published in Mar 2019
Publisher
ISBN-13 9781789959413
Length 420 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Tommy Blanchard Tommy Blanchard
Author Profile Icon Tommy Blanchard
Tommy Blanchard
Debasish Behera Debasish Behera
Author Profile Icon Debasish Behera
Debasish Behera
Pranshu Bhatnagar Pranshu Bhatnagar
Author Profile Icon Pranshu Bhatnagar
Pranshu Bhatnagar
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Data Science for Marketing Analytics
Preface
1. Data Preparation and Cleaning FREE CHAPTER 2. Data Exploration and Visualization 3. Unsupervised Learning: Customer Segmentation 4. Choosing the Best Segmentation Approach 5. Predicting Customer Revenue Using Linear Regression 6. Other Regression Techniques and Tools for Evaluation 7. Supervised Learning: Predicting Customer Churn 8. Fine-Tuning Classification Algorithms 9. Modeling Customer Choice Appendix

Chapter 5: Predicting Customer Revenue Using Linear Regression


Activity 8: Examining Relationships between Storefront Locations and Features about their Area

  1. Load the data from location_rev.csv and then take a look at it:

    import pandas as pd
    
    df = pd.read_csv('location_rev.csv')
    df.head()
  2. Use seaborn's pairplot function to visualize the data and its relationships:

    import seaborn as sns
    %matplotlib inline
    
    sns.pairplot(df)
  3. Use correlations to investigate the relationship between the different variables and the location revenue:

    df.corr()

The resulting correlations should make sense. The more competitors in the area, the lower the revenue of a location, while the median income, loyalty members, and population density are all positively related. A location's age is also positively correlated with revenue, indicating that the longer a location is open, the better known it is and the more customers it attracts (or perhaps, only locations that do well last a long time).

Activity 9: Building a Regression Model to Predict Storefront Location Revenue

  1. Import the data from location_rev.csv and view the first few rows:

    import pandas as pd
    df = pd.read_csv('location_rev.csv')
    df.head()
  2. Create a variable, X, with the predictors in it, and store the outcome (revenue) in a separate variable, y:

    X = df[['num_competitors',
           'median_income',
           'num_loyalty_members',
           'population_density',
           'location_age'
           ]]
    y = df['revenue']
  3. Split the data into training and test sets. Use random_state = 100:

    from sklearn.model_selection import train_test_split
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 100)
  4. Create a linear regression model and fit it on the training data:

    from sklearn.linear_model import LinearRegression
    
    model = LinearRegression()
    model.fit(X_train,y_train)
  5. Print out the model coefficients:

    model.coef_
  6. Print out the model intercept:

    model.intercept_
  7. Produce a prediction for a location that has 3 competitors; a median income of 30,000; 1,200 loyalty members; a population density of 2,000; and a location age of 10:

    single_location = pd.DataFrame({
        'num_competitors': [3],
        'median_income': [30000],
        'num_loyalty_members': [1200],
        'population_density': [2000],
        'location_age': [10]
    })
    
    model.predict(single_location)
  8. Plot the model predictions versus the true values on the test data:

    import matplotlib.pyplot as plt
    %matplotlib inline
    
    plt.scatter(model.predict(X_test),y_test)
    plt.xlabel('Model Predictions')
    plt.ylabel('True Value')
    plt.plot([0, 100000], [0, 100000], 'k-', color = 'r')
    plt.show()
  9. Calculate the correlation between the model predictions and the true values of the test data:

    from scipy.stats.stats import pearsonr
    
    pearsonr(model.predict(X_test),y_test)

The first number shows an extremely high correlation value (just over 0.9, where 1.0 would be a perfect correlation). The second number shows an extremely small p-value, indicating that it's very unlikely that this correlation is due to chance. Taken together, this indicates that our model is working very well on the test data.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image