Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
R Data Analysis Projects

You're reading from   R Data Analysis Projects Build end to end analytics systems to get deeper insights from your data

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781788621878
Length 366 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Gopi Subramanian Gopi Subramanian
Author Profile Icon Gopi Subramanian
Gopi Subramanian
Arrow right icon
View More author details
Toc

Table of Contents (9) Chapters Close

Preface 1. Association Rule Mining 2. Fuzzy Logic Induced Content-Based Recommendation FREE CHAPTER 3. Collaborative Filtering 4. Taming Time Series Data Using Deep Neural Networks 5. Twitter Text Sentiment Classification Using Kernel Density Estimates 6. Record Linkage - Stochastic and Machine Learning Approaches 7. Streaming Data Clustering Analysis in R 8. Analyze and Understand Networks Using R

The cross-selling campaign

Let's get back to our retailer. Let's use what we have built so far to provide recommendations to our retailer for his cross-selling strategy.

This can be implemented using the following code:

###########################################################################
#
# R Data Analysis Projects
#
# Chapter 1
#
# Building Recommender System
# A step step approach to build Association Rule Mining
#
#
# Script:
# Generating rules for cross sell campaign.
#
#
# Gopi Subramanian
###########################################################################
library(arules)
library(igraph)
get.txn <- function(data.path, columns){
# Get transaction object for a given data file
#
# Args:
# data.path: data file name location
# columns: transaction id and item id columns.
#
# Returns:
# transaction object
transactions.obj <- read.transactions(file = data.path, format = "single",
sep = ",",
cols = columns,
rm.duplicates = FALSE,
quote = "", skip = 0,
encoding = "unknown")
return(transactions.obj)
}
get.rules <- function(support, confidence, transactions){
# Get Apriori rules for given support and confidence values
#
# Args:
# support: support parameter
# confidence: confidence parameter
#
# Returns:
# rules object
parameters = list(
support = support,
confidence = confidence,
minlen = 2, # Minimal number of items per item set
maxlen = 10, # Maximal number of items per item set
target = "rules"

)

rules <- apriori(transactions, parameter = parameters)
return(rules)
}
find.rules <- function(transactions, support, confidence, topN = 10){
# Generate and prune the rules for given support confidence value
#
# Args:
# transactions: Transaction object, list of transactions
# support: Minimum support threshold
# confidence: Minimum confidence threshold
# Returns:
# A data frame with the best set of rules and their support and confidence values


# Get rules for given combination of support and confidence
all.rules <- get.rules(support, confidence, transactions)

rules.df <-data.frame(rules = labels(all.rules)
, all.rules@quality)

other.im <- interestMeasure(all.rules, transactions = transactions)

rules.df <- cbind(rules.df, other.im[,c('conviction','leverage')])


# Keep the best rule based on the interest measure
best.rules.df <- head(rules.df[order(-rules.df$leverage),],topN)

return(best.rules.df)
}
plot.graph <- function(cross.sell.rules){
# Plot the associated items as graph
#
# Args:
# cross.sell.rules: Set of final rules recommended
# Returns:
# None
edges <- unlist(lapply(cross.sell.rules['rules'], strsplit, split='=>'))

g <- graph(edges = edges)
plot(g)

}
support <- 0.01
confidence <- 0.2
columns <- c("order_id", "product_id") ## columns of interest in data file
data.path = '../../data/data.csv' ## Path to data file
transactions.obj <- get.txn(data.path, columns) ## create txn object
cross.sell.rules <- find.rules( transactions.obj, support, confidence )
cross.sell.rules$rules <- as.character(cross.sell.rules$rules)
plot.graph(cross.sell.rules)

After exploring the dataset for support and confidence values, we set the support and confidence values as 0.001 and 0.2 respectively.

We have written a function called find.rules. It internally calls get.rules. This function returns the list of top N rules given the transaction and support/confidence thresholds. We are interested in the top 10 rules. As discussed, we are going to use lift values for our recommendation. The following are our top 10 rules:

  rules support confidence lift conviction leverage
59 {Organic Hass Avocado} => {Bag of Organic Bananas} 0.03219805 0.3086420 1.900256 1.211498 0.01525399
63 {Organic Strawberries} => {Bag of Organic Bananas} 0.03577562 0.2753304 1.695162 1.155808 0.01467107
64 {Bag of Organic Bananas} => {Organic Strawberries} 0.03577562 0.2202643 1.695162 1.115843 0.01467107
52 {Limes} => {Large Lemon} 0.01846022 0.2461832 3.221588 1.225209 0.01273006
53 {Large Lemon} => {Limes} 0.01846022 0.2415730 3.221588 1.219648 0.01273006
51 {Organic Raspberries} => {Bag of Organic Bananas} 0.02318260 0.3410526 2.099802 1.271086 0.01214223
50 {Organic Raspberries} => {Organic Strawberries} 0.02003434 0.2947368 2.268305 1.233671 0.01120205
40 {Organic Yellow Onion} => {Organic Garlic} 0.01431025 0.2525253 4.084830 1.255132 0.01080698
41 {Organic Garlic} => {Organic Yellow Onion} 0.01431025 0.2314815 4.084830 1.227467 0.01080698
58 {Organic Hass Avocado} => {Organic Strawberries} 0.02432742 0.2331962 1.794686 1.134662 0.01077217

The first entry has a lift value of 1.9, indicating that the products are not independent. This rule has a support of 3 percent and the system has 30 percent confidence for this rule. We recommend that the retailer uses these two products in his cross-selling campaign as, given the lift value, there is a high probability of the customer picking up a {Bag of Organic Bananas} if he picks up an {Organic Hass Avocado}.

Curiously, we have also included two other interest measures—conviction and leverage.

Leverage 

How many more units of A and B are expected to be sold together than expected from individual sales? With lift, we said that there is a high association between the {Bag of Organic Bananas} and {Organic Hass Avocado} products. With leverage, we are able to quantify in terms of sales how profitable these two products would be if sold together.  The retailer can expect 1.5 more unit sales by selling the {Bag of Organic Bananas} and the {Organic Hass Avocado}  together rather than selling them individually. For a given rule A => B:

Leverage(A => B) = Support(A => B) - Support(A)*Support(B)

Leverage measures the difference between A and B appearing together in the dataset and what would be expected if A and B were statistically dependent.

Conviction

Conviction is a measure to ascertain the direction of the rule. Unlike lift, conviction is sensitive to the rule direction. Conviction (A => B) is not the same as conviction (B => A).

For a rule A => B:

conviction ( A => B) = 1 - support(B) / 1 - confidence( A => B)

Conviction, with the sense of its direction, gives us a hint that targeting the customers of Organic Hass Avocado to cross-sell will yield more sales of Bag of Organic Bananas rather than the other way round.

Thus, using lift, leverage, and conviction, we have provided all the empirical details to our retailer to design his cross-selling campaign. In our case, we have recommended the top 10 rules to the retailer based on leverage. To provide the results more intuitively and to indicate what items could go together in a cross-selling campaign, a graph visualization of the rules can be very appropriate.

The plot.graph function is used to visualize the rules that we have shortlisted based on their leverage values. It internally uses a package called igraph to create a graph representation of the rules:

Our suggestion to the retailer can be the largest subgraph on the left. Items in that graph can be leveraged for his cross-selling campaign. Depending on the profit margin and other factors, the retailer can now design his cross-selling campaign using the preceding output.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image