Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

How to make machine learning based recommendations using Julia [Tutorial]

Save for later
  • 8 min read
  • 08 Feb 2019

article-image

In this article, we will look at machine learning based recommendations using Julia. We will make recommendations using a Julia package called 'Recommendation'.

This article is an excerpt from a book written by Adrian Salceanu titled Julia Programming Projects. In this book, you will learn how to build simple-to-advanced applications through examples in Julia Lang 1.x using modern tools.


In order to ensure that your code will produce the same results as described in this article, it is recommended to use the same package versions. Here are the external packages used in this tutorial and their specific versions:

CSV@v.0.4.3
DataFrames@v0.15.2
Gadfly@v1.0.1
IJulia@v1.14.1
Recommendation@v0.1.0+

In order to install a specific version of a package you need to run:


pkg> add PackageName@vX.Y.Z

For example:


pkg> add IJulia@v1.14.1

Alternatively, you can install all the used packages by downloading the Project.toml file provided on GitHub. You can use pkg> instantiate as follows:


julia> download("https://raw.githubusercontent.com/PacktPublishing/Julia-Projects/master/Chapter07/Project.toml", "Project.toml")
pkg> activate . 
pkg> instantiate


Julia's ecosystem provides access to Recommendation.jl, a package that implements a multitude of algorithms for both personalized and non-personalized recommendations. For model-based recommenders, it has support for SVD, MF, and content-based recommendations using TF-IDF scoring algorithms.

There's also another very good alternative—the ScikitLearn.jl package (https://github.com/cstjean/ScikitLearn.jl). This implements Python's very popular scikit-learn interface and algorithms in Julia, supporting both models from the Julia ecosystem and those of the scikit-learn library (via PyCall.jl). The Scikit website and documentation can be found at http://scikit-learn.org/stable/. It is very powerful and definitely worth keeping in mind, especially for building highly efficient recommenders for production usage. For learning purposes, we'll stick to Recommendation, as it provides for a simpler implementation.

Making recommendations with Recommendation


For our learning example, we'll use Recommendation. It is the simplest of the available options, and it's a good teaching device, as it will allow us to further experiment with its plug-and-play algorithms and configurable model generators.

Before we can do anything interesting, though, we need to make sure that we have the package installed:

 pkg> add Recommendation#master  
 julia> using Recommendation

Please note that I'm using the #master version, because the tagged version, at the time of writing this book, was not yet fully updated for Julia 1.0.


The workflow for setting up a recommender with Recommendation involves three steps:

  1. Setting up the training data
  2. Instantiating and training a recommender using one of the available algorithms
  3. Once the training is complete, asking for recommendations


Let's implement these steps.

Setting up the training data


Recommendation uses a DataAccessor object to set up the training data. This can be instantiated with a set of Event objects. A Recommendation.Event is an object that represents a user-item interaction. It is defined like this:

struct Event 
    user::Int 
    item::Int 
    value::Float64 
end


In our case, the user field will represent the UserID, the item field will map to the ISBN, and the value field will store the Rating. However, a bit more work is needed to bring our data in the format required by Recommendation:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
  1. First of all, our ISBN data is stored as a string and not as an integer.
  2. Second, internally, Recommendation builds a sparse matrix of user *  item and stores the corresponding values, setting up the matrix using sequential IDs. However, our actual user IDs are large numbers, and Recommendation will set up a very large, sparse matrix, going all the way from the minimum to the maximum user IDs.


What this means is that, for example, we only have 69 users in our dataset (as confirmed by unique(training_data[:UserID]) |> size), with the largest ID being 277,427, while for books we have 9,055 unique ISBNs. If we go with this, Recommendation will create a 277,427 x 9,055 matrix instead of a 69 x 9,055 matrix. This matrix would be very large, sparse, and inefficient.

Therefore, we'll need to do a bit more data processing to map the original user IDs and the ISBNs to sequential integer IDs, starting from 1.

We'll use two Dict objects that will store the mappings from the UserID and ISBN columns to the recommender's sequential user and book IDs. Each entry will be of the form dict[original_id] = sequential_id:

julia> user_mappings, book_mappings = Dict{Int,Int}(), Dict{String,Int}()


We'll also need two counters to keep track of, and increment, the sequential IDs:

julia> user_counter, book_counter = 0, 0


We can now prepare the Event objects for our training data:

julia> events = Event[] 
julia> for row in eachrow(training_data) 
 global user_counter, book_counter user_id, book_id, rating = row[:UserID], row[:ISBN], row[:Rating] haskey(user_mappings, user_id) || (user_mappings[user_id] = (user_counter += 1)) haskey(book_mappings, book_id) || (book_mappings[book_id] = (book_counter += 1)) push!(events, Event(user_mappings[user_id], book_mappings[book_id], rating)) end


This will fill up the events array with instances of Recommendation.Event, which represents a unique UserID, ISBN, and Rating combination. To give you an idea, it will look like this:

julia> events 
10005-element Array{Event,1}: 
 Event(1, 1, 10.0) 
 Event(1, 2, 8.0) 
 Event(1, 3, 9.0) 
 Event(1, 4, 8.0) 
 Event(1, 5, 8.0) 
 # output omitted #

Please remember this very important aspect—in Julia, the for loop defines a new scope. This means that variables defined outside the for loop are not accessible inside it. To make them visible within the loop's body, we need to declare them as global.


Now, we are ready to set up our DataAccessor:

julia> da = DataAccessor(events, user_counter, book_counter)

Building and training the recommender


At this point, we have all that we need to instantiate our recommender. A very efficient and common implementation uses MF—unsurprisingly, this is one of the options provided by the Recommendation package, so we'll use it.

Matrix Factorization


The idea behind MF is that, if we're starting with a large sparse matrix like the one used to represent user x profile ratings, then we can represent it as the product of multiple smaller and denser matrices. The challenge is to find these smaller matrices so that their product is as close to our original matrix as possible. Once we have these, we can fill in the blanks in the original matrix so that the predicted values will be consistent with the existing ratings in the matrix:

how-to-make-machine-learning-based-recommendations-using-julia-tutorial-img-0

Our user x books rating matrix can be represented as the product between smaller and denser users and books matrices.


To perform the matrix factorization, we can use a couple of algorithms, among which the most popular are SVD and Stochastic Gradient Descent (SGD). Recommendation uses SGD to perform matrix factorization.

The code for this looks as follows:

julia> recommender = MF(da) 
julia> build(recommender)


We instantiate a new MF recommender and then we build it—that is, train it. The build step might take a while (a few minutes on a high-end computer using the small dataset that's provided on GitHub).

If we want to tweak the training process, since SGD implements an iterative approach for matrix factorization, we can pass a max_iter argument to the build function, asking it for a maximum number of iterations. The more iterations we do, in theory, the better the recommendations—but the longer it will take to train the model. If you want to speed things up, you can invoke the build function with a max_iter of 30 or less—build(recommender, max_iter = 30).

We can pass another optional argument for the learning rate, for example, build (recommender, learning_rate=15e-4, max_iter=100). The learning rate specifies how aggressively the optimization technique should vary between each iteration. If the learning rate is too small, the optimization will need to be run a lot of times. If it's too big, then the optimization might fail, generating worse results than the previous iterations.

Making recommendations


Now that we have successfully built and trained our model, we can ask it for recommendations. These are provided by the recommend function, which takes an instance of a recommender, a user ID (from the ones available in the training matrix), the number of recommendations, and an array of books ID from which to make recommendations as its arguments:

julia> recommend(recommender, 1, 20, [1:book_counter...])


With this line of code, we retrieve the recommendations for the user with the recommender ID 1, which corresponds to the UserID 277427 in the original dataset. We're asking for up to 20 recommendations that have been picked from all the available books.

We get back an array of a Pair of book IDs and recommendation scores:

20-element Array{Pair{Int64,Float64},1}: 
 5081 => 19.1974 
 5079 => 19.1948 
 5078 => 19.1946 
 5077 => 17.1253 
 5080 => 17.1246 
 # output omitted #


In this article, we learned how to make recommendations with machine learning in Julia.  To learn more about machine learning recommendation in Julia and testing the model check out this book Julia Programming Projects.

YouTube to reduce recommendations of ‘conspiracy theory’ videos that misinform users in the US

How to Build a music recommendation system with PageRank Algorithm

How to build a cold-start friendly content-based recommender using Apache Spark SQL