In this article, we will look at machine learning based recommendations using Julia. We will make recommendations using a Julia package called 'Recommendation'.
In order to ensure that your code will produce the same results as described in this article, it is recommended to use the same package versions. Here are the external packages used in this tutorial and their specific versions:
CSV@v.0.4.3 DataFrames@v0.15.2 Gadfly@v1.0.1 IJulia@v1.14.1 Recommendation@v0.1.0+
In order to install a specific version of a package you need to run:
pkg> add PackageName@vX.Y.Z
For example:
pkg> add IJulia@v1.14.1
Alternatively, you can install all the used packages by downloading the Project.toml file provided on GitHub. You can use pkg> instantiate as follows:
julia> download("https://raw.githubusercontent.com/PacktPublishing/Julia-Projects/master/Chapter07/Project.toml", "Project.toml") pkg> activate . pkg> instantiate
Julia's ecosystem provides access to Recommendation.jl, a package that implements a multitude of algorithms for both personalized and non-personalized recommendations. For model-based recommenders, it has support for SVD, MF, and content-based recommendations using TF-IDF scoring algorithms.
There's also another very good alternative—the ScikitLearn.jl package (https://github.com/cstjean/ScikitLearn.jl). This implements Python's very popular scikit-learn interface and algorithms in Julia, supporting both models from the Julia ecosystem and those of the scikit-learn library (via PyCall.jl). The Scikit website and documentation can be found at http://scikit-learn.org/stable/. It is very powerful and definitely worth keeping in mind, especially for building highly efficient recommenders for production usage. For learning purposes, we'll stick to Recommendation, as it provides for a simpler implementation.
For our learning example, we'll use Recommendation. It is the simplest of the available options, and it's a good teaching device, as it will allow us to further experiment with its plug-and-play algorithms and configurable model generators.
Before we can do anything interesting, though, we need to make sure that we have the package installed:
pkg> add Recommendation#master julia> using Recommendation
The workflow for setting up a recommender with Recommendation involves three steps:
Let's implement these steps.
Recommendation uses a DataAccessor object to set up the training data. This can be instantiated with a set of Event objects. A Recommendation.Event is an object that represents a user-item interaction. It is defined like this:
struct Event user::Int item::Int value::Float64 end
In our case, the user field will represent the UserID, the item field will map to the ISBN, and the value field will store the Rating. However, a bit more work is needed to bring our data in the format required by Recommendation:
What this means is that, for example, we only have 69 users in our dataset (as confirmed by unique(training_data[:UserID]) |> size), with the largest ID being 277,427, while for books we have 9,055 unique ISBNs. If we go with this, Recommendation will create a 277,427 x 9,055 matrix instead of a 69 x 9,055 matrix. This matrix would be very large, sparse, and inefficient.
Therefore, we'll need to do a bit more data processing to map the original user IDs and the ISBNs to sequential integer IDs, starting from 1.
We'll use two Dict objects that will store the mappings from the UserID and ISBN columns to the recommender's sequential user and book IDs. Each entry will be of the form dict[original_id] = sequential_id:
julia> user_mappings, book_mappings = Dict{Int,Int}(), Dict{String,Int}()
We'll also need two counters to keep track of, and increment, the sequential IDs:
julia> user_counter, book_counter = 0, 0
We can now prepare the Event objects for our training data:
julia> events = Event[] julia> for row in eachrow(training_data) global user_counter, book_counter user_id, book_id, rating = row[:UserID], row[:ISBN], row[:Rating] haskey(user_mappings, user_id) || (user_mappings[user_id] = (user_counter += 1)) haskey(book_mappings, book_id) || (book_mappings[book_id] = (book_counter += 1)) push!(events, Event(user_mappings[user_id], book_mappings[book_id], rating)) end
This will fill up the events array with instances of Recommendation.Event, which represents a unique UserID, ISBN, and Rating combination. To give you an idea, it will look like this:
julia> events 10005-element Array{Event,1}: Event(1, 1, 10.0) Event(1, 2, 8.0) Event(1, 3, 9.0) Event(1, 4, 8.0) Event(1, 5, 8.0) # output omitted #
Now, we are ready to set up our DataAccessor:
julia> da = DataAccessor(events, user_counter, book_counter)
At this point, we have all that we need to instantiate our recommender. A very efficient and common implementation uses MF—unsurprisingly, this is one of the options provided by the Recommendation package, so we'll use it.
The idea behind MF is that, if we're starting with a large sparse matrix like the one used to represent user x profile ratings, then we can represent it as the product of multiple smaller and denser matrices. The challenge is to find these smaller matrices so that their product is as close to our original matrix as possible. Once we have these, we can fill in the blanks in the original matrix so that the predicted values will be consistent with the existing ratings in the matrix:
Our user x books rating matrix can be represented as the product between smaller and denser users and books matrices.
To perform the matrix factorization, we can use a couple of algorithms, among which the most popular are SVD and Stochastic Gradient Descent (SGD). Recommendation uses SGD to perform matrix factorization.
The code for this looks as follows:
julia> recommender = MF(da) julia> build(recommender)
We instantiate a new MF recommender and then we build it—that is, train it. The build step might take a while (a few minutes on a high-end computer using the small dataset that's provided on GitHub).
If we want to tweak the training process, since SGD implements an iterative approach for matrix factorization, we can pass a max_iter argument to the build function, asking it for a maximum number of iterations. The more iterations we do, in theory, the better the recommendations—but the longer it will take to train the model. If you want to speed things up, you can invoke the build function with a max_iter of 30 or less—build(recommender, max_iter = 30).
We can pass another optional argument for the learning rate, for example, build (recommender, learning_rate=15e-4, max_iter=100). The learning rate specifies how aggressively the optimization technique should vary between each iteration. If the learning rate is too small, the optimization will need to be run a lot of times. If it's too big, then the optimization might fail, generating worse results than the previous iterations.
Now that we have successfully built and trained our model, we can ask it for recommendations. These are provided by the recommend function, which takes an instance of a recommender, a user ID (from the ones available in the training matrix), the number of recommendations, and an array of books ID from which to make recommendations as its arguments:
julia> recommend(recommender, 1, 20, [1:book_counter...])
With this line of code, we retrieve the recommendations for the user with the recommender ID 1, which corresponds to the UserID 277427 in the original dataset. We're asking for up to 20 recommendations that have been picked from all the available books.
We get back an array of a Pair of book IDs and recommendation scores:
20-element Array{Pair{Int64,Float64},1}: 5081 => 19.1974 5079 => 19.1948 5078 => 19.1946 5077 => 17.1253 5080 => 17.1246 # output omitted #
In this article, we learned how to make recommendations with machine learning in Julia. To learn more about machine learning recommendation in Julia and testing the model check out this book Julia Programming Projects.
YouTube to reduce recommendations of ‘conspiracy theory’ videos that misinform users in the US
How to Build a music recommendation system with PageRank Algorithm
How to build a cold-start friendly content-based recommender using Apache Spark SQL