In the previous post (machine learning behind the digital revolution) we had a brief overview about machine learning theory and various real world applications implemented using machine learning algorithms.
One of the prevalent applications that apply machine learning are recommendation systems. Facebook, YouTube, Amazon and Netflix are heavily using machine learning algorithms to recommend their products to users and customers. This post introduces commonly used methods for building recommendation systems : Content based recommendation and collaborative filtering.
The implementation of real world recommendation systems may not strictly follow the presented methods. Instead, many companies might develop their recommendation systems with own methods to achieve their business goals. Often this approach might be more suitable to apply the characteristics of their users’ preference and purchasing habits. However, the fundamental idea behind recommendation systems follows one of three techniques explained in this post.
Why are recommendation systems everywhere?
Recommender problems are everywhere. From the user’s point of view, we may notice product recommendations explicitly, but recommendations can sometimes also be implicit and we perceive them as normal product listings without noticing that these are actually tailored to us. The reason why recommendation systems are prevalent is simply that we have too many products in our daily lives. We have large variety of options among movies, books and news articles. It is therefore hard to find what we really like. When it comes to building recommendation systems, the difficulty origins mainly from the lack of user feedback on products they consumed. The collaborative filtering technique is one way to build recommendation systems by overcoming the lack of user feedback data.
Methods for implementing recommendation systems
When we build recommendation systems, there are two ways to formalize the problem in general. One way is Content based recommendation and the other way is Collaborative filtering. There is also an approach to combine both which is called hybrid model. The main difference between content based recommendation and collaborative filtering is what kind of information (data) the machine learning algorithm uses to find out the products users might potentially purchase.
Among all available products, we may know only a small number of products that users rated. This user feedback data along with other product information (represented in a format called feature vector) constitutes a training data to build a good recommendation systems. Ultimately, our goal is to predict the ratings of all products that users have not yet purchased. If we find the predicted rates of non-purchased products are high enough, we are confident to recommend the products to users.
Content Based Recommendation
In the content based recommendation approach, the model represents a product as a feature vector. A feature vector contains product attributes (i.e. product characteristics) which describe the product itself. Product attributes can be anything: the name of the product, its size or its color are few examples of product attributes. Any characteristic that can be relevant to describe a product can be selected or newly created to build the recommendation systems. With given product information, especially with the existing rating information of products, we can build a regression model. The regression model in this scenario is a model to estimate the product rating based on the attributes’ values of the product.
When we build a regression model, the more features we add to the product’s feature vector, the more sophisticated preferences we can model. In general, to make this regression method more successful in prediction, having richer feature vector is considered as a good approach. However, it may cause the overfitting problem as well. Therefore, we need to build a model that can be generalized well in real prediction setting instead of overfitting only to the training data. For that, we must have more ratings available during training stage.
However, getting more data is not always easy as it sounds. As stated in the paper by Peter Norvig The unreasonable effectiveness of data, often data matters more than algorithms. In reality, small or medium sized datasets are still very common and it is not easy to acquire the large set of training data. In recommendation problems, the obstacle is typically represented by the fact that there is not enough data which describes rated products. In this situation, the problem can be solved by using information from other users that already rated. This approach is called collaborative filtering.
In collaborative filtering, product ratings from certain users that are not known are predicted using the information how other consumers have rated them. The fact that considering other consumers feedback rather than product purchasing information of consumer itself is the main difference between collaborative filtering and content based recommendation. The basic idea behind collaborative filtering is that the patterns of ratings by other users may contain relevant information regarding the products. If the prediction model can find consumers that are similar to the ones that liked similar products in history, it can “borrow” the known ratings for other products as well.
There are two approaches to do collaborative filtering. One approach is finding the nearest neighbor among other consumers in terms of purchasing. This means finding similar consumers to the ones that already rated products and use this information. Imagine a situation where we don’t have much information about product except its id. How can we find products that a user is likely to buy in the future? In this situation, if recommendation systems apply content based recommendation approach, they will not work properly since there is not much information to construct relevant product features. To overcome this difficulty we use collaborative filtering which consider all users instead of focusing on individual users. The key idea is to leverage the purchasing experience from other users. The more users we have, the easier the prediction becomes since the likelihood to find other users who did similar purchases on certain products is higher.
The prediction of the product purchase is driven by the similarity of other users. This approach is called nearest neighbor prediction. Nearest neighbor approach is useful technique in collaborative filtering method. First, it is easy to understand and implement. One of open question in using nearest neighbor approach is that how to set the similarity measure. There can be different way to measure the similarity of users.
The nearest neighbor approach is a way to solve the problem by focusing on individual users: it visits all users one by one and calculates the similarity with other users.
However, nearest neighbor approach doesn’t work well if there are not enough neighbors of given users. There is an other approach to overcome this issue which is considering whole consumers data and products data. This method is called matrix factorization. Matrix factorization approach tries to find the missing rating values thanks to the linear regression method and by using user-product matrix. Simply put, consider the estimating missing rates as finding the model that best represents the relationship between consumer data and product data.
In real world systems, the number of products and consumers are often very big. As an example, Amazon and YouTube have millions of users. The required feature set size for their regression models can be easily over millions. This becomes a big burden to compute the parameters of the regression model. The commonly used approach in these cases is factorization, which reduces the degree of freedom. “Factorizing” in this context means trying to decompose the original consumer-product matrix to two sub matrices. Compared to the degree of freedom of the original matrix, decomposed (or factorized) matrices have less degrees of freedom in terms of finding good parameters of recommendation model. This reduces the computation overhead. The ultimate task of matrix factorization is the same as the one of regression models, which is estimating parameter values in recommendation model. The estimation is done by iteratively minimizing the error between predicted rates and actual rates.
We have briefly talked about two approaches in terms of building recommendation systems. One approach is focusing on the rated products and the other is using all rating information from user and product relationship matrix. The performance of recommendation systems depends on the ratings of users already gave. However, in real world, there are lots of users who don’t express their feedback explicitly. Therefore, the interesting part remains as how can we overcome the missing data by using clever algorithms.
Stay up to date with our articles, job postings and events by subscribing to “Snapshot”, our monthly newsletter!
Latest posts by Soojung Hong (see all)
- Collaborative Filtering: the secret behind recommendation systems - September 27, 2017
- Machine learning behind the digital revolution - August 23, 2017
- Deep Link and App Indexing: Increase your app’s visiblity - October 15, 2015