Skip to content

Recommender Systems

A Recommender System aims to find and suggest items of likely interest based on the users’ preferences

Types

There are various types of recommender systems, each with their own pros and cons.


Terms

There are various terms that are used in this branch of study which we will come across frequently.

Noun Desc
Long Tail Business strategy of selling niche products to many customers
Cold Start A new user does not have any available data, so we can't make a personalised recommendation
Serendipity Recommend an item that a customer likes, even though is not sought by them
Implicit Directly given, e.g. user input rating
Explicit Inferred data from e.g. page-views, purchases etc.
Sparsity Recommending items in the long tail is hard as there are very view user ratings

Explicit vs Implicit Data

While it might be good to have explicit data to build a recommender, they can come with its inherent problems.

  • Very sparse - users tend to be lazy, often don’t bother to rate
  • Users may not always tell the truth? E.g. Influenced by peer/friend opinions
  • May not be available at all if no system in place to collect them
  • Watching what the user actually does (e.g. what they view or buy) may be more reliable / accurate than ratings

Hence, we might want to consider using implicit data at times. However, we need to note that they often do not come with a like or dislike scale, but rather just a degree of like (e.g. number views to a page).

We usually treat implicit data as a binary, e.g. a purchase = like. However, we will not be able to use Cosine Similarity (since it will always be 1), but rather Jaccard Similarity as a result.


Dealing with Big Data

Recommender Systems often have huge data that requires preprocessing or manipulation. Below are some of the ways to achieve it.

  1. User Sampling - e.g. only choose active users
  2. Item Sampling - e.g. cut off "super" long tail items
  3. Item Categorisation
  4. Use a different Algorithm, e.g. Item-CF, instead of User-CF if too many users

Evaluation

Mean Absolute Error (MAE) calculates the absolute error of the recommendation.

Mean Average Precision at K (MAP@K) is done by calculating the MA of number of relevant items within the top K recommendations.

Lift calculates the number of hits of the model at top K, over the number of hits at random.

Mean Percentage Ranking (MPR). MPR < 50% means better than random guess.