Recommender Systems

A Recommender System aims to find and suggest items of likely interest based on the users’ preferences

Types

There are various types of recommender systems, each with their own pros and cons.

Terms

There are various terms that are used in this branch of study which we will come across frequently.

Noun	Desc
Long Tail	Business strategy of selling niche products to many customers
Cold Start	A new user does not have any available data, so we can't make a personalised recommendation
Serendipity	Recommend an item that a customer likes, even though is not sought by them
Implicit	Directly given, e.g. user input rating
Explicit	Inferred data from e.g. page-views, purchases etc.
Sparsity	Recommending items in the long tail is hard as there are very view user ratings

Explicit vs Implicit Data

While it might be good to have explicit data to build a recommender, they can come with its inherent problems.

Very sparse - users tend to be lazy, often don’t bother to rate
Users may not always tell the truth? E.g. Influenced by peer/friend opinions
May not be available at all if no system in place to collect them
Watching what the user actually does (e.g. what they view or buy) may be more reliable / accurate than ratings

Hence, we might want to consider using implicit data at times. However, we need to note that they often do not come with a like or dislike scale, but rather just a degree of like (e.g. number views to a page).

We usually treat implicit data as a binary, e.g. a purchase = like. However, we will not be able to use Cosine Similarity (since it will always be 1), but rather Jaccard Similarity as a result.

Dealing with Big Data

Recommender Systems often have huge data that requires preprocessing or manipulation. Below are some of the ways to achieve it.

User Sampling - e.g. only choose active users
Item Sampling - e.g. cut off "super" long tail items
Item Categorisation
Use a different Algorithm, e.g. Item-CF, instead of User-CF if too many users

Evaluation

Mean Absolute Error (MAE) calculates the absolute error of the recommendation.

Mean Average Precision at K (MAP@K) is done by calculating the MA of number of relevant items within the top K recommendations.

Lift calculates the number of hits of the model at top K, over the number of hits at random.

Mean Percentage Ranking (MPR). MPR < 50% means better than random guess.