Build Your Own Movie Recommender System Using BERT4Rec

Step-by-Step implementation of a Transformer-based recommender system in PyTorch

Recommendation algorithms are a core part of a lot of services that we use every day, from video recommendations on YouTube to shopping items on Amazon, without forgetting Netflix. In this post, we will implement a simple but powerful recommendation system called BERT4Rec: Sequential Recommendation with Bidirectional

Encoder Representations from Transformer. We will apply this model to movie recommendations on a database of around 60,000 movies.

The Task

Our objective is to recommend movies to users given the history of the movies they already watched in the past. This recommendation is learned directly from the data and is personalized for each user.


The Data

We will use the MovieLens-25m dataset (https://grouplens.org/datasets/movielens/25m/). It is a dataset that logs interaction between 162541 users and 62423 movies.

We can construct the time-sorted sequence of movies that they interacted with for each user. We will use these sequences to train our recommendation system.

The Model

BERT4Rec is a lot like regular BERT for NLP. It is a Transformer network that is trained to predict “masked” movies from a user’s history.

The first step is to construct the user’s history in the form of a time-sorted list of movies.


Some of those movies are replaced by a token [MASK].


The task of the BERT4Rec Model is then trained to try to predict the correct values of th