top of page

Build Your Own Movie Recommender System Using BERT4Rec

Step-by-Step implementation of a Transformer-based recommender system in PyTorch

Recommendation algorithms are a core part of a lot of services that we use every day, from video recommendations on YouTube to shopping items on Amazon, without forgetting Netflix. In this post, we will implement a simple but powerful recommendation system called BERT4Rec: Sequential Recommendation with Bidirectional

Encoder Representations from Transformer. We will apply this model to movie recommendations on a database of around 60,000 movies.

The Task

Our objective is to recommend movies to users given the history of the movies they already watched in the past. This recommendation is learned directly from the data and is personalized for each user.

The Data

We will use the MovieLens-25m dataset ( It is a dataset that logs interaction between 162541 users and 62423 movies.

We can construct the time-sorted sequence of movies that they interacted with for each user. We will use these sequences to train our recommendation system.

The Model

BERT4Rec is a lot like regular BERT for NLP. It is a Transformer network that is trained to predict “masked” movies from a user’s history.

The first step is to construct the user’s history in the form of a time-sorted list of movies.

Some of those movies are replaced by a token [MASK].

The task of the BERT4Rec Model is then trained to try to predict the correct values of the [MASK] items. By doing this, the model will learn useful representations for each movie and also important patterns that exist between movies.

And then for inference, we can just add a [MASK] at the end of a user’s sequence to predict the movie that they will most likely want to want in the future.

BERT4Rec is a regular Transformer architecture like the one used in NLP :

Transformer Layer

Each movie in the sequence is mapped to an embedding vector.

src_items = self.item_embeddings(src_items)

Then, the self-attention is what allows this architecture to model long-range dependencies between elements of the input sequence.

The order is modeled by position embeddings, where we learn a “position vector” at each time step.

batch_size, in_sequence_len = src_items.size(0), src_items.size(1)
pos_encoder = (
    torch.arange(0, in_sequence_len, device=src_items.device)
    .repeat(batch_size, 1)
pos_encoder = self.input_pos_embedding(pos_encoder)

src_items += pos_encoder

Finally, at each time step, the model outputs prediction scores for each possible option from the pool of 62423 movies. We use those scores to optimize the categorical cross-entropy loss.

Training logs

Model Usage Examples

Now we will use our trained model to make recommendations based on three scenarios:

Scenario 1: Adventure/Fantasy

User History ->

  • Harry Potter and the Sorcerer’s Stone (a.k.a. Harry Potter and the Philosopher’s Stone) (2001)

  • Harry Potter and the Chamber of Secrets (2002)

  • Harry Potter and the Prisoner of Azkaban (2004)

  • Harry Potter and the Goblet of Fire (2005)

Model recommendation ->

['Ice Age (2002)',
 "Pirates of the Caribbean: Dead Man's Chest (2006)",
 'Avatar (2009)',
 'Star Wars: Episode III - Revenge of the Sith (2005)',
 'Shrek 2 (2004)',
 'Ratatouille (2007)',
 'Bruce Almighty (2003)',
 'I, Robot (2004)',
 'Last Samurai, The (2003)',
 'Up (2009)',
 'Matrix Revolutions, The (2003)',
 'Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (2002)',
 'Iron Man (2008)',
 'Spirited Away (Sen to Chihiro no kamikakushi) (2001)',
 '300 (2007)',
 'Big Fish (2003)',
 "Bridget Jones's Diary (2001)",
 'My Big Fat Greek Wedding (2002)',
 'Pianist, The (2002)',
 'Interstellar (2014)',
 'Shaun of the Dead (2004)',
 'Moulin Rouge (2001)',
 'Juno (2007)',
 'WALL·E (2008)',
 'Casino Royale (2006)',
 'School of Rock (2003)',
 '40-Year-Old Virgin, The (2005)',
 'Harry Potter and the Order of the Phoenix (2007)',
 'Bourne Supremacy, The (2004)',
 'Miss Congeniality (2000)']

We can see that the model makes some interesting recommendations in Adventure/Fantasy genre. Note that the model does not have access to the genre of movies.

Scenario 2: Action/Adventure

User History ->

  • Black Panther (2017)

  • Avengers, The (2012)

  • Avengers: Infinity War — Part I (2018)

  • Logan (2017)

  • Spider-Man (2002)

  • Spider-Man 3 (2007)

  • Spider-Man: Far from Home (2019)

Model recommendation ->

['Avengers: Infinity War - Part II (2019)',
 'Deadpool 2 (2018)',
 'Thor: Ragnarok (2017)',
 'Spider-Man: Into the Spider-Verse (2018)',
 'Captain Marvel (2018)',
 'Incredibles 2 (2018)',
 'Untitled Spider-Man Reboot (2017)',
 'Ant-Man and the Wasp (2018)',
 'Guardians of the Galaxy 2 (2017)',
 'Iron Man 2 (2010)',
 'Thor (2011)',
 'Guardians of the Galaxy (2014)',
 'Captain America: The First Avenger (2011)',
 'X-Men Origins: Wolverine (2009)',
 "Ocean's 8 (2018)",
 'Wonder Woman (2017)',
 'Iron Man 3 (2013)',
 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)',
 'Amazing Spider-Man, The (2012)',
 'Aquaman (2018)',
 'Dark Knight, The (2008)',
 'Mission: Impossible - Fallout (2018)',
 'Avengers: Age of Ultron (2015)',
 'Jurassic World: Fallen Kingdom (2018)',
 'Iron Man (2008)',
 'Coco (2017)',
 'Lord of the Rings: The Two Towers, The (2002)',
 'Rogue One: A Star Wars Story (2016)',
 'X-Men: The Last Stand (2006)',
 'Venom (2018)']

The recommendations are spot on! Most of them are from the Marvel universe, just like the user’s history.

Scenario 3: Comedy

User History ->

  • Zootopia (2016)

  • Toy Story 3 (2010)

  • Toy Story 4 (2019)

  • Finding Nemo (2003)

  • Ratatouille (2007)

  • The Lego Movie (2014)

  • Ghostbusters (a.k.a. Ghost Busters) (1984)

  • Ace Ventura: When Nature Calls (1995)

Model recommendation ->

['Home Alone (1990)',
 "Bug's Life, A (1998)",
 'Toy Story 2 (1999)',
 'Nightmare Before Christmas, The (1993)',
 'Babe (1995)',
 'Inside Out (2015)',
 'Mask, The (1994)',
 'Toy Story (1995)',
 'Back to the Future (1985)',
 'Back to the Future Part II (1989)',
 'Simpsons Movie, The (2007)',
 'Forrest Gump (1994)',
 'Austin Powers: International Man of Mystery (1997)',
 'Monty Python and the Holy Grail (1975)',
 'Cars (2006)',
 'Kung Fu Panda (2008)',
 'Groundhog Day (1993)',
 'American Pie (1999)',
 'Men in Black (a.k.a. MIB) (1997)',
 'Dumb & Dumber (Dumb and Dumber) (1994)',
 'Back to the Future Part III (1990)',
 'Big Hero 6 (2014)',
 'Mrs. Doubtfire (1993)',
 'Clueless (1995)',
 'Bruce Almighty (2003)',
 'Corpse Bride (2005)',
 'Deadpool (2016)',
 'Up (2009)',
 "Ferris Bueller's Day Off (1986)"]

In this case, the model was able to suggest some great movies, like Toy Story 1 or Home Alone, which are in line with the theme of the user’s history.


In this project, we built a powerful movie recommendation system called BERT4Rec. It is a model based on transformer layers and is trained using a very similar scheme to BERT, where we mask some elements of a user’s movie history sequence and then try to predict the true value of those items.

Source: Towards Data Science - Youness Mansar

The Tech Platform

1 commentaire

08 mars

Wow cool! I would implement this not for movies, but for gaming slots, I write reviews on them something like this, and it would be cool to implement such functionality for users , where it will be possible, based on his interests, to potentially issue a slot that he might like.

bottom of page