Building a personalized movie recommendation system based on Surprise and Flask: From algorithm to full-stack implementation

1. Introduction: The magic and practical significance of the recommendation system

Behind Netflix's annual savings of $1 billion in content procurement costs, in the recommendation algorithm that YouTube accounts for 80% of users' viewing time, the recommendation system is quietly changing the content consumption model. This article will take you to build a movie recommendation system with user portrait display from scratch, capture user preferences through collaborative filtering algorithms, and use the Flask framework to achieve visual interaction. After the project is completed, you will understand the core principles of the recommendation system and master the entire process from data preprocessing to web deployment.

2. Technology stack analysis and project architecture

Core algorithm layer: Surprise library implements SVD matrix decomposition;
Data processing layer: Pandas performs data cleaning and feature engineering;
Interactive display layer: Flask framework builds RESTful API and front-end templates;
Data Source: MovieLens 100k dataset (including 100,000 ratings for 943 users × 1682 movies).

3. Environment preparation and dataset loading

# Install dependencies (execute in terminal)
 !pip install surprise pandas flask scikit-surprise
 
 # Data loading script
 import pandas as pd
 from surprise import Dataset, Reader
 
 # Load the rating data
 ratings = pd.read_csv('ml-100k/',
                      sep='\t',
                      names=['user_id', 'item_id', 'rating', 'timestamp'])
 
 # Define Surprise data format
 reader = Reader(rating_scale=(1,5))
 data = Dataset.load_from_df(ratings[['user_id', 'item_id', 'rating']], reader)

4. Collaborative filtering core: SVD matrix decomposition implementation

4.1 Brief analysis of algorithm principles

SVD (singular value decomposition) decomposes the user-item rating matrix into:

Copy the code

 R ≈ P * Σ * Q^T

in:

P: User latent feature matrix
Q: Item potential feature matrix
Σ: Singular value diagonal matrix

The decomposed matrix predicts the missing score to achieve recommendations.

4.2 Surprise implementation code

from surprise import SVD, accuracy
 from surprise.model_selection import train_test_split
 
 # Divide training set/test set
 trainset, testset = train_test_split(data, test_size=0.25)
 
 # Initialize the SVD model
 model = SVD(n_factors=100, # Number of potential factors
            n_epochs=20, # Number of iterations
            lr_all=0.005, # Learning rate
            reg_all=0.02) # Regularization coefficient
 
 # Train the model
 (trainset)
 
 # Evaluate the model
 predictions = (testset)
 (predictions) # Output RMSE evaluation indicators

5. User portrait construction and similarity calculation

5.1 User feature extraction

def get_user_features(user_id):
     # Obtain user rating history
     user_ratings = ratings[ratings['user_id'] == user_id]
    
     # Calculate the scoring distribution characteristics
     avg_rating = user_ratings['rating'].mean()
     rating_counts = user_ratings['rating'].value_counts().sort_index()
    
     # Get the potential vector of users
     user_vector = [user_id-1] # Surprise uses 0-based index internally
    
     return {
         'avg_rating': avg_rating,
         'rating_distribution': rating_counts.to_dict(),
         'latent_factors': user_vector
     }

5.2 User similarity calculation

from surprise.prediction_algorithms.matrix_factorization import SVD
 
 def find_similar_users(target_user, n=5):
     # Get all user potential vectors
     users =
    
     # Calculate cosine similarity
     similarities = []
     for user in users:
         sim = cosine_similarity(users[target_user-1], user)
         ((sim, user))
    
     # Return the most similar n users
     return sorted(similarities, reverse=True, key=lambda x: x[0])[:n]

6. Flask recommendation service implementation

6.1 Web service architecture design

/ -> Home page (user input interface)
 /recommend/<user_id>-> Recommended result page
 /user/<user_id> -> User portrait page

6.2 Core routing implementation

from flask import Flask, render_template, request
 
 app = Flask(__name__)
 
 @('/')
 def index():
     return render_template('')
 
 @('/recommend/<int:user_id>')
 def recommendation(user_id):
     # Generate recommendations (Top-N recommendations)
     user_items = ratings[ratings['user_id'] == user_id]['item_id'].unique()
     all_items = ratings['item_id'].unique()
    
     predictions = []
     for item in all_items:
         if item not in user_items:
             pred = (str(user_id), str(item))
             ((item, ))
    
     # Sort by prediction score
     recommendations = sorted(predictions, key=lambda x: x[1], reverse=True)[:10]
    
     # Get movie metadata
     movies = pd.read_csv('ml-100k/',
                         sep='|',
                         encoding='latin-1',
                         usecols=['movie id', 'movie title', 'release date', 'genres'])
    
     # Merge recommendation results and movie information
     recommended_movies = []
     for item_id, score in recommendations:
         movie = movies[movies['movie id'] == item_id].iloc[0]
         recommended_movies.append({
             'title': movie['movie title'],
             'year': movie['release date'],
             'genres': movie['genres'].split('|'),
             'score': round(score, 2)
         })
    
     return render_template('',
                          movies=recommended_movies,
                          user_id=user_id)
 
 @('/user/<int:user_id>')
 def user_profile(user_id):
     # Obtain user portrait data
     profile = get_user_features(user_id)
    
     # Get similar users
     similar_users = find_similar_users(user_id)
    
     return render_template('',
                          profile=profile,
                          Similar_users=similar_users)
 
 if __name__ == '__main__':
     (debug=True)

7. Front-end template design (Jinja2 example)

7.1 User Portrait Template ()

<div class="profile-card">
   <h2>User portrait: User {{ user_id }}</h2>
   <p>Average rating: {{ profile.avg_rating | round(2) }}</p>
   <div class="rating-distribution">
     {% for rating, count in profile.rating_distribution.items() %}
       <div class="rating-bar">
         <span class="rating-label">★{{ rating }}</span>
         <div class="bar-container">
           <div class="bar" style="width: {{ (count / total_ratings) * 100 }}%"></div>
         </div>
         <span class="count">{{ count }}</span>
       </div>
     {% endfor %}
   </div>
  
   <h3>Similar users:</h3>
   <ul class="similar-users">
     {% for sim, user in similar_users %}
       <li>User {{ user + 1 }} (Similarity: {{ sim | round(3) }})</li>
     {% endfor %}
   </ul>
 </div>

7.2 Recommended result template ()

<div class="recommendations">
   <h2>Recommended for you (User {{ user_id }})</h2>
   {% for movie in movies %}
     <div class="movie-card">
       <h3>{{ }} ({{ }})</h3>
       <p>Type: {% for genre in %}<span class="genre">{{ genre }}</span>{% endfor %}</p>
       <div class="score">Predictive rating: ★{{ }}</div>
     </div>
   {% endfor %}
 </div>

8. System optimization direction

Cold start problem: Integrated content filtering (using movie metadata)
Real-time update: Add incremental training module
Deep Learning Extensions: Try Neural Collaborative Filtering
Performance optimization: Use Faiss to implement approximate nearest neighbor search
Visualization enhancement: Add a rating distribution heat map, user-item relationship map

9. Complete project deployment guide

Download the MovieLens dataset:/datasets/movielens/

Create a project directory structure:

movie_rec_system/
├── 
├── templates/
│   ├── 
│   ├── 
│   └── 
├── static/
│   ├── css/
│   └── js/
└── ml-100k/
    ├── 
    ├── 
    └── ...

Start the service:python
access:http://localhost:5000/

10. Conclusion: Future prospects of the recommendation system

With the success of Transformer architecture in the field of natural language processing, recommendation systems are undergoing a paradigm shift from collaborative filtering to sequence modeling. Future work can model user behavior sequences into time series, use Transformer to capture long-term interests, and at the same time build a more comprehensive user portrait with multimodal data (such as poster images and plot introduction).

Note: Production-level functions such as exception handling and logging should be added during actual deployment.

Through this project, you not only master the core technology of the recommendation system, but also complete the complete engineering practice from algorithm implementation to web services. This full-stack capability is the key competitiveness in building intelligent applications.