1. Introduction: The magic and practical significance of the recommendation system
Behind Netflix's annual savings of $1 billion in content procurement costs, in the recommendation algorithm that YouTube accounts for 80% of users' viewing time, the recommendation system is quietly changing the content consumption model. This article will take you to build a movie recommendation system with user portrait display from scratch, capture user preferences through collaborative filtering algorithms, and use the Flask framework to achieve visual interaction. After the project is completed, you will understand the core principles of the recommendation system and master the entire process from data preprocessing to web deployment.
2. Technology stack analysis and project architecture
- Core algorithm layer: Surprise library implements SVD matrix decomposition;
- Data processing layer: Pandas performs data cleaning and feature engineering;
- Interactive display layer: Flask framework builds RESTful API and front-end templates;
- Data Source: MovieLens 100k dataset (including 100,000 ratings for 943 users × 1682 movies).
3. Environment preparation and dataset loading
# Install dependencies (execute in terminal)
!pip install surprise pandas flask scikit-surprise
# Data loading script
import pandas as pd
from surprise import Dataset, Reader
# Load the rating data
ratings = pd.read_csv('ml-100k/',
sep='\t',
names=['user_id', 'item_id', 'rating', 'timestamp'])
# Define Surprise data format
reader = Reader(rating_scale=(1,5))
data = Dataset.load_from_df(ratings[['user_id', 'item_id', 'rating']], reader)
4. Collaborative filtering core: SVD matrix decomposition implementation
4.1 Brief analysis of algorithm principles
SVD (singular value decomposition) decomposes the user-item rating matrix into:
Copy the code
R ≈ P * Σ * Q^T
in:
- P: User latent feature matrix
- Q: Item potential feature matrix
- Σ: Singular value diagonal matrix
The decomposed matrix predicts the missing score to achieve recommendations.
4.2 Surprise implementation code
from surprise import SVD, accuracy
from surprise.model_selection import train_test_split
# Divide training set/test set
trainset, testset = train_test_split(data, test_size=0.25)
# Initialize the SVD model
model = SVD(n_factors=100, # Number of potential factors
n_epochs=20, # Number of iterations
lr_all=0.005, # Learning rate
reg_all=0.02) # Regularization coefficient
# Train the model
(trainset)
# Evaluate the model
predictions = (testset)
(predictions) # Output RMSE evaluation indicators
5. User portrait construction and similarity calculation
5.1 User feature extraction
def get_user_features(user_id):
# Obtain user rating history
user_ratings = ratings[ratings['user_id'] == user_id]
# Calculate the scoring distribution characteristics
avg_rating = user_ratings['rating'].mean()
rating_counts = user_ratings['rating'].value_counts().sort_index()
# Get the potential vector of users
user_vector = [user_id-1] # Surprise uses 0-based index internally
return {
'avg_rating': avg_rating,
'rating_distribution': rating_counts.to_dict(),
'latent_factors': user_vector
}
5.2 User similarity calculation
from surprise.prediction_algorithms.matrix_factorization import SVD
def find_similar_users(target_user, n=5):
# Get all user potential vectors
users =
# Calculate cosine similarity
similarities = []
for user in users:
sim = cosine_similarity(users[target_user-1], user)
((sim, user))
# Return the most similar n users
return sorted(similarities, reverse=True, key=lambda x: x[0])[:n]
6. Flask recommendation service implementation
6.1 Web service architecture design
/ -> Home page (user input interface)
/recommend/<user_id>-> Recommended result page
/user/<user_id> -> User portrait page
6.2 Core routing implementation
from flask import Flask, render_template, request
app = Flask(__name__)
@('/')
def index():
return render_template('')
@('/recommend/<int:user_id>')
def recommendation(user_id):
# Generate recommendations (Top-N recommendations)
user_items = ratings[ratings['user_id'] == user_id]['item_id'].unique()
all_items = ratings['item_id'].unique()
predictions = []
for item in all_items:
if item not in user_items:
pred = (str(user_id), str(item))
((item, ))
# Sort by prediction score
recommendations = sorted(predictions, key=lambda x: x[1], reverse=True)[:10]
# Get movie metadata
movies = pd.read_csv('ml-100k/',
sep='|',
encoding='latin-1',
usecols=['movie id', 'movie title', 'release date', 'genres'])
# Merge recommendation results and movie information
recommended_movies = []
for item_id, score in recommendations:
movie = movies[movies['movie id'] == item_id].iloc[0]
recommended_movies.append({
'title': movie['movie title'],
'year': movie['release date'],
'genres': movie['genres'].split('|'),
'score': round(score, 2)
})
return render_template('',
movies=recommended_movies,
user_id=user_id)
@('/user/<int:user_id>')
def user_profile(user_id):
# Obtain user portrait data
profile = get_user_features(user_id)
# Get similar users
similar_users = find_similar_users(user_id)
return render_template('',
profile=profile,
Similar_users=similar_users)
if __name__ == '__main__':
(debug=True)
7. Front-end template design (Jinja2 example)
7.1 User Portrait Template ()
<div class="profile-card">
<h2>User portrait: User {{ user_id }}</h2>
<p>Average rating: {{ profile.avg_rating | round(2) }}</p>
<div class="rating-distribution">
{% for rating, count in profile.rating_distribution.items() %}
<div class="rating-bar">
<span class="rating-label">★{{ rating }}</span>
<div class="bar-container">
<div class="bar" style="width: {{ (count / total_ratings) * 100 }}%"></div>
</div>
<span class="count">{{ count }}</span>
</div>
{% endfor %}
</div>
<h3>Similar users:</h3>
<ul class="similar-users">
{% for sim, user in similar_users %}
<li>User {{ user + 1 }} (Similarity: {{ sim | round(3) }})</li>
{% endfor %}
</ul>
</div>
7.2 Recommended result template ()
<div class="recommendations">
<h2>Recommended for you (User {{ user_id }})</h2>
{% for movie in movies %}
<div class="movie-card">
<h3>{{ }} ({{ }})</h3>
<p>Type: {% for genre in %}<span class="genre">{{ genre }}</span>{% endfor %}</p>
<div class="score">Predictive rating: ★{{ }}</div>
</div>
{% endfor %}
</div>
8. System optimization direction
- Cold start problem: Integrated content filtering (using movie metadata)
- Real-time update: Add incremental training module
- Deep Learning Extensions: Try Neural Collaborative Filtering
- Performance optimization: Use Faiss to implement approximate nearest neighbor search
- Visualization enhancement: Add a rating distribution heat map, user-item relationship map
9. Complete project deployment guide
-
Download the MovieLens dataset:/datasets/movielens/
-
Create a project directory structure:
movie_rec_system/ ├── ├── templates/ │ ├── │ ├── │ └── ├── static/ │ ├── css/ │ └── js/ └── ml-100k/ ├── ├── └── ...
-
Start the service:
python
-
access:http://localhost:5000/
10. Conclusion: Future prospects of the recommendation system
With the success of Transformer architecture in the field of natural language processing, recommendation systems are undergoing a paradigm shift from collaborative filtering to sequence modeling. Future work can model user behavior sequences into time series, use Transformer to capture long-term interests, and at the same time build a more comprehensive user portrait with multimodal data (such as poster images and plot introduction).
Note: Production-level functions such as exception handling and logging should be added during actual deployment.
Through this project, you not only master the core technology of the recommendation system, but also complete the complete engineering practice from algorithm implementation to web services. This full-stack capability is the key competitiveness in building intelligent applications.