In the world of machine learning, classification problems are everywhere, andMulti-categorized questionsIt is a common challenge.
Whether it is identifying handwritten numbers, classifying news topics, or predicting product categories purchased by customers, multi-categorized questions play an important role.
With its simplicity and efficiency, the linear model has become one of the powerful tools to deal with multi-classification problems.
This article will discussLinear ModelsolveMulti-categorized questionsThe principles, strategies, advantages and disadvantages of the , and demonstrate their implementation through code examples.
1. Overview of principles
Linear ModelThe core idea is to fit data through linear equations to achieve classification or prediction of data.
existMulti-categorized questionsIn the linear model, the goal is to divide the data into categories.
Specifically, the linear model will build a linear decision boundary for each category, and determine which category the data points belong to by calculating the distance or positional relationship between the data points and these boundaries.
Simply put, it is toMulti-categorized questionsDecompose into a seriesBiclassification questions, or directly handle multi-categories by adjusting the output layer of the model.
Although this classification method based on linear relationships is simple, it can achieve good results in many practical problems.
2. Common strategies
The common strategies for using linear models to solve multi-classification problems are mainly as follows three categories.
2.1. One-to-many strategy
One-to-many strategy is used to solve multi-classification problems by splitting the problem into multiple binary classification problems.
forK
A category of tasks will be createdK
indivualBinary classifier, each specializes in distinguishing between one category and all other categories.
For example, in three categories(A、B、C)
In the case of , three binary classifiers are established: respectively used to distinguish between A and non-A, B and non-B, C and non-C.
During prediction, data points are processed through all classifiers and the category with the highest probability is selected as the final result.
from import load_iris
from sklearn.linear_model import LinearRegression
from import OneVsRestClassifier
from sklearn.model_selection import train_test_split
from import StandardScaler
from import accuracy_score
# Load the dataset
iris = load_iris()
X =
y =
# Data standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Divide the training set and the test set
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
# Use LinearRegression to implement one-to-many policy
model = OneVsRestClassifier(LinearRegression())
(X_train, y_train)
# predict
y_pred = (X_test)
# Output prediction results
print("Predict result:", y_pred)
accuracy = accuracy_score(y_test, y_pred)
print(f"Classification Accuracy: {accuracy:.2f}")
accuracy = accuracy_score(y_test, y_pred)
print(f"Classification Accuracy: {accuracy:.2f}")
## Output result
'''
Predictive results: [1 0 2 2 1 0 2 2 1 1 2 0 0 0 0 2 2 1 1 2 0 2 0 2 2 2 1 2 0 0 0 0 2 0 0 2 2
0 0 0 2 2 2 0 0]
Classification accuracy: 0.82
'''
In this example, first useLinearRegression
Do binary classification and passOneVsRestClassifier
Combining binary classifications to implement one-to-many strategy.
In this way, we can easily decompose multi-classification problems into multiple binary classification problems and solve them using linear models.
There are 3 categories in the iris data set in the example. Judging from the running results, the accuracy is82%
, there is still room for continued optimization.
2.2. Multi-output linear regression
Multi-output linear regressionIt is a method that directly treats multi-classification problems as multi-output regression problems.
In this method, each category is encoded as oneSingle heat vector(One-Hot Encoding
), that is, a category corresponds to a specific vector, and only one element in the vector is1
, the rest of the elements are0
。
For example, for a question that contains three categories,A
Can be encoded as[1, 0, 0]
,categoryB
Encoded as[0, 1, 0]
,categoryC
Encoded as[0, 0, 1]
。
The linear regression model then tries to learn the linear relationship between the input features and these one-hot vectors.
When making predictions, the model outputs a vector, and we can determine the category of data points by selecting the index corresponding to the maximum value in the vector.
from import load_iris
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from import StandardScaler, OneHotEncoder
from import accuracy_score
import numpy as np
# Load the dataset
iris = load_iris()
X =
y =
# Data standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Encode the category tags with one hot color
encoder = OneHotEncoder()
y_encoded = encoder.fit_transform((-1, 1))
# Divide the training set and the test set
X_train, X_test, y_train_encoded, y_test_encoded = train_test_split(X_scaled, y_encoded, test_size=0.3, random_state=42)
# Use LinearRegression to implement multi-output linear regression
model = LinearRegression()
(X_train, y_train_encoded.toarray())
# predict
y_pred_encoded = (X_test)
# Convert prediction results from single-hot encoding back to category tags
y_pred = (y_pred_encoded, axis=1)
# Output prediction results
print("Predict result:", y_pred)
accuracy = accuracy_score(y_test, y_pred)
print(f"Classification Accuracy: {accuracy:.2f}")
## Output result
'''
Predictive results: [1 0 2 2 1 0 2 2 1 1 2 0 0 0 0 2 2 1 1 2 0 2 0 2 2 2 1 2 0 0 0 0 2 0 0 2 2
0 0 0 2 2 2 0 0]
Classification accuracy: 0.82
'''
In this example, first useOneHotEncoder
Convert category tags to single hot encoding and useLinearRegression
The model is trained and predicted.
The prediction result is aHot codeVector, we passThe function selects the index corresponding to the maximum value in the vector to obtain the final category prediction result.
After running, the accuracy is also82%
,andOne-to-many strategyThe effect is the same.
2.3. Linear regression followed by Softmax function
Linear regression followed bySoftmax
Functions are a classic multi-classification method.
ItsCore ideaFirst, use a linear regression model to perform linear transformation of the input features to obtain a linear output vector.
Then, the linear output vector is converted into a probability distribution through the Softmax function.
Softmax
The output of the function is a probability vector, and each element in the vector represents the probability that the data point belongs to the corresponding category.
Ultimately, we select the category with the highest probability as the prediction result.
Softmax
The formula of the function is: $Softmax(z_i) = \frac{e{z_i}}{\sum_{j=1}{e^{z_i}}} $
where $z$ is the output vector of the linear regression model, $K$ is the total number of categories, $z_i\(is the \ in the vector)i $ elements.
Softmax
The function is to convert the linear output vector into a probability distribution so that the value of each element is0
arrive1
between, and the sum of all elements is1
。
from import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from import StandardScaler
from import accuracy_score
# Load the dataset
iris = load_iris()
X =
y =
# Data standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Divide the training set and the test set
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
# Use LogisticRegression to implement linear regression followed by Softmax function
model = LogisticRegression(solver='lbfgs')
(X_train, y_train)
# predict
y_pred = (X_test)
# Output prediction results
print("Predict result:", y_pred)
accuracy = accuracy_score(y_test, y_pred)
print(f"Classification Accuracy: {accuracy:.2f}")
## Output result
'''
Predictive results: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
Classification accuracy: 1.00
'''
In this example, theLogisticRegression
Model, inscikit-learn v1.5
Before the version, you need to set itmulti_class='multinomial'
Parameters to realize linear regression and follow-upSoftmax
function.
However, fromscikit-learn v1.5
The version starts and no longer needs to be setmulti_class
Parameters, automatic use of multi-classification problemsmultinomial
method.
parametersolver='lbfgs'
Specifies an optimizer suitable for handling multi-classification problems, which can be optimizedSoftmax
The loss function of the function.
In this way, we can directly utilize linear regression models andSoftmax
Functions to solve multi-classification problems.
Judging from the operation results, the accuracy of this method is much higher than that of the first two methods, and has achieved100%
。
3. Pros and cons analysis
Linear Modeldeal withMulti-categorized questionsThe advantages are:
- Simple and efficient: The linear model has a simple structure, low computational complexity, and fast training and prediction speed. The advantages of linear models are particularly evident when dealing with large-scale datasets.
- Easy to understand and explain: The decision-making process of linear models is based on linear relationships and is easy to understand and explain. We can intuitively view the coefficients of the model and understand the impact of each feature on the classification results.
- Strong scalability: Linear models can be optimized and extended through technologies such as feature engineering and regularization to adapt to different problems and data sets.
but,Multi-categorized questionsNotLinear ModelThe best area we have to notice it inMulti-categorized questionsLimitations on:
- Assuming linear relationship: The linear model assumes that there is a linear relationship between the data, which may not hold true in many practical problems. If the distribution of the data is nonlinear, the linear model may not capture the inherent laws of the data well, resulting in poor classification results.
- The importance of feature selection: The linear model has high requirements for the quality of input features and requires sufficient feature engineering to extract effective features. If feature selection is improper, it may cause model performance to decline.
- Category imbalance problem: In multi-classification problems, if the sample size of some categories is much smaller than others, the linear model may tend to be more biased towards the majority category, thus affecting the classification effect of a few categories.
4. Summary
This paper explores several strategies to solve multi-classification problems using linear models, including one-to-many, multi-output linear regression (theory introduction) and linear regression followed by Softmax function (implemented through logistic regression).
Each strategy has its own unique advantages and applicable scenarios.
In practical applications, appropriate strategies should be selected based on the characteristics of specific problems and data distribution.
Because of its simplicity and interpretability, linear models still have a place in multi-classification problems, but they also need to pay attention to their limitations and combine other technologies to improve model performance.