logistic regression model

Core: linear regression + sigmoid mapping.

I. Overview

Logistic Regression (LR), by its name, seems to be a model specialized in solving regression problems, in fact, the model is more used to solve classification problems, especially binary classification problems. This is not a contradiction, because the direct output of logistic regression is a continuous value, which we slice according to the size of the value, less than a certain range as a category, and more than a certain range as a category, which realizes the solution to the classification problem. To summarize, the data is first fitted with linear regression, the output value is mapped with a Sigmoid function, mapped between 0 and 1, and finally the S-curve is sliced into the upper and lower two intervals as the basis for category differentiation.

II. Principles of the algorithm

The core of the algorithm is linear regression + sigmoid mapping. Specifically, for a sample to be tested, with specified weights and biases, an output value is calculated, which is then further calculated by sigmoid and mapped to between 0 and 1, with those greater than 0.5 as positive classes and those less than 0.5 as negative classes. The schematic representation of the model can be summarized as

The expression for linear regression can be expressed as\(z=w\cdot x+b\), the sigmoid function expression is expressed as\(y=\frac{1}{1+e^{-z}}\), then the expression for the logistic regression model is\(y=\frac{1}{1+e^{-(w\cdot x+b)}}\)。
The classification algorithm for logistic regression can be expressed as

\[\left\{ \begin{aligned} &-1, \frac{1}{1+e^{-(w\cdot x+b)}}<0.5\\ &1, \frac{1}{1+e^{-(w\cdot x+b)}}\geq0.5 \end{aligned} \right. \]

The logistic regression model is trained using a cross-entropy loss function, and during the optimization process, the best parameter values are calculated with the following expression

\[J\left( \theta \right)=-\frac{1}{m} \sum_{i=1}^{m}\left[ {y^ilog(h(x^i))} +(1-y^i)log(1-h(x^i))\right] \]

III. Python implementation

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
## 1.Defining the data set
train_x = [
    [4.8,3,1.4,0.3],
    [5.1,3.8,1.6,0.2],
    [4.6,3.2,1.4,0.2],
    [5.3,3.7,1.5,0.2],
    [5,3.3,1.4,0.2],
    [7,3.2,4.7,1.4],
    [6.4,3.2,4.5,1.5],
    [6.9,3.1,4.9,1.5],
    [5.5,2.3,4,1.3],
    [6.5,2.8,4.6,1.5]
]

# Training Data Labeling
train_y = [
    'A',
    'A',
    'A',
    'A',
    'A',
    'B',
    'B',
    'B',
    'B',
    'B'
]


# Test Data
test_x = [
    [3.1,3.5,1.4,0.2],
    [4.9,3,1.4,0.2],
    [5.1,2.5,3,1.1],
    [6.2,3.6,3.4,1.3]
]

# Test Data标签
test_y = [
    'A',
    'A',
    'B',
    'B'
]

train_x = (train_x)
train_y = (train_y)
test_x = (test_x)
test_y = (test_y)

## 2.model training
clf_lr = LogisticRegression()
rclf_lr = clf_lr.fit(train_x, train_y)

## 3.data computation
pre_y = rclf_lr.predict(test_x)
accuracy = metrics.accuracy_score(test_y,pre_y)

print('The predicted results are：',pre_y)
print('The accuracy is：',accuracy)

End.

pdf download