Location>code7788 >text

Learning Machine Learning from Scratch - Understanding Regression

Popularity:948 ℃/2024-09-25 09:15:48

First of all, I'd like to introduce you to a very useful study address:/columns

regression (statistics)

Today we will delve into the concept of regression analysis. Regression analysis is an important method in statistics and is usually categorized into two types: linear regression and logistic regression. They are used for different data models and analysis needs respectively. In order to understand their role more intuitively, let's start with a chart to get a feel for their application scenarios and effects:

Linear regression is a statistical method that predicts the value of unknown data by utilizing known relevant data. It reveals the linear relationship between unknown (or dependent) and known (or independent) variables by mathematically modeling the relationship between them as a linear equation.

image

Logistic regression is a commonly used data analysis technique that aims to reveal the relationship between two data factors through mathematical modeling. Through this relationship, logistic regression can predict the likelihood of one factor based on the value of the other. Often, the predictions are limited, such as binary categorization (yes or no).

image

data visualization

In the field of machine learning, data visualization is crucial for a deeper understanding of the distribution of the data, the relationship between features, and the performance of the model. matplotlib, as a powerful plotting library, provides a wealth of plotting features that make it easy to create a variety of types of charts, including, but not limited to, line graphs, scatter plots, histograms, and so on. These charts not only visualize the data, but also help analysts gain quick insights into the structure and trends of the data.

Data visualization also serves an important presentation role, helping scholars understand data more easily through intuitive charts and graphs that enhance the readability and communication of information.

import  as plt
import numpy as np
from sklearn import datasets, linear_model, model_selection
X, y = datasets.load_diabetes(return_X_y=True)
X = X[:, , 2]
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
model = linear_model.LinearRegression()
(X_train, y_train)
y_pred = (X_test)
(X_test, y_test,  color='black')
(X_test, y_pred, color='blue', linewidth=3)
()

Like some libraries in Python, the Scikit-learn library comes with a set of built-in datasets. You need to import the dataset library before you can access the specific dataset of your choice. In the example above, the dataset is imported. The dataset contains data from people with diabetes and contains certain characteristics such as their bmi (body mass index), age, blood pressure, and glucose levels

The key code is parsed:

  • Use model_selection.train_test_split() to split the dataset into a training set (X_train, y_train) and a test set (X_test, y_test), where the test set has a 33% share
  • The linear regression model is trained using the training set data (X_train, y_train) i.e. (X_train, y_train).
  • Predict the test set (X_test) using the trained model and get the prediction y_pred.
  • Use matplotlib to plot a scatterplot (()) representing the test set data points.
  • Use () to plot the prediction result curve of the regression model on the test set, along with the fitted straight line indicated by the blue line.
  • Finally the graph is displayed via ().

As an example, the data presented through the visualization charts is more intuitive and easy to understand.

image

summarize

In this paper, we explore the importance and application of regression analysis in statistics and data analysis. Linear regression and logistic regression, as the two main regression analysis methods, are applicable to different types of data modeling and prediction needs, respectively. Through mathematical modeling, they are able to reveal the relationships between variables and demonstrate strong predictive power in practical applications.

Data visualization plays a key role in gaining a deeper understanding of data characteristics and model performance. Through charts such as line graphs, scatter plots, and histograms, we are able to visualize data distributions and trends, helping analysts gain quick insights into the structure and patterns of data. Especially in machine learning, these visualization techniques not only improve the efficiency of data analysis, but also enhance the communication and understanding of information.

Through the study of this paper, we not only gain an in-depth understanding of the theoretical basis and practical operation of regression analysis, but also show how to use the relevant libraries in Python for data modeling and visual analysis through examples.


I'm Rain, a Java server-side coder, studying the mysteries of AI technology. I love technical communication and sharing, and I am passionate about open source community. I am also a Tencent Cloud Creative Star, Ali Cloud Expert Blogger, Huawei Cloud Enjoyment Expert, and Nuggets Excellent Author.

💡 I won't be shy about sharing my personal explorations and experiences on the path of technology, in the hope that I can bring some inspiration and help to your learning and growth.

🌟 Welcome to the effortless drizzle! 🌟