Gaussina Blur and SIFT algorithms in image processing

Gaussina Blur

Mathematical definition of Gaussian fuzzy

Gaussian blur is throughGaussian KernelThe two-dimensional Gaussian function is defined as

\[G(x, y, \sigma) = \frac{1}{2\pi \sigma^2} e^{-\frac{x^2 + y^2}{2\sigma^2}} \]

in:

$(x, y)$It is the coordinates of the pixel
$\sigma$It is the standard deviation of the Gaussian kernel, controlling the degree of fuzziness$\sigma$The bigger the image is, the blurry

Python implementation of Gaussian fuzzy computing

The following is the code to calculate Gaussian fuzziness at different scales using OpenCV

import cv2
 import numpy as np
 import as plt

 # Read the image
 img = ('', cv2.IMREAD_COLOR_RGB)

 # Define the scale (σ value) of the Gaussian kernel
 sigma_values = [1.0, 1.6, 2.0, 2.5, 3.0] # Example σ value

 # Gaussian fuzzy for each σ value
 blurred_images = []
 for sigma in sigma_values:
     # Gaussian kernel size (usually calculated automatically based on σ, such as ksize=(0,0))
     blurred = (img, (0, 0), sigmaX=sigma, sigmaY=sigma)
     blurred_images.append(blurred)

 # Show results
 (figsize=(15, 8))
 # Original image
 (2, 3, 1)
 (img)
 ('original')
 ('off')
 # Fuzzy processed pictures
 for i, (sigma, blurred) in enumerate(zip(sigma_values, blurred_images)):
     (2, 3, i+2)
     (blurred, cmap='gray')
     (f'σ={sigma}')
     ('off')
 plt.tight_layout()
 ()

(img, ksize, sigmaX)
- ksize=(0,0)When OpenCV will$\sigma$Automatically calculate the kernel size, usually$6\sigma + 1$
- sigmaXandsigmaYis the standard deviation of the Gaussian kernel in the X and Y directions, usually set to the same value

Gaussian kernel matrix

In actual calculations, the Gaussian kernel needs to be discrete into a two-dimensional matrix. For example, when$\sigma = 1.0$When a 3×3 Gaussian core may be as follows

\[K = \frac{1}{16} \begin{bmatrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \\ \end{bmatrix} \]

Example of manually calculating Gaussian kernels

import cv2
 import numpy as np

 # Generate a two-dimensional Gaussian core
 def gaussian_kernel(size, sigma):
     kernel = ((size, size))
     # // is an integer division operator, which will round the result down to the closest integer
     center = size // 2
     for x in range(size):
         for y in range(size):
             dx, dy = x - center, y - center
             kernel[x, y] = (-(dx**2 + dy**2) / (2 * sigma**2))
     kernel /= () # Normalization
     Return kernel

 # Generate a 5x5 Gaussian core with σ=1.5
 kernel = gaussian_kernel(5, 1.5)
 print(kernel)

The values of each element in the matrix form a three-dimensional Gaussian surface with the highest center point and lowered in a bell shape.

SIFT (Scale-Invariant Feature Transform) algorithm

The SIFT (Scale-Invariant Feature Transform) algorithm is a local feature extraction method used in image processing, with scale, rotation and illumination invariance, because its result stability and high accuracy are widely used in image matching. The disadvantage of SIFT is that its computational complexity is high, and in some scenarios that require real-time processing are replaced by fast algorithms such as SURF, ORB, etc. In COLMAP, the extraction feature quantity and matching are based on the SIFT algorithm.

1. Scale-Space Extreme Detection

Purpose: Find key points (potential feature points) in multi-scale space

Building the Gaussian pyramid
- Perform Gaussian blurring at different scales on the image (through the Gaussian convolution kernel $G(x,y,\sigma) $), and generate multiple sets of (Octave) images. Each group contains multiple layers (Interval), and the scale is incremented by $k\sigma $ (such as $\sigma, k\sigma, k^2\sigma $).
  - Gaussian blur of different scales is done by applying different standard deviations to images$\sigma$Gaussian kernel is obtained by convolution calculation
- The next set of images is obtained by downsampling of the previous set (such as halving the size).
Building a Gaussian Differential Pyramid (DoG)
- Subtract the Gaussian images of adjacent scales in the same group and obtain $ D(x,y,\sigma) = L(x,y,k\sigma) - L(x,y,\sigma) $
- DoG is used to approximate Laplace operators (LoG), and is more efficient.
Detect extreme points
- Compare each pixel with 8 pixels adjacent to the same layer and 18 pixels in the upper and lower adjacent layers (26 in total) to determine whether it is a local maximum/minimum value.

2. Keypoint Localization

Purpose: Accurately locate key points, remove low contrast or edge response points

Taylor expands precise positioning
- Fit DoG function through Taylor expansion to find the subpixel position of the extreme point (offset $ \hat{x} $)
- If $ |D(\hat{x})| $ is less than the threshold (such as 0.03), it is considered as a low contrast point and is removed
Edge response culling
- Use the Hessian matrix to calculate the curvature and eliminate points with strong edge response (points with large main curvature ratio)
- If $ \frac{\text{Tr}(H)^2}{\text{Det}(H)} > \frac{(r+1)^2}{r} $(usually $ r=10 $), then it is removed

3. Orientation Assignment

Purpose: Normalize the direction of key points to achieve rotation invariance

Calculate gradient amplitude and direction
- On the Gaussian-scale image where the key points are located, calculate the gradient of pixels in the neighborhood window:
  
  \[m(x,y) = \sqrt{(L(x+1,y)-L(x-1,y))^2 + (L(x,y+1)-L(x,y-1))^2} \]
  
  \[\theta(x,y) = \tan^{-1}\left( \frac{L(x,y+1)-L(x,y-1)}{L(x+1,y)-L(x-1,y)} \right) \]
Generate direction histogram
- Divide 360° into 36 columns (10° per column), and weight the gradient amplitude (weights are Gaussian window and gradient amplitude).
- Take the main peak (highest peak) and the secondary peak of more than 80% of the main peak as the key point direction.

4. Key point descriptor (Descriptor Generation)

Purpose: After the direction normalization process, the eigenvectors of key points are generated by dividing the molecular blocks.

Rotating coordinate axis: Rotate the neighborhood window to the main direction of the key point.
Dividing regions: Divide the 16×16 window into 4×4 sub-blocks (16 blocks in total).
Calculate sub-block gradient histogram:
- Calculate the gradient histogram of 8 directions in each subblock (8 dimensions in total).
- 16 sub-blocks × 8 directions = 128-dimensional eigenvector.
Normalization processing:
- Normalize the eigenvectors, reduce the illumination effect, and truncate values greater than 0.2 to enhance robustness.

5. Key point matching

Comparison of SIFT descriptors of two images by Euclidean distance (such as nearest neighbor algorithm)
Use the nearest neighbor distance ratio (NNDR, such as $\frac{d_1}{d_2} < 0.8 $) to filter matching points to improve matching accuracy.

Using SIFT algorithm through OpenCV in Python code

Extract key points

import cv2
 import as plt

 # Read the image (convert to grayscale)
 img = ('', cv2.IMREAD_GRAYSCALE)

 # Initialize the SIFT detector
 sift = cv2.SIFT_create()

 # Detect key points and calculate descriptors
 keypoints, descriptors = (img, None)

 # Draw key points
 img_with_keypoints = (
     img,
     keypoints,
     None,
     flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
 )

 # Show results
 (figsize=(10, 6))
 (img_with_keypoints, cmap='gray')
 ('SIFT Keypoints')
 ('off')
 ()

Double-picture key points matching

import cv2
 import as plt

 # Read two images
 img1 = ('', cv2.IMREAD_GRAYSCALE)
 img2 = ('', cv2.IMREAD_GRAYSCALE)

 # Initialize SIFT
 sift = cv2.SIFT_create()

 # Calculate key points and descriptors
 kp1, desc1 = (img1, None)
 kp2, desc2 = (img2, None)

 # Use BFMatcher (Brute-Force Matcher)
 bf = (cv2.NORM_L2, crossCheck=True)
 matches = (desc1, desc2)

 # Sort by distance and get the best match
 matches = sorted(matches, key=lambda x: )

 # Draw the first 50 matching points
 matched_img = (
     img1, kp1,
     img2, kp2,
     matches[:10],
     None,
     flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
 )

 # Show matching results
 (figsize=(15, 8))
 (matched_img, cmap='gray')
 ('SIFT Feature Matching')
 ('off')
 ()