CSEC: City University of * Proposes SOTA Exposure Correction Algorithm

Images captured under poor lighting conditions may contain both overexposed and underexposed areas. Current approaches focus on adjusting image brightness, which may exacerbate tonal distortion in underexposed regions and fail to restore accurate colors in overexposed regions. The paper proposes to enhance both overexposed and underexposed images by learning to estimate and correct such tonal shifts. This is done first by learning to estimate and correct for this tonal bias based on theUNetof the network to derive the color feature maps of the brightened and darkened versions of the input image, and then a pseudo-normal color feature map is generated using a pseudo-normal feature generator. Then, the color feature maps are generated by the paper's proposedCOlor Shift Estimation(COSE) module to estimate the tonal shift between the derived lightening (or darkening) color feature maps and the pseudo-normal color feature maps, correcting the estimated tonal shifts for overexposed and underexposed regions, respectively. Finally, the proposedCOlor MOdulation(COMO) module to modulate the corrected colors in the overexposed and underexposed regions, respectively, to generate enhanced images.

Source: Xiaofei's Algorithmic Engineering Notes Public

discuss a paper or thesis (old): Color Shift Estimation-and-Correction for Image Enhancement

Paper Address:/abs/2405.17725
Thesis Code:/yiyulics/CSEC

Introduction

Real-world scenes often involve a wide range of lighting conditions, which pose a significant photographic challenge. Although cameras have automatic exposure modes to determine the "ideal" exposure setting based on scene brightness, adjusting exposure evenly across the entire image range can still result in areas that are overly bright and overly dark, and such underexposed and overexposed areas can exhibit significant tonal distortion. Relatively high noise levels in underexposed areas can alter the data distribution and result in tonal shifts, while overexposed areas can lose their original color. Enhancement of such images therefore usually involves brightness adjustments and tonal shift corrections.

In recent years, many efforts have been made to enhance incorrectly exposed images. These methods can be broadly categorized into two groups.

The first class focuses on enhancing overexposed or underexposed images. Some methods propose to learn exposure-invariant representation spaces, where different exposure levels can be mapped to a normalized and invariant representation. Other methods propose to integrate frequency information with spatial information, which helps to model the inherent structural features of the image, thereby enhancing the brightness and structural distortion of the image. However, the above methods usually assume that overexposure or underexposure occurs over the entire image, and for images with both overexposure and underexposure (e.g., Fig.1(b)), they are ineffective.
The second type of work aims to enhance images that are simultaneously overexposed and underexposed, using local color distributions as a prior to guide the enhancement process. However, despite the design of a pyramidal local color distribution prior, it still tends to produce results with significant color shifts in large homogeneous areas (e.g., Fig.1(c)）。

The aim of this paper is to correct the problem of brightness and color distortion in images that are simultaneously overexposed and underexposed. In order to solve this problem, firstly the image1(f) and1In (g), it is shown that the results from two related datasets (MSECcap (a poem)LCDP) of randomly sampled pixels in thePCAResults.MSECEach scene in the dataset contains five images with different exposure values (EV) of the input image, and theLCDPThere is only one input image for each scene in the dataset that contains both overexposure and underexposure. Two observations can be drawn from this preliminary study.

In both datasets, the underexposed pixels (green dots) tend to have the opposite distributional bias to the overexposed pixels (red dots).
together withMSECThe dataset contains the0 EVThe input image serves as a reference image for the exposure normalization process unlike theLCDPThe image has no such "normally exposed" pixels.

The first observation inspired us to consider estimating and correcting such color shifts, while the second inspired us to create pseudo-normal exposure feature maps as a reference for color shift estimation and correction.

To this end, the paper proposes a new method to jointly adjust image brightness and correct tonal distortion. This is first done using a method based on theUNetnetwork that extracts color feature maps of overexposed and underexposed regions from lightened and darkened versions of the input image. Then, a pseudo-normal feature generator creates pseudo-normal color feature maps based on these derived color feature maps. Subsequently, the paper proposes a new color shift estimation (COSE) module, respectively, estimates and corrects the color shift between the derived lightening (or darkening) color feature map and the created pseudo-normal color feature map by extending the deformable convolution in the color feature domainCOSEModule. Further, the paper proposes a new color modulation (COMO) module to generate an enhanced image by performing modulation on separately corrected colors of overexposed and underexposed regions through a customized cross-attention mechanism. By performing a customized cross-attention mechanism on the input image and the estimated darkening/lightening color offsetsCOMOModule, Figure1(d) shows the ability of our method to generate visually pleasing images.

The main contributions of the paper can be summarized as follows:

A novel neural network approach is proposed to enhance images that are simultaneously overexposed and underexposed by modeling changes in color distribution.
A novel neural network is proposed that includes two new modules: one is a novel color shift estimation for estimating and correcting colors in overexposed and underexposed regions, respectively (COSE) module, and the second is a novel color modulation for modulating the corrected colors to generate enhanced images (COMO) module.
Extensive experiments demonstrate that the paper's network is lightweight and outperforms existing image enhancement methods in popular benchmarks.

Proposed Method

The paper's approach was inspired by two observations. First, overexposed pixels tend to have reverse distribution offsets compared to underexposed pixels, suggesting the need to capture and correct for such color offsets separately. Second, since the vast majority, if not all, of the pixels are affected by either overexposure or underexposure, it is necessary to create pseudo-normal exposure information to guide the estimation of the color shift of overexposed or underexposed pixels. Based on these two observations, we propose a new network that includes two new modules: a new color offset estimation (COSE) module and the new color modulation (COMO) module for enhancing an image having overexposure or underexposure.

Network Overview

Given an input image with overexposure and underexposure\(I_x\in \mathcal{R}^{3\times H\times W}\)The aim is to generate an enhanced image\(I_y\in \mathcal{R}^{3\times H\times W}\), with corrected image brightness as well as recovered image details and colors, the model structure is shown in Fig.2Shown. Given an input image\(I_x\)The first step is to compute its inverse version by\(\hat{I}_x=1-I_x\)and then enter both into a program based on theUNetof the network to extract two light maps\(F_L^U\in \mathcal{R}^{1\times H\times W}\) cap (a poem)\(F_L^O\in \mathcal{R}^{1\times H\times W}\)The two light maps (i.e.\(F_L^U\) cap (a poem)\(F_L^O\)) indicate the areas affected by underexposure and overexposure, respectively. Next, the darkening feature map is computed\(F_D\)and brightening feature maps\(F_B\)The details are as follows:

\[\begin{align} F_B = \frac{I_x}{F_L^U} &= \frac{I_x}{f(I_x)}, \\ F_D = 1-\frac{1-I_x}{F_L^O} &= 1 - \frac{1-I_x}{f(1 - I_x)}, \end{align} \]

Among them.\(f(\cdot)\) representUNetThe feature extractor of the Based on the lightening and darkening feature maps\(F_B, F_D \in \mathbb{R}^{3 \times H \times W}\) to model color offsets.

state in advance\(F_B\) cap (a poem)\(F_D\) The first step is to compare them with the input image using a pseudo-normal feature generator.\(I_x\) Fusion to generate pseudo-normal feature maps\(F_N\) The details are as follows:

\[\begin{align} F_N = g(F_B, F_D, I_x), \end{align} \]

Among them.\(g(\cdot)\) denotes the pseudo-normal exposure generator. Then, the\(F_N\) can be used as a reference to pass two separateCOSEModule bootstrap estimation\(F_B\) cap (a poem)\(F_N\) as well as\(F_D\) cap (a poem)\(F_N\) The color offset between the These twoCOSEDarkening offsets generated by modules\(O_D\) and Brightening Offset\(O_B\) The input image is the one that is used to create the image.\(I_x\) simulates changes in brightness and color. As a result, the\(O_D\) 、 \(O_B\) cap (a poem)\(I_x\) profferedCOMOmodule for adjusting the brightness of the image and correcting color shifts to generate the final image\(I_y\) 。

Color Shift Estimation (COSE) Module

Unlike luminance adjustments, color offset correction is more challenging because it inherently requires the network to be inRGBmodeling pixel orientation in color space, rather than the magnitude of pixel intensities. While there is some work that uses cosine similarity regularization to help maintain the color of the image during training, such a strategy typically fails in large underexposed or overexposed regions where small or high value pixels are expected to have different colors.

The paper proposes a deformable convolution technique basedCOSEmodule to solve this problem. The deformable convolution (DConv) by introducing a spatial offset\(\Delta p_n\) extends the ordinary convolution to be able to adaptively perform the convolution in any\(N\times N\) The convolution is performed at any position of the pixel, where\(N\times N\) Indicates the size of the convolution kernel. The modulation term\(\Delta m_n\) is proposed to assign different weights to different convolution kernel positions, allowing the convolution operator to focus on important pixels. While deformable convolution can predict offsets with respect to the basis to capture changes in color distribution, since previous approaches have only applied deformable convolution in the pixel space domain, the paper proposes to extend deformable convolution to both the spatial domain and the color space to jointly model luminance variations and color offsets.

as shown3Shown.COSEThe module first connects pseudo-normal feature maps along the channel dimension\(F_N\) and brightening/darkening feature maps\(F_B\) / \(F_D\), and then use three separate\(3\times 3\) Convolution to extract positional offsets\(\Delta p_n\in \mathcal{R}^{B\times 2N\times H\times W}\) Color Shift\(\Delta c_n\in \mathcal{R}^{B\times 3N\times H\times W}\) and modulation term\(\Delta m_n\in \mathcal{R}^{B\times N\times H\times W}\) . Position Offset\(\Delta p_n\) and modulation term\(\Delta m_n\) executed in the spatial domain to aggregate spatial context information of deformed irregular sensory fields in the convolution operation. In addition, a color offset is introduced\(\Delta c_n\) , is used to represent the color offset of each channel at each convolution kernel position. The learned color offsets\(\Delta c_n\) Designed to have\(3N\) channels for simulations with3Inputs for each channelsRGBThe color offset of the image.

The computation of deformable convolution in the spatial domain and color space can be written as:

\[\begin{align} y = \sum_{p_n\in \mathcal{R}} (w_n\cdot x(p_0 + p_n + \Delta p_n) + \Delta c_n) \cdot \Delta m_n,\label{eq:cdc} \end{align} \]

Among them.\(x\) denotes the input features for the convolution operation, and\(p_0\) ， \(p_n\) cap (a poem)\(\Delta p_n\) is a two-dimensional variable representing a spatial location.\(y\) (or\(y(p_0)\) ) represents each pixel in the input image\(p_0\) The output of the deformable convolution of the color space of the The set of\(\mathcal{R} = \{(-1, -1), (-1, 0), \dots, (1, 1)\}\) denote a conventional\(3\times 3\) Lattice of convolutional kernels.\(n\) be\(\mathcal{R}\) The enumerator for the elements in the\(n\) Positions.\(N\) be\(\mathcal{R}\) length (for regular\(3\times 3\) convolution kernel.\(N=9\) ). Since the displacement\(\Delta p_n\) may have small numbers in practice, are computed using bilinear interpolation, which is consistent with spatially deformable convolution.

Color Modulation (COMO) Module

COMOThe module is used to adjust the brightness and color of the input image to generate the final output image.\(I_y\) , based on learned brightening features\(F_B\) and darkening features\(F_D\) The offset between\(O_B\) / \(O_D\) and pseudo-normal characteristics\(F_N\) .. Since it is crucial to aggregate global information when generating corrected images with harmonious colors, the paper draws inspiration from nonlocal context modeling, and by incorporating theself-affinityThe computational extension iscross-affinityCalculations to developCOMOmodule toCOMOAbility to search by\(O_B\) cap (a poem)\(O_D\) to enhance the input image.

as shown4As shown, for processing the input image\(I_x\) Darkening Offset\(O_D\) and the brightening offset\(O_B\) Three branches were assigned, each containing three\(1\times 1\) The convolutional layers (denoted respectively as\(Conv\psi\) 、 \(Conv\phi\) cap (a poem)\(ConvZ\) ). Then, in each branch, computeself-affinitymatrices\(A_i\) , as shown below:

\[\begin{align} A_i = \psi_i \otimes \phi_i,\ for\ i\in \{I, B, D\}, \end{align} \]

Among them.\(\otimes\) denotes the matrix multiplication.\(\psi_i\) cap (a poem)\(\phi_i\) respectively, by\(Conv\psi\) cap (a poem)\(Conv\phi\) The obtained feature maps. Then.\(A_i\) is symmetrized and normalized to ensure the presence of real eigenvalues and to stabilize the backpropagation.\(A_i\) of each row is used as a spatial attention map, and the\(Z_i\) (via\(ConvZ\) obtained) were used as weights for the attention graph. Next, modeling by matrix multiplication\(I_x\) together with\(O_B\) / \(O_D\) The correlation between theself-affinityThe features are summed as follows:

\[\begin{align} f_j = w_1 A_j \otimes Z_j + w_2 A_j \otimes Z_I, \end{align} \]

Among them.\(j \in \{B, D\}\) is the affinity matrix in the lightening or darkening branch\(A_j\) and feature maps\(Z_j\) The index of the\(w_1\) cap (a poem)\(w_2\) attributable\(1\times 1\) The weight matrix generated by convolution. In Eq.6In the first of these, the first one is to discover the results obtained by theCOSEacquired\(O_B\) cap (a poem)\(O_D\) in the significant color-shifted regions, while the second one aims to take advantage of the fact that the input\(Z_I\) The learning weights to focus on\(O_B\) cap (a poem)\(O_D\) of the attention map to understand the bias in the salient regions of the input.

Finally, it will be\(f_B\) 、 \(f_D\) and input image\(I_x\) Combined, the color offsets used as a guide to the exploration of the input image generate the final result\(I_y\) , as shown below:

\[\begin{align} I_y = w_4(BN(f_B) + BN(f_D) + w_3A_I\otimes Z_I) + I_x, \end{align} \]

Among them.\(BN(\cdot)\) denotes the batch normalization.\(w_3\) 、 \(w_4\) attributable\(1\times 1\) The weight matrix generated by the convolution.

Loss Function

Using two loss functions\(\mathcal{L}_{pesudo}\) cap (a poem)\(\mathcal{L}_{output}\) to train. Since a pseudo-normal feature map needs to be generated to help identify color offsets, using the\(\mathcal{L}_{pesudo}\) to provide intermediate oversight of the generation process.

\[\begin{align} \mathcal{L}_{pesudo} = ||F_N - GT||_1. \end{align} \]

\(\mathcal{L}_{output}\) contains four terms that are used to supervise the network in generating the enhanced image, i.e.\(L1\) Loss, cosine similarity\(\mathcal{L}_{cos}\) , structural similarity (SSIM) Losses\(\mathcal{L}_{ssim}\) cap (a poem)VGGdamages\(\mathcal{L}_{vgg}\) 。 \(\mathcal{L}_{output}\) It can be expressed as:

\[\begin{align} \mathcal{L}_{output} = \lambda_1 \mathcal{L}_{L1} + \lambda_2 \mathcal{L}_{cos} + \lambda_3 \mathcal{L}_{ssim} + \lambda_4 \mathcal{L}_{vgg}, \end{align} \]

Among them.\(\lambda_1\) 、 \(\lambda_2\) 、 \(\lambda_3\) cap (a poem)\(\lambda_4\) are four equilibrium hyperparameters. The overall loss function is:

\[\begin{align} \mathcal{L} = \lambda_p \mathcal{L}_{pesudo} + \lambda_o \mathcal{L}_{output}, \end{align} \]

Among them.\(\lambda_p\) cap (a poem)\(\lambda_o\) are two equilibrium hyperparameters.

Experiments

If this article is helpful to you, please click a like or in the look at it ～～
For more content, please pay attention to WeChat public number [Xiaofei's Algorithm Engineering Notes].

work-life balance.

CSEC: City University of * Proposes SOTA Exposure Correction Algorithm | CVPR 2024

Network Overview

Color Shift Estimation (COSE) Module

Color Modulation (COMO) Module

Loss Function