[Engineering Application XII] Hamilton-Adams and Zhang Wu in Bayer image format Optimization of Demosaic process based on LMMSS algorithm.

Demosaic, the Chinese direct hair translation for de-mosaic, but its essence is not to remove the mosaic, which makes a lot of first contact with this term or concept will always think more. Therefore, the general sensor in the collection of information in a position to collect only one channel in the RGB color, which can reduce the amount of collection, reduce costs, due to human vision is most sensitive to the green, so in general, the green component will be twice as much information than the red and blue components, so that, according to the different positions of the RGB layout, it produces a RGGB, GBRG, GRBG, BGGR Four commonly used modes, for example, the RGGB mode is shown below:

RGGBBGGRGBRG　GRBG

The purpose of de-mosaicing is to recover the original information as perfectly as possible from this indeed information, for example, for the first point in the picture above, only the red component is accurate, then we have to find a way to reconstruct the green and blue components of the altered point through other information.

Currently, there is also a lot of information about this, here I describe the four algorithms that I have optimized and processed so far, and share some of the code.

I. Bilinear processing

One of the most direct and simple ideas is to use the relevant information of the domain to make up for the current missing color components by interpolation, let's take the RGGB format as an example and refer to the above figure (the coordinates are all starting from 0, the X-axis is incrementing from left to right, and the Y-axis is incrementing from top to bottom).

Ignore the edges for a moment.

For example, the point at (1,1) coordinates now has the component B and lacks the components R and G. However, he has the component R on all four diagonals, so the average of these four pixels can be used to calculate the R component of the altered point, and there are exactly four points in its four directions up, down, left, right, and left of the G component, and in the same way, the average of these four can be used to evaluate the G value of the current point.

In considering the coordinate point of (1,2), his current RMS is the G component and lacks the R and B components, whereas he is not as lucky as the point of (1,1) in that he is surrounded by only 2 valid R and B components, and therefore only the average of these two RMS can be utilized for evaluating the R/B component of the point.

Other points are handled similarly.

In considering the edges, since the pixels at the edges are always missing pixels in one or both directions, the missing side can be supplemented by taking the positional information of the mirrored side using the mirroring relationship.

A simple and efficient C++ code is shown below:

int IM_DecodeBayerRG8ToBGR_Bilinear_PureC(unsigned char* Src, BitmapData Dest)
{
    int Status = IM_STATUS_OK;
    int Width = , Height = , Stride = ;
    if (((Width & 1) == 1) || ((Height & 1) == 1))    return IM_STATUS_INVALIDPARAMETER;
    if ((Width < 8) || (Height < 8))                return IM_STATUS_INVALIDPARAMETER;
    unsigned char* RowCopy = (unsigned char*)malloc((Width + 2) * 3 * sizeof(unsigned char));
    if (RowCopy == NULL)    return IM_STATUS_OUTOFMEMORY;
    unsigned char* First = RowCopy;
    unsigned char* Second = RowCopy + (Width + 2);
    unsigned char* Third = RowCopy + (Width + 2) * 2;
    unsigned char* SrcP = (unsigned char*)Src;
    Second[0] = SrcP[0];
    memcpy(Second + 1, SrcP, Width * sizeof(unsigned char));
    Second[Width + 1] = SrcP[Width - 1];
    memcpy(First, Second, (Width + 2) * sizeof(unsigned char));            //    The first line is the same as the second
    Third[0] = SrcP[Width];
    memcpy(Third + 1, SrcP + Width, Width * sizeof(unsigned char));
    Third[Width + 1] = SrcP[Width + Width - 1];
    for (int Y = 0; Y < Height; Y++)
    {
        unsigned char* LinePD = Dest.Scan0 + Y * Stride;
        if (Y != 0)
        {
            unsigned char* Temp = First; First = Second; Second = Third; Third = Temp;
        }
        if (Y == Height - 1)
        {
            memcpy(Third, Second, (Width + 2) * sizeof(unsigned char));
        }
        else
        {
            Third[0] = SrcP[(Y + 1) * Width];
            memcpy(Third + 1, SrcP + (Y + 1) * Width, Width * sizeof(unsigned char));
            Third[Width + 1] = SrcP[(Y + 1) * Width + Width - 1];
        }
        if ((Y & 1) == 0)        //    even series
        {
            for (int X = 0; X < Width; X++, LinePD += 3)
            {
                int P0 = First[X], P1 = First[X + 1], P2 = First[X + 2];
                int P3 = Second[X], P4 = Second[X + 1], P5 = Second[X + 2];
                int P6 = Third[X], P7 = Third[X + 1], P8 = Third[X + 2];
                if ((X & 1) == 0)
                {
                    LinePD[0] = (P0 + P2 + P6 + P8 + 2) >> 2;
                    LinePD[1] = (P1 + P3 + P5 + P7 + 2) >> 2;
                    LinePD[2] = P4;
                }
                else
                {
                    LinePD[0] = (P1 + P7 + 1) >> 1;
                    LinePD[1] = P4;
                    LinePD[2] = (P3 + P5 + 1) >> 1;
                }
            }
        }
        else
        {
            for (int X = 0; X < Width; X++, LinePD += 3)
            {
                int P0 = First[X], P1 = First[X + 1], P2 = First[X + 2];
                int P3 = Second[X], P4 = Second[X + 1], P5 = Second[X + 2];
                int P6 = Third[X], P7 = Third[X + 1], P8 = Third[X + 2];
                if ((X & 1) == 0)
                {
                    LinePD[0] = (P3 + P5 + 1) >> 1;
                    LinePD[1] = P4;
                    LinePD[2] = (P1 + P7 + 1) >> 1;
                }
                else
                {
                    LinePD[0] = P4;
                    LinePD[1] = (P1 + P3 + P5 + P7 + 2) >> 2;
                    LinePD[2] = (P0 + P2 + P6 + P8 + 2) >> 2;
                }
            }
        }
    }
    free(RowCopy);
    return IM_STATUS_OK;
}

This algorithm is very efficient and extremely easy to use instruction set optimization, on an ordinary configuration of the PC (12th Gen Intel(R) Core(TM) i7-12700F 2.10 GHz) to process 1902 * 1080 figure, it takes about only 2ms, SSE optimization can be done 0.6ms, but this method ignores the edge structure and channel However, this method ignores the correlation between edge structures and channels, which can easily lead to color artifacts and image blurring, such as the following classic test image, the decoded result is a bit miserable.

Results after bilinear decoding with color display of RGGB format maps (i.e., designing the color components of the other channels as 0)

Noticeable color anomalies can be seen in both the horizontal and vertical orientation of the decoded fence.

However, due to the extreme efficiency of this algorithm, it can still be used in some less demanding situations.

II. Hamilton-Adams algorithm

This is also a very classical algorithm, and before explaining this algorithm, it is important to explain the theory of constant color difference.

Both the color difference constant criterion and the color ratio constant criterion are based on the correlation between color channels, and both aim to introduce the correlation information between color channels into the color interpolation algorithm to improve the accuracy of the interpolation. Color difference has two advantages over color ratio:

First, the arithmetic of color difference is simple and easier to implement. Second, the color ratio has a large error when the G channel is close to 0. Color difference does not have such problems. Therefore, the vast majority of color interpolation algorithms use color difference.

Then the theory of chromatic aberration constancy, the most central idea is: two color pixels in close proximity to each other, the difference between the same color components should be approximately about the same value, expressed in the following formula:

R(X,Y) - R(X-1,Y) = G(X,Y) - G(X-1,Y) = B(X,Y) - B(X-1,Y)

Then this is the time when if we have acquired the values of all the positions of a certain color channel, it is easy to derive the missing values of the other positions by using this formula.

Let's still take the above (1,1) coordinate point as an example, assuming that we have acquired all the G-channel data, that is to say, this (1,1) point is actually only the R-channel data is missing (the B data itself is there), at this time, according to the theory of constant color difference, there should be

R(1,1) - R(0,0) = G(1,1) - G(0,0) ------> R(1,1) = G(1,1) + R(0,0) - G(0, 0)

There are actually three other points that satisfy this equation (0,2), (2,0), and (2,2) (the red component of these three points is the exact value), so in order to get better accuracy, we can finalize the value of R(1,1) by using the following equation.

R(1,1) = (G(1,1) + R(0,0) - G(0, 0) + G(1,1) + R(0,2) - G(0, 2) + G(1,1) + R(2,0) - G(2, 0) + G(1,1) + R(2,2) - G(2, 2)) /4

Organized that is:

R(1,1) = G(1,1) + (R(0,0) + R(0,2) + R(2, 0) + R(2,2) - G(0,0) -G(0,2) -G(2, 0) -G(2,2)) / 4

　For the point (1, 2), the G-channel data would have been there, missing R and B. Then, according to the theory of constant color difference, there should be

R(1,2) - G(1,2) = R(0,2) - G(0,2)

B(1,2) - G(1,2) = B(1,1) - G(1,1)

By the same token, there are

R(1,2) - G(1,2) = R(2,2) - G(2,2)

B(1,2) - G(1,2) = B(1,3) - G(1,3)

Similar to (1,2), combining them gives more precise results:

R(1,2) = G(1,2) + ((R(0,2) + R(2,2) - G(0,2) - G(2,2)) / 2

B(1,2) = G(1,2) + ((B(1,1) + B(1,3) - G(0,2) - G(2,2)) / 2

Above using the color constant difference theory, it correlates the data between each channel. Then all the previous calculations have a precondition that all the data of the green channel is already known.

We said earlier, the amount of data in the green channel itself is only missing half, and missing that half of any one point in the data can be filled with four points around the field of data, therefore, if we green channel is handled in this way, while the red and blue with the color constant difference theory with the help of the green channel of the data after the results of how it is, we do so the algorithm is called the CDCT to put, a simple and serious The code for this algorithm is shown below:

int IM_DecodeBayerRG8ToBGR_CDCT_PureC(unsigned char* Src, BitmapData Dest)
{
    int Status = IM_STATUS_OK;
    int Width = , Height = , Stride = ;
    if (((Width & 1) == 1) || ((Height & 1) == 1))    return IM_STATUS_INVALIDPARAMETER;
    if ((Width < 8) || (Height < 8))                return IM_STATUS_INVALIDPARAMETER;

    unsigned char* Blue = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    unsigned char* Green = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    unsigned char* Red = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    if ((Blue == NULL) || (Green == NULL) || (Red == NULL))
    {
        Status = IM_STATUS_OUTOFMEMORY;
        goto FreeMemory;
    }
    //    First, copy the data directly, and there is no need to extract the value of the channel separately to fill, because it will subsequently fill the invalid points of the
    memcpy(Blue, Src, Height * Width * sizeof(unsigned char));
    memcpy(Green, Src, Height * Width * sizeof(unsigned char));
    memcpy(Red, Src, Height * Width * sizeof(unsigned char));
    //    Because the Green component takes up 1/2 pixel, fill the Green pixel first
//    Because the subsequent Green component involves a 3*3 field, for the part of the edge, the average of the four points is used directly and presented separately, which is a small amount of computation and does not need to be accelerated.
    for (int X = 0; X < Width; X++)
    {
        IM_CalcBorderGreen_CDCT(Green, Width, Height, X, 0);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, X, Height - 1);
    }
    for (int Y = 1; Y < Height - 1; Y++)
    {
        IM_CalcBorderGreen_CDCT(Green, Width, Height, 0, Y);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, Width - 1, Y);
    }

    //    Data to fill the invalid positions of the Green channel
    for (int Y = 1; Y < Height - 1; Y++)
    {
        int Index = Y * Width;
        for (int X = 1; X < Width - 1; X++)
        {
            //    Even rows and even columns or odd rows and odd columns, the green component is invalid.
            if (((X + Y) & 1) == 0)
            {
                Green[Index + X] = (Green[Index + X - Width] + Green[Index + X + Width] + Green[Index + X - 1] + Green[Index + X + 1] + 2) / 4;
            }
        }
    }
    IM_RGGB_CalcRed_CDCT_PureC(Red, Green, Width, Height);
    IM_RGGB_CalcBlue_CDCT_PureC(Blue, Green, Width, Height);
    Status = IM_CombineRGB_PureC(Blue, Green, Red, Dest.Scan0, , , , Width);
    if (Status != IM_STATUS_OK)    goto FreeMemory;
FreeMemory:
    if (Blue != NULL)    free(Blue);
    if (Green != NULL)    free(Green);
    if (Red != NULL)    free(Red);
    return Status;
}

Where IM_RGGB_CalcRed_CDCT_PureC code is shown below:

void IM_RGGB_CalcRed_CDCT_PureC(unsigned char* Red, unsigned char* Green, int Width, int Height)
{
    //    R    G    R    G    R    G    
    //    G    B    G    B    G    B
    //    R    G    R    G    R    G    
    //    G    B    G    B    G    B
    //    R    G    R    G    R    G    
    //    G    B    G    B    G    B    

    //    Chromatic Aberration Constant Principle: Chromatic Aberration is the difference between the chrominance signal (R and B components) and the luminance signal (G component). In a small range of the image, the chromatic aberration of the current pixel is more or less the same as that of its surrounding points, i.e., it is similar to the following statement
//    R(I,J) - G(I,J) = R(I,J + 1) - G(I,J + 1), or written as R(I,J) - R(I,J + 1) = G(I,J) - G(I,J + 1). In this way the other unknown R and B components can be reconstructed using the G component that has already had its reconstruction done earlier.

    for (int X = 0; X < Width; X++)
    {
        IM_RGGB_CalcBorderRed_CDCT(Red, Green, Width, Height, X, 0);
        IM_RGGB_CalcBorderRed_CDCT(Red, Green, Width, Height, X, Height - 1);
    }
    for (int Y = 1; Y < Height - 1; Y++)
    {
        IM_RGGB_CalcBorderRed_CDCT(Red, Green, Width, Height, 0, Y);
        IM_RGGB_CalcBorderRed_CDCT(Red, Green, Width, Height, Width - 1, Y);
    }
    //    Data to fill the invalid positions of the Red channel
    for (int Y = 1; Y < Height - 1; Y++)
    {
        int Index = Y * Width;
        for (int X = 1; X < Width - 1; X++)
        {
            //    Even rows and odd columns, horizontally filled with red components
            if ((Y & 1) == 0 && (X & 1) == 1)
            {
                Red[Index + X] = IM_ClampToByte((Red[Index + X - 1] + Red[Index + X + 1] - Green[Index + X - 1] - Green[Index + X + 1] + 1) / 2 + Green[Index + X]);
            }
            //    Odd rows and even columns, vertically filled with red components
            else if ((Y & 1) == 1 && (X & 1) == 0)
            {
                Red[Index + X] = IM_ClampToByte((Red[Index + X - Width] + Red[Index + X + Width] - Green[Index + X - Width] - Green[Index + X + Width] + 1) / 2 + Green[Index + X]);
            }
            //    Odd rows and columns, horizontally and vertically filled with red components
            else if ((Y & 1) == 1 && (X & 1) == 1)
            {
                Red[Index + X] = IM_ClampToByte((Red[Index + X - Width - 1] + Red[Index + X - Width + 1] + Red[Index + X + Width - 1] + Red[Index + X + Width + 1] - Green[Index + X - Width - 1] - Green[Index + X - Width + 1] - Green[Index + X + Width - 1] - Green[Index + X + Width + 1] + 2) / 4 + Green[Index + X]);
            }
        }
    }
}

IM_RGGB_CalcBlue_CDCT_PureC is similar.

After testing, there's basically no difference in the results of this compared to direct bilinearity, so direct this still won't work.

Hamilton-Adams et al. proposed the following algorithm for interpolation of the green component based on combining the first-order derivatives of the green channel and the second-order inverses of the surrounding red and blue channels, a process that takes into account the edge information between pixels:

When the gradient in the horizontal direction is greater than the gradient in the vertical direction, the result of the calculation of the relevant pixel in the vertical direction is used, otherwise the relevant value in the horizontal square is used, and if they are equal, the average value is used.

In practice, a threshold is defined, and if the absolute value of the difference between the horizontal and vertical gradients is less than the threshold, the average value is used, and if it is outside the threshold, the relationship between the horizontal and vertical is considered. This gives a more reasonable result.

In fact, if you look closely, the calculation of G5 in each direction above also makes use of the color constant difference theory, it's just that it makes use of the horizontal or vertical pixels individually.

We share the results of C++ code written more along these lines:

int IM_DecodeBayerRG8ToBGR_HamiltonAdams_PureC(unsigned char* Src, BitmapData Dest,    int Threshold)
{
    int Status = IM_STATUS_OK;
    int Width = , Height = , Stride = ;
    //    Both width and height must be even
    if (((Width & 1) == 1) || ((Height & 1) == 1))    return IM_STATUS_INVALIDPARAMETER;
    if ((Width < 8) || (Height < 8))                return IM_STATUS_INVALIDPARAMETER;

    unsigned char* Blue = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    unsigned char* Green = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    unsigned char* Red = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    if ((Blue == NULL) || (Green == NULL) || (Red == NULL))
    {
        Status = IM_STATUS_OUTOFMEMORY;
        goto FreeMemory;
    }
    //    First, copy the data directly, and there is no need to extract the value of the channel separately to fill, because it will subsequently fill the invalid points of the
    memcpy(Blue, Src, Height * Width * sizeof(unsigned char));
    memcpy(Green, Src, Height * Width * sizeof(unsigned char));
    memcpy(Red, Src, Height * Width * sizeof(unsigned char));

    //    Because the Green component takes up 1/2 pixel, fill the Green pixel first
//    Because the subsequent Green component involves a 5*5 field, for the part of the edge, the average of the four points is used directly and presented separately, and this part of the computation is very small and does not need to be accelerated
    for (int X = 0; X < Width; X++)
    {
        IM_CalcBorderGreen_CDCT(Green, Width, Height, X, 0);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, X, 1);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, X, Height - 2);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, X, Height - 1);
    }
    for (int Y = 2; Y < Height - 2; Y++)
    {
        IM_CalcBorderGreen_CDCT(Green, Width, Height, 0, Y);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, 1, Y);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, Width - 2, Y);
        IM_CalcBorderGreen_CDCT(Green, Width, Height, Width - 1, Y);
    }
    //    Algorithms that deal with whether the remaining can safely access the domain
    for (int Y = 2; Y < Height - 2; Y++)
    {
        int IndexC = Y * Width;
        int IndexN1 = (Y + 1) * Width, IndexN2 = (Y + 2) * Width;
        int IndexP1 = (Y - 1) * Width, IndexP2 = (Y - 2) * Width;
        unsigned char* Sample = (Y & 1) == 0 ? Red : Blue;
        for (int X = 2; X < Width - 2; X++)
        {
            //    Only if X and Y are both even or both odd need to be handled
            if (((X + Y) & 1) == 0)
            {
                //    Second-order derivatives of the blue or red components of the perimeter
                int SecDH = 2 * Sample[IndexC + X] - Sample[IndexC + X + 2] - Sample[IndexC + X - 2];
                int SecDV = 2 * Sample[IndexC + X] - Sample[IndexN2 + X] - Sample[IndexP2 + X];

                //    Adding the first order derivative of the green component yields the gradient
                int GradH = IM_Abs(Green[IndexC + X - 1] - Green[IndexC + X + 1]) + IM_Abs(SecDH);
                int GradV = IM_Abs(Green[IndexP1 + X] - Green[IndexN1 + X]) + IM_Abs(SecDV);

                //    If the vertical or horizontal gradient is about the same, calculate the average value of the perimeter
                if (IM_Abs(GradV - GradH) < Threshold)
                    Green[IndexC + X] = IM_ClampToByte((Green[IndexP1 + X] + Green[IndexN1 + X] + Green[IndexC + X - 1] + Green[IndexC + X + 1] + 2) / 4 + (SecDH + SecDV + 4) / 8);
                // If the horizontal differences are smaller, the average value in the horizontal direction is utilized
                else if (GradH < GradV)
                    Green[IndexC + X] = IM_ClampToByte((Green[IndexC + X - 1] + Green[IndexC + X + 1] + 1) / 2 + (SecDH + 2) / 4);
                // Otherwise use the average of the vertical
                else
                    Green[IndexC + X] = IM_ClampToByte((Green[IndexP1 + X] + Green[IndexN1 + X] + 1) / 2 + (SecDV + 2) / 4);
            }
        }
    }
    IM_RGGB_CalcRed_CDCT_SSE(Red, Green, Width, Height);
    IM_RGGB_CalcBlue_CDCT_SSE(Blue, Green, Width, Height);
    Status = IM_CombineRGB_PureC(Blue, Green, Red, Dest.Scan0, , , , Width);
    if (Status != IM_STATUS_OK)    goto FreeMemory;
FreeMemory:
    if (Blue != NULL)    free(Blue);
    if (Green != NULL)    free(Green);
    if (Red != NULL)    free(Red);
    return Status;
}

We share the results of the previously mentioned CDCT algorithm as well as the HamiltonAdams algorithm:

Simple CDCT results HamiltonAdams results

As you can see from the results on the right above, the color anomalies seem to be no longer visible at the horizontal fence compared to the bilinear, and the anomalies at the vertical fence are greatly reduced, but there are still slight imperfections.

In terms of speed, because when using bilinear directly, you can directly fill the data data into the destination map with regularity, and the amount of computation is very simple, while after using the theory of constant chromatic aberration, due to the requirements of the order as well as some coding aspects, it is necessary to use some intermediate memory and the amount of computation is comparatively a lot larger, so the speed is also much slower, the code of the above mentioned C++ to deal with the 1080P The above C++ code needs about 7ms to process 1080P image, and the optimized code after SSE can reach about 4ms speed, which is still of practical value in some real-time demanding scenarios.

　 In order to further improve the effect, based on the HA algorithm, there are still a lot of people subsequently proposed more improved algorithms, in IPOL, you can find anA Mathematical Analysis and Implementation of Residual Interpolation Demosaicking Algorithms The article lists several types of processing, the site is a also provides online DEMO and code, you can see the results of the respective processing, as shown in the figure below:

However, these are more traditional algorithms in terms of things, and I looked at the code and the effects of these algorithms, I feel that these algorithms are more theoretical, the actual operation, may be completely insufficient efficiency, at the same time, IPOL also provides an algorithm based on the aspects of CNN deep learning, the effect of which I looked at, it is indeed still very good, but it is the feeling that the speed of the more worrisome, interested can be in theA Study of Two CNN Demosaicking AlgorithmsBrowse here.

　Then I follow up with another article from IPOL:.Zhang-Wu Directional LMMSE Image Demosaicking, through the test found that the effect of this article is still very good and stable, and analysis of its code, feel that can do great optimization work.

This article treats the R and B channels in the same way as the HA algorithm, in that it is necessary to obtain all the data for the G channel, and then to compute R and B using the principle of constancy of color difference, starting with the computation of the color difference between (G-R) and (G-B) in the horizontal and vertical directions in the computation of the G channel. These calculations are then treated as noisy estimates of the actual color difference and they are optimally combined using a linear least mean square error framework.

The so-called optimal combination can be understood in this way, the traditional HA algorithm, when calculating the G-channel, is to decide whether to use the calculated value of the horizontal or vertical direction in the end according to the gradient of the horizontal and vertical directions, while the ZhangWu algorithm does not completely use the information of the horizontal or vertical, but also calculates the weight of each of the horizontal and vertical directions according to certain processes and then fuses them.

The formula for color noise estimation is briefly described here:

These formulas can be viewed as a simple de-drying process, where f represents the original noise information, s is a simple low-pass filtering of f, such as Gaussian filtering, and then formula 9 calculates the mean of a certain field size of s, formula 10 calculates the variance of s within this field, and formula 11 calculates the mean of the squared differences between f and s.

Using the above information, it is then possible to estimate the estimate u after removing the noise (Eq. 12) as well as the estimation error (Eq. 13) for one.

The color differences on the horizontal and vertical inversions are filtered in this way, respectively, and then fused using the results above according to the following formula:

The above formula is a 1D de-drying process, I'll change it to a 2D graph de-drying sometime and see how it works.

These formulas in the companion code of the paper have the relevant implementation, interested friends can have a look at the code, I will process the process, this function probably has the following code composition:

//    Zhang–Wu Directional LMMSE Image Demosaicking
int IM_DecodeBayerRG8ToBGR_ZhangWu_SSE(unsigned char* Src, BitmapData Dest)
{
    int Status = IM_STATUS_OK;
    int Width = , Height = , Stride = ;
    //    Both width and height must be even
    if (((Width & 1) == 1) || ((Height & 1) == 1))    return IM_STATUS_INVALIDPARAMETER;
    //    Some fields with width or height less than 8 will overflow
    if ((Width < 8) || (Height < 8))                return IM_STATUS_INVALIDPARAMETER;
    unsigned char* Blue = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    unsigned char* Green = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    unsigned char* Red = (unsigned char*)malloc(Width * Height * sizeof(unsigned char));
    short* DiffH = (short *)malloc(Width * Height * sizeof(short));
    short* DiffV = (short *)malloc(Width * Height * sizeof(short));
    short* LowPassH = (short*)malloc(Width * Height * sizeof(short));
    short* LowPassV = (short *)malloc(Width * Height * sizeof(short));
    if ((Blue == NULL) || (Green == NULL) || (Red == NULL) || (DiffH == NULL) || (DiffV == NULL) || (LowPassH == NULL) || (LowPassV == NULL))
    {
        Status = IM_STATUS_OUTOFMEMORY;
        goto FreeMemory;
    }
    //    First, copy the data directly, and there is no need to extract the value of the channel separately to fill, because it will subsequently fill the invalid points of the
    memcpy(Blue, Src, Height * Width * sizeof(unsigned char));
    //memcpy(Green, Src, Height * Width * sizeof(unsigned char));            //    The green channel, because of its specificity, will be subsequently populated in the code, without the need to copy the data separately
    memcpy(Red, Src, Height * Width * sizeof(unsigned char));
    //    Obtaining the difference between the accurate and interpolated signals in the horizontal direction
    IM_GetHoriDiffSignal_SSE(Src, DiffH, Width, Height);
    //    Obtaining the difference between accurate and interpolated signals in the vertical direction
    IM_GetVertDiffSignal_SSE(Src, DiffV, Width, Height);
    //    1*9 Gaussian filtering for level differences
    IM_HoriLowPass1X9_SSE(DiffH, LowPassH, Width, Height);
    //    9*1 Gaussian filtering of vertical differences
    IM_VertLowPass9X1_SSE(DiffV, LowPassV, Width, Height);
    //    Calculation of the complete green channel by the LMMSE algorithm
    Status = IM_RGGB_CalcGreen_ZW_SSE(Src, Green, DiffH, DiffV, LowPassH, LowPassV, Width, Height, 4);
    if (Status != IM_STATUS_OK)    goto FreeMemory;
    //    Calculation of the Red channel using the principle of constant chromatic aberration
    IM_RGGB_CalcRed_CDCT_SSE(Red, Green, Width, Height);
    //    Calculation of the Blue channel using the principle of constant chromatic aberration
    IM_RGGB_CalcBlue_CDCT_SSE(Blue, Green, Width, Height);
    //    Combining RGB single-channel data into RGB color images
    Status = IM_CombineRGB_SSE(Blue, Green, Red, Dest.Scan0, , , , Width);
    if (Status != IM_STATUS_OK)    goto FreeMemory;
FreeMemory:
    if (Blue != NULL)        free(Blue);
    if (Green != NULL)        free(Green);
    if (Red != NULL)        free(Red);
    if (DiffH != NULL)        free(DiffH);
    if (DiffV != NULL)        free(DiffV);
    if (LowPassH != NULL)    free(LowPassH);
    if (LowPassV != NULL)    free(LowPassV);
    return Status;
}

Specific implementation will not do too much to explore, the original author of the C code is very inefficient, a simple test of the 1080P map about 1 minute or so, which is completely meaningless, so the need for in-depth optimization, such as horizontal filtering, the original author of the use of 1 * 9 filter, I slightly improve and optimize the SSE instruction as follows:

//    In-Place operation is not supported
void IM_HoriLowPass1X9_SSE(short* Src, short* Dest, int Width, int Height)
{
    for (int Y = 0; Y < Height; Y++)
    {
        int Index = Y * Width;
        //    Relationships where edges are mirrored 4 3 2 1 0 1 2 3 4 5 6 7
        Dest[Index + 0] = ((Src[Index + 4] + Src[Index + 4]) * 4 + (Src[Index + 3] + Src[Index + 3]) * 8 + (Src[Index + 2] + Src[Index + 2]) * 16 + (Src[Index + 1] + Src[Index + 1]) * 23 + Src[Index + 0] * 26 + 64) / 128;
        Dest[Index + 1] = ((Src[Index + 3] + Src[Index + 5]) * 4 + (Src[Index + 2] + Src[Index + 4]) * 8 + (Src[Index + 1] + Src[Index + 3]) * 16 + (Src[Index + 0] + Src[Index + 2]) * 23 + Src[Index + 1] * 26 + 64) / 128;
        Dest[Index + 2] = ((Src[Index + 2] + Src[Index + 6]) * 4 + (Src[Index + 1] + Src[Index + 5]) * 8 + (Src[Index + 0] + Src[Index + 4]) * 16 + (Src[Index + 1] + Src[Index + 3]) * 23 + Src[Index + 2] * 26 + 64) / 128;
        Dest[Index + 3] = ((Src[Index + 1] + Src[Index + 7]) * 4 + (Src[Index + 0] + Src[Index + 6]) * 8 + (Src[Index + 1] + Src[Index + 5]) * 16 + (Src[Index + 2] + Src[Index + 4]) * 23 + Src[Index + 3] * 26 + 64) / 128;

        //    W-8        W-7        W-6        W-5        W-4        W-3        W-2        W-1        W-2        W-3        W-4        W-5
        Dest[Index + Width - 4] = ((Src[Index + Width - 8] + Src[Index + Width - 2]) * 4 + (Src[Index + Width - 7] + Src[Index + Width - 1]) * 8 + (Src[Index + Width - 6] + Src[Index + Width - 2]) * 16 + (Src[Index + Width - 5] + Src[Index + Width - 3]) * 23 + Src[Index + Width - 4] * 26 + 64) / 128;
        Dest[Index + Width - 3] = ((Src[Index + Width - 7] + Src[Index + Width - 3]) * 4 + (Src[Index + Width - 6] + Src[Index + Width - 2]) * 8 + (Src[Index + Width - 5] + Src[Index + Width - 2]) * 16 + (Src[Index + Width - 4] + Src[Index + Width - 2]) * 23 + Src[Index + Width - 3] * 26 + 64) / 128;
        Dest[Index + Width - 2] = ((Src[Index + Width - 6] + Src[Index + Width - 4]) * 4 + (Src[Index + Width - 5] + Src[Index + Width - 3]) * 8 + (Src[Index + Width - 4] + Src[Index + Width - 3]) * 16 + (Src[Index + Width - 3] + Src[Index + Width - 1]) * 23 + Src[Index + Width - 2] * 26 + 64) / 128;
        Dest[Index + Width - 1] = ((Src[Index + Width - 5] + Src[Index + Width - 5]) * 4 + (Src[Index + Width - 4] + Src[Index + Width - 4]) * 8 + (Src[Index + Width - 3] + Src[Index + Width - 4]) * 16 + (Src[Index + Width - 2] + Src[Index + Width - 2]) * 23 + Src[Index + Width - 1] * 26 + 64) / 128;

        int BlockSize = 8, Block = (Width - 8) / BlockSize;
        for (int X = 4; X < 4 + Block * BlockSize; X += BlockSize)
        {
            //    (V0 * 4 + V1 * 8 + V2 * 16 + V3 * 23 + V4 * 26 + V5 * 23 + V6 * 16 + V7 * 8 + V8 * 4 + 64) / 128
            __m128i V0 = _mm_loadu_si128((__m128i*)(Src + Index + X - 4));
            __m128i V1 = _mm_loadu_si128((__m128i*)(Src + Index + X - 3));
            __m128i V2 = _mm_loadu_si128((__m128i*)(Src + Index + X - 2));
            __m128i V3 = _mm_loadu_si128((__m128i*)(Src + Index + X - 1));
            __m128i V4 = _mm_loadu_si128((__m128i*)(Src + Index + X + 0));
            __m128i V5 = _mm_loadu_si128((__m128i*)(Src + Index + X + 1));
            __m128i V6 = _mm_loadu_si128((__m128i*)(Src + Index + X + 2));
            __m128i V7 = _mm_loadu_si128((__m128i*)(Src + Index + X + 3));
            __m128i V8 = _mm_loadu_si128((__m128i*)(Src + Index + X + 4));

            __m128i V08 = _mm_slli_epi16(_mm_add_epi16(V0, V8), 2);                                //    (V0 + V8) * 4
            __m128i V17 = _mm_slli_epi16(_mm_add_epi16(V1, V7), 3);                                //    (V1 + V7) * 8
            __m128i V26 = _mm_slli_epi16(_mm_add_epi16(V2, V6), 4);                                //    (V2 + V6) * 16
            __m128i V35 = _mm_mullo_epi16(_mm_add_epi16(V3, V5), _mm_set1_epi16(23));            //    (V3 + V5) * 23
            __m128i    V44 = _mm_mullo_epi16(V4, _mm_set1_epi16(26));                                //    V4 * 26

            __m128i Sum = _mm_add_epi16(_mm_add_epi16(_mm_add_epi16(V08, V17), _mm_add_epi16(V26, V35)), V44);
            __m128i Avg = _mm_srai_epi16(_mm_add_epi16(Sum, _mm_set1_epi16(64)), 7);
            _mm_storeu_si128((__m128i*)(Dest + Index + X), Avg);
        }
        for (int X = 4 + Block * BlockSize; X < Width - 4; X++)
        {
            Dest[Index + X] = ((Src[Index + X - 4] + Src[Index + X + 4]) * 4 + (Src[Index + X - 3] + Src[Index + X + 3]) * 8 + (Src[Index + X - 2] + Src[Index + X + 2]) * 16 + (Src[Index + X - 1] + Src[Index + X + 1]) * 23 + Src[Index + X] * 26 + 64) / 128;
        }
    }
}

This greatly improves the speed of operation.

In the calculation process of LMMSE, the author's M size is taken to be 4, that is, the size of the field involved is also 9 pixels, and the author uses a hard loop to realize it in the code, in fact, this is also the ordinary one-in-one-out statistics, which is completely possible to do something to optimize, especially the vertical direction of the loop, each time to jump a line of pixels, the cachemiss is very big, so by changing the structure appropriately, it can greatly So by changing the structure appropriately, the speed can be greatly improved. After optimization, we test this algorithm to process a 1080P image, SSE version takes about 12ms, and my own optimized C++ code takes about 27ms (compiled using the compiler's own vectorization method, and not the original author's C++ code), this speed should be said that it is still acceptable in real-time systems, and this process can still be multi-threaded This speed should be acceptable in a real-time system, and the process can still be parallelized in multiple threads, if two threads are opened, the SSE version is expected to do 8ms a frame.

Compared to the HA algorithm, this algorithm gives more perfect results and has fewer flaws as shown below:

Zhang Wu's algorithm results Original image

We'll share another set of tests and images:

　MosaicBilinear

HAZhang Wu

Overall, the best result is Zhang Wu algorithm, followed by HA, the most general is Bilinear, however, do not shoot dead Bilinear, in many cases is not a very complex scene, the use of his results can still meet the demand, the key is that he is really fast ah.

Of course, if there is a balance between the efficiency and effectiveness of algorithm execution, it should be said that the HA algorithm is more appropriate.

To make it easier, I wrote a simple DEMO for algorithm verification.

/files/Imageshop/?t=1725238639&download=true

If you want to always follow my latest articles, you can also follow the public number:

　References:

// / / OdellSwan/article/details/136887148 ISP Algorithm | Demosaic (I)
// / / OdellSwan/article/details/137246117 ISP Algorithm | Demosaic (II)
// /developer/article/1934157 ISP image processing Demosaic algorithm and related
// /p/594341024 Demosaic (ii) Hamilton & Adams Interpolation Algorithm
// /p/144651850 Understanding ISP Pipeline - Demosaicking
// /feiyanjia/article/details/124366793