One practice: camera calibration for your own cell phone camera

1. Introduction of the issue
2. Preparatory work
- 2.1 Calibration field
- 2.2 Camera shooting
3. Fundamentals
- 3.1 Imaging Principles
- 3.2 Aberration correction
4. Calibration solutions
- 4.1 Code Implementation
- 4.2 Detailed Analysis
  - 4.2.1 Solution realization
  - 4.2.2 Extraction points
- 4.3 Solution results
5. Additional questions

1. Introduction of the issue

I have to say, now the computer vision technology has been developed to a mature enough stage, I remember when I first worked, the camera calibration is still a very mysterious technology, only a few professionals can do, the Internet can not find any relevant information. But now the camera calibration has been a very common technology, there are a lot of information can be referred to, so the author had a sudden thought, since those large parts of the camera can be calibrated, then we use the cell phone camera must also be calibrated. Therefore, the author will record to their own cell phone camera specific practice, is to make up for the year did not learn the regret of the technology, after all, to learn a technology is the best way to personally practice.

2. Preparatory work

2.1 Calibration field

I have seen a number of formal calibration yards, some of which are very large, with many vertical strips with markers and tracks for moving camera equipment. However, the more popular and least expensive method is to use a checkerboard calibration board, also known as Zhang Zhengyou calibration method.

So where to get the checkerboard grid calibration board? Printed on paper is a way, but there may be two problems, one is to print the size of each checkerboard needs to be converted from pixels to metric units, which may not be a whole number; the other is to find a wall to paste up, to paste the smooth and flat is still quite difficult. Therefore, I did not choose this approach, and finally through online shopping to find the calibration plate. As it is to the cell phone camera sensor size are not too large, the calibration plate do not have to choose too large, I ultimately selected the calibration plate size is shown below:

图1：标定板尺寸

Each lattice is 5mm, a total of 12X9 lattices, the overall size is still relatively small, about the size of a palm. Material is a glass substrate, the cost is about 50 dollars or so. This size of the author's actual experience is still a little on the small side, but then the cost of large up, it is recommended that students with financial resources can properly choose a little larger.

2.2 Camera shooting

The next step is to take a picture of the checkerboard grid calibration board with the cell phone camera. Theoretically, only 6 sets of control points are needed to perform the calibration solution, but because the recognized control points are in error, multiple sets of points are needed to perform the solution to improve the accuracy. The control points obtained by taking only one photo are also not quite enough, and it is usually necessary to obtain control points on multiple photos to avoid the problem of local optimization and to improve the reliability of the solution process. If possible, use multiple viewpoints and multiple photos of calibration plates at different distances, and it is also desirable to ensure that the calibration plates cover different regions of the entire image plane, so that aberrations and other parameters can be better estimated.

Here I have taken 6 pictures of the checkerboard grid calibration board with 6 different positions and viewpoints as follows: front, back, left, right, top and bottom:

图2：拍摄的棋盘格图片

You can see that the calibration plate area for the shot is all too close to the center, but it can't be helped, the size of the calibration plate used is indeed a bit on the small side. Once you get very close to the camera, the phone's photo program will automatically switch to close-up shooting. I'm not sure if switching to close-up shooting will modify the camera's parameters, so I didn't get very close. But too far away from the photo and a little paste, can only use the current effect.

Nowadays, many cell phones take photos with features that automatically correct the photo, such as filters, wide-angle correction, etc. These features are turned off or not used as much as possible. In addition, do not focus during the photo process, specifically the camera will have 0.6x, 1x, 2x, 3x such parameters, which represents the zoom magnification, use the original magnification (1x) to calibrate can be. The autofocus function should of course also be turned off, keeping the lens and the focal plane in the same position.

Another question is whether it's better to keep the phone still and move the calibration plate to take the photo, or to keep the calibration plate still and move the phone to take the photo? It should be said that both can be realized in principle, but the calibration plate does not move, the camera moves a little more common, because the realization is more simple. The author is the checkerboard grid calibration plate through double-sided tape on the wall to achieve, is also composed of a miniature calibration field of the lowest cost.

In fact, I have also tried to put the calibration plate on the desktop to shoot, but it is easy to shoot indoors in the photo shadow, or fixed on the wall is better. And it is best to put a better light on the wall, in the daytime when the sunshine is sufficient to shoot, in order to get the best shooting results.

3. Fundamentals

3.1 Imaging Principles

Although the camera calibration solves the internal reference, but in fact, even the external reference is also solved, because the camera calibration solves the camera imaging principle, the process of internal reference and external reference will be involved in the solution together. In the case of no consideration of aberrations, the camera's imaging principle can be expressed in the following formula (1):

\[s \begin{bmatrix} u\\ v\\ 1\\ \end{bmatrix} = K \begin{bmatrix} R|t\\ \end{bmatrix} \begin{bmatrix} X_w\\ Y_w\\ Z_w\\ 1\\ \end{bmatrix} \tag{1} \]

In this equation:

\({\begin{bmatrix}X_w & Y_w & Z_w\\\end{bmatrix}}^T\)Represents a three-dimensional point in world space, also known as an object square point.
\({\begin{bmatrix}u & v\\\end{bmatrix}}^T\)Represents the coordinates of a pixel in the image plane, also known as the image point.
\(\begin{bmatrix}R|t\\\end{bmatrix}\)is the camera's external reference matrix. Specifically, it is the combination of the rotation and translation transformations of the\(R\)It's the 3x3 rotation matrix.\(t\)Then it is a 3-column dimensional vector. Since rotational transformations can be expressed in terms of Euler angles, and thus also as 3-dimensional vectors. 3 rotations, 3 translations, this is where the camera's 6 external parameters come from.
\(K\)is the internal reference matrix of the camera, which is usually expressed as the following equation (2).

\[K = \begin{bmatrix} f_x & 0 & c_x\\ 0 & f_y & c_y\\ 0 & 0 & 1\\ \end{bmatrix} \tag{2} \]

\(f_x\)cap (a poem)\(f_y\)Focal lengths in pixels in the horizontal and vertical directions, respectively.
\(c_x\)cap (a poem)\(c_y\)are the coordinates of the image principal point (i.e., the intersection of the optical axes of the imaging plane) in pixels.
\(s\)is the scale factor, this parameter is to realize the conversion of chi-square coordinates, will its sub-three-dimensional coordinates need to be converted to two-dimensional coordinates.

In the author's insight, the above camera imaging principles actually have similarities with some knowledge from other disciplines:

Computer Graphics. The geometric transformations in graphics rendering include model, view and projection transformations, which together are commonly known as the MVP matrix. Model transformations include rotation and translation transformations, and view transformations are inverse transformations of model transformations, which correspond to the external reference matrix in equation (1).\(\begin{bmatrix}R|t\\\end{bmatrix}\). The projection matrices are different, however, and the inner reference matrix of Eq. (1)\(K\)is to convert points from the camera coordinate system to the image coordinate system, and the projection matrix in graphics rendering is to convert points from convert points from the camera coordinate system to the cropping coordinate system.
Photogrammetry. In photogrammetry, this set of formulas for the imaging principle is summarized as the covariance equation, and apart from the different forms of representation, the most notable difference is that there are only three internal references: the focal length and the two-dimensional coordinates of the image principal point. This formula is personally not very intuitive, but it is easier to perform parity calculations.

If the reader has the experience of both of the above, can be compared to understand, although they seem to be a little different, but the author is sure that they are the same principle, are based on the geometric transformation of space, but only should be for different situations have different descriptions.

3.2 Aberration correction

The above imaging principles do not take into account the effects of aberrations. Why does aberration occur? Quite simply, camera lenses are not perfectly flat optics, and light rays undergo complex bending as they travel, which can cause straight lines in an image to distort at the edges of the image. Common types of aberrations are radial and tangential aberrations.

Aberration correction seems to be very mysterious, but in fact, to put it bluntly, it is also very simple, we only need to understand a little bit, the aberration correction using the model of rational function. The so-called rational function model is to use a higher order polynomial (of the form\(y=ax^3+bx^2+cx+d\)) to be represented, there is no physical principle, it is a purely mathematical approach to fitting, and finally the coefficients (a, b, c, d) of each higher-order term are obtained.

Given that aberration correction adds complexity to the calibration solution, it will not be discussed further here. For beginners, it is a little more critical to understand equation (1) of the imaging principle.

4. Calibration solutions

4.1 Code Implementation

Using the basic principles introduced above can be calibrated solution, but the solution is more complex, we still combined with a specific implementation to explain, the code is shown below, where the main use of the OpenCV library:

#include <filesystem>
#include <iostream>
#include <opencv2/>
#include <vector>

#ifdef _WIN32
#include <>
#endif

using namespace cv;
using namespace std;

int main() {
#ifdef _WIN32
  SetConsoleOutputCP(65001);
#endif

  vector<std::filesystem::path> imgPaths = {
      "C:/Work/CalibrateCamera/Data/",
      "C:/Work/CalibrateCamera/Data/",
      "C:/Work/CalibrateCamera/Data/",
      "C:/Work/CalibrateCamera/Data/",
      "C:/Work/CalibrateCamera/Data/",
      "C:/Work/CalibrateCamera/Data/"};
  size_t imageNum = ();

  // Define the size of the checkerboard grid (Inside Angle Points)
  int boardWidth = 11; // columns
  int boardHeight = 8; // row
  cv::Size boardSize(boardWidth, boardHeight);

  double cellSize = 0.005;

  Size imageSize(3072, 4096); // image size

  // Prepare the object and image points for calibration.
  vector<vector<Point3f>> objectPoints(imageNum); // Multiple images of3Dobjective
  vector<vector<Point2f>> imagePoints(imageNum); // Multiple images of2Dpixelated point

  for (size_t ii = 0; ii < imageNum; ++ii) {
    // Load checkerboard image
    cv::Mat image = cv::imread(imgPaths[ii].string().c_str());
    if (()) {
      std::cerr << "Error: Could not load image!" << std::endl;
      return -1;
    }

    // Storing corner coordinates
    std::vector<cv::Point2f> corners;

    // Convert image to grayscale
    cv::Mat grayImage;
    cv::cvtColor(image, grayImage, cv::COLOR_BGR2GRAY);

    // Finding the corner of the checkerboard grid
    // cv::CALIB_CB_ADAPTIVE_THRESH | cv::CALIB_CB_NORMALIZE_IMAGE
    bool found = cv::findChessboardCorners(grayImage, boardSize, corners,
                                           cv::CALIB_CB_FAST_CHECK);

    // If we find the corners.，Further processing
    if (found) {
      std::cout << "Chessboard corners found!" << std::endl;

      // Increase the accuracy of corner points
      cv::cornerSubPix(
          grayImage, corners, cv::Size(11, 11), cv::Size(-1, -1),
          cv::TermCriteria(cv::TermCriteria::EPS + cv::TermCriteria::MAX_ITER,
                           30, 0.001));

      // Plotting Angle Points
      std::string cornerImgPath = imgPaths[ii].parent_path().generic_string() +
                                  "/corner/" + imgPaths[ii].stem().string() +
                                  "_corner" + imgPaths[ii].extension().string();
      cv::drawChessboardCorners(image, boardSize, corners, found);
      cv::imwrite(cornerImgPath.c_str(), image);

      cout << () << endl;
      imagePoints[ii].resize(());

      for (size_t ci = 0; ci < (); ++ci) {
        imagePoints[ii][ci] = corners[ci];
      }

      objectPoints[ii].resize(());
      for (int hi = 0; hi < boardHeight; ++hi) {
        for (int wi = 0; wi < boardWidth; ++wi) {
          int ci = hi * boardWidth + wi;
          objectPoints[ii][ci].x = cellSize * wi;
          objectPoints[ii][ci].y = cellSize * hi;
          objectPoints[ii][ci].z = 0;
        }
      }
    } else {
      std::cerr << "Chessboard corners not found!" << std::endl;
    }
  }

  // Internal reference matrices and distortion coefficients
  Mat cameraMatrix = Mat::eye(3, 3, CV_64F); // Initialize to unit matrix
  Mat distCoeffs = Mat::zeros(8, 1, CV_64F); // Initialized to zero

  // Rotation and displacement vectors of the external reference
  vector<Mat> rvecs, tvecs;

  // Perform calibration
  double reprojectionError =
      calibrateCamera(objectPoints, imagePoints, imageSize, cameraMatrix,
                      distCoeffs, rvecs, tvecs);

  cout << u8"re-projection error：" << reprojectionError << endl;
  cout << u8"internal reference matrix：" << cameraMatrix << endl;
  cout << u8"distortion factor：" << distCoeffs << endl;

  return 0;
}

4.2 Detailed Analysis

4.2.1 Solution realization

The code implements this in a very simple step, by means of the functionfindChessboardCornersExtract the corner points of the checkerboard grid image and pass them into thecalibrateCamerafunction, the final result of the solution, the inner reference matrix, is obtained. The key to this lies in thecalibrateCameraThis function, we can look at its function prototype:

CV_EXPORTS_W double calibrateCamera( InputArrayOfArrays objectPoints,
                                     InputArrayOfArrays imagePoints, Size imageSize,
                                     InputOutputArray cameraMatrix, InputOutputArray distCoeffs,
                                     OutputArrayOfArrays rvecs, OutputArrayOfArrays tvecs,
                                     int flags = 0, TermCriteria criteria = TermCriteria(
                                        TermCriteria::COUNT + TermCriteria::EPS, 30, DBL_EPSILON) );

Its parameters are detailed below:

objectPoints: the set of coordinates of object points in 3D space, i.e., Equation (1) of the\({\begin{bmatrix}X_w & Y_w & Z_w\\\end{bmatrix}}^T\). Since it is a collection of multiple sets of points from multiple images, its type is actuallystd::vector<std::vector<cv::Point3f>>。
imagePoints: the set of pixel coordinates in the image, corresponding to equation (1) in the\({\begin{bmatrix}u & v\\\end{bmatrix}}^T\)The type should likewise be a double arraystd::vector<std::vector<cv::Point2f>>。
imageSize: the size (width and height) of the input image in pixels.
cameraMatrix: the output camera internal reference matrix, which is the one in equation (1)\(K\), for a 3X3 matrix.
distCoeffs: the distortion coefficients of the output camera, usually a 1X5 or 1X8 vector containing radial and tangential distortion coefficients.
rvecs: the set of output rotation vectors, which can be converted to Eq. (1) in the\(R\). Each rotation vector corresponds to an image, so the types arestd::vector<cv::Mat>。
tvecs: the set of output translation vectors, corresponding to equation (1) in the\(t\). Each translation vector corresponds to an image of the typestd::vector<cv::Mat>。
Return Value: The reprojection error of the calibration, used to measure the accuracy of the calibration result. The smaller the error, the more accurate the calibration result.

This is accomplished through a review of thecalibrateCameraThe analysis of the function, I believe that the reader will easily understand why the author wants to talk about the imaging principle of formula (1) first. The input and output of this solution parameter are all based on formula (1), but another question comes, how do the input object square points and image square points come from?

4.2.2 Extraction points

The answer is simple: the corner points on the checkerboard grid. The checkerboard grid consists of black and white squares, so its corner points are easy to extract; on the other hand, the checkerboard grid is also regular, as long as the size of each square is the same, it is easy to know the object coordinates. Theoretically, as long as the image to extract the corner points, and then eliminate the non-checkerboard corner points can be used as the camera calibration of the image point. However, OpenCV provides an interface that goes further than thatfindChessboardCornersIf you enter the number of points in the inner corner of the checkerboard grid, the image points can be detected automatically. The following figure shows the image points extracted by the author on a picture:

图3: 提取棋盘格的角点作为像点

As shown in the figure above, thefindChessboardCornersThe extracted points are the inside corners, for example, for a 12X9 checkerboard grid, the extracted inside corners are 11X8 points, and the results are sorted from left to right, top to bottom. Why is it sorted this way? Because it's easy to help us figure out the object square points. In this application of camera calibration, the camera's external reference is unimportant, so we can just take the upper left corner of the checkerboard calibration board as the origin of the world coordinate system, the coordinates of the 1st point are (0,0,0), the coordinates of the 2nd point are (0.005,0,0), the coordinates of the 3rd point are (0.010,0,0)... The coordinates of the 12th point are (0,0.005,0), the coordinates of the 13th point are (0.005,0.005,0)... And so on and so forth to get the coordinates of the world space coordinate system corresponding to all the corner points.

Another point to remind readers is thatfindChessboardCornersHere I configured the parameters are cv::CALIB_CB_FAST_CHECK, a fast algorithm, cv::CALIB_CB_ADAPTIVE_THRESH and cv::CALIB_CB_NORMALIZE_IMAGE will preprocess the image, which can increase the robustness of extracting the corner of the checkerboard grid. However, I actually used them and found the program stuck, I don't know if it's very inefficient or a problem with OpenCV, so I didn't use these two options.

4.3 Solution results

The final result of the author's settlement is shown below:

Reprojection error: 0.166339
Internal Reference Matrix: [2885.695162446343, 0, 1535.720945173723;
 0, 2885.371543143629, 2053.122840953737.
 0, 0, 1]
Aberration coefficient: [0.181362004467736.
 -3.970106972775221;
 0.0005157812878172198.
 0.0004644406171824815.
 23.559069196518]

The resulting reprojection error of the solution is 0.166339, indicating that each object point has an error of 0.166339 pixels from the actual detected corner position when reprojected onto the image. In general, such an error is considered to be very small, indicating that the calibration results are relatively accurate.

However, I also consider a question, the error is 0.166339 pixels, so how many meters? When I used to do mapping software, the result of leveling is also in pixels, and there are always customers who ask me: how many meters is it? This time I also paid attention to this issue, I think that in the camera calibration of such application scenarios, it is indeed impossible to directly use the physical units to indicate the accuracy, because the results of this algorithm lies in the reprojection of the pixel difference on the image as a measure, which is different from the camera external reference to the orientation of the error measure.

Against the inner reference matrix, the focal length of the solution can be obtained as\(f_x=2885.695\)cap (a poem)\(f_y=2885.372\), the units are also pixels. So how many meters is this focal length converted to physical units? According to the information I found, the formula for converting the focal length between pixels and millimeters is shown below:

\[Focal Length (mm) = \frac{Focal Length (pixels) x Sensor Size (mm)}{Image Resolution (pixels)} \]

That is to say it has to do with the camera sensor size, but the description of the sensor size is a bit egregious, for example, the internet shows that the sensor in my phone's camera is 1/1.49 inches, which usually indicates the diagonal length of the sensor. You can figure out the physical size of the camera sensor by adding the diagonal length to the aspect ratio (e.g., 4:3 or 16:9), which in turn tells you the size of the focal length value in specific physical units. However, there are differences between the nominal diagonal length of the sensor and the true physical size due to industry practices and historical standards, so the math may not always be correct, and it's best to contact an official to be sure. However, the focal length of the calibrated pixel unit is already enough to subsequently meet the subsequent use of the scene, the author here is also looking for the root of the matter.

5. Additional questions

Finally, let's add some questions that weren't taken care of or weren't understood for a while:

Regarding the internal reference matrix part of equation (1) listed in the imaging principle, in fact, the author didn't figure out why the focal length is divided into the x-direction in the\(f_x\)and in the y-direction\(f_y\)Some sources do not list the internal reference matrix in this way, and the co-linear equations listed in the Photogrammetry textbook have only one focal length value.\(f\)。
I seem to remember that there is an operation called "Camera Recalibration" that allows you to change the focus of a camera that is using a fixed focal length.\(f\), adjusting the image principal point to the image center, as well as eliminating the reprojection of distortions, can simplify the subsequent spatial calculations and make them more convenient. Time constraints will be left for subsequent research.
In this paper, the author does not specifically explain the algorithmic principles of the solution, because it is not one or two sentences can be clear, there is a process in surveying and mapping there is a special term called leveling; or called state estimation, maximum likelihood estimation, nonlinear optimization, etc., at least we need to know the principle of least squares in order to continue to discuss this, leave it to the subsequent articles to discuss it.

List some articles for reference:

Camera Calibration: From Beginner to Hands-On
Camera Series--Camera Calibration in Brief
Camera calibration of Zhang Zhengyou calibration method of mathematical principles explained in detail
Computer Vision ---- Camera Calibration

Source code and data address for this article