Some Introduction to Computer Graphics (I) Fundamentals and Spatial Transformations

write sth. upfront

The author started a new series "Wgpu Graphics in Detail" some time ago, and in the process of writing it, I realized that using wgpu is only at the application level. To introduce wgpu well, some theoretical knowledge in graphics had to be explained. But putting it in the series "Wgpu Graphics Explained" is a bit noisy, so I decided to use another series separately to place some content about graphics. In addition, the content of this series is not written in a certain order, but to think of which points to introduce some, is considered to be with the popularization of knowledge notes.

The author is not a graphics professional, but only the current understanding of some of the content organized into an article, if there is a problem with the content, please readers also point out.

The content of this article is mainly an introduction to the concept, so no code will be generated.

Foundations of the three-dimensional world

We all know that two points form a line and three lines (three points) form a plane. So, suppose there are such three points in three-dimensional space:(0, 1, 0)、(-1, 0, 0)、(1, 0, 0), by simply thinking about them, we know that they can form a triangle with no thickness, but of course, we can use more points to form more faces to create a more three-dimensional object:

010-things

For a three-dimensional object, we generally "perceive" it in two stages.

The first stage is to "create" the object. We first define some "vertices" somewhere in space; then, by connecting these "vertices" two by two in a straight line, we get the frame of the object; finally, we "paste" a layer of "paper" on each side of the frame, and we get an opaque three-dimensional object. "Finally, we put a layer of paper on each side of the frame, and we get an opaque three-dimensional object.

020-cube

The second stage is to "observe" the object. In the first stage, we simply create a three-dimensional object that exists somewhere in space. In order for us to perceive it, we either look at it with our eyes at some angle, or we use a camera to capture it and present it in a photograph. Either way, we find that we are "projecting" a three-dimensional object in space onto a "surface", i.e., we are "downscaling" a three-dimensional object onto a two-dimensional surface.

Of course, the three-dimensional object projected onto a two-dimensional plane, there are generally two projection methods: orthographic projection, perspective projection, the biggest difference between the two is that perspective projection needs to take into account the imaging mechanism of near big and far small. For example, in the cube scene above, when we stand on the z-axis and look down at the cube, orthographic projection and perspective projection will have different effects:

030-projection

As you can see from the diagram above, perspective projection is clearly the projection that best fits our common perception. With a basic understanding of the three-dimensional world, let's move on to an introduction to some aspects of graphics.

Elements of Graphics

In the previous section, we briefly described how a three-dimensional object can be constructed and observed in the real three-dimensional world. In computer graphics, the process is not completely removed. In computer graphics, there are points, lines, surfaces, and ultimately "projections" that are rendered on the screen. In this section, we'll take a closer look at some of the important concepts of computer graphics and their central role.

vertex

What are Vertices? When the reader first recognizes the word "vertex", he or she may think that "vertices" are the "points" that we define in the first step of the construction process of the three-dimensional object above. This is true, but not entirely true, because points on a geometric object arespecializationon the vertices, which represent only some positions in three-dimensional space.

040-only-points

The vertex in computer graphics, on the other hand, is an ensemble of data that contains much more, including but not limited to the point's: position coordinates, color information, and so on (in order to avoid too much confusion on the part of the reader, we'll only mention the more understandable two attributes first). In other words, a vertex in geometry is almost equivalent to a position coordinate, while a vertex in computer graphics can contain other information about the color of the point in addition to the position coordinate.

050-vertex-more-info

The reader must remember that from now on, whenever we talk about "vertex", we mean the whole vertex data with position, color, and other additional information, and if we are just describing the position, we must use the full name: "vertex position".

What is the significance of vertices containing color data? The author in the first contact with computer graphics, can naturally understand the vertex contains positional coordinate data, but for the inclusion of color information (or even normal information, etc.) is puzzled, until later more and more understanding, and then gradually understand the mystery of this. Of course, here the author first sold a secret, until the subsequent introduction of the graph element, slice element is completed, and then back to be able to understand very well.

Primitive

In computer graphics, Primitives are the basic elements that make up an image, and they are the most fundamental geometric shapes in the graphics rendering process. Primitives can be points, lines, polygons (e.g., triangles or quadrilaterals), and so on.

assembly of components

To get a graph element, we need to take the "vertices" introduced in the previous section as inputs and perform a series of operations to get the graph element.assembly of components. Let's use a more visual example to understand the process.

Suppose there are now three points:(0, 1)、(-1, 0)as well as(1, 0). What graphs can be formed from these three points? We need to look at it in a different way:

Points: Each vertex is rendered as a separate point.
Line: Two consecutive vertices form a line segment.
Triangle way (Triangles): every three vertices form a triangle.

The following figure shows the graphical results obtained by combining the above three points in the three ways mentioned above:

060-point-line-face

There is much more to element assembly than just the three mentioned above, and there are other forms of assembly depending on the vertex data

So, the reader should now be able to understand that the core logic of tuple assembly is to put the n vertices through some kind of tuple assembly to get the final graph (different assembly methods are often accompanied by different algorithmic logic).

Of course, in the generation of graphical elements, but also contains a number of operations, but in order to help the reader better understand, we temporarily put aside, later unified explanation.

Fragment

Before introducing slices, we need to mention an operational concept: rasterization. Rasterization is the process of converting geometric data into pixels after a series of transformations and rendering them on a display device. A common display device consists of physical pixels that form a complete screen with certain width and height values. In other words, the pixels on the screen are not "continuous". However, our images are "continuous", which means that for geometric shapes, a line, especially a non-horizontal and non-vertical line, each point on the line always needs to be approximated to get the coordinates of its physical pixel on the screen.

Let's take the example of presenting a triangle. Suppose we now have the following triangle, which has no anomalies in its hypotenuse from a geometric point of view:

070-a-triangle However, our physical device pixels are integer-valued and finite. Assuming a screen with a resolution of 20x20, in order to render the beveled edges, we may need to find the corresponding pixel point to fill in the color in the following form:

080-rasterization-triangle

Notice that the author has chosen 1 point on the geometric triangle with geometric coordinates of(0.5, 0.5)The screen coordinates are the same as the screen coordinates on a 20x20 screen. On a 20x20 screen, the screen coordinates are(10, 10)。

Rasterization logic is for each geometric figure on the "point", in the screen device to find the corresponding pixel process. For the implementation of rasterization, it is not in the scope of this article, for this piece of interest in the students can consult the relevant information for in-depth study.

After a brief look at rasterization, let's get back to the heart of this section: the fragment, which is actually one or more pixel-sized samples of the primitive after it has been rasterized. Two things are worth noting here:

Despite being calledwafer (math.), but usually refers to one or many pixel-sized units less. That is, a tuple is a whole geometric figure that is rasterized to be broken down into multiple tuples.
The wafer obtained after rasterization is justproximityPixel points, but not exactly the same as pixel points. A slice is a collection of data related to a pixel that is to be processed, including information such as color, depth, texture coordinates, etc. (depth and texture coordinates, etc. are first simply understood as some additional data, which will be explained later). This place is a bit similar to what we said earlier, the vertices in graphics, not simply the geometry of the vertices, but a collection of data containing vertex position, color information, etc..

A flake is not a pixel, it's just close to a pixel, so there is usually a step to further process the flake so that it is eventually converted to a pixel on the screen for rendering (which is basically a dot with rgba color).

At this point, we have a brief overview of the three elements in graphics: vertices, tuples and slices. Originally, the next content should introduce the rendering pipeline. However, after thinking about it, I always think that it is not intuitive enough to move out some concepts directly, and beginners will be easily persuaded to leave. So before introducing the rendering pipeline, I decided to introduce the spatial transformation in graphics.

Spatial transformations in graphics

Model space and world space

Suppose we have made a cube with side length 2, as shown below:

090-simple-cube-model

At this point, this cube we created in the current coordinate system. So its points are positioned as shown above (e.g. the three points identified in the picture:(1, 0, 1)、(1, 1, -1)as well as(-1, 1, 1)(etc.).

We then place the cube into a World scene, which already has a sphere in it:

100-a-ball-in-world

To keep them from overlapping, we will take this cube and first reduce the side lengths from the original 2 to 1 unit, and then place it as follows:

110-cube-and-ball

Note that the coordinates of our cube are no longer the original coordinates of the three points ABC in the world that is coexisting with the sphere at this moment, but the coordinates of the three points in this "world" (A(3, 0.5, -0.5)、B(3, 0, 0.5)、C(2, 1, 0.5)), this process is actually the process of transforming "model space" to "world space" (coordinates).

The meaning of "model space" is that each 3D object itself is in its own spatial coordinate system (also called "local coordinate space"), and when we define the 3D data of this object, it does not depend on external sources, but is purely a space of the current object. space.

However, after we have created an object, wegeneralWe need to place it in a place with other objects to form a scene and make it more meaningful, and this place is the "world space", in this process, we usually rotate and scale an individual model object to make it more coordinated to exist in the "world space". "world space.

Observation space

When we get the coordinates in the world space, we will further transform them into the "viewing space". Before introducing the "viewing space", we need to introduce a character: the camera. Now that we have the world ready (e.g. a cube with a sphere in the scene above), we always need to "observe" it, otherwise it doesn't make sense. Therefore, we will need a "camera" like character. This camera will contain three elements: 1, the position of the camera; 2, the direction of the camera to look at the target; 3, the camera's upward approach. For the camera's position and the direction of the target is easy to understand, for the upward direction of the camera, with the following example should also be very good to understand:

120-camera-direction

In the above figure, we first place a camera somewhere in space and let it "look" at a tree, we use the blue vector to represent this direction; by using the direction of this blue vector and the position of the camera, we can determine the only plane where the blue vector is perpendicular. In this plane, we can find an infinite set of vectors that are perpendicular to each other (e.g., the red and green vectors of the left and right cameras in the figure above; we can rotate around the direction of the blue vector and get many more red and green vectors), where we would define the red vector as the right vector of the camera, and then the green vector perpendicular to the plane of the red and blue vectors would be the up vector. A difference in the upward direction of the camera will result in a different upward direction of the object captured by the image camera.

Note that the red, green, and blue vectors here are actually the same as in some tutorials (e.g., "[Camera - LearnOpenGL CN ()] (/01 Getting started/09 Camera/)") is not quite consistent. This article is just a realistic introduction to the concepts using an easy-to-understand approach, not a complete consideration of the computational side of things.

Once we have a defined camera, we need to do something like this. Do an overall displacement of the camera and the world, moving the camera to the origin, with the viewing direction coinciding with the Z-axis and the up direction coinciding with the Y-axis, while keeping the objects in the world relatively unchanged from the camera:

130-camera-transform

140-camera-transform-animation

After the move is complete, the coordinates of the original object have a new coordinate position in the coordinate space where the camera is at the origin (e.g., originally the coordinates of the topmost point of our sphere were(0, 2, 0), after transforming the world space into an observation space, it becomes the(0, 2, -2)), and this process is the transformation of "world space" into "observation space", the coordinate space where the camera is at the origin is the "observation space".

Why is there an observation space? Because the process of placing the camera at the origin is convenient for subsequent processing of the viewing projection.

After a series of operations, we have transformed a 2x2x2 cube, from model space to observation space:

150-point-transform-flow

Notice the result of a series of transformations to the left of point A of the cube above

Based on the observation space (at this point the camera is at the origin of the observation space), we will start the projection process. As introduced at the beginning, projections are generally speaking divided into two types: 1) orthographic projections and 2) perspective projections.

For the sphere and cube we created above, the camera is placed at the origin, and the results of using orthographic projection and perspective projection respectively are roughly as follows:

160-projection

It is easy to understand that after projection, we transform a three-dimensional object into a two-dimensional image. Any point in the original three-dimensional space is "projected" onto the two-dimensional space. If we consider this two-dimensional space as our monitor screen. Then it is clear that we need to "observation space" in a three-dimensional coordinates of a point through a certain way of calculation of the transformation, to get a specific location on the screen pixels.

170-projection-to-screen

For the figure above, the point in the observation space originally in theAThe coordinates will be calculated by certain context (camera distance, FOV field of view, etc.) through certain data to get theA'And thisA'are the coordinates of a pixel on the screen whose x and y are integers.

put at the end

This article is a general introduction about some basic elements in graphics, and spatial transformations. In later articles, some elements of computer graphics, such as rendering pipelines, will be introduced step by step.