I. Writing at the beginning
The subject of this article is to explore with you to learn: "enter the URL in the browser after the start of the computer to do a few things", this question is a few years ago when their own interviews, the interviewer asked the interviewer, when the preparation is very inadequate, the answer is a mess, and today take it out to organize and learn once again, together with the progress!
In fact, the question itself is not difficult, but it is clever is that we have learned the network programming knowledge to string together, the interviewer just through a question can examine the degree of knowledge we have for this part of the mastery. So today we also use this as a topic to start the formal study of network programming!
II. Underlying operational mechanisms for accessing web pages
As shown in the figure above, when we enter the search box in the browser "" and enter, the browser jumped to the home page of Baidu, the process of computers to do what operation?
1. enter the specified URL address in the browser; 2. the browser obtains the IP address of the domain name through the DNS protocol (Domain Name Settlement Protocol); 3. the browser according to the obtained IP + port, to the target server to launch a TCP connection request; 4. After three handshakes, the TCP connection is successful, the browser will send a HTTP request message on the TCP connection; 5. the server to process the HTTP request, feedback response message to the browser. 6; 6. the browser receives the HTTP response message, parses the HTML code in the response body, renders the structure and style of the web page, and at the same time, according to the URLs of other resources in the HTML (such as images, CSS, JS, etc.), initiates another HTTP request to obtain the content of these resources until the web page is fully loaded and displayed. 7. When the browser is not interacting with the server, it closes the TCP connection by waving its hand four times.
III. Parsing the bottom
In the second chapter of the operation of the browser is divided into seven small points, the next, we will explain the technology used in them, due to the length of a single article is not easy to be too long, the knowledge of which are roughly speaking, to achieve throughout the understanding can be followed up for each point of knowledge, such as TCP/UDP, HTTP, DNS, etc. to analyze individually.
3.1 URL
An input in our browser that accesses the content we want is called aURLIn English, it is Uniform Resource Locators, meaning: Uniform Resource Locators. It identifies a unique resource on the network and gives a path to locate it. Related to it is also aURI(Uniform Resource Identifier): A uniform resource identifier that uniquely identifies a resource.
URL is a specific URI, which not only uniquely identifies the resource, but also provides a location address, URI is compared to our ID number, then the URL is our specific home address.
The structure of the URL:
http:// :Hypertext Transfer Protocol, a prefix to a URL, belongs to the application layer protocols, usually HTTP and HTTPS, prefixed with ftp in file transfer URLs, and so on. : A domain name, which can also be an IP address, has a one-to-one mapping between them, except that a domain name is easier to remember. 80: port, if you specify the port to access the URL, the port will be immediately after the domain name, separated by a colon, of course, in some cases, the domain name has been configured to correspond to the default access address, there will be no port number here. /path/to/: resource path, from the domain name (port) after the / to ? before the end of this section of the path, as a specific resource to access an address, starting from the first /, that is, from the root directory on the server to start indexing to the file path, the above figure to access the file is the root directory of the server /path/to/. key1=value&key2=value2: Parameters, when http send get request, parameters will be included in the URL, with the path to ? Split start, key=value form appears, multi-parameter case, with & split, some request parameters are placed in the body, such as post. #SomewhereInTheDocument : Anchor, as the name implies, is an anchor on the page to be visited. Most of the pages to be accessed are more than one page long, so if an anchor is specified, the page will be displayed on the client side at the anchor, which is equivalent to a small bookmark. It is worth noting that anchors begin with # in the URL and are not sent to the server as part of the request.
3.2 DNS
We mentioned above the domain name and IP address mapping relationship, where the role of its decision is DNS (Domain Name System) domain name system, the specific process is as follows:
Local buffer query: When we enter a domain name in the browser, we will first check the browser cache to see if the domain name resolution already exists, and if it does, we will return the corresponding IP address, otherwise we will proceed to the next step; Local DNS server lookup: After there is none in the local cache, the browser sends a recursive query request to the locally configured DNS server, and continues to the next step if there is no hit from the local DNS server either; Root DNS server lookup: If the local DNS server also does not have the corresponding resolution result, it will send an iterative query request to the root DNS server. The root DNS server is responsible for managing the IP addresses of the top-level domain name servers, and it will return the IP addresses of the corresponding top-level domain name servers based on the information of the top-level domain name (e.g. .com); Top Level Domain Search: The local DNS server sends a query request to the top-level domain name servers and returns the IP address of the next level of domain name servers based on the information in the top-level domain name. This process queries down one level at a time until it finds the authoritative domain name server responsible for resolution; Authoritative Domain Name Search: The local DNS server sends a query request to the authoritative domain name server and obtains the IP address corresponding to the domain name. The local DNS server caches the resolution result and returns it to the browser; Returns and caches: When a browser receives an IP address back from a local DNS server, it stores it in the local cache and initiates a network request associated with that IP address.
3.3 TCP
In part 3.2, through the DNS resolution, got the IP address of the target host, the browser can send a TCP connection request to the target server, TCP protocol is a transport layer protocol, can be established on the basis of a secure connection, control the data transmission, to ensure reliability, and support two-way communication, like HTTP, HTTPS are built on top of the TCP protocol. (The most classic TCP connection is the 3 handshakes!)
Note: TCP/IP protocols are bound to see, the proposed TCP protocol, you need to send data, send data you need to send data to the network layer of the IP protocol, this protocol a packet-switched protocol, does not guarantee reliable transmission, is responsible for the packet will be routed from the source host to the target host.
3.4 HTTP
After the establishment of a TCP connection, the browser can send HTTP request messages to the target server, of course, some sites configured to enhance the security of the HTTPS protocol, the difference between the later we will talk about separately, including HTTP1.0, HTTP1.1 and other content.
3.5 Server processing and return response
After receiving the HTTP message, the server generates an HTML response based on the corresponding interface, parameters, and cookies, and returns it to the browser. Upon receiving the HTTP response message, the browser parses the HTML code in the response body, renders the structure and style of the web page, and at the same time, based on the URLs of the other resources in the HTML (such as images, CSS, JS, etc.), it launches another At the same time, according to the URLs of other resources in the HTML (such as images, CSS, JS, etc.), the browser launches another HTTP request to get the contents of these resources until the web page is fully loaded and displayed.
IV. Summary
The above is the entire network request involved in the TCP/IP four-layer model of the actual content, it is also our network programming learning in the top priority, as for the details of each of these layers, we later individual fine-tuning!