Location>code7788 >text

k8s calico

Popularity:490 ℃/2024-08-27 20:58:10

What is calico.

Calicois a pure Layer 3 network scheme based on BGP, which assigns a routable IP to each container (pod).No unpacking and unpacking required for communicationThe Calico network is powerful and can be integrated with istio.Calico IPIP mode is similar to Vxlan, and is also implemented through network tunneling technology, theThe difference with Vxlan is that VXLAN is essentially essentially a UDP packet, whereas IPIP encapsulates the packet on its own packet.It is in fact the use of the Linux tun/tap device, the IP layer of the message and then add a layer of IP layer encapsulation to achieve a kind of overlay mode. Because IPIP mode is more than BGP mode, there is an additional layer of encapsulation and decapsulation, so there will be a loss of performance.In that case, why not just use BGP mode?Because BGP mode is required to exchange routing information for the container network through route broadcasting, which can only be done in the LAN, in BGP mode, if the worker nodes in the kubernetes cluster are not in the same subnet, the PODs on the worker nodes across the subnets will not be able to communicate properly.Below is a detailed explanation of the principles of calico's two network modes, as well as their respective advantages, disadvantages and application scenarios

Two models of calico:

ipip mode:

Calico's ipip mode is a communication mode for pods whose node nodes are in different network segments, i.e., the node nodes where the pod nodes are located are not in the same Layer 2 network, and have different VNI tags in their headers in the VXLAN.

BGP mode:

Calico's BGP mode is a communication mode for pods whose nodes are on the same network segment, treating each node as a router.(configure routing rules between nodes on the node node).Each node node maintains a route forwarding table to other nodes (maintained by felix, BIRD broadcasts and forwards the routing table to other nodes)

 

 

Design Ideas for Calico Network Modeling:

 

Same-segment forwarding principle ((BGP mode):

 

Let's look at the two physical machines in the diagram. Their physical NICs are inside the same Layer 2 network. Since the two physical machines have different container network segments, we can absolutely configure the two physical machines as routers and configure the routing table according to the container's network segments.

 

For example, in the physical machine In A, we can configure it in such a way that to access the network segment 172.17.9.0/24, the next hop is 192.168.100.101, i.e. to physical machine B.

 

This way, in the container A to access container B. When the packet arrives at physical machine A, it will be able to match this routing rule and send the packet to the next-hop router, i.e., to physical machine B. There is also a routing rule on physical machine B. To access 172.17.9.0/24, you can just go in from docker0's NIC.

 

When the container When B returns the results, on physical machine B, you can do a similar configuration: to access network segment 172.17.8.0/24, the next hop is 192.168.100.100, that is, go to physical machine A.

Principle of forwarding on different network segments (IPIP mode):

Cross-segment access issues

Another issue with the Calico mode above is the cross-segment issue, where cross-segment means physical machine cross-segment.

What we said earlier about the conditions for the logic to hold is that we are assuming that the physical machine can be used as a router. For example, Physical Machine A has to tell Physical Machine B that you want to access 172.17.8.0/24, and the next hop is my 192.168.100.100; similarly, Physical Machine B has to tell Physical Machine A that you want to access 172.17.9.0/24, and the next hop is my 192.168.100.101.

This is possible because Physical Machine A and Physical Machine B are on the same network segment and are connected to the same switch. What if Physical Machine A and Physical Machine B are not on the same network segment?

 

For example, the network segment of physical machine A is 192.168.100.100/24, and the network segment of physical machine B is 192.168.200.101/24, so that the two machines can not be connected through the Layer 2 switch, and you need to put a router in the middle to do a routing forwarding, in order to access across network segments

Originally, physical machine A should tell physical machine B, you want to access 172.17.8.0/24, the next hop is my 192.168.100.100, but in the middle of an additional router, the next hop is not me, but the middle of the router, this router's then the next hop, is me. This way the previous logic doesn't hold.

Let's look at the bottom half of that diagram just now. The container on physical machine B wants to access the container on physical machine A. The first hop is physical machine B with IP 192.168.200.101, the second hop is the right network port of the physical router in the middle with IP 192.168.200.1, and the third hop is physical machine A with IP 192.168.100.100.

This is what we see through the topology diagram, the key question is, in the system, how does physical machine A tell physical machine B, how to make it to me? Physical machine A simply can not know from the physical machine B out of the next hop is who, and now only a router in the middle of the interval between such a simple situation, if more than one router? Who can tell Physical Machine B the path of this string?

The first way we can think of is to have all the routers in the middle adapting Calico. where they were telling each other about the routes, and only telling each other about the physical machines, now they have to tell the container's network segment as well. This is not possible in most cases.

The second way, or in the physical machine A and physical machine B between a tunnel, this tunnel has two endpoints, the endpoints on the encapsulation, the container's IP as a passenger protocol inside the tunnel, while the physical host's IP is placed on the outside as the carrier protocol. In this way, no matter how many hops the IP of the outer layer travels through the traditional physical network to reach the target physical machine, from both ends of the tunnel, it seems that the next hop of Physical Machine A is Physical Machine B, so that the previous logic can be established.

calico architecture diagram:

The role of each component in BGP mode:

Felix Role:Calico Agent, the agent process running on each Host, is mainly responsible for network interface management and listening, routing, ARP management, ACL management and synchronization, status reporting, etc., to ensure network interoperability across host containers.

BGP Client (BIRD) Role:The role of Calico is to listen to the routing information injected by Felix on the Host and then broadcast it to the remaining Host nodes via the BGP protocol to interoperate with the network.

BGP Route ReflectorIn large network scale, if only use BGP client to form mesh network interconnection scheme will lead to scale limitation, because all nodes between the two two interconnections, need N^2 connections, in order to solve this scale problem, you can use BGP's Router Reflector method, so that all the BGP Client interconnect only with a specific RR node and do route synchronization, thus greatly reducing the number of connections. Router Reflector method can be used to make all BGP Clients only interconnect with specific RR nodes and do route synchronization, thus greatly reducing the number of connections.

CAdvantages of the alico BGP model:CalicoBGP modeis a pure Layer 3 implementation, thus avoiding the operations of packet encapsulation associated with Layer 2 schemes, with no intermediateNAT, there is no overlay, so it probably has the highest forwarding efficiency of any solutionBecause it's a package that goes straight to the nativeTCP/IP's protocol stack, and its isolation is made good because of that stack.on account ofThe TCP/IP stack provides a full set of rules for firewalls, so it is possible to achieve more complex isolation logic through the rules of IPTABLES.

Layer 2 network communication relies on the broadcast messaging mechanism, which has the same overhead as the While the number of hosts grows exponentially, the Layer 3 routing approach used by Calico completely suppresses Layer 2 broadcasts and reduces resource overhead.

In addition, Layer 2 networks use VLAN isolation, with its inherent 4096 specification limit, even though it can be solved using vxlan, introduces a new problem of tunneling overhead. Calico does not use vlan or vxlan technology, resulting in better resource utilization.

 

IPIP mode packet transmission flow:

Test environment:

A msater node, ip 172.171.5.95, and a node node ip 172.171.5.96:

Create a daemonset application with pod1 landing on the master node ip address 192.168.236.3 and pod2 landing on the node node ip address 192.168.190.203:

pod1 ping pod2:

packet-specific processes and the network devices involved:

Routing information on pod1:

Based on the routing information, ping 192.168.190.203 will match the first one. The first route means that packets going to any network segment are sent to the network administrator 169.254.1.1 and then sent out from the eth0 NIC.

Meaning of the Flags flag in the routing table:

U up indicates the current startup state

H host indicates that the route is a host, mostly the route to reach packets

G Gateway Indicates that the route is a gateway, if not it means that the destination is directly connected

D Dynamicaly Indicates that the route is a redirect message modification

M Indicates that the route has been modified by a redirect message

 

Routing information on the master node:

When the ping packet comes to the master node, thewill match to route tunl0.This route means that packets to the segment 192.169.190.192/26 are destined for gateway 172.171.5.96. because pod1 is at 5.95 and pod2 is at 5.96.So the packets are sent through the device tunl0 to the node node. 

 

Routing information on the node node:

When the node node NIC receives the packet, it finds that the outgoing destination ip is 192.168.190.203 and matches the route in red. This route means that 192.168.190.203 is the local directly connected device and the packet to the device is sent to caliadce112d250.

 

So what is that device? If you can guess what it is by this point, that means you have good networking skills. This device is the veth pair end of the spectrum. When creating pod2 calico creates a veth pair device for pod2. One end is the pod2's NIC and the other end is the caliadce112d250 we see. here's how we verify it. Install ethtool tool in pod2, then use ethtool -S eth0, to see the device number of the other end of the veth pair.

The device good number at the other end of the pod2 NIC is 18, and you can view the network device numbered 18 on the node, and you can find that the network device is caliadce112d250.

So the route on the node that sends the caliadce112d250 is actually sending it to pod2's NIC. the ping packet travels here to its destination.

Take a look at the routing information in pod2 and realize that the routing information is the same as in pod1.

As the name suggests, an IPIP network is an IP network encapsulated in an IP network.IPIP networks are characterized by the fact that all pod data traffic is sent from the tunnel tunl0 and an additional layer of transport layer packets is added here at tunl0.

Capture packets on the master NIC to analyze the process:

Open ICMP 285, pod1 ping pod2 packet, you can see that the packet has a total of five layers, in which the IP is located in the network layer there are two, respectively, the network between the pod and the network encapsulation between the hosts:

Based on the packet encapsulation order, there should be an extra layer of host-to-host packets encapsulated outside the ICMP packet of pod1 pinging pod2:

This is necessary because tunl0 is a tunnel endpoint device, and a layer of encapsulation is added to the data as it arrives to make it easier to send to the opposite end tunnel device. 

 

The specifics of two-layer IP encapsulation:

BGP mode packet transmission process:

Test environment:

When installing the calico network, the default installation is an IPIP network. In the file, change the value of CALICO_IPV4POOL_IPIP to "off" to be able to replace it with a BGP network.

The biggest difference between BGP networks compared to IPIP networks is the absence of the tunneling device tunl0. The traffic between pods in IPIP networks is sent tunl0, and then tunl0 is sent to the opposite end of the device. in BGP networks, the traffic between pods is sent directly from the NIC to the destination, which reduces the tunl0 link.

Routing information on the master node. From the routing information, there is no tunl0 device.

Again create a daemonset with pod1 on the master node and pod2 on the node.

The packet specific process is as follows:

pod1 ping pod2。

Based on the routing information in pod1, ping packets are sent to the master node through the eth0 NIC.

Routing information on the master node. According to the matched 192.168.190.192 route, the route means that the packet going to segment 192.168.190.192/26 is sent to segment 172.171.5.96. and 5.96 is the node node. So, the packet is sent directly to node 5.96.

routing information on the node node. Based on the matched route to 192.168.190.192, the data is sent to the cali6fcd7d1702e device, which is the same as analyzed above, for one end of pod2's veth pair. The data is then sent directly to pod2's NIC.

After pod2 responds to the ping packet, the data arrives on the node node and matches the route to 192.168.236.0, which says: data going to the network segment 192.168.236.0/26 is sent to the gateway 172.171.5.95. The packet is then sent directly through the NIC ens160, to the master node.

By grabbing packets on the master node and looking at the traffic that passes through, sift through the ICMP and find the pod1 ping pod2 packets.

You can see that under the BGP network, IPIP mode is not used and the packets are encapsulated normally.

It is worth noting the encapsulation of the mac addresses. 192.168.236.0 is the ip of pod1, and 192.168.190.198 is the ip of pod2. while the source mac address is the mac of the master node's NIC, and the destination mac is that of the node node's NIC. this suggests that, after the master node's routing receives the data and reconstructs the packet, uses an arp request to get the node node's mac and then encapsulates it to the data link layer.

Comparison of the two models:

There are two main differences:

mode applies to communication between different Pods on node nodes between the same network segments, while IPIP mode applies to communication between different network segments

The mode incarnates the node where the container is located as a router (vRouter), which provides the function of routing, and distributes the routing rules through the BGP protocol, and then forwards the packets to the destination through the routing rules on the router. In this process, there is no packet sealing and unpacking of IPIP mode tunnels, just pure route forwarding, and the performance will be much better.