Location>code7788 >text

[Solution] Several ways to bridge the container network between different nodes via iproute2

Popularity:264 ℃/2024-11-06 02:11:04

Several ways to bridge the container network between different nodes via iproute2

Several ways to bridge the container network between different nodes via iproute2

  • host-gw
  • ipip
  • vxlan

contexts

Previously, due to the need to bridge the network of containers between different nodes, there was a need for theflannel Modified to add logic for obtaining network-related information, which in turn allows for the complete use of thebackend The function of the
I've been wanting to take this off lately.flannelSo analyze this.backend The function of using theip command to perform the simulation

flannel (used form a nominal expression)backend Seven types are provided, and only the following three types of kernel-provided functions are simulated here

  • host-gw
  • ipip
  • vxlan

matrix

Container usenetwork namespaceIsolating network-related resources, for operational simplicity and direct usenetwork namespaceas a test environment.

vm1(192.168.32.245) and vm2(192.168.32.246) are two virtual machines on a physical machine with the operating system CentOS 7.2.

The goal is to connect the 172.245.0.0/24 and 172.246.0.0/24 segments.

┌─────────────────────────────┐          ┌─────────────────────────────┐
│ vm1                         │          │ vm2                         │
│     ┌─────────────────┐     │          │     ┌─────────────────┐     │
│     │ ns245           │     │          │     │ ns246           │     │
│     │   172.245.0.2   │     │          │     │   172.246.0.2   │     │
│     │      eth0       │     │          │     │      eth0       │     │
│     └────────|────────┘     │          │     └────────|────────┘     │
│              │              │          │              │              │
│           veth245           │          │           veth246           │
│              │              │          │              │              │
│            br245            │          │            br246            │
│       172.245.0.1/24        │          │       172.246.0.1/24        │
│                             │          │                             │
│                             │          │                             │
│      192.168.32.245/24      │          │      192.168.32.246/24      │
│            eth0             │          │            eth0             │
└──────────────│──────────────┘          └──────────────│──────────────┘
               │                                        │
               └────────────────────────────────────────┘

Environment Configuration

Need to openip_forward option to allow traffic to pass over the bridge into the namespace

sysctl -w net.ipv4.ip_forward=1

Setting up the bridge

# Create a bridge
ip link add br245 type bridge

# Enable the bridge
ip link set br245 up

Creating namespaces, setting up virtual NICs

# Create a network namespace
ip netns add ns245

# Create veth-peer and set one end in netns
ip link add veth245 type veth peer name eth0 netns ns245

# Enable veth-peer in netns
ip netns exec ns245 ip link set eth0 up

# Enable veth-peer in host
ip link set veth245 up

# Mount veth-peer on host to bridge br245
ip link set veth245 master br245

Setting the NIC address and routing

# Set the bridge address
ip addr add 172.245.0.1/24 dev br245

# Set veth-peer address in namespace
ip netns exec ns245 ip addr add 172.245.0.2/24 dev eth0

# Set the default route in the namespace (a new container will also have this default route)
ip netns exec ns245 ip route add default via 172.245.0.1 dev eth0

For vm2, simply replace 245 with 246.

When the setup is complete, the NIC, address and routing information for vm1 is as follows (omitting extraneous NICs)

$ ip addr
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:ce:51:a5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.32.245/24 brd 192.168.32.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fece:51a5/64 scope link 
       valid_lft forever preferred_lft forever
9: br245: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:8d:b1:91:fa:7a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7426:b5ff:fea9:d35/64 scope link 
       valid_lft forever preferred_lft forever
10: veth245@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br245 state UP group default qlen 1000
    link/ether 02:8d:b1:91:fa:7a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::8d:b1ff:fe91:fa7a/64 scope link 
       valid_lft forever preferred_lft forever

$ ip route
default via 192.168.32.1 dev eth0 proto static metric 100 
172.245.0.0/24 dev br245 proto kernel scope link src 172.245.0.1 
192.168.32.0/24 dev eth0 proto kernel scope link src 192.168.32.245 metric 100

$ ip netns exec ns245 ip addr
2: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether be:9c:d4:57:58:5a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.245.0.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::bc9c:d4ff:fe57:585a/64 scope link 
       valid_lft forever preferred_lft forever

$ ip netns exec ns245 ip route
default via 172.245.0.1 dev eth0 
172.245.0.0/24 dev eth0 proto kernel scope link src 172.245.0.2

host-gw

Just set up a route in the vm that points to the other side

# vm1
ip route add 172.246.0.0/24 via 192.168.32.246 dev eth0 onlink

# vm2
ip route add 172.245.0.0/24 via 192.168.32.245 dev eth0 onlink

Test that 172.246.0.2 works in the namespace

$ ip netns exec ns245 ping -c 1 172.246.0.2
PING 172.246.0.2 (172.246.0.2) 56(84) bytes of data.
64 bytes from 172.246.0.2: icmp_seq=1 ttl=62 time=0.656 ms

--- 172.246.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.656/0.656/0.656/0.000 ms

The timing diagram for accessing 172.246.0.2 from 172.245.0.2 is as follows (the packet return direction is logically consistent)

172.245.0.2  ns245:eth0   br245       vm1:eth0    vm2:eth0        br246    ns246:eth0    172.246.0.2
    |           |           |             |           |             |           |             |
    |  default  |           |             |           |             |           |             |
    |  routing  |           |             |           |             |           |             |
    | --------> |           |             |           |             |           |             |
    |           | veth-peer |             |           |             |           |             |
    |           |  master   |             |           |             |           |             |
    |           | --------> |             |           |             |           |             |
    |           |           | 172.246.0.2 |           |             |           |             |
    |           |           |   routing   |           |             |           |             |
    |           |           | ----------> |           |             |           |             |
    |           |           |             |  layer 2  |             |           |             |
    |           |           |             | --------> |             |           |             |
    |           |           |             |           | 172.246.0.2 |           |             |
    |           |           |             |           |   routing   |           |             |
    |           |           |             |           | ----------> |           |             |
    |           |           |             |           |             |  master   |             |
    |           |           |             |           |             | veth-peer |             |
    |           |           |             |           |             | --------> |             |
    |           |           |             |           |             |           | 172.246.0.2 |
    |           |           |             |           |             |           | ----------> |
    |           |           |             |           |             |           |             |
172.245.0.2  ns245:eth0   br245       vm1:eth0    vm2:eth0       br246     ns246:eth0    172.246.0.2

Q&A

Q: Whyhost-gw The host where the containers are located in mode must be on the same network segment (Layer 2 interoperability)
A: Routing does not modify the source IP, after the return packet is exchanged for the source destination IP, the gateway cannot determine the interface of the packet by querying the routing table, Layer 2 interworking does not have this problem, it does not need to query the routing table, and selects the route according to the MAC address directly.

ipip

The IPIP (IP-in-IP) protocol is a network layer tunneling protocol used to transmit IP packets over one IP network to another.The main purpose of the IPIP protocol is to provide transparent communication channels between networks, allowing packets from an internal network to be transmitted over an external network without changing the contents of the packets.

The protocol format is as follows

// Raw TCP packets
+--------------------------------+
| ....              + |
+--------------------------------+ |--------------------------------+
| TCP |
+--------------------------------+
| IP |
+--------------------------------+
| Ethernet |
+--------------------------------+

// ip ip encapsulated packets
+--------------------------------+
| ....             | ipip
+--------------------------------+
| TCP |
+--------------------------------+
| IP |
+--------------------------------+
/ IP (tunnel) /
+--------------------------------+
| Ethernet |
+--------------------------------+

Setting up in vm1ipip underpass

# Create an IPIP tunnel interface
ip tunnel add tun245 mode ip ip local 192.168.32.245 remote 192.168.32.246

# Configure the IP address of the IPIP tunnel interface
ip addr add 172.245.1.1/30 dev tun245

# Enable the IPIP tunnel interface
ip link set tun245 up

# Add a route to enable IPIP tunneling for packets in the 172.246.0.0/24 segment.
ip route add 172.246.0.0/24 via 172.245.1.2 dev tun245

Accessed in vm2172.245.0.2 (Increase in visits192.168.32.245 (for comparison)

$ ping -c 1 192.168.32.245
$ ip netns exec ns246 ping -c 1 172.245.0.2

In vm1, theeth0 packetize

$ tcpdump -s0 -i eth0 host 192.168.32.246  -nn
11:28:01.268176 IP 192.168.32.246 > 192.168.32.245: ICMP echo request, id 1730, seq 1, length 64
11:28:01.268299 IP 192.168.32.245 > 192.168.32.246: ICMP echo reply, id 1730, seq 1, length 64
11:28:08.697558 IP 192.168.32.246 > 192.168.32.245: IP 172.246.0.2 > 172.245.0.2: ICMP echo request, id 1708, seq 26, length 64 (ipip-proto-4)
11:28:08.697760 IP 192.168.32.245 > 192.168.32.246: IP 172.245.0.2 > 172.246.0.2: ICMP echo reply, id 1708, seq 26, length 64 (ipip-proto-4)

More visually in wiresharkipip An extra layer of ip is used for tunneled communication.

Frame 3: 118 bytes on wire (944 bits), 118 bytes captured (944 bits)
Ethernet II, Src: 52:54:00:b9:5d:53 (52:54:00:b9:5d:53), Dst: 52:54:00:ce:51:a5 (52:54:00:ce:51:a5)
Internet Protocol Version 4, Src: 192.168.32.246, Dst: 192.168.32.245
Internet Protocol Version 4, Src: 172.246.0.2, Dst: 172.245.0.2
Internet Control Message Protocol

The timing diagram for accessing 172.246.0.2 from 172.245.0.2 is as follows (omitting the portion within the namespace, the packet return direction is logically the same)

br245        tun245      vm1:eth0    vm2:eth0      tun246         br246
  |             |           |           |             |             |
  | 172.246.0.2 |           |           |             |             |
  |  routing    |           |           |             |             |
  | ----------> |           |           |             |             |
  |             | ipip pack |           |             |             |
  |             | --------> |           |             |             |
  |             |           |  layer 2  |             |             |
  |             |           | --------> |             |             |
  |             |           |           | ipip unpack |             |
  |             |           |           | ----------> |             |
  |             |           |           |             | 172.246.0.2 |
  |             |           |           |             |   routing   |
  |             |           |           |             | ----------> |
  |             |           |           |             |             |
br245        tun245      vm1:eth0    vm2:eth0      tun246         br246

Q&A

Q: andhost-gw What's the biggest difference?
A: The network of the node where the container (network namespace) resides is no longer restricted to Layer 2 interoperability, as long as the node networks (Layer 2 and Layer 3) are interoperable.

vxlan

vxlan Designed to address the limitations of traditional vlan (virtual LAN) technology in large cloud data centers and multi-tenant environments. By encapsulating Layer 2 Ethernet frames on top of UDP, it enables Layer 2 network extension over Layer 3 (IP) networks, allowing the creation of up to 16 million isolated virtual networks, far exceeding the vlan's 4096 network limit.

The protocol format is as follows

// Raw TCP packets
+----------------------------------------+
| ....                  + |
+----------------------------------------+ |----------------------------------------+
| TCP |
+----------------------------------------+
| IP |
+----------------------------------------+
| Ethernet |
+----------------------------------------+


// vxlan encapsulated packets
+----------------------------------------+
| ....                  |
+----------------------------------------+
| TCP |
+----------------------------------------+
| IP |
+----------------------------------------+
| Ethernet |
+----------------------------------------+
/ VXLAN Header (8 bytes) /
+----------------------------------------+
/ UDP (tunnel) /
+----------------------------------------+
/ IP (tunnel) /
+----------------------------------------+
/ Ethernet (tunnel) /
+----------------------------------------+

Setting up in vm1

# establish VXLAN tunnel interface
ip link add vtep245 type vxlan id 1 local 192.168.32.245 dev eth0 dstport 8472 nolearning

# configure IP
ip addr add 172.245.0.0/32 dev vtep245

# set up MTU
ip link set vtep245 mtu 1450

# activate (a plan) vtep
ip link set veth245 up

# configure路由
ip route add 172.246.0.0/24 via 172.246.0.0 dev vtep245 onlink

The configuration in vm2 is reversed from vm1, and 245 and 246 need to be swapped.

View the virtual NIC and see the set vxlan id, NIC and local ip

root@vm1# ip -d link show vtep245
12: vtep245: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 52:3a:8c:c9:08:e8 brd ff:ff:ff:ff:ff:ff promiscuity 0 
    vxlan id 1 local 192.168.32.245 dev eth0 srcport 0 0 dstport 8472 nolearning ageing 300 noudpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

root@vm2# ip -d link show vtep246
10: vtep246: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 02:38:4a:12:3b:43 brd ff:ff:ff:ff:ff:ff promiscuity 0 
    vxlan id 1 local 192.168.32.246 dev eth0 srcport 0 0 dstport 8472 nolearning ageing 300 noudpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

Look at the listening UDP port and see that the kernel listens to the8742

$ netstat -nulp
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
udp        0      0 0.0.0.0:8472            0.0.0.0:*                           -  

The current network topology diagram is as follows

┌─────────────────────────────┐          ┌─────────────────────────────┐
│ vm1                         │          │ vm2                         │
│     ┌─────────────────┐     │          │     ┌─────────────────┐     │
│     │ ns245           │     │          │     │ ns246           │     │
│     │   172.245.0.2   │     │          │     │   172.246.0.2   │     │
│     │      eth0       │     │          │     │      eth0       │     │
│     └────────|────────┘     │          │     └────────|────────┘     │
│              │              │          │              │              │
│           veth245           │          │           veth246           │
│              │              │          │              │              │
│            br245            │          │            br246            │
│       172.245.0.1/24        │          │       172.246.0.1/24        │
│                             │          │                             │
│           vtep245           │          │           vtep246           │
│      vni:1 172.245.0.0      │          │      vni:1 172.246.0.0      │
│                             │          │                             │
│      192.168.32.245/24      │          │      192.168.32.246/24      │
│            eth0             │          │            eth0             │
└──────────────│──────────────┘          └──────────────│──────────────┘
               │                                        │
               └────────────────────────────────────────┘

Trying to access 172.246.0.2 via 172.245.0.2 on vm1, can't access it, diagnose traffic to virtual NIC via tcpdumpvtep245But I don't know the gateway.172.246.0.0 MAC address of the

$ ip netns exec ns245 ping -c 1 172.246.0.2

$ tcpdump -s0 -i vtep245 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
17:44:26.916277 ARP, Request who-has 172.246.0.0 tell 172.245.0.0, length 28

The MAC address of the gateway is already known above, so it is straightforward to set it up manually (there is an etcd in the flannel for centralized management).

# vm1
ip neigh add 172.246.0.0 dev vtep245 lladdr 02:38:4a:12:3b:43

# vm2
ip neigh add 172.245.0.0 dev vtep246 lladdr 52:3a:8c:c9:08:e8

After resolving the MAC address issue, there is another problem:vxlan How sealed packets are sent out, here using thefdb to specify the MAC based on thevxlan address to which the packet is sent

The fdb is a table of information used to store MAC addresses and their corresponding ports so that bridge devices can efficiently forward data frames.

# vm1
bridge fdb add 02:38:4a:12:3b:43 dev vtep245 dst 192.168.32.246

# vm2
bridge fdb add 52:3a:8c:c9:08:e8 dev vtep246 dst 192.168.32.245

The timing diagram for accessing 172.246.0.2 from 172.245.0.2 is as follows (omitting the portion within the namespace, the packet return direction is logically the same)

br245        vtep245      vm1:eth0    vm2:eth0      vtep246         br246
  |             |            |           |              |             |
  | 172.246.0.2 |            |           |              |             |
  |  routing    |            |           |              |             |
  | ----------> |            |           |              |             |
  |             | vxlan pack |           |              |             |
  |             | ---------> |           |              |             |
  |             |            |  udp 8742 |              |             |
  |             |            | --------> |              |             |
  |             |            |           | vxlan unpack |             |
  |             |            |           | -----------> |             |
  |             |            |           |              | 172.246.0.2 |
  |             |            |           |              |   routing   |
  |             |            |           |              | ----------> |
  |             |            |           |              |             |
br245        vtep245      vm1:eth0    vm2:eth0      vtep246         br246

Q&A

Q: Why should routes on vm1 be routed using the172.246.0.0 as the gateway address
A:

  • The gateway cannot use a local IP or an IP that is in the local network segment. traffic is not forwarded to the outside without routing;
  • 172.246.0.0 Not a usable address to identify the entire network, using other addresses would be a conflict;

summarize

Functionally:

  1. host-gw The simplest, requiring only one route to bridge the container network of different nodes, but can only be accomplished in a Layer 2 interoperability scenario
  2. ipip existhost-gw The IP tunnel is built on top of the IP tunnel so that the container network can be opened even in the case of Layer 3 interoperability.
  3. vxlan in a similar situationipip based Layer 2 overlay networks based on Layer 3 tunnels, which provide an extremely high level of network isolation and are relatively also the most complex

In terms of performance:

  1. host-gw Fastest because no packets are consumed
  2. ipip The next best thing is the encapsulation of an extra twenty bytes of the IP header
  3. vxlan Relatively slowest, with a full Ethernet, IP and UDP layer encapsulation.

consultation

  1. /rfc/inline-errata/, Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks