Introduction to Service in k8s [k8s Series II].

X. Preface

Each Pod in a k8s cluster has its own IP address, so does having an IP make it easier to access it?

Because Pods are not persistent in k8s, destroying and rebuilding them will result in new IPs, and it is not reasonable for clients to access them by changing their IPs. In addition, Pods are often load-balanced across multiple replicas, so the question of how efficiently clients can access which replica becomes apparent.

The Service object, which will be introduced in this article, was born.

I. The working principle of Service

1.1 Introduction to Service

The following relationship diagram is shown below:

(coll.) fail (a student)Create/Modify Service via APIobject when theEndpointsController The Informer mechanism listens to the Service object and then selects it according to the Service's configured selector.Create an Endpoints objectThis object records the Pod's IP, container port, andStore to etcdThis way, the Service can simply look at the Endpoints under its own name to find out the information about the corresponding Pod.

When a Pod changes (e.g., a new Pod is dispatched, the state of an existing Pod changes to non-Running, or the number of Pods scales).The API Server notifies the EndpointsController of these changes in the form of eventsThe EndpointsController then recalculates the list of endpoints for the Endpoints object based on the latest Pod status and the Service's Label Selector, and updates the Endpoints resource object stored in the etcd. In this way, theThe Endpoints associated with the Service dynamically reflect the IP addresses and ports of all Running state Pods that match the Selector.。

For EndpointsController:is the controller responsible for generating and maintaining all Endpoints objects.Listen for changes in the Service and the corresponding Pod, and update the Endpoints object of the corresponding Service.When the user creates a Service, EndpointsController will listen to the status of the Pod. After the user creates a Service, EndpointsController listens to the status of the Pod, and when the Pod is Running and ready, EndpointsController records the IP of the Pod into the Endpoints object, so the container discovery of the Service is realized through the Therefore, the container discovery of Service is realized through Endpoints. Therefore, the container discovery of Service is realized through Endpoints.kube-proxy It listens for updates to Service and Endpoints and calls its proxy module to refresh the route forwarding rules on the host.。

In addition, the k8s Informer is a core component, which is specifically designed to monitor changes to API resources and notify the relevant clients when resources change.

1.2 Configuration File Example

Start by creating a Deployment deployment to configure and manage the Pod:

#
apiVersion: apps/v1
# Specify the type of resource to create as Deployment
kind: Deployment
# Configure the metadata information of the resource, e.g., the name of the resource
metadata.
  name: deployment-demo
# spec statute, required
# Desired State Desired State of the object, and basic information about the object.
spec.
  selector: # Specifies how to recognize the Pod instances created by the Pod template.
    matchLabels: # The selector will match all Pods with the label app: nginx.
      app: nginx
  replicas: 3 # Specify that the number of Pod replicas to be maintained is 3.
  # Define the Pod template to be used to create new Pod instances
  template: # Define the Pod template to be used to create new Pod instances.
    metadata: # Configure metadata information for the Pod board, such as labels.
      labels: app: nginx
        app: nginx
    # Define the Pod's desired state and basic information
    spec: containers: # List the containers running in the Pod.
      containers: # List containers running in the Pod, name, mirrors, ports
      - name: nginx
        image: mirrorgooglecontainers/serve_hostname
        name: nginx image: mirrorgooglecontainers/serve_hostname
        - containerPort: 9376
          protocol: TCP # Specify the port protocol as TCP.

The server_hostname is the official k8s debug image, a web server that returns the hostname. this creates three Pods labeled [app: nginx], which return the hostname when we access 9376 of the Pod.

Next is the Service manifest, which specifies the selector as [app: nginx], configured as follows:

#
apiVersion: v1
kind: Service
# Configure metadata information about the resource, such as the resource name.
metadata: name: service-demo
  name: service-demo
spec: selector: # Identify the backend Pod by the content of the specified tag.
  selector: # Identify the backend Pod by the content of the specified tag.
    app: nginx
  # Define the ports exposed by Service
  ports: name: default
  - name: default # Name the ports for easy referencing
    protocol: TCP
    # Service Port to the outside world
    port: 80 # Specifies that Service exposes port 80.
    # Container Port internal
    targetPort: 9376 # Specifies that the target port number for Service to forward traffic to the backend Pod is 9376.

This results in an unchanged CLUSTER-IP 10.96.148.206 Service:

If the Pod is launched successfully, Endpoints with the same name as the Service are automatically created to record the data of the three Pods:

When the selector in the Service does not specify a label, Endpoints needs to manually create the network address mapped to the Service, as follows:

apiVersion: v1
kind: Endpoints
metadata: endpoints-demo
  name: endpoints-demo
# Define subsets of Endpoints, each corresponding to a backend service instance.
subsets: addresses: # List the Endpoints service instances.
  - addresses: # List the IP addresses of the Endpoints instances.
      - ip: 10.96.148.206
    ports: # Define a list of ports on which the backend service instance listens.
      - ports: 9376

At this point when we keep accessing the CLUSTER-IP of the Service:

# curl 10.96.148.206:80
deployment-demo-7d94cbb55f-8mmxb
# curl 10.96.148.206:80
deployment-demo-7d94cbb55f-674ns
# curl 10.96.148.206:80
deployment-demo-7d94cbb55f-lfrm8
# curl 10.96.148.206:80
deployment-demo-7d94cbb55f-8mmxb

You can see that the request has been routed to the back-end Pod, which returns the hostname, and the load balancing method is Round Robin, i.e., polling mode.

Load Balancing for Services

As mentioned above, the actual routing and forwarding of the Service is implemented by the kube-proxy component. The service exists only as a VIP (ClusterIP), and kube-proxy mainly implements the access from Pod to Service inside the cluster and from nodePort to Service outside the cluster. kube-proxy's routing rules are implemented through its backend proxy module.

There are currently four implementations of kube-proxy's proxy module.userspace (not commonly used), iptables (default), ipvs (used for large clusters), kernelspace (for Windows environments)The history of its development is shown below:

kubernetes v1.0: services is only a "Layer 4" proxy, and the proxy module is only userspace.
kubernetes v1.1: Ingress API appears, it proxies "layer 7" services, and adds iptables proxy module
kubernetes v1.2: iptables becomes the default proxy mode.
kubernetes v1.8: introduction of ipvs proxy module
kubernetes v1.9: ipvs proxy module becomes beta version
kubernetes v1.11: ipvs proxy mode GA

Each of these modes has its own load balancing strategy, which is described in more detail below.

userspace mode, not commonly used

In userspace mode, after a request to access a service arrives at a node, theFirst go to kernel iptables, then back to userspaceThe kube-proxy forwards the traffic to the back-end Pod, so that the traffic from the user-spacePerformance Losses from Incoming and Outgoing Kernelsis unacceptable, hence the iptables mode.

Why does userspace mode create iptables rules? Because The port on which kube-proxy listens is in userspace.，This port is not the service's access port nor is it the service's nodePortThe result is the need for a layer ofiptables redirects connections to the kube-proxy service.。

iptables mode, the default way

The iptables mode isThe current default proxy methodWhen a client requests a Service ClusterIP, it is routed to each Pod according to iptables rules. When a client requests the ClusterIP of a service, it is routed to each Pod according to iptables rules. iptables uses DNAT to complete the forwarding, and it uses random numbers to achieve load balancing.

Netfilter is a framework in the Linux kernel for processing packets at the network layer. It provides a mechanism that allows userspace programs to monitor and modify network traffic through a series of hooks (hooks). These hooks can be inserted at various stages of a packet's lifecycle, such as when it enters, leaves, or passes through the network stack.

DNAT (Destination Network Address Translation) Module: is a network feature used to redirect traffic arriving at services within the cluster to the correct destination IP address and port.DNAT is typically used in conjunction with load balancers in order to distribute traffic between multiple back-end services.

The biggest difference between iptables mode and userspace mode is that theThe iptables module uses the DNAT module to translate the Service entry address to the actual address of the Pod.This eliminates the need for a kernel-to-user switch; another difference with the userspace proxy model is that the iptables proxy does not automatically retry other Pods if the one it initially selects is not responding.

The main problem with the iptables model is that it generates too many iptables rules when the number of services is large, and the use of non-incremental updates introduces some latency, which is a significant performance problem at large scale.

ipvs mode for large clusters

When the cluster size is relatively large, iptables rule refresh will be very slow, it is difficult to support large-scale clusters, because of its underlying routing table implementation is a chained table, the addition, deletion, and modification of the routing rules are involved in traversing the chained table once, ipvs was introduced to solve this problem.

ipvs is a load balancing module for LVS. Similar to iptables, the implementation of ipvs is based on netfilter's hook functions, but it uses thehash tableas the underlying data structure andWork in kernel stateIn other words. ipvs has better performance in redirecting traffic and synchronizing proxy rules, allowing for almost unlimited scaling。

ipvs supports three load balancing modes: DR mode (Direct Routing), NAT mode (Network Address Translation), and Tunneling (also known as ipip mode). Among the three modes, only NAT supports port mapping. ipvs using NAT mode。

The linux kernel's native ipvs only supports DNAT, and ipvs still uses iptables when it comes to packet filtering, SNAT, and support for NodePort-type services.

In addition. ipvs also supports additional load balancing algorithms such as:

rr: round-robin/polling
lc: least connection/least connection
dh: destination hashing/target hashing
sh: source hashing/Source hashing
sed: shortest expected delay/expected delay time
nq: never queue/never queue

userspace, iptables, and ipvs.The default strategy is round-robin。

You can realize the session affinity based on client ip in Service by setting the value of Service . The session affinity based on client ip can be achieved in Service by setting the value of Service . The default value is "None" and can be set to "ClientIP". You can also set the session hold time using Service . The default value is "None" and can be set to "ClientIP".

Also.kernelspace modeIt is mainly used under windows and is omitted from this article.

Types of Services

The type of service support is also the way of service exposure in k8s, by default there are four ClusterIP, NodePort, LoadBalancer, ExternelName, in addition to Ingress, the following will be described in detail the specific use of each type of service scenarios.

3.1 ClusterIP, default method, intra-cluster access

Service of type ClusterIP is a kubernetes cluster.Default way of exposing servicesItCan only be used for intra-cluster communicationIt is accessible by the Pods in the following ways:

pod ---> ClusterIP:ServicePort --> (iptables)DNAT --> PodIP:containePort

Configuration example:

apiVersion: v1
kind: Service
metadata:
  name: service-python
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 443
  selector:
    run: pod-python
  type: ClusterIP # indicate clearly and with certainty Service typology

Use the command kuebctl get svc :

A service of type ClusterIP has a Cluster-IP, which is actually a VIP, and relies on the kube-proxy component via iptables or ipvs.

In the Create Service request, you can specify your own cluster IP address through the Settings field. For example, if you want to reuse an existing DNS entry, or if a legacy system has been configured with a fixed IP that is difficult to reconfigure. If you set . None" in Service, Kubernetes does not assign an IP address.

The configured IP address must be a legal IPv4 or IPv6 address that is within the service-cluster-ip-range CIDR configured on the API server. The API server returns an HTTP status code 422 for a Service configured with an illegal clusterIP address, indicating that the value is not legal.

3.2 NodePort Out-of-Cluster Access

To access services inside the cluster from outside the cluster, you can use this type of Service.

A service of type NodePort opens a specified port on a node in the cluster where kube-proxy is deployed, after which all traffic is sent directly to this port and then forwarded to the real service on the backend of the service for access.

Nodeport is built on ClusterIP with the following access links:

client ---> NodeIP:NodePort ---> ClusterIP:ServicePort ---> (iptables)DNAT ---> PodIP:containePort

Configuration example:

apiVersion: v1
kind: Service
metadata:
  name: service-python
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 443
    nodePort: 30080
  selector:
    run: pod-python
  type: NodePort

Use the command kuebctl get svc :

At this point we have access to Pod -python via http://4.4.4.1:30080 or http://4.4.4.2:30080.

The port has a range, for example the default k8s control plane will assign ports in the range specified by the --service-node-port-range flag (default: 30000-32767).

Policies for assigning ports to NodePort servicesSuitable for both automatic distributionof the situation.Also suitable for manual distributionScenario. When a port is used to create a NodePort service that wishes to use a specific port, that target port may conflict with another port that has already been assigned. To avoid this problem, the range of ports used for NodePort services is divided into two segments.Dynamic port assignment uses higher port segments by default and also allows lower port segments to be used when higher port segments are exhausted. Users can assign ports from lower port segments to reduce the risk of port conflicts.

3.3 LoadBalancer Load Balancing

LoadBalancer-type services are another solution that can be implemented to access services from outside the cluster. However, not all k8s clusters will support it.Mostly in public cloud hosted clusters will support this type of。

Load balancers are created asynchronously, and information about the provided load balancer is published in the Service field.

LoadBalancer configuration example:

apiVersion: v1
kind: Service
metadata:
  name: service-python
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 443
    nodePort: 30080
  selector:
    run: pod-python
  type: LoadBalancer

Use the command kuebctl get svc :

You can see the external-ip, and then you can access it through that ip.

Of course, each public cloud supports many other settings. Most of the public cloud load balancer setup parameters can be set via svc annotations, such as the following aws:

    metadata:
      name: my-service
      annotations:
        /aws-load-balancer-access-log-enabled: "true"
        # Specifies whether access logs are enabled for the load balancer
        /aws-load-balancer-access-log-emit-interval: "60"
        # The interval for publishing the access logs. You can specify an interval of either 5 or 60 (minutes).
        /aws-load-balancer-access-log-s3-bucket-name: "my-bucket"
        # The name of the Amazon S3 bucket where the access logs are stored
        /aws-load-balancer-access-log-s3-bucket-prefix: "my-bucket-prefix/prod"
        # The logical hierarchy you created for your Amazon S3 bucket, for example `my-bucket-prefix/prod`

3.4 ExternalName Specifies the domain name for external access.

Services of type ExternalName map services to DNS names instead of typical selectors such as my-service or cassandra. You can specify these services with the parameter.

Note: CoreDNS 1.7 or later is required to use the ExternalName type.

ExternalName Configuration Example:

kind: Service
apiVersion: v1
metadata:
  name: service-python
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 443
  type: ExternalName
  externalName:

When looking up the host Service -, the cluster DNS service returns a CNAME record with the value . Service -python is accessed in the same way as the other services, but the main difference is that the redirection occurs at the DNS level, not through a proxy or forwarding.

3.5 Headless Services (Headless Services)

When a situation arises where load balancing is not required and a separate Service IP is not needed, a Headless Service can be created by explicitly setting the value of the Cluster IP () to "None".

Headless Services do not get cluster IPs, kube-proxy does not handle them, and the platform does not provide load balancing or routing support for them.

Headless Service allows clients to connect directly to any Pod. it does not use virtual IP addresses and proxies to configure routing and packet forwarding; instead, itReport endpoint IP addresses for each Pod via internal DNS recordsThese DNS records are provided by the cluster's DNS serviceThe.

Service's definition configuration: set the. Set to ClusterIP(which is also the default value for type), and furthermore, the . Set to None. The string value None is a special case that is different from not setting the . field.

Two application scenarios:

Autonomy. Sometimes the Client wants to decide which Real Server to use on its own, so it can query DNS to get information about the Real Server.
Each Endpoints of the Headless Service, i.e., each Pod, will have a corresponding DNS domain name; this allows the Pods to access each other and the cluster to access the Pods individually.

Articles with simple examples are available:/p/490900452

Service Discovery

Although the Endpoints of the Service solves the container discovery problem, how to discover the Service service without knowing the Cluster IP of the Service in advance?

Service currently supports two types of service discovery mechanisms, one via environment variables and the other via DNS; of these two options, the latter is recommended.

environment variable

When a Pod is running on a Node, the kubelet will run on it.Add a set of environment variables for each active ServiceFor example, {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT. where the name of the Service is converted to uppercase and the horizontal line is converted to an underscore.

For example, if a Service redis-primary exposes TCP port 6379 and is assigned the cluster IP address 10.0.0.11, the Service generates the following environment variables:

REDIS_PRIMARY_SERVICE_HOST=10.0.0.11
REDIS_PRIMARY_SERVICE_PORT=6379
REDIS_PRIMARY_PORT=tcp://10.0.0.11:6379
REDIS_PRIMARY_PORT_6379_TCP=tcp://10.0.0.11:6379
REDIS_PRIMARY_PORT_6379_TCP_PROTO=tcp
REDIS_PRIMARY_PORT_6379_TCP_PORT=6379
REDIS_PRIMARY_PORT_6379_TCP_ADDR=10.0.0.11

It should be noted, however, that inThe environment variable is not registered for all Pods before the Service is created.So it is recommended to use DNS for service discovery between services in normal use.

DNS

It is common to use an add-on to install a DNS service for a Kubernetes cluster.

A DNS server (such as CoreDNS), which is cluster-aware, also monitors the Kubernetes API for new Services and creates a set of DNS records for each Service. If DNS is enabled across the cluster, all Pods should be able to automatically resolve Services by DNS name.

For example, if there is a Service named my-service in the k8s namespace my-ns, the control plane and DNS service work together to generate a DNS record for -ns. Pods in the namespace my-ns should be able to find the service by searching for my-service by name (as can -ns). Pods in other namespaces must qualify their names to -ns. these names will resolve to the cluster IP assigned to the Service.

k8s also supports DNS SRV (Service) records for named ports. If Service -ns has a port named http and the protocol is set to TCP, you can perform a DNS SRV query with _http._tcp.-ns to discover the port number of http as well as the IP address.

The k8s DNS server is the only way to access a Service of type ExternalName.

Reference:/p/454836610 /p/111244353 /p/157565821 /zh-cn/docs/concepts/services-networking/service/