ArgoWorkflow Tutorial (IV) - Workflow & Log Archiving

In the last article, we analyzed artifacts in argo-workflow, including artifact-repository configuration and how to use artifacts in Workflow. this article focuses on analyzing pipelined GC and archiving, to prevent unlimited space consumption in the cluster etcd.

1. General

Because ArgoWorkflow is implemented as a CRD, it does not require an external storage service to run properly:

running recordUsing the Workflow CR Object Store
running logThe pods are stored in the kubectl logs.
- So you need to make sure that the Pod is not deleted or you won't be able to view it

But also because all the data is stored in the cluster, when the amount of data is large etcd Storage pressure can be high, ultimately affecting cluster stability。

To solve this problem ArgoWorkflow provides an archiving feature to archive historical data to external storage to reduce the storage pressure on etcd.

Specifically realized as:

1) Store the Workflow objects to Postgres (or MySQL).
2) The logs corresponding to the Pod will be stored in S3, because the amount of log data may be large, so it is not directly stored in PostgresQL.

In order to provide archiving functionality, two storage services are relied upon:

Postgres: external database to store archived workflow records
minio: provides S3 storage for artifacts generated in Workflow and Pod logs for archived workflows

Therefore, if you don't need to store too many Workflow records and log viewing requirements, you don't need to use the archiving feature, and you can just clean up the data in the cluster at regular intervals.

GC

Argo Workflows have a workflow cleanup mechanism, known as Garbage Collect (GC), which prevents too many execution records from overloading Etcd, the backend storage of Kubernetes.

opens

We can configure the number of work execution records we want to keep in ConfigMap, and here we support to set different number of retention for execution records of different status.

First check which Configmap is specified in the argo-server startup command

# kubectl -n argo get deploy argo-workflows-server -oyaml|grep args -A 5
      - args:
        - server
        - --configmap=argo-workflows-workflow-controller-configmap
        - --auth-mode=server
        - --secure=false
        - --loglevel

As you can see, this is used hereargo-workflows-workflow-controller-configmapThen just modify this.

The configuration is as follows:

apiVersion: v1
data:
  retentionPolicy: |
    completed: 3
    failed: 3
    errored: 3
kind: ConfigMap
metadata:
  name: argo-workflows-workflow-controller-configmap
  namespace: argo

Note that the cleanup mechanism here removes redundant Workflow resources from Kubernetes. If you want more history, it is recommended to enable and configure archiving.

Then reboot. argo-workflow-controller cap (a poem) argo-server

kubectl -n argo rollout restart deploy argo-workflows-server
kubectl -n argo rollout restart deploy argo-workflows-workflow-controller

beta (software)

Run multiple pipelines and see if it cleans up automatically!

for ((i=1; i<=10; i++)); do
cat <<EOF | kubectl create -f -
apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world $i"]
EOF
done

Created 10 Workflows to see if they will be automatically cleaned up after the run is complete

[root@lixd-argo archive]# k get wf
NAME                STATUS      AGE   MESSAGE
hello-world-6hgb2   Succeeded   74s
hello-world-6pl5w   Succeeded   37m
hello-world-9fdmv   Running     21s
hello-world-f464p   Running     18s
hello-world-kqwk4   Running     16s
hello-world-kxbtk   Running     18s
hello-world-p88vd   Running     19s
hello-world-q7xbk   Running     22s
hello-world-qvv7d   Succeeded   10m
hello-world-t94pb   Running     23s
hello-world-w79q6   Running     15s
hello-world-wl4vl   Running     23s
hello-world-znw7w   Running     23s

I'll read it later.

[root@lixd-argo archive]# k get wf
NAME                STATUS      AGE    MESSAGE
hello-world-f464p   Succeeded   102s
hello-world-kqwk4   Succeeded   100s
hello-world-w79q6   Succeeded   99s

As you can see, only 3 records were retained and the rest were cleaned up, indicating that the GC function is ok.

3. Pipeline archiving

/en/stable/workflow-archive/

When GC is turned on, it will automatically clean up the workflow to ensure that etcd is not full, but you will not be able to query the previous records either.

ArgoWorkflow also provides pipeline archiving functionality to address this issue.

Persistence is achieved by logging Workflow to an external Postgres database for query history needs.

Deploying Postgres

First, simply use helm to deploy an AIO Postgres

REGISTRY_NAME=
REPOSITORY_NAME=bitnamicharts
storageClass="local-path"
# postgres Password for the account
adminPassword="postgresadmin"

helm install pg-aio oci://$REGISTRY_NAME/$REPOSITORY_NAME/postgresql \
--set =$storageClass \
--set =$adminPassword \
--set =argo

Configuration pipeline archiving

Similarly, add persistence to the argo configuration file:

persistence: 
  archive: true
  postgresql:
    host: 
    port: 5432
    database: postgres
    tableName: argo_workflows
    userNameSecret:
      name: argo-postgres-config
      key: username
    passwordSecret:
      name: argo-postgres-config
      key: password

argo-workflows-workflow-controller-configmap The complete content is as follows:

apiVersion: v1
data:
  retentionPolicy: |
    completed: 3
    failed: 3
    errored: 3
  persistence: |
    archive: true
    archiveTTL: 180d
    postgresql:
      host: 
      port: 5432
      database: argo
      tableName: argo_workflows
      userNameSecret:
        name: argo-postgres-config
        key: username
      passwordSecret:
        name: argo-postgres-config
        key: password
kind: ConfigMap
metadata:
  name: argo-workflows-workflow-controller-configmap
  namespace: argo

Then you have to create a secret

kubectl create secret generic argo-postgres-config -n argo --from-literal=password=postgresadmin --from-literal=username=postgres

You may also need to give rbac, otherwise the Controller won't be able to query the secret.

kubectl create clusterrolebinding argo-workflow-controller-admin --clusterrole=admin --serviceaccount=argo:argo-workflows-workflow-controller

Then reboot. argo-workflow-controller cap (a poem) argo-server

kubectl -n argo rollout restart deploy argo-workflows-server
kubectl -n argo rollout restart deploy argo-workflows-workflow-controller

When you start the workflow controller with archiving enabled, the following table is created in the database:

argo_workflows
argo_archived_workflows
argo_archived_workflows_labels
schema_history

Archived records GC

configuration file in thearchiveTTL Used to specify how long the Workflow records compressed into Postgres will live. argo Controller will automatically delete the expired records according to this configuration, if you don't specify this value, they will not be deleted.

The details are as follows:

func (r *workflowArchive) DeleteExpiredWorkflows(ttl ) error {
	rs, err := ().
		DeleteFrom(archiveTableName).
		Where(()).
		And(("finishedat < current_timestamp - interval '%d' second", int(()))).
		Exec()
	if err != nil {
		return err
	}
	rowsAffected, err := ()
	if err != nil {
		return err
	}
	({"rowsAffected": rowsAffected}).Info("Deleted archived workflows")
	return nil
}

However, the deletion task is executed once a day by default, so it will not be deleted immediately even if it is configured for 1m minutes.

func (wfc *WorkflowController) archivedWorkflowGarbageCollector(stopCh <-chan struct{}) {
	defer (...)

	periodicity := ("ARCHIVED_WORKFLOW_GC_PERIOD", 24*)
	if  == nil {
		("Persistence disabled - so archived workflow GC disabled - you must restart the controller if you enable this")
		return
	}
	if ! {
		("Archive disabled - so archived workflow GC disabled - you must restart the controller if you enable this")
		return
	}
	ttl := 
	if ttl == (0) {
		("Archived workflows TTL zero - so archived workflow GC disabled - you must restart the controller if you enable this")
		return
	}
	({"ttl": ttl, "periodicity": periodicity}).Info("Performing archived workflow GC")
	ticker := (periodicity)
	defer ()
	for {
		select {
		case <-stopCh:
			return
		case <-:
			("Performing archived workflow GC")
			err := ((ttl))
			if err != nil {
				("err", err).Error("Failed to delete archived workflows")
			}
		}
	}
}

Environment variables need to be setARCHIVED_WORKFLOW_GC_PERIOD To adjust the value, modify argo-workflows-workflow-controller to add env, like this:

        env:
        - name: ARCHIVED_WORKFLOW_GC_PERIOD
          value: 1m

beta (software)

Next, create a workflow and see if it tests

for ((i=1; i<=10; i++)); do
cat <<EOF | kubectl create -f -
apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world $i"]
EOF
done

Check to see if an archive record has been created in postgres.

export POSTGRES_PASSWORD=postgresadmin

kubectl run postgresql-dev-client --rm --tty -i --restart='Never' --namespace default --image /bitnami/postgresql:14.1.0-debian-10-r80 --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host pg-aio-postgresql -U postgres -d argo -p 5432

Press Enter to enter the Pod and go directly to the query

# lookup table
argo-# \dt
                     List of relations
 Schema | Name | Type | Owner
--------+--------------------------------+-------+----------
 public | argo_archived_workflows | table | postgres
 public | argo_archived_workflows_labels | table | postgres
 public | argo_workflows | table | postgres
 public | schema_history | table | postgres
(4 rows)

# Inquiry Record
argo=# select name,phase from argo_archived_workflows;
       name | phase
-------------------+-----------
 hello-world-s8v4f | Succeeded
 hello-world-6pl5w | Succeeded
 hello-world-qvv7d | Succeeded
 hello-world-vgjqr | Succeeded
 hello-world-g2s8f | Succeeded
 hello-world-jghdm | Succeeded
 hello-world-fxtvk | Succeeded
 hello-world-tlv9k | Succeeded
 hello-world-bxcg2 | Succeeded
 hello-world-f6mdw | Succeeded
 hello-world-dmvj6 | Succeeded
 hello-world-btknm | Succeeded
(12 rows)

# \q abort
argo=# \q

As you can see, Postgres already stores the archived Workflow, so if you need to query the history, you can just go to Postgres and query it.

Change the archiveTTL to 1 minute, then restart argo, wait 1 to 2 minutes, and then check the

argo=#  select name,phase from argo_archived_workflows;
 name | phase
------+-------
(0 rows)

argo=#

As you can see, all records are cleaned up due to TTL, which also ensures that the data in the external Postgres doesn't accumulate more and more.

4. Pod log archiving

/en/stable/configure-archive-logs/

Pipeline archiving implements pipeline persistence, so that even if you delete a Workflow object in the cluster, you can query the record and the state and other information from Postgres.

However, the pipeline execution logs are scattered in the corresponding Pod, if the Pod is deleted, the logs can not be viewed, so we also need to do log archiving.

Configuring Pod Archiving

global configuration

Turn on Pod log archiving and configure S3 information in the argo configuration file.

The specific configurations are as follows:

It is the same as the artifact configured in the third article, except that there is an additionalarchiveLogs: true

artifactRepository:
  archiveLogs: true
  s3:
    endpoint: :9000
    bucket: argo
    insecure: true
    accessKeySecret:
      name: my-s3-secret
      key: accessKey
    secretKeySecret:
      name: my-s3-secret
      key: secretKey

The full configuration is shown below:

apiVersion: v1
data:
  retentionPolicy: |
    completed: 3
    failed: 3
    errored: 3
  persistence: |
    archive: true
    postgresql:
      host: 
      port: 5432
      database: argo
      tableName: argo_workflows
      userNameSecret:
        name: argo-postgres-config
        key: username
      passwordSecret:
        name: argo-postgres-config
        key: password
  artifactRepository: |
    archiveLogs: true
    s3:
      endpoint: :9000
      bucket: argo
      insecure: true
      accessKeySecret:
        name: my-s3-secret
        key: accessKey
      secretKeySecret:
        name: my-s3-secret
        key: secretKey
kind: ConfigMap
metadata:
  name: argo-workflows-workflow-controller-configmap
  namespace: argo

Note: According to the third article analyzing artifact, the information about artifactRepository in argo includes three ways to configure it:

1) Global Configuration
2) Namespace default configuration
3) Specify the configuration in Workflow

If the artifactRepository is also configured at the Namespace level or Workflow level and you specify that log archiving is not to be turned on, then it will not be archived.

Then reboot argo.

kubectl -n argo rollout restart deploy argo-workflows-server
kubectl -n argo rollout restart deploy argo-workflows-workflow-controller

Configuration in Workflow & template

Configure the entire workflow to require archiving

apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: archive-location-
spec:
  archiveLogs: true
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]

A template in a configuration workflow needs to be archived.

apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: archive-location-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]
    archiveLocation:
      archiveLogs: true

wrap-up

All 3 places can be configured to archive or not, it's kinda tricky, according to the official document, the priority of each configuration is as follows:

workflow-controller config (on) > workflow spec (on/off) > template (on/off)

Controller Config Map	Workflow Spec	Template	are we archiving logs?
true	true	true	true
true	true	false	true
true	false	true	true
true	false	false	true
false	true	true	true
false	true	false	false
false	false	true	true
false	false	false	false

Corresponding code implementation:

// IsArchiveLogs determines if container should archive logs
// priorities: controller(on) > template > workflow > controller(off)
func (woc *wfOperationCtx) IsArchiveLogs(tmpl *) bool {
	archiveLogs := ()
	if !archiveLogs {
		if  != nil {
			archiveLogs = *
		}
		if  != nil &&  != nil {
			archiveLogs = *
		}
	}
	return archiveLogs
}

It is recommended to just configure the global one.

beta (software)

Next, create a workflow and see if it tests

cat <<EOF | kubectl create -f -
apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world"]
EOF

Wait for Workflow to finish running

# k get po
NAME                     READY   STATUS      RESTARTS   AGE
hello-world-6pl5w        0/2     Completed   0          53s
# k get wf
NAME                STATUS      AGE   MESSAGE
hello-world-6pl5w   Succeeded   55s

Go to S3 to see if the log archive file is available

As you can see, a log file has been stored in the specified bucket as a$bucket/$workflowName/$stepName Format Naming.

A normal workflow has multiple steps, each of which is stored in a directory.

The contents are the Pod logs, as follows:

 _____________ 
< hello world >
 ------------- 
    \
     \
      \     
                    ##        .            
              ## ## ##       ==            
           ## ## ## ##      ===            
       /""""""""""""""""___/ ===        
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
       \______ o          __/            
        \    \        __/             
          \____\______/

5. Summary

[ArgoWorkflow Series]Continuously updated, search the public number [Explore Cloud Native]Subscribe to read more articles.

To summarize, this paper analyzes the following 3 main sections:

1) Enable GC to automatically clean up completed workflow records to avoid occupying etcd space.
2) Enable pipeline archiving to store Workflow records to external Postgres for easy query history
3) Enable Pod log archiving to record the Pod logs of each step of the pipeline to S3 for easy querying, otherwise the Pod deletion won't be able to be queried

For production use, it is generally recommended to enable the relevant cleanup and archiving features, if all stored to etcd, it will inevitably affect the cluster performance and stability.