In the last article, we analyzed artifacts in argo-workflow, including artifact-repository configuration and how to use artifacts in Workflow. this article focuses on analyzing pipelined GC and archiving, to prevent unlimited space consumption in the cluster etcd.
1. General
Because ArgoWorkflow is implemented as a CRD, it does not require an external storage service to run properly:
- running recordUsing the Workflow CR Object Store
-
running logThe pods are stored in the kubectl logs.
- So you need to make sure that the Pod is not deleted or you won't be able to view it
But also because all the data is stored in the cluster, when the amount of data is large etcd Storage pressure can be high, ultimately affecting cluster stability。
To solve this problem ArgoWorkflow provides an archiving feature to archive historical data to external storage to reduce the storage pressure on etcd.
Specifically realized as:
- 1) Store the Workflow objects to Postgres (or MySQL).
- 2) The logs corresponding to the Pod will be stored in S3, because the amount of log data may be large, so it is not directly stored in PostgresQL.
In order to provide archiving functionality, two storage services are relied upon:
- Postgres: external database to store archived workflow records
- minio: provides S3 storage for artifacts generated in Workflow and Pod logs for archived workflows
Therefore, if you don't need to store too many Workflow records and log viewing requirements, you don't need to use the archiving feature, and you can just clean up the data in the cluster at regular intervals.
GC
Argo Workflows have a workflow cleanup mechanism, known as Garbage Collect (GC), which prevents too many execution records from overloading Etcd, the backend storage of Kubernetes.
opens
We can configure the number of work execution records we want to keep in ConfigMap, and here we support to set different number of retention for execution records of different status.
First check which Configmap is specified in the argo-server startup command
# kubectl -n argo get deploy argo-workflows-server -oyaml|grep args -A 5
- args:
- server
- --configmap=argo-workflows-workflow-controller-configmap
- --auth-mode=server
- --secure=false
- --loglevel
As you can see, this is used hereargo-workflows-workflow-controller-configmap
Then just modify this.
The configuration is as follows:
apiVersion: v1
data:
retentionPolicy: |
completed: 3
failed: 3
errored: 3
kind: ConfigMap
metadata:
name: argo-workflows-workflow-controller-configmap
namespace: argo
Note that the cleanup mechanism here removes redundant Workflow resources from Kubernetes. If you want more history, it is recommended to enable and configure archiving.
Then reboot. argo-workflow-controller cap (a poem) argo-server
kubectl -n argo rollout restart deploy argo-workflows-server
kubectl -n argo rollout restart deploy argo-workflows-workflow-controller
beta (software)
Run multiple pipelines and see if it cleans up automatically!
for ((i=1; i<=10; i++)); do
cat <<EOF | kubectl create -f -
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world $i"]
EOF
done
Created 10 Workflows to see if they will be automatically cleaned up after the run is complete
[root@lixd-argo archive]# k get wf
NAME STATUS AGE MESSAGE
hello-world-6hgb2 Succeeded 74s
hello-world-6pl5w Succeeded 37m
hello-world-9fdmv Running 21s
hello-world-f464p Running 18s
hello-world-kqwk4 Running 16s
hello-world-kxbtk Running 18s
hello-world-p88vd Running 19s
hello-world-q7xbk Running 22s
hello-world-qvv7d Succeeded 10m
hello-world-t94pb Running 23s
hello-world-w79q6 Running 15s
hello-world-wl4vl Running 23s
hello-world-znw7w Running 23s
I'll read it later.
[root@lixd-argo archive]# k get wf
NAME STATUS AGE MESSAGE
hello-world-f464p Succeeded 102s
hello-world-kqwk4 Succeeded 100s
hello-world-w79q6 Succeeded 99s
As you can see, only 3 records were retained and the rest were cleaned up, indicating that the GC function is ok.
3. Pipeline archiving
/en/stable/workflow-archive/
When GC is turned on, it will automatically clean up the workflow to ensure that etcd is not full, but you will not be able to query the previous records either.
ArgoWorkflow also provides pipeline archiving functionality to address this issue.
Persistence is achieved by logging Workflow to an external Postgres database for query history needs.
Deploying Postgres
First, simply use helm to deploy an AIO Postgres
REGISTRY_NAME=
REPOSITORY_NAME=bitnamicharts
storageClass="local-path"
# postgres Password for the account
adminPassword="postgresadmin"
helm install pg-aio oci://$REGISTRY_NAME/$REPOSITORY_NAME/postgresql \
--set =$storageClass \
--set =$adminPassword \
--set =argo
Configuration pipeline archiving
Similarly, add persistence to the argo configuration file:
persistence:
archive: true
postgresql:
host:
port: 5432
database: postgres
tableName: argo_workflows
userNameSecret:
name: argo-postgres-config
key: username
passwordSecret:
name: argo-postgres-config
key: password
argo-workflows-workflow-controller-configmap The complete content is as follows:
apiVersion: v1
data:
retentionPolicy: |
completed: 3
failed: 3
errored: 3
persistence: |
archive: true
archiveTTL: 180d
postgresql:
host:
port: 5432
database: argo
tableName: argo_workflows
userNameSecret:
name: argo-postgres-config
key: username
passwordSecret:
name: argo-postgres-config
key: password
kind: ConfigMap
metadata:
name: argo-workflows-workflow-controller-configmap
namespace: argo
Then you have to create a secret
kubectl create secret generic argo-postgres-config -n argo --from-literal=password=postgresadmin --from-literal=username=postgres
You may also need to give rbac, otherwise the Controller won't be able to query the secret.
kubectl create clusterrolebinding argo-workflow-controller-admin --clusterrole=admin --serviceaccount=argo:argo-workflows-workflow-controller
Then reboot. argo-workflow-controller cap (a poem) argo-server
kubectl -n argo rollout restart deploy argo-workflows-server
kubectl -n argo rollout restart deploy argo-workflows-workflow-controller
When you start the workflow controller with archiving enabled, the following table is created in the database:
argo_workflows
argo_archived_workflows
argo_archived_workflows_labels
schema_history
Archived records GC
configuration file in thearchiveTTL
Used to specify how long the Workflow records compressed into Postgres will live. argo Controller will automatically delete the expired records according to this configuration, if you don't specify this value, they will not be deleted.
The details are as follows:
func (r *workflowArchive) DeleteExpiredWorkflows(ttl ) error {
rs, err := ().
DeleteFrom(archiveTableName).
Where(()).
And(("finishedat < current_timestamp - interval '%d' second", int(()))).
Exec()
if err != nil {
return err
}
rowsAffected, err := ()
if err != nil {
return err
}
({"rowsAffected": rowsAffected}).Info("Deleted archived workflows")
return nil
}
However, the deletion task is executed once a day by default, so it will not be deleted immediately even if it is configured for 1m minutes.
func (wfc *WorkflowController) archivedWorkflowGarbageCollector(stopCh <-chan struct{}) {
defer (...)
periodicity := ("ARCHIVED_WORKFLOW_GC_PERIOD", 24*)
if == nil {
("Persistence disabled - so archived workflow GC disabled - you must restart the controller if you enable this")
return
}
if ! {
("Archive disabled - so archived workflow GC disabled - you must restart the controller if you enable this")
return
}
ttl :=
if ttl == (0) {
("Archived workflows TTL zero - so archived workflow GC disabled - you must restart the controller if you enable this")
return
}
({"ttl": ttl, "periodicity": periodicity}).Info("Performing archived workflow GC")
ticker := (periodicity)
defer ()
for {
select {
case <-stopCh:
return
case <-:
("Performing archived workflow GC")
err := ((ttl))
if err != nil {
("err", err).Error("Failed to delete archived workflows")
}
}
}
}
Environment variables need to be setARCHIVED_WORKFLOW_GC_PERIOD
To adjust the value, modify argo-workflows-workflow-controller to add env, like this:
env:
- name: ARCHIVED_WORKFLOW_GC_PERIOD
value: 1m
beta (software)
Next, create a workflow and see if it tests
for ((i=1; i<=10; i++)); do
cat <<EOF | kubectl create -f -
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world $i"]
EOF
done
Check to see if an archive record has been created in postgres.
export POSTGRES_PASSWORD=postgresadmin
kubectl run postgresql-dev-client --rm --tty -i --restart='Never' --namespace default --image /bitnami/postgresql:14.1.0-debian-10-r80 --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host pg-aio-postgresql -U postgres -d argo -p 5432
Press Enter to enter the Pod and go directly to the query
# lookup table
argo-# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------------------------+-------+----------
public | argo_archived_workflows | table | postgres
public | argo_archived_workflows_labels | table | postgres
public | argo_workflows | table | postgres
public | schema_history | table | postgres
(4 rows)
# Inquiry Record
argo=# select name,phase from argo_archived_workflows;
name | phase
-------------------+-----------
hello-world-s8v4f | Succeeded
hello-world-6pl5w | Succeeded
hello-world-qvv7d | Succeeded
hello-world-vgjqr | Succeeded
hello-world-g2s8f | Succeeded
hello-world-jghdm | Succeeded
hello-world-fxtvk | Succeeded
hello-world-tlv9k | Succeeded
hello-world-bxcg2 | Succeeded
hello-world-f6mdw | Succeeded
hello-world-dmvj6 | Succeeded
hello-world-btknm | Succeeded
(12 rows)
# \q abort
argo=# \q
As you can see, Postgres already stores the archived Workflow, so if you need to query the history, you can just go to Postgres and query it.
Change the archiveTTL to 1 minute, then restart argo, wait 1 to 2 minutes, and then check the
argo=# select name,phase from argo_archived_workflows;
name | phase
------+-------
(0 rows)
argo=#
As you can see, all records are cleaned up due to TTL, which also ensures that the data in the external Postgres doesn't accumulate more and more.
4. Pod log archiving
/en/stable/configure-archive-logs/
Pipeline archiving implements pipeline persistence, so that even if you delete a Workflow object in the cluster, you can query the record and the state and other information from Postgres.
However, the pipeline execution logs are scattered in the corresponding Pod, if the Pod is deleted, the logs can not be viewed, so we also need to do log archiving.
Configuring Pod Archiving
global configuration
Turn on Pod log archiving and configure S3 information in the argo configuration file.
The specific configurations are as follows:
It is the same as the artifact configured in the third article, except that there is an additional
archiveLogs: true
artifactRepository:
archiveLogs: true
s3:
endpoint: :9000
bucket: argo
insecure: true
accessKeySecret:
name: my-s3-secret
key: accessKey
secretKeySecret:
name: my-s3-secret
key: secretKey
The full configuration is shown below:
apiVersion: v1
data:
retentionPolicy: |
completed: 3
failed: 3
errored: 3
persistence: |
archive: true
postgresql:
host:
port: 5432
database: argo
tableName: argo_workflows
userNameSecret:
name: argo-postgres-config
key: username
passwordSecret:
name: argo-postgres-config
key: password
artifactRepository: |
archiveLogs: true
s3:
endpoint: :9000
bucket: argo
insecure: true
accessKeySecret:
name: my-s3-secret
key: accessKey
secretKeySecret:
name: my-s3-secret
key: secretKey
kind: ConfigMap
metadata:
name: argo-workflows-workflow-controller-configmap
namespace: argo
Note: According to the third article analyzing artifact, the information about artifactRepository in argo includes three ways to configure it:
- 1) Global Configuration
- 2) Namespace default configuration
- 3) Specify the configuration in Workflow
If the artifactRepository is also configured at the Namespace level or Workflow level and you specify that log archiving is not to be turned on, then it will not be archived.
Then reboot argo.
kubectl -n argo rollout restart deploy argo-workflows-server
kubectl -n argo rollout restart deploy argo-workflows-workflow-controller
Configuration in Workflow & template
Configure the entire workflow to require archiving
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: archive-location-
spec:
archiveLogs: true
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
A template in a configuration workflow needs to be archived.
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: archive-location-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
archiveLocation:
archiveLogs: true
wrap-up
All 3 places can be configured to archive or not, it's kinda tricky, according to the official document, the priority of each configuration is as follows:
workflow-controller config (on) > workflow spec (on/off) > template (on/off)
Controller Config Map | Workflow Spec | Template | are we archiving logs? |
---|---|---|---|
true | true | true | true |
true | true | false | true |
true | false | true | true |
true | false | false | true |
false | true | true | true |
false | true | false | false |
false | false | true | true |
false | false | false | false |
Corresponding code implementation:
// IsArchiveLogs determines if container should archive logs
// priorities: controller(on) > template > workflow > controller(off)
func (woc *wfOperationCtx) IsArchiveLogs(tmpl *) bool {
archiveLogs := ()
if !archiveLogs {
if != nil {
archiveLogs = *
}
if != nil && != nil {
archiveLogs = *
}
}
return archiveLogs
}
It is recommended to just configure the global one.
beta (software)
Next, create a workflow and see if it tests
cat <<EOF | kubectl create -f -
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
EOF
Wait for Workflow to finish running
# k get po
NAME READY STATUS RESTARTS AGE
hello-world-6pl5w 0/2 Completed 0 53s
# k get wf
NAME STATUS AGE MESSAGE
hello-world-6pl5w Succeeded 55s
Go to S3 to see if the log archive file is available
As you can see, a log file has been stored in the specified bucket as a$bucket/$workflowName/$stepName
Format Naming.
A normal workflow has multiple steps, each of which is stored in a directory.
The contents are the Pod logs, as follows:
_____________
< hello world >
-------------
\
\
\
## .
## ## ## ==
## ## ## ## ===
/""""""""""""""""___/ ===
~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ / ===- ~~~
\______ o __/
\ \ __/
\____\______/
5. Summary
[ArgoWorkflow Series]Continuously updated, search the public number [Explore Cloud Native]Subscribe to read more articles.
To summarize, this paper analyzes the following 3 main sections:
- 1) Enable GC to automatically clean up completed workflow records to avoid occupying etcd space.
- 2) Enable pipeline archiving to store Workflow records to external Postgres for easy query history
- 3) Enable Pod log archiving to record the Pod logs of each step of the pipeline to S3 for easy querying, otherwise the Pod deletion won't be able to be queried
For production use, it is generally recommended to enable the relevant cleanup and archiving features, if all stored to etcd, it will inevitably affect the cluster performance and stability.