In the previous article, we analyzed the relationship between Workflow, WorkflowTemplate, and template. This article mainly analyzes how to use S3 storage artifact in argo-workflow to achieve file sharing between steps.
This paper addresses two main issues:
- 1) how to configure artifact-repository
- 2) How to use in Workflow
1. artifact-repository configuration
ArgoWorkflow interfaces with S3 for persistence and relies on artifact-repository configuration.
There are three ways to set the relevant configuration:
- 1) Global Configuration: Write the S3 configuration directly in the workflow-controller deploy by means of a configuration file.Specify the artifactRepository to be used globally.This approach has the lowest priority and can be replaced by the next two approaches.
-
2) Namespace default configuration: ArgoWorkflow looks in the Workflow's namespace for theDefault configuration of the current namespace, which configures priority two and can override the globally specified configuration.
- Requirement: The Workflow namespace will be searched for a file namedartifact-repositories Configmap as a configuration.
- 3) Specify the configuration in Workflow: You can also explicitly specify which artifact-repository to use in Workflow, which has the highest priority.
Note 📢:Regardless of how the artifact-repository is specified, the Secret in which the S3 AKSK information is stored must be synchronized to the Workflow Only the namespace where the。
Priority InWorkflowConfig > Namespace > Global
global configuration
ArgoWorkflow deployed as helm will specify the configuration in this form by default.
workflow-controller (used form a nominal expression) deployment yaml as below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: argo-workflow-argo-workflows-workflow-controller
namespace: argo-dev
spec:
template:
metadata:
spec:
containers:
- args:
- --configmap
- argo-workflow-argo-workflows-workflow-controller-configmap
- --executor-image
- /argoproj/argoexec:v3.4.11
- --loglevel
- info
- --gloglevel
- "0"
- --log-format
- text
As you can see in the startup command with--configmap argo-workflow-argo-workflows-workflow-controller-configmap
way specifies the Configmap from which the configuration file originated.
The contents of this Configmap are as follows:
apiVersion: v1
data:
# ... an omission
artifactRepository: |
s3:
endpoint: :9000
bucket: argo
insecure: true
accessKeySecret:
name: my-s3-secret
key: accessKey
secretKeySecret:
name: my-s3-secret
key: secretKey
kind: ConfigMap
metadata:
name: argo-workflows-workflow-controller-configmap
namespace: argo
Includes information about S3 endpoint, bucket, aksk, etc. With this information Workflow can access S3.
Namespace Default Configuration
According to the current implementation, ArgoWorkflowThe default artifactRepository configuration in the Workflow namespace takes precedence.。
The default will be to use a file namedartifact-repositories
Configmap as the artifactRepository configuration for Workflow in the current namespace, the contents of the Configmap will look something like this:
Note: The Configmap name must be artifact-repositories.
apiVersion: v1
kind: ConfigMap
metadata:
# If you want to use this config map by default, name it "artifact-repositories". Otherwise, you can provide a reference to a
# different config map in ``.
name: artifact-repositories
annotations:
# v3.0 and after - if you want to use a specific key, put that key into this annotation.
/default-artifact-repository: my-artifact-repository
data:
my-artifact-repository: |
s3:
bucket: lixd-argo
endpoint: :9000
insecure: true
accessKeySecret:
name: my-s3-secret
key: accessKey
secretKeySecret:
name: my-s3-secret
key: secretKey
# It is possible to write more than one Repository
my-artifact-repository2: ...
Each Key in Data corresponds to a Repository, and then theutilization /default-artifact-repository
annotation to specify which artifactRepository to use by default.。
For example, this specifies my-artifact-repository as the default artifactRepository.
Specify the configuration in Workflow
In addition, you can specify which artifactRepository you want to use directly in Workflow.
spec:
artifactRepositoryRef:
configMap: my-artifact-repository # default is "artifact-repositories"
key: v2-s3-artifact-repository # default can be set by the `/default-artifact-repository` annotation in config map.
You need to specify the Configmap and the specific Key to find the unique artifactRepository.
It will only look under the current namespace, so you need to make sure this Configmap exists.
Or write the S3 configuration directly to Workflow (not recommended), like this:
templates:
- name: artifact-example
inputs:
artifacts:
- name: my-input-artifact
path: /my-input-artifact
s3:
endpoint:
bucket: my-aws-bucket-name
key: path/in/bucket/
accessKeySecret:
name: my-aws-s3-credentials
key: accessKey
secretKeySecret:
name: my-aws-s3-credentials
key: secretKey
outputs:
artifacts:
- name: my-output-artifact
path: /my-output-artifact
s3:
endpoint:
bucket: my-gcs-bucket-name
# NOTE that, by default, all output artifacts are automatically tarred and
# gzipped before saving. So as a best practice, .tgz or .
# should be incorporated into the key name so the resulting file
# has an accurate file extension.
key: path/in/bucket/
accessKeySecret:
name: my-gcs-s3-credentials
key: accessKey
secretKeySecret:
name: my-gcs-s3-credentials
key: secretKey
region: my-GCS-storage-bucket-region
container:
image: debian:latest
command: [sh, -c]
args: ["cp -r /my-input-artifact /my-output-artifact"]
It will only look under the current namespace, so you need to make sure this Configmap exists.
wrap-up
Three ways are included:
- 1) Global Configuration
- 2) Namespace default configuration
- 3) Specify the configuration in Workflow
Note 📢: Since S3 AKSKs are stored as secrets, theAll three configurations require that the Secret be synchronized to the Workflow host namespaceOtherwise, it will not be available in the Pod and Workflow will not function properly.
It would be nice if ArgoWorkflow could take over automatically by using the/mittwald/kubernetes-replicator to automatically synchronize
Differences between the three approaches:
- Global Configuration All you need globally is a Configmap to specify the S3 information, and all Workflows use this S3 configuration, which is simple, but not flexible enough.
- Namespace Default Configuration: This approach allows you to configure different S3s for different namespaces, but you need to create a Configmap for each namespace.
- Specifying Configuration in Workflow: This approach is the most flexible, you can specify different S3 for different Workflow, but you need to create many Configmaps.
Usage Scenarios:
If there is only one S3 configuration globally, it is easiest to use the global configuration method.
If the tenants are namespace-separated and use different S3s, it's just as well to use the namespace default configuration.
If none of the above is satisfied, it is recommended to use the configuration method specified in Workflow.
2. Using artifacts in Workflow
key-only-artifacts
When S3 configuration information is not explicitly specified in Workflow, argo automatically looks for artifact-repository configurations according to the previous priority.
The configuration under Namespace is preferred, while the global configuration is used if there is none.
A complete demo is shown below:
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: whalesay
- - name: consume-artifact
template: print-message
arguments:
artifacts:
# bind message to the hello-art artifact
# generated by the generate-artifact step
- name: message
from: "{{-art}}"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello world | tee /tmp/hello_world.txt"]
outputs:
artifacts:
# generate hello-art artifact from /tmp/hello_world.txt
# artifacts can be directories as well as files
- name: hello-art
path: /tmp/hello_world.txt
- name: print-message
inputs:
artifacts:
# unpack the message input artifact
# and put it at /tmp/message
- name: message
path: /tmp/message
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
The first step, by
In the first step, a file is created with the tee command and output via outputs, and since artifacts are specified, the file is stored in S3.
The second step then specifies that an artifact named message be read from S3 and stored in the /tmp/message directory.
The question is, where does the artifact read in step 2 come from? It's the artifact specified in steps, associated by name.
The whole logic is basically the same as parameter
-
1) whalesay template passes to assert that the current template will output an artifact.
-
(2) The print-message declares the need for an artifact and specifies where it should be stored.
-
(3) Steps specifies an artifact, the source of which is the output in 1, when using the template via the
{{.$name}}
Syntax Citation.
artifact-passing-vzp2r-1469537892:
boundaryID: artifact-passing-vzp2r
displayName: generate-artifact
finishedAt: "2024-03-29T08:42:34Z"
hostNodeName: lixd-argo
id: artifact-passing-vzp2r-1469537892
message: 'Error (exit code 1): You need to configure artifact storage. More
information on how to do this can be found in the docs: /en/release-3.5/configure-artifact-repository/'
name: artifact-passing-vzp2r[0].generate-artifact
artifact compression
By default, all artifacts are tarballed and gzip-compressed, which can be done with thearchive
field to configure compression:
- Default behavior: tar + gzip
- Optionally disable tar+ gzip
- Or configure the gzip compression level
<... snipped ...>
outputs:
artifacts:
# default behavior - tar+gzip default compression.
- name: hello-art-1
path: /tmp/hello_world.txt
# disable archiving entirely - upload the file / directory as is.
# this is useful when the container layout matches the desired target repository layout.
- name: hello-art-2
path: /tmp/hello_world.txt
archive:
none: {}
# customize the compression behavior (disabling it here).
# this is useful for files with varying compression benefits,
# . disabling compression for a cached build workspace and large binaries,
# or increasing compression for "perfect" textual data - like a json/xml export of a large database.
- name: hello-art-3
path: /tmp/hello_world.txt
archive:
tar:
# no compression (also accepts the standard gzip 1 to 9 values)
compressionLevel: 0
<... snipped ...>
Artifact Garbage Collection
All Artifacts are uploaded to S3, and garbage removal is an issue to keep S3 from filling up.
The good news is that starting with argo-workflow 3.4, it is possible to add configurations to Workflow to enable automatic deletion of unneeded Artifacts.
Two recycling strategies are currently offered, which are:
-
OnWorkflowCompletion
: Delete the workflow once it has finished running -
OnWorkflowDeletion
: Delete only when the workflow is deleted
You can also configure a recycling policy for all artifacts in Workflow, or you can configure a recycling policy for each artifact individually.
The demo is as follows:
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion # default Strategy set here applies to all Artifacts by default
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/
echo "keep this" > /tmp/
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/
s3:
key:
- name: keep-this
path: /tmp/
s3:
key:
artifactGC:
strategy: Never # optional override for an Artifact
The core components are as follows:
spec:
entrypoint: main
# because of Workflow All of the artifact Unified Configuration
artifactGC:
strategy: OnWorkflowDeletion # default Strategy set here applies to all Artifacts by default
# Individual designation artifact recovery strategy
outputs:
artifacts:
- name: temporary-artifact
artifactGC:
strategy: Never # optional override for an Artifact
Note: To avoid the problem of artifacts being mistakenly deleted when the same workflow runs concurrently, you can configure different artifact repositories for different workflows.
forceFinalizerRemoval
argo-workflow will start a<wfName>-artgc-*
The Pod is named in the format to perform garbage collection, and if it fails, the entire Workflow is marked as failed.
Meanwhile, due tofinalizers
It hasn't been deleted.
apiVersion: /v1alpha1
kind: Workflow
finalizers:
- /artifact-gc
This will cause the workflow to be unable to be deleted, and can be removed by executing the following command
kubectl patch workflow my-wf \
--type json \
--patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
To optimize the experience, argo-workflow version 3.5 adds the forceFinalizerRemoval parameter.
spec:
artifactGC:
strategy: OnWorkflowDeletion
forceFinalizerRemoval: true
so long asforceFinalizerRemoval
Set to true to remove finalizers even if GC fails.
Popular Artifacts Extensions
In addition to S3 Artifacts, argo-workflow has built-in git and http methods to retrieve artifacts for ease of use.
You can clone code directly from a specific git repository, or download files from a specific url, like this:
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: hardwired-artifact-
spec:
entrypoint: hardwired-artifact
templates:
- name: hardwired-artifact
inputs:
artifacts:
# Check out the main branch of the argo repo and place it at /src
# revision can be anything that git checkout accepts: branch, commit, tag, etc.
- name: argo-source
path: /src
git:
repo: /argoproj/
revision: "main"
# Download kubectl 1.8.0 and place it at /bin/kubectl
- name: kubectl
path: /bin/kubectl
mode: 0755
http:
url: /kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl
# Copy an s3 compatible artifact repository bucket (such as AWS, GCS and MinIO) and place it at /s3
- name: objects
path: /s3
s3:
endpoint:
bucket: my-bucket-name
key: path/in/bucket
accessKeySecret:
name: my-s3-credentials
key: accessKey
secretKeySecret:
name: my-s3-credentials
key: secretKey
container:
image: debian
command: [sh, -c]
args: ["ls -l /src /bin/kubectl /s3"]
3. Demo
Test Points:
- 1) Whether the Namespace created to correspond to Workflow can be used properly
- 2) Is it possible to create the S3 configuration into the Namespace of the Argo deployment without synchronization.
Configmap:
- Name:argo-workflow-argo-workflows-workflow-controller-configmap
- Namespace:argo-dev
- Key:artifactRepository
Minio Preparation
Deploy a local-path-storage csi, or skip this step if you have a different one
kubectl apply -f /rancher/local-path-provisioner/v0.0.24/deploy/
Then deploy minio
helm install minio oci:///bitnamicharts/minio
export ROOT_USER=$(kubectl get secret --namespace default my-release-minio -o jsonpath="{.-user}" | base64 -d)
export ROOT_PASSWORD=$(kubectl get secret --namespace default my-release-minio -o jsonpath="{.-password}" | base64 -d)
Configuring the artifact-repository
The complete content is below:
apiVersion: v1
kind: ConfigMap
metadata:
name: artifact-repositories
annotations:
/default-artifact-repository: my-artifact-repository
data:
my-artifact-repository: |
s3:
bucket: argo
endpoint: :9000
insecure: true
accessKeySecret:
name: my-s3-secret
key: accessKey
secretKeySecret:
name: my-s3-secret
key: secretKey
The complete content is below:
apiVersion: v1
stringData:
accessKey: admin
secretKey: minioadmin
kind: Secret
metadata:
name: my-s3-secret
type: Opaque
Create an artifact repository configuration
kubectl apply -f
kubectl apply -f
Using artifacts in Workflow
Two steps:
- generate: generates a file and writes it to S3 via the
- consume: Use to read a file from S3 and print the contents.
The complete content is below:
apiVersion: /v1alpha1
kind: Workflow
metadata:
generateName: key-only-artifacts-
spec:
entrypoint: main
templates:
- name: main
dag:
tasks:
- name: generate
template: generate
- name: consume
template: consume
dependencies:
- generate
- name: generate
container:
image: argoproj/argosay:v2
args: [ echo, hello, /mnt/file ]
outputs:
artifacts:
- name: file
path: /mnt/file
s3:
key: my-file
- name: consume
container:
image: argoproj/argosay:v2
args: [cat, /tmp/file]
inputs:
artifacts:
- name: file
path: /tmp/file
s3:
key: my-file
Creating a Workflow
kubectl create -f
Waiting for the run to complete
[root@lixd-argo artiface]# kubectl get wf
NAME STATUS AGE MESSAGE
key-only-artifacts-9r84h Succeeded 2m30s
S3 View File
Go to S3 to see if the file exists
As you can see, under the argo bucket there is a file namedmy-file
file exists, and the context-type is application/gzip, which verifies that argo will tar+gzip the artifact.
4. Summary
[ArgoWorkflow Series]Continuously updated, search the public number [Explore Cloud Native]Subscribe to read more articles.
This article analyzes the use of artifact in argo, including how to configure artifact-repository:
Three ways are included:
- 1) Global Configuration
- 2) Namespace default configuration
- 3) Specify the configuration in Workflow
And how to use artifacts in Workflow, demonstrated by a demo.