ArgoWorkflow Tutorial (III) - Using Artifacts for Inter-Step File Sharing

In the previous article, we analyzed the relationship between Workflow, WorkflowTemplate, and template. This article mainly analyzes how to use S3 storage artifact in argo-workflow to achieve file sharing between steps.

This paper addresses two main issues:

1) how to configure artifact-repository
2) How to use in Workflow

1. artifact-repository configuration

ArgoWorkflow interfaces with S3 for persistence and relies on artifact-repository configuration.

There are three ways to set the relevant configuration:

1) Global Configuration: Write the S3 configuration directly in the workflow-controller deploy by means of a configuration file.Specify the artifactRepository to be used globally.This approach has the lowest priority and can be replaced by the next two approaches.
2) Namespace default configuration: ArgoWorkflow looks in the Workflow's namespace for theDefault configuration of the current namespace, which configures priority two and can override the globally specified configuration.
- Requirement: The Workflow namespace will be searched for a file namedartifact-repositories Configmap as a configuration.
3) Specify the configuration in Workflow: You can also explicitly specify which artifact-repository to use in Workflow, which has the highest priority.

Note 📢：Regardless of how the artifact-repository is specified, the Secret in which the S3 AKSK information is stored must be synchronized to the Workflow Only the namespace where the。

Priority InWorkflowConfig > Namespace > Global

global configuration

ArgoWorkflow deployed as helm will specify the configuration in this form by default.

workflow-controller (used form a nominal expression) deployment yaml as below：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argo-workflow-argo-workflows-workflow-controller
  namespace: argo-dev
spec:
  template:
    metadata:
    spec:
      containers:
      - args:
        - --configmap
        - argo-workflow-argo-workflows-workflow-controller-configmap
        - --executor-image
        - /argoproj/argoexec:v3.4.11
        - --loglevel
        - info
        - --gloglevel
        - "0"
        - --log-format
        - text

As you can see in the startup command with--configmap argo-workflow-argo-workflows-workflow-controller-configmap way specifies the Configmap from which the configuration file originated.

The contents of this Configmap are as follows:

apiVersion: v1
data:
  # ... an omission
  artifactRepository: |
    s3:
      endpoint: :9000
      bucket: argo
      insecure: true
      accessKeySecret:
        name: my-s3-secret
        key: accessKey
      secretKeySecret:
        name: my-s3-secret
        key: secretKey
kind: ConfigMap
metadata:
  name: argo-workflows-workflow-controller-configmap
  namespace: argo

Includes information about S3 endpoint, bucket, aksk, etc. With this information Workflow can access S3.

Namespace Default Configuration

According to the current implementation, ArgoWorkflowThe default artifactRepository configuration in the Workflow namespace takes precedence.。

The default will be to use a file namedartifact-repositories Configmap as the artifactRepository configuration for Workflow in the current namespace, the contents of the Configmap will look something like this:

Note: The Configmap name must be artifact-repositories.

apiVersion: v1
kind: ConfigMap
metadata:
  # If you want to use this config map by default, name it "artifact-repositories". Otherwise, you can provide a reference to a
  # different config map in ``.
  name: artifact-repositories
  annotations:
    # v3.0 and after - if you want to use a specific key, put that key into this annotation.
    /default-artifact-repository: my-artifact-repository
data:
  my-artifact-repository: |
    s3:
      bucket: lixd-argo
      endpoint: :9000
      insecure: true
      accessKeySecret:
        name: my-s3-secret
        key: accessKey
      secretKeySecret:
        name: my-s3-secret
        key: secretKey
 # It is possible to write more than one Repository
 my-artifact-repository2： ...

Each Key in Data corresponds to a Repository, and then theutilization /default-artifact-repository annotation to specify which artifactRepository to use by default.。

For example, this specifies my-artifact-repository as the default artifactRepository.

Specify the configuration in Workflow

In addition, you can specify which artifactRepository you want to use directly in Workflow.

spec:
  artifactRepositoryRef:
    configMap: my-artifact-repository # default is "artifact-repositories"
    key: v2-s3-artifact-repository # default can be set by the `/default-artifact-repository` annotation in config map.

You need to specify the Configmap and the specific Key to find the unique artifactRepository.

It will only look under the current namespace, so you need to make sure this Configmap exists.

Or write the S3 configuration directly to Workflow (not recommended), like this:

  templates:
  - name: artifact-example
    inputs:
      artifacts:
      - name: my-input-artifact
        path: /my-input-artifact
        s3:
          endpoint: 
          bucket: my-aws-bucket-name
          key: path/in/bucket/
          accessKeySecret:
            name: my-aws-s3-credentials
            key: accessKey
          secretKeySecret:
            name: my-aws-s3-credentials
            key: secretKey
    outputs:
      artifacts:
      - name: my-output-artifact
        path: /my-output-artifact
        s3:
          endpoint: 
          bucket: my-gcs-bucket-name
          # NOTE that, by default, all output artifacts are automatically tarred and
          # gzipped before saving. So as a best practice, .tgz or .
          # should be incorporated into the key name so the resulting file
          # has an accurate file extension.
          key: path/in/bucket/
          accessKeySecret:
            name: my-gcs-s3-credentials
            key: accessKey
          secretKeySecret:
            name: my-gcs-s3-credentials
            key: secretKey
          region: my-GCS-storage-bucket-region
    container:
      image: debian:latest
      command: [sh, -c]
      args: ["cp -r /my-input-artifact /my-output-artifact"]

It will only look under the current namespace, so you need to make sure this Configmap exists.

wrap-up

Three ways are included:

1) Global Configuration
2) Namespace default configuration
3) Specify the configuration in Workflow

Note 📢: Since S3 AKSKs are stored as secrets, theAll three configurations require that the Secret be synchronized to the Workflow host namespaceOtherwise, it will not be available in the Pod and Workflow will not function properly.

It would be nice if ArgoWorkflow could take over automatically by using the/mittwald/kubernetes-replicator to automatically synchronize

Differences between the three approaches：

Global Configuration All you need globally is a Configmap to specify the S3 information, and all Workflows use this S3 configuration, which is simple, but not flexible enough.
Namespace Default Configuration: This approach allows you to configure different S3s for different namespaces, but you need to create a Configmap for each namespace.
Specifying Configuration in Workflow: This approach is the most flexible, you can specify different S3 for different Workflow, but you need to create many Configmaps.

Usage Scenarios：

If there is only one S3 configuration globally, it is easiest to use the global configuration method.

If the tenants are namespace-separated and use different S3s, it's just as well to use the namespace default configuration.

If none of the above is satisfied, it is recommended to use the configuration method specified in Workflow.

2. Using artifacts in Workflow

key-only-artifacts

When S3 configuration information is not explicitly specified in Workflow, argo automatically looks for artifact-repository configurations according to the previous priority.

The configuration under Namespace is preferred, while the global configuration is used if there is none.

A complete demo is shown below:

apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: whalesay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          # bind message to the hello-art artifact
          # generated by the generate-artifact step
          - name: message
            from: "{{-art}}"

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      # generate hello-art artifact from /tmp/hello_world.txt
      # artifacts can be directories as well as files
      - name: hello-art
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      # unpack the message input artifact
      # and put it at /tmp/message
      - name: message
        path: /tmp/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/message"]

The first step, by

In the first step, a file is created with the tee command and output via outputs, and since artifacts are specified, the file is stored in S3.

The second step then specifies that an artifact named message be read from S3 and stored in the /tmp/message directory.

The question is, where does the artifact read in step 2 come from? It's the artifact specified in steps, associated by name.

The whole logic is basically the same as parameter

1) whalesay template passes to assert that the current template will output an artifact.
(2) The print-message declares the need for an artifact and specifies where it should be stored.
(3) Steps specifies an artifact, the source of which is the output in 1, when using the template via the{{.$name}} Syntax Citation.

	artifact-passing-vzp2r-1469537892:
      boundaryID: artifact-passing-vzp2r
      displayName: generate-artifact
      finishedAt: "2024-03-29T08:42:34Z"
      hostNodeName: lixd-argo
      id: artifact-passing-vzp2r-1469537892
      message: 'Error (exit code 1): You need to configure artifact storage. More
        information on how to do this can be found in the docs: /en/release-3.5/configure-artifact-repository/'
      name: artifact-passing-vzp2r[0].generate-artifact

artifact compression

By default, all artifacts are tarballed and gzip-compressed, which can be done with thearchive field to configure compression:

Default behavior: tar + gzip
Optionally disable tar+ gzip
Or configure the gzip compression level

<... snipped ...>
    outputs:
      artifacts:
        # default behavior - tar+gzip default compression.
      - name: hello-art-1
        path: /tmp/hello_world.txt

        # disable archiving entirely - upload the file / directory as is.
        # this is useful when the container layout matches the desired target repository layout.   
      - name: hello-art-2
        path: /tmp/hello_world.txt
        archive:
          none: {}

        # customize the compression behavior (disabling it here).
        # this is useful for files with varying compression benefits, 
        # . disabling compression for a cached build workspace and large binaries, 
        # or increasing compression for "perfect" textual data - like a json/xml export of a large database.
      - name: hello-art-3
        path: /tmp/hello_world.txt
        archive:
          tar:
            # no compression (also accepts the standard gzip 1 to 9 values)
            compressionLevel: 0
<... snipped ...>

Artifact Garbage Collection

All Artifacts are uploaded to S3, and garbage removal is an issue to keep S3 from filling up.

The good news is that starting with argo-workflow 3.4, it is possible to add configurations to Workflow to enable automatic deletion of unneeded Artifacts.

Two recycling strategies are currently offered, which are:

OnWorkflowCompletion: Delete the workflow once it has finished running
OnWorkflowDeletion: Delete only when the workflow is deleted

You can also configure a recycling policy for all artifacts in Workflow, or you can configure a recycling policy for each artifact individually.

The demo is as follows:

apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion  # default Strategy set here applies to all Artifacts by default
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/
            echo "keep this" > /tmp/
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/
            s3:
              key: 
          - name: keep-this
            path: /tmp/
            s3:
              key: 
            artifactGC:
              strategy: Never   # optional override for an Artifact

The core components are as follows:

spec:
  entrypoint: main
  # because of Workflow All of the artifact Unified Configuration
  artifactGC:
    strategy: OnWorkflowDeletion # default Strategy set here applies to all Artifacts by default
# Individual designation artifact recovery strategy
			outputs:
        artifacts:
          - name: temporary-artifact
            artifactGC:
              strategy: Never # optional override for an Artifact

Note: To avoid the problem of artifacts being mistakenly deleted when the same workflow runs concurrently, you can configure different artifact repositories for different workflows.

forceFinalizerRemoval

argo-workflow will start a<wfName>-artgc-* The Pod is named in the format to perform garbage collection, and if it fails, the entire Workflow is marked as failed.

Meanwhile, due tofinalizers It hasn't been deleted.

apiVersion: /v1alpha1
kind: Workflow
  finalizers:
  - /artifact-gc

This will cause the workflow to be unable to be deleted, and can be removed by executing the following command

kubectl patch workflow my-wf \
    --type json \
    --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

To optimize the experience, argo-workflow version 3.5 adds the forceFinalizerRemoval parameter.

spec:
  artifactGC:
    strategy: OnWorkflowDeletion 
    forceFinalizerRemoval: true

so long asforceFinalizerRemoval Set to true to remove finalizers even if GC fails.

Popular Artifacts Extensions

In addition to S3 Artifacts, argo-workflow has built-in git and http methods to retrieve artifacts for ease of use.

You can clone code directly from a specific git repository, or download files from a specific url, like this:

apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: hardwired-artifact-
spec:
  entrypoint: hardwired-artifact
  templates:
  - name: hardwired-artifact
    inputs:
      artifacts:
      # Check out the main branch of the argo repo and place it at /src
      # revision can be anything that git checkout accepts: branch, commit, tag, etc.
      - name: argo-source
        path: /src
        git:
          repo: /argoproj/
          revision: "main"
      # Download kubectl 1.8.0 and place it at /bin/kubectl
      - name: kubectl
        path: /bin/kubectl
        mode: 0755
        http:
          url: /kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl
      # Copy an s3 compatible artifact repository bucket (such as AWS, GCS and MinIO) and place it at /s3
      - name: objects
        path: /s3
        s3:
          endpoint: 
          bucket: my-bucket-name
          key: path/in/bucket
          accessKeySecret:
            name: my-s3-credentials
            key: accessKey
          secretKeySecret:
            name: my-s3-credentials
            key: secretKey
    container:
      image: debian
      command: [sh, -c]
      args: ["ls -l /src /bin/kubectl /s3"]

3. Demo

Test Points:

1) Whether the Namespace created to correspond to Workflow can be used properly
2) Is it possible to create the S3 configuration into the Namespace of the Argo deployment without synchronization.

Configmap:

Name：argo-workflow-argo-workflows-workflow-controller-configmap
Namespace：argo-dev
Key：artifactRepository

Minio Preparation

Deploy a local-path-storage csi, or skip this step if you have a different one

kubectl apply -f /rancher/local-path-provisioner/v0.0.24/deploy/

Then deploy minio

helm install minio oci:///bitnamicharts/minio

    
   
   export ROOT_USER=$(kubectl get secret --namespace default my-release-minio -o jsonpath="{.-user}" | base64 -d)
   export ROOT_PASSWORD=$(kubectl get secret --namespace default my-release-minio -o jsonpath="{.-password}" | base64 -d)

Configuring the artifact-repository

The complete content is below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: artifact-repositories
  annotations:
    /default-artifact-repository: my-artifact-repository
data:
  my-artifact-repository: |
    s3:
      bucket: argo
      endpoint: :9000
      insecure: true
      accessKeySecret:
        name: my-s3-secret
        key: accessKey
      secretKeySecret:
        name: my-s3-secret
        key: secretKey

The complete content is below:

apiVersion: v1
stringData:
  accessKey: admin
  secretKey: minioadmin
kind: Secret
metadata:
  name: my-s3-secret
type: Opaque

Create an artifact repository configuration

kubectl apply -f 
kubectl apply -f

Using artifacts in Workflow

Two steps:

generate: generates a file and writes it to S3 via the
consume: Use to read a file from S3 and print the contents.

The complete content is below:

apiVersion: /v1alpha1
kind: Workflow
metadata:
  generateName: key-only-artifacts-
spec:
  entrypoint: main
  templates:
    - name: main
      dag:
        tasks:
          - name: generate
            template: generate
          - name: consume
            template: consume
            dependencies:
              - generate
    - name: generate
      container:
        image: argoproj/argosay:v2
        args: [ echo, hello, /mnt/file ]
      outputs:
        artifacts:
          - name: file
            path: /mnt/file
            s3:
              key: my-file
    - name: consume
      container:
        image: argoproj/argosay:v2
        args: [cat, /tmp/file]
      inputs:
        artifacts:
          - name: file
            path: /tmp/file
            s3:
              key: my-file

Creating a Workflow

kubectl create -f

Waiting for the run to complete

[root@lixd-argo artiface]# kubectl get wf
NAME                                  STATUS      AGE     MESSAGE
key-only-artifacts-9r84h              Succeeded   2m30s

S3 View File

Go to S3 to see if the file exists

As you can see, under the argo bucket there is a file namedmy-filefile exists, and the context-type is application/gzip, which verifies that argo will tar+gzip the artifact.

argo-artifact-s3

4. Summary

[ArgoWorkflow Series]Continuously updated, search the public number [Explore Cloud Native]Subscribe to read more articles.

This article analyzes the use of artifact in argo, including how to configure artifact-repository:

Three ways are included:

1) Global Configuration
2) Namespace default configuration
3) Specify the configuration in Workflow

And how to use artifacts in Workflow, demonstrated by a demo.