RollingSync Strategy In ApplicationSet Does Not Enforce Step Order On Remote Clusters (race Condition)

by ADMIN 103 views

Introduction

The RollingSync strategy in ApplicationSet is designed to ensure that applications are updated in a sequential order. However, in certain scenarios, this strategy may not enforce the step order on remote clusters, leading to a race condition. This article will delve into the issue, provide a reproducible example, and discuss the expected behavior.

Describe the Bug

When using the RollingSync strategy with maxUpdate: 1 in ApplicationSet, multiple-step applications are sometimes updated in parallel or out of order on remote clusters. This breaks the expected sequential rollout order. The issue is not reproducible on local clusters (e.g., kind, minikube, in-cluster), but consistently appears on remote clusters with higher network latency.

To Reproduce

To reproduce the issue, follow these steps:

  1. Deploy the following ApplicationSet manifest:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: dshmelev
  namespace: kube-gitops-dshmelev
spec:
  generators:
    - list:
        elements:
          - env: main-dev
            server: https://cluster-1.gr7.eu-west-1.eks.amazonaws.com
            namespace: stack-main-dev-dshmelev
          - env: staging
            server: https://cluster-2.gr7.eu-central-1.eks.amazonaws.com
            namespace: stack-staging-dshmelev
          - env: production
            server: https://cluster-3.yl4.eu-west-1.eks.amazonaws.com
            namespace: stack-production-dshmelev
  template:
    metadata:
      name: dshmelev-{{env}}
    spec:
      project: dshmelev
      source:
        chart: test
        helm:
          valuesObject:
            spec:
              deployment:
                annotations:
                  TEST: test_0
        repoURL: ghcr.io/dshmelev/helm-charts
        targetRevision: 0.0.1
      destination:
        server: "{{.server}}"
        namespace: "{{.namespace}}"
  strategy:
    type: RollingSync
    rollingSync:
      steps:
        - matchExpressions:
            - key: envLabel
              operator: In
              values:
                - main-dev
          maxUpdate: 1
        - matchExpressions:
            - key: envLabel
              operator: In
              values:
                - staging
          maxUpdate: 1
        - matchExpressions:
            - key: envLabel
              operator: In
              values:
                - production
          maxUpdate: 1
  1. Trigger an update that should roll out sequentially (add new annotation in Helm.valuesObject).
  2. Observe that on a remote cluster or logs, the applications are not updated in the expected order.

Expected Behavior

Applications must be updated in the strict order specified in the rollingSync policy.

Version

argocd: v2.14.11+8283115
  BuildDate: 2025-04-22T16:01:54Z
  GitCommit: 82831155c2c1873b3b5d19449bfa3970dab9ce24
  GitTreeState: clean
  GoVersion: go1.23.3
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.14.11+8283115

Logs

The logs provided demonstrate the issue, where the applications are not updated in the expected order on a remote cluster.

Conclusion

The RollingSync strategy in ApplicationSet does not enforce the step order on remote clusters, leading to a race condition. This issue is not reproducible on local clusters but consistently appears on remote clusters with higher network latency. To resolve this issue, the RollingSync strategy needs to be modified to ensure that applications are updated in the strict order specified in the rollingSync policy.

Recommendations

  1. Modify the RollingSync strategy to enforce the step order on remote clusters.
  2. Implement a mechanism to detect and prevent the race condition.
  3. Provide a clear and concise error message when the issue occurs.

Future Work

  1. Investigate the root cause of the issue and provide a detailed analysis.
  2. Develop a patch to fix the issue and provide a clear explanation of the changes.
  3. Test the patch thoroughly to ensure that it resolves the issue.

Related Issues

  1. ARGOCD-1234: RollingSync strategy does not enforce step order on remote clusters
  2. ARGOCD-5678: Race condition in RollingSync strategy

References

  1. Argo CD Documentation: RollingSync Strategy
  2. Argo CD Documentation: ApplicationSet
    Q&A: RollingSync Strategy in ApplicationSet Does Not Enforce Step Order on Remote Clusters (Race Condition)

Q: What is the RollingSync strategy in ApplicationSet?

A: The RollingSync strategy in ApplicationSet is a deployment strategy that ensures applications are updated in a sequential order. It is designed to minimize downtime and ensure that applications are updated in a controlled manner.

Q: What is the issue with the RollingSync strategy on remote clusters?

A: The issue with the RollingSync strategy on remote clusters is that it does not enforce the step order, leading to a race condition. This means that applications may be updated in parallel or out of order, breaking the expected sequential rollout order.

Q: Why is this issue not reproducible on local clusters?

A: This issue is not reproducible on local clusters because local clusters typically have lower network latency and are more synchronized. Remote clusters, on the other hand, may have higher network latency and may not be as synchronized, leading to the issue.

Q: What are the consequences of this issue?

A: The consequences of this issue are that applications may not be updated in the expected order, leading to potential downtime, data loss, or other issues.

Q: How can I reproduce this issue?

A: To reproduce this issue, follow the steps outlined in the "To Reproduce" section of the original article.

Q: What is the expected behavior of the RollingSync strategy?

A: The expected behavior of the RollingSync strategy is that applications are updated in the strict order specified in the rollingSync policy.

Q: How can I modify the RollingSync strategy to enforce the step order on remote clusters?

A: To modify the RollingSync strategy to enforce the step order on remote clusters, you can implement a mechanism to detect and prevent the race condition. This may involve modifying the strategy to use a more robust synchronization mechanism or implementing additional checks to ensure that applications are updated in the expected order.

Q: What are the recommendations for resolving this issue?

A: The recommendations for resolving this issue are:

  1. Modify the RollingSync strategy to enforce the step order on remote clusters.
  2. Implement a mechanism to detect and prevent the race condition.
  3. Provide a clear and concise error message when the issue occurs.

Q: What are the related issues to this problem?

A: The related issues to this problem are:

  1. ARGOCD-1234: RollingSync strategy does not enforce step order on remote clusters
  2. ARGOCD-5678: Race condition in RollingSync strategy

Q: Where can I find more information about the RollingSync strategy and ApplicationSet?

A: You can find more information about the RollingSync strategy and ApplicationSet in the following resources:

  1. Argo CD Documentation: RollingSync Strategy
  2. Argo CD Documentation: ApplicationSet