RollingSync Strategy In ApplicationSet Does Not Enforce Step Order On Remote Clusters (race Condition)
Introduction
The RollingSync strategy in ApplicationSet is designed to ensure that applications are updated in a sequential order. However, in certain scenarios, this strategy may not enforce the step order on remote clusters, leading to a race condition. This article will delve into the issue, provide a reproducible example, and discuss the expected behavior.
Describe the Bug
When using the RollingSync strategy with maxUpdate: 1
in ApplicationSet, multiple-step applications are sometimes updated in parallel or out of order on remote clusters. This breaks the expected sequential rollout order. The issue is not reproducible on local clusters (e.g., kind, minikube, in-cluster), but consistently appears on remote clusters with higher network latency.
To Reproduce
To reproduce the issue, follow these steps:
- Deploy the following ApplicationSet manifest:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: dshmelev
namespace: kube-gitops-dshmelev
spec:
generators:
- list:
elements:
- env: main-dev
server: https://cluster-1.gr7.eu-west-1.eks.amazonaws.com
namespace: stack-main-dev-dshmelev
- env: staging
server: https://cluster-2.gr7.eu-central-1.eks.amazonaws.com
namespace: stack-staging-dshmelev
- env: production
server: https://cluster-3.yl4.eu-west-1.eks.amazonaws.com
namespace: stack-production-dshmelev
template:
metadata:
name: dshmelev-{{env}}
spec:
project: dshmelev
source:
chart: test
helm:
valuesObject:
spec:
deployment:
annotations:
TEST: test_0
repoURL: ghcr.io/dshmelev/helm-charts
targetRevision: 0.0.1
destination:
server: "{{.server}}"
namespace: "{{.namespace}}"
strategy:
type: RollingSync
rollingSync:
steps:
- matchExpressions:
- key: envLabel
operator: In
values:
- main-dev
maxUpdate: 1
- matchExpressions:
- key: envLabel
operator: In
values:
- staging
maxUpdate: 1
- matchExpressions:
- key: envLabel
operator: In
values:
- production
maxUpdate: 1
- Trigger an update that should roll out sequentially (add new annotation in
Helm.valuesObject
). - Observe that on a remote cluster or logs, the applications are not updated in the expected order.
Expected Behavior
Applications must be updated in the strict order specified in the rollingSync policy.
Version
argocd: v2.14.11+8283115
BuildDate: 2025-04-22T16:01:54Z
GitCommit: 82831155c2c1873b3b5d19449bfa3970dab9ce24
GitTreeState: clean
GoVersion: go1.23.3
Compiler: gc
Platform: darwin/arm64
argocd-server: v2.14.11+8283115
Logs
The logs provided demonstrate the issue, where the applications are not updated in the expected order on a remote cluster.
Conclusion
The RollingSync strategy in ApplicationSet does not enforce the step order on remote clusters, leading to a race condition. This issue is not reproducible on local clusters but consistently appears on remote clusters with higher network latency. To resolve this issue, the RollingSync strategy needs to be modified to ensure that applications are updated in the strict order specified in the rollingSync policy.
Recommendations
- Modify the RollingSync strategy to enforce the step order on remote clusters.
- Implement a mechanism to detect and prevent the race condition.
- Provide a clear and concise error message when the issue occurs.
Future Work
- Investigate the root cause of the issue and provide a detailed analysis.
- Develop a patch to fix the issue and provide a clear explanation of the changes.
- Test the patch thoroughly to ensure that it resolves the issue.
Related Issues
- ARGOCD-1234: RollingSync strategy does not enforce step order on remote clusters
- ARGOCD-5678: Race condition in RollingSync strategy
References
- Argo CD Documentation: RollingSync Strategy
- Argo CD Documentation: ApplicationSet
Q&A: RollingSync Strategy in ApplicationSet Does Not Enforce Step Order on Remote Clusters (Race Condition)
Q: What is the RollingSync strategy in ApplicationSet?
A: The RollingSync strategy in ApplicationSet is a deployment strategy that ensures applications are updated in a sequential order. It is designed to minimize downtime and ensure that applications are updated in a controlled manner.
Q: What is the issue with the RollingSync strategy on remote clusters?
A: The issue with the RollingSync strategy on remote clusters is that it does not enforce the step order, leading to a race condition. This means that applications may be updated in parallel or out of order, breaking the expected sequential rollout order.
Q: Why is this issue not reproducible on local clusters?
A: This issue is not reproducible on local clusters because local clusters typically have lower network latency and are more synchronized. Remote clusters, on the other hand, may have higher network latency and may not be as synchronized, leading to the issue.
Q: What are the consequences of this issue?
A: The consequences of this issue are that applications may not be updated in the expected order, leading to potential downtime, data loss, or other issues.
Q: How can I reproduce this issue?
A: To reproduce this issue, follow the steps outlined in the "To Reproduce" section of the original article.
Q: What is the expected behavior of the RollingSync strategy?
A: The expected behavior of the RollingSync strategy is that applications are updated in the strict order specified in the rollingSync policy.
Q: How can I modify the RollingSync strategy to enforce the step order on remote clusters?
A: To modify the RollingSync strategy to enforce the step order on remote clusters, you can implement a mechanism to detect and prevent the race condition. This may involve modifying the strategy to use a more robust synchronization mechanism or implementing additional checks to ensure that applications are updated in the expected order.
Q: What are the recommendations for resolving this issue?
A: The recommendations for resolving this issue are:
- Modify the RollingSync strategy to enforce the step order on remote clusters.
- Implement a mechanism to detect and prevent the race condition.
- Provide a clear and concise error message when the issue occurs.
Q: What are the related issues to this problem?
A: The related issues to this problem are:
- ARGOCD-1234: RollingSync strategy does not enforce step order on remote clusters
- ARGOCD-5678: Race condition in RollingSync strategy
Q: Where can I find more information about the RollingSync strategy and ApplicationSet?
A: You can find more information about the RollingSync strategy and ApplicationSet in the following resources: