CI/CD Pipeline: Git Push to Production Deployment

    Introduction CI/CD (Continuous Integration/Continuous Deployment) automates the software delivery process from code commit to production deployment. This automation reduces manual errors, speeds up releases, and improves software quality. This guide visualizes the complete CI/CD pipeline: Code Commit: Developer pushes code Continuous Integration: Automated testing and building Continuous Deployment: Automated deployment to production Quality Gates: Checkpoints ensuring code quality Rollback Mechanisms: Handling deployment failures Part 1: Complete CI/CD Pipeline Overview End-to-End Flow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TD Start([Developer writes codecommits changes]) --> Push[git push origin main] Push --> Webhook[Git Provider WebhookTriggers CI/CD pipeline] Webhook --> Checkout[Stage 1: CheckoutClone repositoryFetch dependencies] Checkout --> Lint[Stage 2: LintCheck code styleESLint, Prettier, golangci-lint] Lint --> LintResult{Lintingpassed?} LintResult -->|No| LintFail[❌ Pipeline FailedNotify developerFix linting errors] LintResult -->|Yes| UnitTest[Stage 3: Unit TestsRun test suiteGenerate coverage report] UnitTest --> TestResult{Testspassed?} TestResult -->|No| TestFail[❌ Pipeline FailedSome tests failedCoverage too low] TestResult -->|Yes| Build[Stage 4: BuildCompile applicationBuild Docker image] Build --> BuildResult{Buildsuccessful?} BuildResult -->|No| BuildFail[❌ Pipeline FailedBuild errorsDependency issues] BuildResult -->|Yes| IntegTest[Stage 5: Integration TestsTest with real dependenciesDatabase, APIs, etc.] IntegTest --> IntegResult{Integrationtests passed?} IntegResult -->|No| IntegFail[❌ Pipeline FailedIntegration issuesService communication errors] IntegResult -->|Yes| Security[Stage 6: Security ScanScan for vulnerabilitiesOWASP, Snyk, Trivy] Security --> SecResult{Securitychecks passed?} SecResult -->|No| SecFail[❌ Pipeline FailedSecurity vulnerabilities foundFix before deploying] SecResult -->|Yes| Push2Registry[Stage 7: Push ImageTag: myapp:abc123Push to container registry] Push2Registry --> DeployStaging[Stage 8: Deploy to Stagingkubectl apply -f staging/Run smoke tests] DeployStaging --> SmokeTest[Stage 9: Smoke TestsTest critical pathsHealth checksBasic functionality] SmokeTest --> SmokeResult{Smoke testspassed?} SmokeResult -->|No| StagingFail[❌ Pipeline FailedStaging deployment issuesRollback staging] SmokeResult -->|Yes| Approval{Manualapprovalrequired?} Approval -->|Yes| WaitApproval[⏸️ Waiting for ApprovalNotify team leadReview changes] WaitApproval --> ApprovalDecision{Approved?} ApprovalDecision -->|No| Rejected[❌ Deployment RejectedNot ready for production] ApprovalDecision -->|Yes| DeployProd Approval -->|No| DeployProd[Stage 10: Deploy to ProductionRolling updateOr blue-green deployment] DeployProd --> ProdHealth{Productionhealthy?} ProdHealth -->|No| AutoRollback[❌ Auto-RollbackRevert to previous versionAlert on-call team] ProdHealth -->|Yes| Success[✅ Deployment Successful!Monitor metricsNotify teamUpdate status] style LintFail fill:#7f1d1d,stroke:#ef4444 style TestFail fill:#7f1d1d,stroke:#ef4444 style BuildFail fill:#7f1d1d,stroke:#ef4444 style IntegFail fill:#7f1d1d,stroke:#ef4444 style SecFail fill:#7f1d1d,stroke:#ef4444 style StagingFail fill:#7f1d1d,stroke:#ef4444 style AutoRollback fill:#7f1d1d,stroke:#ef4444 style Success fill:#064e3b,stroke:#10b981 style WaitApproval fill:#78350f,stroke:#f59e0b Part 2: Continuous Integration (CI) Stages CI Pipeline Detailed Flow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% sequenceDiagram participant Dev as Developer participant Git as Git Repository participant CI as CI Server participant Docker as Docker Registry participant Notify as Slack/Email Dev->>Git: git push origin feature/new-api Note over Git: Webhook triggeredon push event Git->>CI: Trigger pipeline:Branch: feature/new-apiCommit: abc123Author: [email protected] CI->>CI: Create build environmentUbuntu 22.04 container CI->>Git: git clone --depth 1Checkout abc123 Note over CI: Stage 1: Setup CI->>CI: Install dependenciesnpm installgo mod download Note over CI: Stage 2: Code Quality CI->>CI: Run lintereslint src/golangci-lint run alt Linting Failed CI->>Notify: ❌ Linting failed26 issues foundFix formatting CI-->>Dev: Pipeline failed end Note over CI: Stage 3: Unit Testing CI->>CI: Run unit testsnpm testgo test ./... CI->>CI: Generate coverage reportCoverage: 87% alt Tests Failed or Low Coverage CI->>Notify: ❌ Tests failed5 tests failingCoverage: 72% < 80% CI-->>Dev: Pipeline failed end Note over CI: Stage 4: Build CI->>CI: Build applicationnpm run buildgo build -o app CI->>CI: Build Docker imagedocker build -t myapp:abc123 alt Build Failed CI->>Notify: ❌ Build failedCompilation errors CI-->>Dev: Pipeline failed end Note over CI: Stage 5: Integration Tests CI->>CI: Start test dependenciesdocker-compose up -dpostgres, redis CI->>CI: Run integration testsTest database connectionsTest API endpoints CI->>CI: Stop test servicesdocker-compose down alt Integration Tests Failed CI->>Notify: ❌ Integration tests failedDatabase connection timeout CI-->>Dev: Pipeline failed end Note over CI: Stage 6: Security Scanning CI->>CI: Scan dependenciesnpm auditsnyk test CI->>CI: Scan Docker imagetrivy image myapp:abc123 alt Security Issues Found CI->>Notify: ⚠️ Security issues3 high severity CVEsUpdate dependencies CI-->>Dev: Pipeline failed end Note over CI: All checks passed! ✓ CI->>Docker: docker push myapp:abc123Tag: myapp:latest Docker-->>CI: Image pushed successfully CI->>Notify: ✅ Build successful!Image: myapp:abc123Ready for deployment CI-->>Dev: Pipeline succeededDuration: 8m 32s GitHub Actions CI Configuration # .github/workflows/ci.yml name: CI Pipeline on: push: branches: [ main, develop ] pull_request: branches: [ main ] env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: # Job 1: Code Quality Checks lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install dependencies run: npm ci - name: Run ESLint run: npm run lint - name: Run Prettier run: npm run format:check # Job 2: Unit Tests test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install dependencies run: npm ci - name: Run tests run: npm test -- --coverage - name: Check coverage threshold run: | COVERAGE=$(cat coverage/coverage-summary.json | jq '.total.lines.pct') if (( $(echo "$COVERAGE < 80" | bc -l) )); then echo "Coverage $COVERAGE% is below 80%" exit 1 fi - name: Upload coverage to Codecov uses: codecov/codecov-action@v3 # Job 3: Build build: runs-on: ubuntu-latest needs: [lint, test] # Wait for lint and test to pass steps: - uses: actions/checkout@v3 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Log in to GitHub Container Registry uses: docker/login-action@v2 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v4 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=sha,prefix={{branch}}- type=ref,event=branch type=ref,event=pr - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} cache-from: type=gha cache-to: type=gha,mode=max # Job 4: Integration Tests integration-test: runs-on: ubuntu-latest needs: build services: postgres: image: postgres:15 env: POSTGRES_PASSWORD: postgres options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 redis: image: redis:7 options: >- --health-cmd "redis-cli ping" --health-interval 10s --health-timeout 5s --health-retries 5 steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install dependencies run: npm ci - name: Run integration tests run: npm run test:integration env: DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test REDIS_URL: redis://localhost:6379 # Job 5: Security Scan security: runs-on: ubuntu-latest needs: build steps: - uses: actions/checkout@v3 - name: Run npm audit run: npm audit --audit-level=high - name: Run Snyk security scan uses: snyk/actions/node@master env: SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} - name: Scan Docker image with Trivy uses: aquasecurity/trivy-action@master with: image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} format: 'sarif' output: 'trivy-results.sarif' - name: Upload Trivy results to GitHub Security uses: github/codeql-action/upload-sarif@v2 with: sarif_file: 'trivy-results.sarif' Part 3: Continuous Deployment (CD) Stages Deployment Pipeline Flow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TD Start([CI Pipeline PassedImage ready: myapp:abc123]) --> DeployDecision{Whichbranch?} DeployDecision -->|feature/*| SkipDeploy[Skip deploymentCI only forfeature branches] DeployDecision -->|develop| DeployDev[Deploy to Dev EnvironmentNamespace: devAuto-deploy on push] DeployDecision -->|main| DeployStaging[Deploy to StagingNamespace: stagingAuto-deploy on push] DeployDev --> DevSmoke[Run smoke testsBasic health checks] DevSmoke --> DevDone[✅ Dev deployment complete] DeployStaging --> UpdateManifest[Update Kubernetes manifestsimage: myapp:abc123Apply configuration] UpdateManifest --> ApplyStaging[kubectl apply -f k8s/staging/Create/Update resourcesWait for rollout] ApplyStaging --> WaitReady{All podsready?} WaitReady -->|No timeout| CheckHealth[Check pod statuskubectl get pods -n staging] CheckHealth --> HealthStatus{Healthy?} HealthStatus -->|No| RollbackStaging[❌ Rollback stagingkubectl rollout undodeployment myapp -n staging] RollbackStaging --> NotifyFail[Notify team:Staging deployment failedCheck logs and fix] HealthStatus -->|Yes| StagingSmoke[Run staging smoke tests- Health endpoint- Critical API endpoints- Database connectivity] StagingSmoke --> SmokePass{Smoke testspassed?} SmokePass -->|No| RollbackStaging SmokePass -->|Yes| StagingReady[✅ Staging ReadyAll tests passedReady for production] StagingReady --> ApprovalGate{Require manualapproval?} ApprovalGate -->|Yes| WaitApproval[⏸️ Wait for approvalCreate deployment requestNotify reviewers] WaitApproval --> ReviewDecision{Approvedby reviewer?} ReviewDecision -->|No| Rejected[❌ Deployment rejectedFeedback providedMake changes] ReviewDecision -->|Yes| DeployProd ApprovalGate -->|No| DeployProd[Deploy to ProductionNamespace: productionStrategy: Rolling update] DeployProd --> BackupProd[Create backup:- Current deployment state- Database snapshot- Config backup] BackupProd --> ApplyProd[kubectl apply -f k8s/prod/Rolling update:maxSurge: 1maxUnavailable: 0] ApplyProd --> MonitorRollout[Monitor rollout statuskubectl rollout statusdeployment myapp -n production] MonitorRollout --> ProdHealth{All new podshealthy?} ProdHealth -->|No| AutoRollback[🚨 Auto-rollback triggeredkubectl rollout undoRestore previous version] AutoRollback --> AlertTeam[Alert on-call teamPagerDuty notificationProduction incident] ProdHealth -->|Yes| ProdMonitor[Monitor production metrics- Error rates- Latency- Business KPIs] ProdMonitor --> MetricsOK{Metricshealthy for10 minutes?} MetricsOK -->|No| AutoRollback MetricsOK -->|Yes| Complete[✅ Deployment Complete!Production healthyNew version liveUpdate status page] Complete --> CleanupOld[Cleanup old resourcesRemove old replica setsPrune old images] style SkipDeploy fill:#1e3a8a,stroke:#3b82f6 style WaitApproval fill:#78350f,stroke:#f59e0b style RollbackStaging fill:#7f1d1d,stroke:#ef4444 style AutoRollback fill:#7f1d1d,stroke:#ef4444 style Complete fill:#064e3b,stroke:#10b981 style DevDone fill:#064e3b,stroke:#10b981 Part 4: Quality Gates Quality Gate Decision Flow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TD Start([Code ready to deploy]) --> Gate1{Quality Gate 1:Code Quality} Gate1 --> CheckLint[Check LintingESLint, Prettier] Gate1 --> CheckComplexity[Check ComplexityCyclomatic complexity< 10 per function] Gate1 --> CheckDuplication[Check DuplicationCode duplication < 3%] CheckLint --> LintScore{Pass?} CheckComplexity --> ComplexScore{Pass?} CheckDuplication --> DupScore{Pass?} LintScore -->|No| Fail1[❌ Gate 1 Failed] ComplexScore -->|No| Fail1 DupScore -->|No| Fail1 LintScore -->|Yes| Gate2{Quality Gate 2:Testing} ComplexScore -->|Yes| Gate2 DupScore -->|Yes| Gate2 Gate2 --> CheckCoverage[Check CoverageLine coverage >= 80%Branch coverage >= 75%] Gate2 --> CheckTests[All Tests PassUnit + Integration] Gate2 --> CheckPerf[Performance TestsResponse time < baseline] CheckCoverage --> CovScore{Pass?} CheckTests --> TestScore{Pass?} CheckPerf --> PerfScore{Pass?} CovScore -->|No| Fail2[❌ Gate 2 Failed] TestScore -->|No| Fail2 PerfScore -->|No| Fail2 CovScore -->|Yes| Gate3{Quality Gate 3:Security} TestScore -->|Yes| Gate3 PerfScore -->|Yes| Gate3 Gate3 --> CheckVuln[Scan VulnerabilitiesNo high/critical CVEs] Gate3 --> CheckSecrets[Check for SecretsNo hardcoded credentials] Gate3 --> CheckDeps[Dependency CheckAll deps up-to-date] CheckVuln --> VulnScore{Pass?} CheckSecrets --> SecretScore{Pass?} CheckDeps --> DepScore{Pass?} VulnScore -->|No| Fail3[❌ Gate 3 Failed] SecretScore -->|No| Fail3 DepScore -->|No| Fail3 VulnScore -->|Yes| Gate4{Quality Gate 4:Production Readiness} SecretScore -->|Yes| Gate4 DepScore -->|Yes| Gate4 Gate4 --> CheckHealth[Health ChecksLiveness + Readiness] Gate4 --> CheckResources[Resource LimitsCPU + Memory defined] Gate4 --> CheckDocs[DocumentationREADME + API docs] CheckHealth --> HealthScore{Pass?} CheckResources --> ResScore{Pass?} CheckDocs --> DocScore{Pass?} HealthScore -->|No| Fail4[❌ Gate 4 Failed] ResScore -->|No| Fail4 DocScore -->|No| Fail4 HealthScore -->|Yes| AllGates[✅ All Quality Gates Passed!Ready for deployment] ResScore -->|Yes| AllGates DocScore -->|Yes| AllGates Fail1 --> Block[Block deploymentFix issues first] Fail2 --> Block Fail3 --> Block Fail4 --> Block style Fail1 fill:#7f1d1d,stroke:#ef4444 style Fail2 fill:#7f1d1d,stroke:#ef4444 style Fail3 fill:#7f1d1d,stroke:#ef4444 style Fail4 fill:#7f1d1d,stroke:#ef4444 style AllGates fill:#064e3b,stroke:#10b981 Part 5: GitLab CI/CD Example .gitlab-ci.yml Configuration # .gitlab-ci.yml stages: - lint - test - build - security - deploy-staging - deploy-production variables: DOCKER_DRIVER: overlay2 DOCKER_TLS_CERTDIR: "/certs" IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA # Template for Docker jobs .docker-login: &docker-login before_script: - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY # Stage 1: Linting lint:code: stage: lint image: node:18 script: - npm ci - npm run lint - npm run format:check cache: paths: - node_modules/ # Stage 2: Testing test:unit: stage: test image: node:18 script: - npm ci - npm test -- --coverage - | COVERAGE=$(cat coverage/coverage-summary.json | jq '.total.lines.pct') if (( $(echo "$COVERAGE < 80" | bc -l) )); then echo "Coverage $COVERAGE% is below threshold" exit 1 fi coverage: '/Lines\s*:\s*(\d+\.\d+)%/' artifacts: reports: coverage_report: coverage_format: cobertura path: coverage/cobertura-coverage.xml test:integration: stage: test image: node:18 services: - name: postgres:15 alias: postgres - name: redis:7 alias: redis variables: DATABASE_URL: postgresql://postgres:postgres@postgres:5432/test REDIS_URL: redis://redis:6379 script: - npm ci - npm run test:integration # Stage 3: Build build:image: stage: build image: docker:24 services: - docker:24-dind <<: *docker-login script: - docker build -t $IMAGE_TAG . - docker push $IMAGE_TAG - docker tag $IMAGE_TAG $CI_REGISTRY_IMAGE:latest - docker push $CI_REGISTRY_IMAGE:latest only: - main - develop # Stage 4: Security Scanning security:scan: stage: security image: aquasec/trivy:latest script: - trivy image --severity HIGH,CRITICAL --exit-code 1 $IMAGE_TAG allow_failure: true security:sast: stage: security image: node:18 script: - npm audit --audit-level=high - npx snyk test --severity-threshold=high allow_failure: true # Stage 5: Deploy to Staging deploy:staging: stage: deploy-staging image: bitnami/kubectl:latest script: - kubectl config set-cluster k8s --server="$K8S_SERVER" - kubectl config set-credentials admin --token="$K8S_TOKEN" - kubectl config set-context default --cluster=k8s --user=admin - kubectl config use-context default - | kubectl set image deployment/myapp \ myapp=$IMAGE_TAG \ -n staging - kubectl rollout status deployment/myapp -n staging --timeout=5m - kubectl get pods -n staging environment: name: staging url: https://staging.example.com only: - main # Stage 6: Deploy to Production deploy:production: stage: deploy-production image: bitnami/kubectl:latest script: - kubectl config set-cluster k8s --server="$K8S_SERVER" - kubectl config set-credentials admin --token="$K8S_TOKEN" - kubectl config set-context default --cluster=k8s --user=admin - kubectl config use-context default - | kubectl set image deployment/myapp \ myapp=$IMAGE_TAG \ -n production - kubectl rollout status deployment/myapp -n production --timeout=10m - | # Check pod health READY=$(kubectl get deployment myapp -n production -o jsonpath='{.status.readyReplicas}') DESIRED=$(kubectl get deployment myapp -n production -o jsonpath='{.spec.replicas}') if [ "$READY" != "$DESIRED" ]; then echo "Deployment unhealthy: $READY/$DESIRED pods ready" kubectl rollout undo deployment/myapp -n production exit 1 fi environment: name: production url: https://example.com when: manual # Require manual approval only: - main Part 6: Pipeline Best Practices Pipeline Optimization Fast Feedback Loop: ...

    January 23, 2025 · 11 min · Rafiul Alam

    Deployment Strategies: Blue-Green, Canary, Rolling Updates

    Introduction Choosing the right deployment strategy is critical for minimizing downtime and risk when releasing new versions of your application. Different strategies offer different trade-offs between speed, safety, and resource usage. This guide visualizes three essential deployment strategies: Rolling Updates: Gradual replacement of instances Blue-Green Deployments: Instant cutover between versions Canary Deployments: Progressive rollout with traffic splitting Comparison and Use Cases: When to use each strategy Part 1: Rolling Update Deployment Rolling updates gradually replace old version pods with new version pods, ensuring continuous availability. ...

    January 23, 2025 · 11 min · Rafiul Alam

    GitOps Workflow: Git as Single Source of Truth

    Introduction GitOps is a modern approach to continuous deployment where Git serves as the single source of truth for both application code and infrastructure. Changes are made through Git commits, and automated agents ensure the live environment matches the desired state in Git. This guide visualizes the GitOps workflow: Declarative Infrastructure: Everything defined as code in Git Automated Sync: Agents continuously reconcile live state with Git Drift Detection: Automatic detection and correction of manual changes Pull-Based Deployment: Agents pull changes from Git (vs push-based CI/CD) Audit Trail: Complete history of changes in Git Part 1: GitOps Overview GitOps Principles %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TD Start([GitOps Core Principles]) --> P1[1️⃣ DeclarativeAll config in Git as YAMLDesired state, not steps] Start --> P2[2️⃣ Versioned & ImmutableGit history is truthEvery change tracked] Start --> P3[3️⃣ Pulled AutomaticallyAgents pull from GitNo push access needed] Start --> P4[4️⃣ Continuously ReconciledAgents detect driftAuto-heal to Git state] P1 --> Example1[Example:Kubernetes manifestsTerraform codeHelm charts] P2 --> Example2[Example:git log shows who changed whatgit revert to rollbackgit blame for accountability] P3 --> Example3[Example:ArgoCD polls Git every 3minFluxCD watches Git repoNo CI/CD push needed] P4 --> Example4[Example:Manual kubectl edit detectedReverted to Git stateDrift alert sent] style P1 fill:#064e3b,stroke:#10b981 style P2 fill:#064e3b,stroke:#10b981 style P3 fill:#064e3b,stroke:#10b981 style P4 fill:#064e3b,stroke:#10b981 Part 2: GitOps vs Traditional CI/CD Architecture Comparison %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% graph TB subgraph Traditional[Traditional CI/CD PUSH Model] T1([Developer]) --> T2[git push] T2 --> T3[CI ServerGitHub ActionsJenkins] T3 --> T4[Build & Test] T4 --> T5[Docker Build] T5 --> T6[Push Image] T6 --> T7[Deploy Scriptkubectl apply] T7 --> T8[Kubernetes Cluster] Note1[Issues:❌ CI needs cluster credentials❌ Push-based security risk❌ No drift detection❌ Manual changes persist] end subgraph GitOps[GitOps PULL Model] G1([Developer]) --> G2[git push] G2 --> G3[Git RepositoryKubernetes manifestsHelm charts] G4[GitOps AgentArgoCD/FluxRunning IN cluster] G4 --> |Polls every 3min| G3 G4 --> G5{Desired state= Live state?} G5 --> |No| G6[Apply changeskubectl apply] G6 --> G7[Kubernetes Cluster] G5 --> |Yes| G8[No action needed] G7 --> |Detect drift| G4 Note2[Benefits:✅ No external cluster access✅ Pull-based security✅ Automatic drift detection✅ Self-healing✅ Audit trail in Git] end style Traditional fill:#7f1d1d,stroke:#ef4444 style GitOps fill:#064e3b,stroke:#10b981 Part 3: Complete GitOps Flow End-to-End Workflow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TD Start([Developer makes change]) --> Change[Update Kubernetes manifestdeployment.yaml:image: myapp:v2.0replicas: 5] Change --> Commit[git commit -m "Update to v2.0"] Commit --> Push[git push origin main] Push --> GitRepo[(Git Repositoryk8s-configs/├─ apps/│ └─ myapp/│ ├─ deployment.yaml│ ├─ service.yaml│ └─ ingress.yaml└─ infrastructure/ └─ namespaces.yaml)] GitRepo --> Webhook{Webhookconfigured?} Webhook -->|Yes| Notify[Notify ArgoCDimmediately] Webhook -->|No| Poll[ArgoCD pollsevery 3 minutes] Notify --> ArgoCD[ArgoCD ControllerDetect changes in Git] Poll --> ArgoCD ArgoCD --> Compare[Compare Git statevs Live cluster state] Compare --> Diff{Differencesdetected?} Diff -->|No| InSync[✅ Application in syncNo action neededStatus: Healthy] Diff -->|Yes| OutOfSync[⚠️ Application out of syncGit: image: v2.0Cluster: image: v1.0] OutOfSync --> SyncPolicy{Auto-syncenabled?} SyncPolicy -->|No| WaitManual[⏸️ Waiting formanual sync trigger] WaitManual --> ManualSync[User clicks "Sync"in ArgoCD UI] ManualSync --> ApplyChanges SyncPolicy -->|Yes| ApplyChanges[Apply changes from Gitkubectl apply -f deployment.yaml] ApplyChanges --> RolloutStart[Kubernetes Rolling UpdateCreate new pods with v2.0] RolloutStart --> HealthCheck[Health Check LoopCheck pod statusRun readiness probes] HealthCheck --> HealthStatus{All podshealthy?} HealthStatus -->|No| CheckTimeout{Exceededtimeout?} CheckTimeout -->|No| HealthCheck CheckTimeout -->|Yes| SyncFailed[❌ Sync FailedStatus: DegradedSend alert] SyncFailed --> Rollback{Auto-rollbackenabled?} Rollback -->|Yes| RevertGit[Revert to previousGit commitTrigger new sync] Rollback -->|No| Manual[Manual interventionrequired] HealthStatus -->|Yes| Synced[✅ Sync SuccessfulApplication: HealthyGit ≡ Cluster] Synced --> ContinuousMonitor[Continuous MonitoringDetect driftWatch for manual changes] ContinuousMonitor --> DriftDetect{Manual changedetected?} DriftDetect -->|Yes| DriftAlert[🚨 Drift Detected!Someone ran kubectl editCluster ≠ Git] DriftAlert --> AutoHeal{Self-healingenabled?} AutoHeal -->|Yes| RevertDrift[Revert manual changeRestore Git stateCluster ≡ Git again] AutoHeal -->|No| DriftNotify[Notify teamManual change persistsUpdate Git to match?] RevertDrift --> ContinuousMonitor DriftNotify --> ContinuousMonitor DriftDetect -->|No| ContinuousMonitor style InSync fill:#064e3b,stroke:#10b981 style Synced fill:#064e3b,stroke:#10b981 style OutOfSync fill:#78350f,stroke:#f59e0b style SyncFailed fill:#7f1d1d,stroke:#ef4444 style DriftAlert fill:#7f1d1d,stroke:#ef4444 Part 4: ArgoCD Sync Process Detailed Sync Flow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% sequenceDiagram participant Dev as Developer participant Git as Git Repository participant ArgoCD as ArgoCD Controller participant K8s as Kubernetes API participant Pods as Application Pods Dev->>Git: git pushUpdate deployment.yamlimage: myapp:v2.0 Note over Git: Git Repository UpdatedCommit: abc123 alt Webhook Configured Git->>ArgoCD: Webhook: Repo changed else Polling loop Every 3 minutes ArgoCD->>Git: Poll for changes end end ArgoCD->>Git: Fetch latest commitgit pull origin main ArgoCD->>ArgoCD: Parse manifests:deployment.yamlservice.yaml ArgoCD->>K8s: Get live resourceskubectl get deployment myapp -o yaml K8s-->>ArgoCD: Current state:image: myapp:v1.0replicas: 3 Note over ArgoCD: Compare states:Git: v2.0, replicas: 5Live: v1.0, replicas: 3Diff detected! ArgoCD->>ArgoCD: Status: OutOfSyncHealth: Healthy alt Auto-Sync Enabled Note over ArgoCD: Auto-sync triggered else Manual Sync ArgoCD->>ArgoCD: Wait for user to click "Sync" end ArgoCD->>K8s: Apply changeskubectl apply -f deployment.yaml K8s->>K8s: Create new ReplicaSetfor image v2.0 K8s->>Pods: Start new podswith image v2.0 loop Rolling Update K8s->>Pods: Create 1 new pod Pods-->>K8s: Pod starting... K8s->>Pods: Run readiness probe Pods-->>K8s: Ready ✓ K8s->>Pods: Terminate 1 old pod Pods-->>K8s: Terminated ArgoCD->>K8s: Check sync progress K8s-->>ArgoCD: 2/5 pods updated end K8s-->>ArgoCD: All pods ready5/5 running v2.0 ArgoCD->>ArgoCD: Status: Synced ✓Health: Healthy ✓ ArgoCD->>Dev: Notification:✅ Sync successfulmyapp updated to v2.0 Note over ArgoCD,K8s: Continuous monitoringfor drift Note over K8s: Someone runs:kubectl scale deployment myapp --replicas=10 K8s->>Pods: Scale to 10 pods ArgoCD->>K8s: Poll live state K8s-->>ArgoCD: Live: replicas: 10Git: replicas: 5Drift detected! ArgoCD->>ArgoCD: Status: OutOfSyncHealth: HealthyDrift: Yes alt Self-Healing Enabled ArgoCD->>K8s: Revert to Git statekubectl apply -f deployment.yaml K8s->>Pods: Scale back to 5 pods Note over ArgoCD: Drift correctedCluster matches Git else No Self-Healing ArgoCD->>Dev: Alert: Manual change detectedCluster has 10 replicasGit has 5 replicas end Part 5: GitOps Repository Structure Recommended Directory Layout gitops-repo/ ├── apps/ │ ├── production/ │ │ ├── myapp/ │ │ │ ├── deployment.yaml │ │ │ ├── service.yaml │ │ │ ├── ingress.yaml │ │ │ └── kustomization.yaml │ │ └── database/ │ │ ├── statefulset.yaml │ │ └── service.yaml │ │ │ ├── staging/ │ │ └── myapp/ │ │ ├── deployment.yaml │ │ ├── service.yaml │ │ └── kustomization.yaml │ │ │ └── dev/ │ └── myapp/ │ └── ... │ ├── infrastructure/ │ ├── namespaces/ │ │ ├── production.yaml │ │ ├── staging.yaml │ │ └── dev.yaml │ │ │ ├── ingress-controller/ │ │ └── nginx-ingress.yaml │ │ │ └── monitoring/ │ ├── prometheus/ │ └── grafana/ │ ├── argocd/ │ ├── applications/ │ │ ├── myapp-production.yaml │ │ ├── myapp-staging.yaml │ │ └── infrastructure.yaml │ │ │ └── projects/ │ └── default-project.yaml │ └── README.md ArgoCD Application Definition # argocd/applications/myapp-production.yaml apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: myapp-production namespace: argocd spec: # Where the app is defined in Git source: repoURL: https://github.com/myorg/gitops-repo targetRevision: main path: apps/production/myapp # Where to deploy destination: server: https://kubernetes.default.svc namespace: production # Sync policy syncPolicy: automated: prune: true # Delete resources not in Git selfHeal: true # Revert manual changes allowEmpty: false syncOptions: - CreateNamespace=true - PrunePropagationPolicy=foreground - PruneLast=true retry: limit: 5 backoff: duration: 5s factor: 2 maxDuration: 3m # Health assessment ignoreDifferences: - group: apps kind: Deployment jsonPointers: - /spec/replicas # Ignore HPA changes # Notifications notifications: - when: on-sync-succeeded destination: slack - when: on-sync-failed destination: pagerduty Part 6: Drift Detection and Self-Healing Drift Handling Flow %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% flowchart TD Start([Cluster State]) --> Monitor[ArgoCD monitorsevery 3 minutes] Monitor --> Compare[Compare:Git desired statevsCluster live state] Compare --> Check{Statesmatch?} Check -->|Yes| InSync[✅ In SyncNo action needed] Check -->|No| DriftType{Type ofdrift?} DriftType --> ManualEdit[Manual kubectl editSomeone changed replicasfrom 5 to 10] DriftType --> ResourceAdd[New resource addedNot in Gite.g., manual deployment] DriftType --> ResourceDelete[Resource deletedIn Git but not in cluster] ManualEdit --> SelfHeal1{Self-healingenabled?} ResourceAdd --> SelfHeal2{Pruneenabled?} ResourceDelete --> SelfHeal3{Self-healingenabled?} SelfHeal1 -->|Yes| Revert[Revert to Git statekubectl apply -f deployment.yamlReplicas: 10 → 5] SelfHeal1 -->|No| Alert1[🚨 Alert OnlyManual drift detectedReplicas changed] SelfHeal2 -->|Yes| Delete[Delete extra resourcekubectl delete deployment xyzNot in Git = Removed] SelfHeal2 -->|No| Alert2[⚠️ Alert OnlyExtra resource detectedNot managed by Git] SelfHeal3 -->|Yes| Recreate[Recreate resourcekubectl apply -f service.yamlRestore from Git] SelfHeal3 -->|No| Alert3[⚠️ Alert OnlyResource missingExpected in Git] Revert --> Healed[✅ Drift CorrectedCluster ≡ GitLog event] Delete --> Healed Recreate --> Healed Alert1 --> Decision[Team Decision:1. Update Git to match?2. Revert cluster to Git?] Alert2 --> Decision Alert3 --> Decision Healed --> InSync InSync -.->|Continue monitoring| Monitor style InSync fill:#064e3b,stroke:#10b981 style Healed fill:#064e3b,stroke:#10b981 style Alert1 fill:#78350f,stroke:#f59e0b style Alert2 fill:#78350f,stroke:#f59e0b style Alert3 fill:#78350f,stroke:#f59e0b Part 7: GitOps Workflow Best Practices Git Branching Strategy %%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','secondaryTextColor':'#e5e7eb','tertiaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb','edgeLabelText':'#e5e7eb','clusterTextColor':'#e5e7eb','actorTextColor':'#e5e7eb'}}}%% gitGraph commit id: "Initial infrastructure" commit id: "Add myapp v1.0" branch feature/upgrade-v2 checkout feature/upgrade-v2 commit id: "Update to v2.0" commit id: "Add new config" checkout main commit id: "Hotfix: security patch" checkout feature/upgrade-v2 merge main id: "Merge main" checkout main merge feature/upgrade-v2 id: "PR merged" tag: "Deploy to staging" commit id: "Staging validated" tag: "Deploy to prod" Approval Process # PR approval workflow name: GitOps PR Validation on: pull_request: paths: - 'apps/**' - 'infrastructure/**' jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Validate Kubernetes YAML run: | kubeval apps/**/*.yaml kustomize build apps/production/myapp | kubeval - - name: Dry-run in staging run: | kubectl apply --dry-run=server -k apps/staging/myapp - name: Security scan run: | kubesec scan apps/production/myapp/deployment.yaml - name: Policy check run: | conftest test apps/production/myapp/*.yaml require-approval: needs: validate runs-on: ubuntu-latest steps: - name: Check required approvals if: contains(github.event.pull_request.files, 'apps/production') run: | # Require 2 approvals for production changes APPROVALS=$(gh pr view ${{ github.event.pull_request.number }} --json reviews -q '.reviews | length') if [ "$APPROVALS" -lt 2 ]; then echo "Production changes require 2 approvals" exit 1 fi Part 8: Comparison Table GitOps Tools Comparison Feature ArgoCD Flux Jenkins X Architecture Controller + UI Set of controllers Full platform UI Rich web UI No UI (CLI only) Web UI Multi-cluster ✅ Native support ✅ Via Flux controllers ✅ Supported Helm support ✅ Native ✅ Via Helm controller ✅ Native Kustomize support ✅ Native ✅ Via Kustomize controller ✅ Supported SSO/RBAC ✅ Built-in ❌ Use K8s RBAC ✅ Built-in Notifications ✅ Slack, email, webhook ✅ Via providers ✅ Various channels Drift detection ✅ Visual in UI ✅ CLI/metrics ✅ Supported Learning curve Medium Low High Best for Teams wanting UI GitOps purists Full CI/CD platform Conclusion GitOps provides: ...

    January 23, 2025 · 9 min · Rafiul Alam