Introduction
Release management is the process of planning, scheduling, and controlling software releases through different stages and environments. It ensures that software is released reliably, predictably, and with minimal disruption.
This guide visualizes key release management concepts:
- Semantic Versioning: Deciding when to bump major, minor, or patch versions
- Release Train: Structured release cadence with quality gates
- Hotfix Process: Fast-track critical fixes to production
- Release Checklist: Ensuring nothing is missed during deployment
- Environment Promotion: Moving code through dev, staging, and production
Part 1: Semantic Versioning Decision Tree
Understanding Version Numbers: MAJOR.MINOR.PATCH
Semantic versioning (SemVer) uses a three-part version number: MAJOR.MINOR.PATCH
Current: v2.4.7]) --> Breaking{Does change
break existing API
or functionality?} Breaking -->|Yes| MajorExamples[Examples:
- Remove API endpoint
- Change function signature
- Remove configuration option
- Change data format
- Rename public methods] MajorExamples --> Major[🔴 MAJOR version bump
v2.4.7 → v3.0.0
Breaking changes allowed
Users must update code
Migration guide needed] Breaking -->|No| NewFeature{Does change
add new
functionality?} NewFeature -->|Yes| BackCompat{Is new feature
backward
compatible?} BackCompat -->|No| Major BackCompat -->|Yes| MinorExamples[Examples:
- Add new API endpoint
- Add optional parameter
- Add new feature
- Enhance existing feature
- Add configuration option] MinorExamples --> Minor[🟡 MINOR version bump
v2.4.7 → v2.5.0
New features added
Backward compatible
Users can upgrade safely] NewFeature -->|No| BugFix{Is this a
bug fix or
patch?} BugFix -->|Yes| PatchExamples[Examples:
- Fix crash/error
- Fix security issue
- Fix performance bug
- Update dependencies
- Fix typos/docs] PatchExamples --> Patch[🟢 PATCH version bump
v2.4.7 → v2.4.8
Bug fixes only
No new features
Backward compatible
Safe to auto-update] BugFix -->|No| Internal{Internal changes
only? No user-facing
changes?} Internal -->|Yes| NoVersion[No version bump
Examples:
- Refactor code
- Update dev dependencies
- Update CI/CD
- Update comments] Internal -->|No| ReviewNeeded[⚠️ Review needed
Unclear change type
Consult team
Check impact] Major --> PreRelease{Is this ready
for stable
release?} Minor --> PreRelease Patch --> PreRelease PreRelease -->|No| Beta[Create pre-release
v3.0.0-beta.1
v3.0.0-rc.1
For testing only
Not production-ready] PreRelease -->|Yes| Metadata{Need build
metadata?} Metadata -->|Yes| WithMeta[Add build metadata
v3.0.0+20250124
v3.0.0+build.123
Does not affect precedence
For informational purposes] Metadata -->|No| Release[✅ Release version!
Tag in Git
Update CHANGELOG
Publish to registry
Notify users] WithMeta --> Release style Major fill:#7f1d1d,stroke:#ef4444 style Minor fill:#78350f,stroke:#f59e0b style Patch fill:#064e3b,stroke:#10b981 style Release fill:#1e3a8a,stroke:#3b82f6 style Beta fill:#4a2d5f,stroke:#aa4aff
Semantic Versioning Examples
Version History Example:
v1.0.0 - Initial release
v1.0.1 - Fix authentication bug
v1.0.2 - Fix memory leak
v1.1.0 - Add password reset feature
v1.1.1 - Fix password reset email
v1.2.0 - Add OAuth2 support
v1.2.1 - Update dependencies (security)
v2.0.0 - Remove deprecated API v1 endpoints
v2.0.1 - Fix migration script
v2.1.0 - Add GraphQL API
v2.1.0-beta.1 - GraphQL beta testing
v2.1.0-rc.1 - GraphQL release candidate
v2.1.0 - GraphQL stable release
Git Tagging:
# Tag a new release
git tag -a v2.5.0 -m "Release v2.5.0: Add notification system"
git push origin v2.5.0
# Tag a pre-release
git tag -a v3.0.0-beta.1 -m "Beta release for v3.0.0"
git push origin v3.0.0-beta.1
# List all version tags
git tag -l "v*" | sort -V
Part 2: Release Train Model
Structured Release Cadence
The Release Train model establishes a predictable release schedule with quality gates at each stage.
Active development
New features added
Code reviews ongoing] Dev2[Integration Testing
Developers test locally
Unit tests passing
Integration tests running] end subgraph Week2["🎯 Week 3: Feature Freeze"] Freeze[Feature Freeze Begins
❄️ No new features
✅ Bug fixes only
📝 Documentation updates
🧪 Test improvements] Freeze --> CodeComplete[Code Complete
All features implemented
All tests passing
Code reviewed
Ready for QA] end subgraph Week3["🔍 Week 4: QA Testing"] QAStart[QA Testing Begins
Deploy to QA environment
Smoke tests executed
Test plan distributed] QAStart --> QATests[Execute Test Plan
- Functional testing
- Regression testing
- Performance testing
- Security testing
- UAT with stakeholders] QATests --> QABugs{Bugs
found?} QABugs -->|Critical/High| FixBugs[Fix Critical Bugs
Emergency fixes only
Re-test after fix
Document changes] FixBugs --> QATests QABugs -->|Minor| Defer[Defer to Next Release
Log in backlog
Non-critical issues
Won't delay release] QABugs -->|None| QAApprove[✅ QA Sign-off
All tests passed
Ready for release] end subgraph Week4["📦 Week 5: Release Preparation"] ReleasePrep[Release Preparation
- Create release branch
- Update version number
- Generate CHANGELOG
- Build release artifacts
- Create release notes] ReleasePrep --> StagingDeploy[Deploy to Staging
Full production replica
Run final smoke tests
Performance validation] StagingDeploy --> StagingTests{Staging
healthy?} StagingTests -->|No| RollbackStaging[Rollback & Fix
Identify issues
Fix and re-deploy
Re-validate] RollbackStaging --> StagingDeploy StagingTests -->|Yes| ReleaseMeeting[Release Go/No-Go Meeting
👥 Engineering lead
👥 Product manager
👥 QA lead
👥 DevOps lead] ReleaseMeeting --> Decision{Go or
No-Go?} Decision -->|No-Go| Postpone[Postpone Release
Fix blocking issues
Schedule new date
Communicate delay] Decision -->|Go| ReleaseApproved[✅ Release Approved
Schedule deployment
Notify stakeholders
Prepare rollback plan] end subgraph Week5["🚀 Week 6: Production Release"] ProdDeploy[Production Deployment
Off-peak hours
Rolling deployment
Monitor metrics closely] ProdDeploy --> Monitor[Post-Release Monitoring
⏱️ First 15 minutes: Critical
⏱️ First 1 hour: Important
⏱️ First 24 hours: Monitor
📊 Error rates
📊 Performance metrics
📊 User feedback] Monitor --> HealthCheck{Production
healthy?} HealthCheck -->|No| Emergency[🚨 Emergency Response
Assess severity
Rollback or hotfix
Incident response] HealthCheck -->|Yes| Success[✅ Release Successful!
Update status page
Notify users
Close release ticket
Retrospective meeting] end subgraph Hotfix["🔥 Hotfix Track (Anytime)"] CriticalBug[Critical Production Bug
🔴 Security vulnerability
🔴 Data loss risk
🔴 Service outage
🔴 Critical functionality broken] CriticalBug --> FastTrack[Fast-Track Hotfix
Skip release train
Minimal testing
Direct to production
See Hotfix Process] end Dev1 --> Dev2 Dev2 --> Freeze CodeComplete --> QAStart QAApprove --> ReleasePrep ReleaseApproved --> ProdDeploy Success --> NextCycle[Start Next Release Train
Begin Week 1 of next cycle] CriticalBug -.->|Emergency| ProdDeploy style Success fill:#064e3b,stroke:#10b981 style Emergency fill:#7f1d1d,stroke:#ef4444 style Freeze fill:#1e3a8a,stroke:#3b82f6 style CriticalBug fill:#7f1d1d,stroke:#ef4444 style Postpone fill:#78350f,stroke:#f59e0b
Release Train Calendar
Sprint Calendar (6-week cycle):
Week 1-2: Development Sprint
├─ Monday: Sprint planning
├─ Daily: Stand-ups, development
├─ Wednesday: Mid-sprint review
└─ Friday: Demo day
Week 3: Feature Freeze
├─ Monday: Feature freeze begins
├─ Focus: Bug fixes, tests, docs
└─ Friday: Code complete deadline
Week 4: QA Testing
├─ Monday: Deploy to QA
├─ Daily: Test execution
├─ Wednesday: Bug triage
└─ Friday: QA sign-off
Week 5: Release Prep
├─ Monday: Create release branch
├─ Wednesday: Deploy to staging
├─ Thursday: Go/No-Go meeting
└─ Friday: Final prep
Week 6: Production Release
├─ Tuesday 2 AM: Production deployment
├─ Tuesday-Wednesday: Intensive monitoring
└─ Friday: Retrospective, start next cycle
Part 3: Hotfix Process
Fast-Track Critical Fixes to Production
When critical bugs are discovered in production, the hotfix process bypasses the normal release train.
500 errors spiking
Payment processing down Note over Oncall: Severity assessment Oncall->>Incident: Create P0 incident
Severity: Critical
Impact: Revenue loss Incident->>Incident: Assemble war room
Page relevant teams
Start incident timeline Incident->>Dev: Emergency page
Critical bug in production
Needs immediate fix Incident->>QA: Emergency page
Hotfix needs rapid testing Incident->>Ops: Emergency page
Prepare for hotfix deployment Note over Dev,Ops: War room established
All hands on deck Dev->>Dev: Investigate issue
- Check logs
- Review recent changes
- Identify root cause Dev->>Incident: Root cause identified:
NPE in payment handler
Introduced in v2.4.5 Dev->>Dev: Create hotfix branch
git checkout -b hotfix/v2.4.6
From production tag v2.4.5 Dev->>Dev: Implement fix
- Fix null pointer
- Add null checks
- Write regression test Dev->>Dev: Test locally
- Unit tests pass
- Integration tests pass
- Manual verification Dev->>QA: Hotfix ready for testing
Deployed to hotfix env
Reproduction steps attached QA->>QA: Rapid testing (15 min)
- Verify fix works
- Test critical paths
- Regression check
- No new issues QA->>Incident: ✅ QA approved
Fix verified
Safe to deploy Incident->>Ops: Approval to deploy hotfix
v2.4.5 → v2.4.6
Execute with caution Ops->>Prod: Deploy hotfix
Rolling deployment
1 instance at a time
Monitor carefully Note over Prod: First instance deployed Ops->>Ops: Monitor first instance
- Check logs
- Verify errors stopped
- Test payment flow Ops->>Incident: First instance healthy
Payments working
Proceeding with rollout Ops->>Prod: Deploy to remaining instances
Gradual rollout
10% → 50% → 100% Prod-->>Ops: All instances updated
Version: v2.4.6
Status: Healthy Ops->>Incident: ✅ Deployment complete
All instances on v2.4.6
Error rate: 0%
Payments: Normal Incident->>Incident: Verify resolution
- Monitor for 30 min
- Check user reports
- Validate metrics Incident->>User: ✅ Issue resolved
Hotfix deployed
Service restored Note over Oncall,Ops: Post-incident activities Incident->>Dev: Merge hotfix to main
Cherry-pick to develop
Update release notes Dev->>Dev: Document incident
- Root cause analysis
- Timeline
- Lessons learned
- Prevention steps Incident->>Incident: Close incident
Duration: 47 minutes
Status: Resolved
Schedule postmortem rect rgb(95, 45, 45) Note over User,Prod: 🔴 INCIDENT ACTIVE: 47 minutes end rect rgb(45, 95, 46) Note over User,Prod: ✅ INCIDENT RESOLVED - Service Normal end
Hotfix Branching Strategy
# 1. Create hotfix branch from production tag
git checkout v2.4.5 # Current production version
git checkout -b hotfix/v2.4.6
# 2. Implement fix
# ... make changes ...
git add .
git commit -m "hotfix: Fix null pointer in payment handler"
# 3. Tag hotfix version
git tag -a v2.4.6 -m "Hotfix: Fix payment processing bug"
# 4. Deploy to production
# ... deployment process ...
# 5. Merge back to main branches
git checkout main
git merge hotfix/v2.4.6
git checkout develop
git merge hotfix/v2.4.6
# 6. Push everything
git push origin main develop
git push origin v2.4.6
# 7. Clean up
git branch -d hotfix/v2.4.6
Hotfix Decision Matrix
When to use Hotfix Process:
🔴 CRITICAL (Immediate hotfix required):
├─ Production outage (service down)
├─ Data loss or corruption
├─ Security vulnerability (actively exploited)
├─ Revenue-impacting bug (payments failing)
└─ Compliance violation (legal/regulatory)
🟡 HIGH (Expedite in next release):
├─ Performance degradation (slow but working)
├─ Feature partially broken (workaround exists)
├─ Security issue (not actively exploited)
└─ Incorrect data display (non-critical)
🟢 MEDIUM/LOW (Wait for regular release):
├─ Minor UI issues
├─ Documentation errors
├─ Non-critical feature requests
└─ Nice-to-have improvements
Part 4: Release Checklist Flow
Comprehensive Pre-Deploy to Post-Deploy Checklist
Version: v2.5.0
Scheduled: 2 AM UTC]) --> PreDeploy[📋 PRE-DEPLOY CHECKLIST] PreDeploy --> Check1[✓ Code freeze completed
✓ All tests passing
✓ QA sign-off received
✓ Release notes written] Check1 --> Check2[✓ Version bumped correctly
✓ CHANGELOG updated
✓ Git tag created
✓ Docker images built] Check2 --> Check3[✓ Staging validation passed
✓ Database migrations tested
✓ Rollback plan documented
✓ Monitoring dashboards ready] Check3 --> Check4[✓ Stakeholders notified
✓ Maintenance window scheduled
✓ On-call engineers available
✓ Incident channels ready] Check4 --> PreDeployGate{All pre-deploy
checks passed?} PreDeployGate -->|No| FixIssues[❌ Fix blocking issues
Document problems
Reschedule if needed] FixIssues --> PreDeploy PreDeployGate -->|Yes| DeployPhase[🚀 DEPLOY PHASE] DeployPhase --> Deploy1[📸 Snapshot current state
- Backup databases
- Save current configs
- Document versions
- Record metrics baseline] Deploy1 --> Deploy2[🔒 Enable maintenance mode
- Display status page
- Queue incoming requests
- Graceful degradation
- User notifications] Deploy2 --> Deploy3[💾 Run database migrations
- Take DB snapshot first
- Apply migrations
- Verify schema changes
- Test rollback script] Deploy3 --> MigrationOK{Migrations
successful?} MigrationOK -->|No| RollbackDB[⚠️ Rollback database
Restore from snapshot
Abort deployment
Investigate issue] MigrationOK -->|Yes| Deploy4[🐳 Deploy new version
- Update container images
- Rolling update
- Health checks enabled
- Monitor pod status] Deploy4 --> Deploy5[⚙️ Update configurations
- Update config maps
- Refresh secrets
- Update env variables
- Sync CDN cache] Deploy5 --> Deploy6[🔓 Disable maintenance mode
- Remove status message
- Process queued requests
- Resume normal traffic
- Update status page] Deploy6 --> VerifyPhase[🔍 VERIFY PHASE] VerifyPhase --> Verify1[🏥 Health checks
✓ /health endpoint: 200 OK
✓ /readiness: All services ready
✓ All pods running
✓ No crash loops] Verify1 --> HealthOK{Health checks
passing?} HealthOK -->|No| EmergencyRollback[🚨 EMERGENCY ROLLBACK
Revert to v2.4.9
Restore database
Alert team] HealthOK -->|Yes| Verify2[🧪 Smoke tests
✓ User login works
✓ Critical API endpoints
✓ Database connectivity
✓ External integrations] Verify2 --> SmokeOK{Smoke tests
passing?} SmokeOK -->|No| EmergencyRollback SmokeOK -->|Yes| Verify3[📊 Monitor metrics (15 min)
✓ Error rate < 0.1%
✓ Response time < baseline
✓ CPU/Memory normal
✓ Database queries normal] Verify3 --> MetricsOK{Metrics
healthy?} MetricsOK -->|No| Investigate{Issues
critical?} Investigate -->|Yes| EmergencyRollback Investigate -->|No| Monitor[⚠️ Continue monitoring
Non-critical issues
Document for review] MetricsOK -->|Yes| Verify4[👥 User validation
✓ Test user accounts
✓ Sample transactions
✓ Report generation
✓ Key user workflows] Verify4 --> UserOK{User validation
successful?} UserOK -->|No| Investigate UserOK -->|Yes| Verify5[🔐 Security verification
✓ SSL certificates valid
✓ Authentication working
✓ Authorization rules applied
✓ No exposed secrets] Verify5 --> SecurityOK{Security
checks passed?} SecurityOK -->|No| SecurityIssue[🔒 Security issue found
Assess severity
Apply hotfix if critical] SecurityOK -->|Yes| CommunicatePhase[📢 COMMUNICATE PHASE] CommunicatePhase --> Comm1[✅ Update status page
- Deployment successful
- All systems operational
- Version: v2.5.0
- Timestamp] Comm1 --> Comm2[📧 Notify stakeholders
- Send email to team
- Update Slack channels
- Notify customer success
- Update documentation] Comm2 --> Comm3[📝 Update tracking
- Close release ticket
- Update project board
- Mark version as released
- Archive release notes] Comm3 --> Comm4[📊 Share release report
- Deployment duration
- Issues encountered
- Metrics comparison
- Success metrics] Comm4 --> PostDeploy[📋 POST-DEPLOY ACTIVITIES] PostDeploy --> Post1[🔍 Extended monitoring (24h)
- Error trends
- Performance patterns
- User feedback
- Support tickets] Post1 --> Post2[📚 Documentation updates
- Update API docs
- Update user guides
- Update runbooks
- Update architecture diagrams] Post2 --> Post3[🗑️ Cleanup
- Remove old docker images
- Archive old releases
- Clean up test data
- Update dependencies] Post3 --> Post4[🔄 Retrospective meeting
- What went well?
- What went wrong?
- Action items
- Process improvements] Post4 --> Complete[✅ RELEASE COMPLETE!
Version v2.5.0 in production
All systems healthy
Team notified] EmergencyRollback --> PostMortem[📋 Post-mortem required
- Root cause analysis
- Timeline reconstruction
- Lessons learned
- Prevention measures] style Complete fill:#064e3b,stroke:#10b981 style EmergencyRollback fill:#7f1d1d,stroke:#ef4444 style RollbackDB fill:#7f1d1d,stroke:#ef4444 style SecurityIssue fill:#78350f,stroke:#f59e0b style Monitor fill:#78350f,stroke:#f59e0b
Release Checklist Template
# Release Checklist v2.5.0
## Pre-Deploy (T-24 hours)
- [ ] All features code-complete and merged
- [ ] All tests passing (unit, integration, e2e)
- [ ] QA sign-off received
- [ ] Security scan completed (no critical issues)
- [ ] Performance tests passed
- [ ] Release notes written
- [ ] CHANGELOG.md updated
- [ ] Version bumped in package.json / version files
- [ ] Git tag created: v2.5.0
- [ ] Docker images built and pushed
- [ ] Database migration scripts ready
- [ ] Rollback procedures documented
- [ ] Staging environment validated
- [ ] Monitoring dashboards configured
- [ ] Alerting rules updated
## Team Coordination
- [ ] Stakeholders notified (email sent)
- [ ] Maintenance window scheduled
- [ ] On-call engineers confirmed
- [ ] Incident Slack channel ready
- [ ] War room link shared
- [ ] Rollback team identified
## Deploy Phase
- [ ] Backup current production state
- [ ] Snapshot production databases
- [ ] Enable maintenance mode
- [ ] Run database migrations
- [ ] Deploy new application version
- [ ] Update configurations
- [ ] Disable maintenance mode
- [ ] Verify deployment completed
## Verification (First 15 minutes)
- [ ] Health endpoints returning 200 OK
- [ ] All pods/containers running
- [ ] No crash loops detected
- [ ] Smoke tests passing
- [ ] Critical user flows working
- [ ] Error rate < 0.1%
- [ ] Response times normal
- [ ] Database queries performing well
- [ ] External integrations working
## Communication
- [ ] Update status page to "All Systems Operational"
- [ ] Send success notification to team
- [ ] Notify customer success team
- [ ] Update documentation
- [ ] Close release ticket
- [ ] Share release metrics
## Post-Deploy (First 24 hours)
- [ ] Monitor error rates
- [ ] Monitor performance metrics
- [ ] Review user feedback
- [ ] Check support tickets
- [ ] Archive old releases
- [ ] Schedule retrospective
## Sign-Off
- [ ] Engineering Lead: _______________
- [ ] DevOps Lead: _______________
- [ ] Product Manager: _______________
- [ ] QA Lead: _______________
Deployment Time: _______
Duration: _______
Issues: _______
Part 5: Environment Promotion Flow
Progressive Deployment Through Environments
━━━━━━━━━━
🔧 Write code
🧪 Unit tests
🐳 Docker Compose
━━━━━━━━━━
Data: Mocked/Sample
Users: 1 developer] end subgraph Dev["🔨 Dev Environment"] DevEnv[Development Server
━━━━━━━━━━
🚀 Auto-deploy on merge
🔄 Latest develop branch
🗄️ Shared dev database
🎯 Integration testing
━━━━━━━━━━
Data: Synthetic
Users: Dev team
━━━━━━━━━━
Gateway: dev.api.internal
User Service: v2.5.0-dev
Note Service: v1.8.0-dev
PostgreSQL: 15
Redis: 7.0] DevTests[Automated Tests
━━━━━━━━━━
✓ Linting
✓ Unit tests
✓ Integration tests
✓ Code coverage > 80%] DevEnv --> DevTests DevTests --> DevGate{Tests
pass?} end subgraph QA["🧪 QA/Test Environment"] QAEnv[QA Server
━━━━━━━━━━
📦 Release candidate
🎭 Production-like
🗄️ QA database
🔍 Manual testing
━━━━━━━━━━
Data: Anonymized prod copy
Users: QA team + PMs
━━━━━━━━━━
Gateway: qa.api.internal
User Service: v2.5.0-rc1
Note Service: v1.8.0-rc1
PostgreSQL: 15
Redis: 7.0] QATests[QA Test Suite
━━━━━━━━━━
✓ Functional tests
✓ Regression tests
✓ UAT
✓ Performance tests
✓ Security scan] QAEnv --> QATests QATests --> QAGate{QA
approved?} end subgraph Staging["🎬 Staging Environment"] StagingEnv[Staging Server
━━━━━━━━━━
🎯 Production replica
📊 Same infrastructure
🗄️ Staging database
🚀 Final validation
━━━━━━━━━━
Data: Recent prod snapshot
Users: Stakeholders
━━━━━━━━━━
Gateway: staging.api.company.com
User Service: v2.5.0
Note Service: v1.8.0
PostgreSQL: 15 (Primary + Replica)
Redis: 7.0 (Cluster mode)] StagingTests[Pre-Production Tests
━━━━━━━━━━
✓ Smoke tests
✓ Load tests
✓ Disaster recovery
✓ Migration dry-run
✓ Monitoring validation] StagingEnv --> StagingTests StagingTests --> StagingGate{Staging
validated?} end subgraph Prod["🌐 Production Environment"] ProdEnv[Production Servers
━━━━━━━━━━
🚀 Live traffic
🔒 High security
🗄️ Production database
📊 Full monitoring
━━━━━━━━━━
Data: Real customer data
Users: All customers
━━━━━━━━━━
Gateway: api.company.com
└─ Load Balancer
├─ User Service: v2.5.0 (×3 replicas)
├─ Note Service: v1.8.0 (×3 replicas)
PostgreSQL: 15 (HA cluster)
└─ Primary + 2 Replicas
Redis: 7.0 (Sentinel cluster)
└─ 3 nodes + 3 sentinels] ProdMonitor[Production Monitoring
━━━━━━━━━━
📊 Real-time metrics
🚨 Alerting
📈 Business KPIs
🔍 Error tracking
📱 On-call rotation] end LocalDev -->|git push| DevEnv DevGate -->|Pass| QAPromote[Promote to QA
━━━━━━━━━━
1. Tag release candidate
2. Build QA images
3. Update QA configs
4. Deploy to QA
5. Run smoke tests] DevGate -->|Fail| DevFix[Fix in Dev
Debug and repair] DevFix --> DevEnv QAPromote --> QAEnv QAGate -->|Pass| StagingPromote[Promote to Staging
━━━━━━━━━━
1. Create release tag
2. Build prod images
3. Deploy to staging
4. Run full test suite
5. Stakeholder review] QAGate -->|Fail| QAFix[Fix Critical Issues
Return to dev] QAFix --> DevEnv StagingPromote --> StagingEnv StagingGate -->|Pass| ProdPromote[Promote to Production
━━━━━━━━━━
1. Go/No-Go meeting
2. Backup production
3. Maintenance window
4. Database migration
5. Rolling deployment
6. Gradual traffic shift] StagingGate -->|Fail| StagingFix[Fix & Re-validate
Critical issues only] StagingFix --> StagingEnv ProdPromote --> ProdEnv ProdEnv --> ProdMonitor ProdMonitor --> ProdHealth{Production
healthy?} ProdHealth -->|No| Rollback[🚨 Rollback
━━━━━━━━━━
1. Revert deployment
2. Restore database
3. Alert team
4. Incident response] ProdHealth -->|Yes| Success[✅ Success!
━━━━━━━━━━
Continue monitoring
Document release
Start next cycle] style Success fill:#064e3b,stroke:#10b981 style Rollback fill:#7f1d1d,stroke:#ef4444 style DevFix fill:#78350f,stroke:#f59e0b style QAFix fill:#78350f,stroke:#f59e0b style StagingFix fill:#78350f,stroke:#f59e0b
Part 6: Practical Configuration Examples
Microservices Architecture Setup
Our example includes:
- Gateway: API Gateway (Kong/NGINX)
- User Service: User authentication and management
- Note Service: Note CRUD operations
- PostgreSQL: Primary database
- Redis: Caching and sessions
Docker Compose for Local Development
# docker-compose.yml
version: '3.8'
services:
# API Gateway
gateway:
image: kong:3.4
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: postgres
KONG_PG_USER: kong
KONG_PG_PASSWORD: kong
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 0.0.0.0:8001
ports:
- "8000:8000" # Proxy
- "8001:8001" # Admin API
depends_on:
- postgres
- redis
# User Service
user-service:
build: ./services/user-service
image: myapp/user-service:dev
environment:
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/users
REDIS_URL: redis://redis:6379/0
JWT_SECRET: dev-secret-key
LOG_LEVEL: debug
ports:
- "3001:3000"
depends_on:
- postgres
- redis
volumes:
- ./services/user-service:/app
command: npm run dev
# Note Service
note-service:
build: ./services/note-service
image: myapp/note-service:dev
environment:
DATABASE_URL: postgresql://postgres:postgres@postgres:5432/notes
REDIS_URL: redis://redis:6379/1
USER_SERVICE_URL: http://user-service:3000
LOG_LEVEL: debug
ports:
- "3002:3000"
depends_on:
- postgres
- redis
- user-service
volumes:
- ./services/note-service:/app
command: npm run dev
# PostgreSQL Database
postgres:
image: postgres:15-alpine
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: myapp
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
- ./db/init:/docker-entrypoint-initdb.d
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
# Redis Cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
command: redis-server --appendonly yes
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres-data:
redis-data:
Kubernetes Manifests for Production
# k8s/production/user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: production
labels:
app: user-service
version: v2.5.0
tier: backend
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
version: v2.5.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
containers:
- name: user-service
image: myregistry.com/user-service:v2.5.0
imagePullPolicy: Always
ports:
- containerPort: 3000
name: http
- containerPort: 9090
name: metrics
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: user-service-db-url
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: app-config
key: redis-url
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: jwt-secret
key: secret
- name: LOG_LEVEL
value: "info"
- name: VERSION
value: "v2.5.0"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: production
spec:
selector:
app: user-service
ports:
- name: http
port: 80
targetPort: 3000
- name: metrics
port: 9090
targetPort: 9090
type: ClusterIP
Database Migration Configuration
# db/migrations/config.yml
# Flyway database migration configuration
development:
url: postgresql://postgres:postgres@localhost:5432/myapp
user: postgres
password: postgres
schemas:
- users
- notes
locations:
- filesystem:./migrations
baselineOnMigrate: true
staging:
url: ${STAGING_DATABASE_URL}
user: ${DB_USER}
password: ${DB_PASSWORD}
schemas:
- users
- notes
locations:
- filesystem:./migrations
validateOnMigrate: true
outOfOrder: false
production:
url: ${PRODUCTION_DATABASE_URL}
user: ${DB_USER}
password: ${DB_PASSWORD}
schemas:
- users
- notes
locations:
- filesystem:./migrations
validateOnMigrate: true
outOfOrder: false
# Create backup before migration
callbacks:
- BeforeMigrate: scripts/backup-db.sh
- AfterMigrateError: scripts/rollback-db.sh
Version Management Script
#!/bin/bash
# scripts/release.sh
# Automated release management script
set -e
CURRENT_VERSION=$(cat VERSION)
echo "Current version: $CURRENT_VERSION"
# Parse semantic version
IFS='.' read -ra VERSION_PARTS <<< "${CURRENT_VERSION//v/}"
MAJOR=${VERSION_PARTS[0]}
MINOR=${VERSION_PARTS[1]}
PATCH=${VERSION_PARTS[2]}
# Function to bump version
bump_version() {
local bump_type=$1
case $bump_type in
major)
MAJOR=$((MAJOR + 1))
MINOR=0
PATCH=0
;;
minor)
MINOR=$((MINOR + 1))
PATCH=0
;;
patch)
PATCH=$((PATCH + 1))
;;
*)
echo "Invalid bump type: $bump_type"
echo "Usage: $0 {major|minor|patch}"
exit 1
;;
esac
NEW_VERSION="v${MAJOR}.${MINOR}.${PATCH}"
echo "New version: $NEW_VERSION"
}
# Update version files
update_version_files() {
echo "$NEW_VERSION" > VERSION
# Update package.json
if [ -f package.json ]; then
npm version "${NEW_VERSION//v/}" --no-git-tag-version
fi
# Update other service package.json files
find services -name package.json -exec \
npm --prefix $(dirname {}) version "${NEW_VERSION//v/}" --no-git-tag-version \;
}
# Generate changelog
generate_changelog() {
echo "Generating changelog..."
git log "$(git describe --tags --abbrev=0)..HEAD" \
--pretty=format:"- %s (%h)" >> CHANGELOG.md.tmp
{
echo "## $NEW_VERSION ($(date +%Y-%m-%d))"
echo ""
cat CHANGELOG.md.tmp
echo ""
echo ""
cat CHANGELOG.md
} > CHANGELOG.md.new
mv CHANGELOG.md.new CHANGELOG.md
rm CHANGELOG.md.tmp
}
# Create git tag
create_git_tag() {
git add VERSION CHANGELOG.md package.json services/*/package.json
git commit -m "chore: Bump version to $NEW_VERSION"
git tag -a "$NEW_VERSION" -m "Release $NEW_VERSION"
echo "Created tag: $NEW_VERSION"
echo "Push with: git push origin main --tags"
}
# Build docker images
build_docker_images() {
echo "Building Docker images..."
docker build -t myregistry.com/user-service:$NEW_VERSION ./services/user-service
docker build -t myregistry.com/note-service:$NEW_VERSION ./services/note-service
docker build -t myregistry.com/gateway:$NEW_VERSION ./services/gateway
echo "Built images with tag: $NEW_VERSION"
}
# Main execution
main() {
if [ $# -eq 0 ]; then
echo "Usage: $0 {major|minor|patch}"
exit 1
fi
bump_version "$1"
update_version_files
generate_changelog
create_git_tag
build_docker_images
echo ""
echo "✅ Release $NEW_VERSION prepared successfully!"
echo ""
echo "Next steps:"
echo "1. Review changes: git show $NEW_VERSION"
echo "2. Push to remote: git push origin main --tags"
echo "3. Push images: docker push myregistry.com/user-service:$NEW_VERSION"
echo "4. Create GitHub release"
echo "5. Deploy to staging"
}
main "$@"
Environment-Specific Configurations
# config/environments.yaml
# Environment-specific configuration
local:
services:
user-service:
replicas: 1
resources:
memory: 256Mi
cpu: 100m
note-service:
replicas: 1
resources:
memory: 256Mi
cpu: 100m
database:
host: localhost
replicas: 1
redis:
mode: standalone
dev:
services:
user-service:
replicas: 1
resources:
memory: 512Mi
cpu: 250m
note-service:
replicas: 1
resources:
memory: 512Mi
cpu: 250m
database:
host: dev-postgres.internal
replicas: 1
redis:
mode: standalone
domain: dev.api.internal
staging:
services:
user-service:
replicas: 2
resources:
memory: 1Gi
cpu: 500m
note-service:
replicas: 2
resources:
memory: 1Gi
cpu: 500m
database:
host: staging-postgres.internal
replicas: 2 # Primary + 1 replica
redis:
mode: cluster
nodes: 3
domain: staging.api.company.com
production:
services:
user-service:
replicas: 3
resources:
memory: 2Gi
cpu: 1000m
note-service:
replicas: 3
resources:
memory: 2Gi
cpu: 1000m
database:
host: prod-postgres.internal
replicas: 3 # Primary + 2 replicas
backup:
enabled: true
schedule: "0 2 * * *" # Daily at 2 AM
redis:
mode: sentinel
nodes: 3
sentinels: 3
domain: api.company.com
monitoring:
enabled: true
retention: 90d
Part 7: Release Management Best Practices
Key Principles
1. Semantic Versioning:
MAJOR.MINOR.PATCH
↓ ↓ ↓
2 . 5 . 0
MAJOR: Breaking changes (v2 → v3)
MINOR: New features, backward compatible (v2.4 → v2.5)
PATCH: Bug fixes, backward compatible (v2.5.0 → v2.5.1)
2. Release Frequency:
Microservices:
├─ Hotfixes: As needed (minutes to hours)
├─ Patch releases: Weekly
├─ Minor releases: Bi-weekly or monthly
└─ Major releases: Quarterly or annually
Monoliths:
├─ Hotfixes: As needed
├─ Regular releases: Weekly or bi-weekly
└─ Major releases: Quarterly
3. Branching Strategy:
main (production)
├─ Tag: v2.5.0
├─ Tag: v2.4.9
└─ Tag: v2.4.8
develop (integration)
├─ feature/user-auth
├─ feature/notifications
└─ bugfix/login-timeout
hotfix/v2.5.1
└─ Critical production fix
4. Rollback Strategy:
Deployment Methods:
├─ Blue-Green: Zero-downtime rollback
├─ Canary: Gradual rollout, partial rollback
└─ Rolling: Progressive rollback
Rollback Triggers:
├─ Error rate > 1%
├─ Response time > 2x baseline
├─ Health checks failing
└─ Critical functionality broken
5. Communication:
Before Release:
├─ Maintenance window notification (24h advance)
├─ Release notes to stakeholders
└─ Team coordination meeting
During Release:
├─ Real-time updates in Slack
├─ Status page updates
└─ War room for incidents
After Release:
├─ Success notification
├─ Release metrics report
└─ Retrospective meeting
Conclusion
Effective release management combines:
- Clear Versioning: Semantic versioning provides predictable version numbers
- Structured Process: Release trains create predictable cadence
- Fast Response: Hotfix process handles critical issues
- Comprehensive Checks: Checklists prevent missed steps
- Progressive Deployment: Environment promotion reduces risk
Key Benefits:
- Predictable release schedule
- Reduced deployment risk
- Faster time to market
- Better stakeholder communication
- Improved quality and stability
Success Metrics:
- Deployment frequency
- Mean time to recovery (MTTR)
- Change failure rate
- Lead time for changes
The visual diagrams in this guide demonstrate how releases flow from development through production, making the entire release management process transparent and repeatable.
Further Reading
- Semantic Versioning Specification
- The Release Train Model
- Google SRE Book - Release Engineering
- Accelerate: The Science of DevOps
- Database Migrations Best Practices
Ship with confidence, release with precision!