Merge Request Quality Gates
Introduction
Merge request quality gates operate at Stage 3 of the CD Model, serving as the primary approval checkpoint before code merges to the main branch. Stage 3 combines automated validation (repeated pre-commit checks plus integration tests) with human judgment (peer review).
This stage is unique because it's the only stage in the CD Model that always requires manual human approval - the peer reviewer's sign-off. In the RA pattern, Stage 3 is the first-level approval (code quality). In the CDe pattern, Stage 3 becomes a combined first and second-level approval (code quality AND production approval).
The Dual Nature of Stage 3
Automated Validation
What runs automatically:
- All pre-commit checks (formatting, linting, unit tests, security)
- Integration tests (service-to-service interactions)
- Code coverage analysis
- Static analysis security testing (SAST)
- Dependency vulnerability scanning
- Container image scanning (if applicable)
- Documentation build verification
Why repeat pre-commit checks: Even though Stage 2 already validated these, Stage 3 repeats them because:
- Developers might bypass local checks
- Environment consistency (clean CI environment)
- Creates formal record for audit trail
- Multiple commits might have been pushed
- Base branch might have changed since local check
Parallel execution: While the peer reviewer examines the code, automated checks run in parallel. By the time review is complete, automated validation results are available.
Human Judgment
What humans evaluate:
- Design quality and architecture alignment
- Code clarity and maintainability
- Security implications requiring context
- Business logic correctness
- Test coverage adequacy (not just percentage)
- Documentation completeness
- API contract compatibility
- Error handling appropriateness
Why humans are essential: Automation catches mechanical issues (syntax, style, test failures). Humans catch:
- Subtle logic bugs
- Design smells (over-engineering, tight coupling)
- Security vulnerabilities requiring domain knowledge
- Missing edge cases in tests
- Unclear naming or confusing structure
- Inappropriate abstractions
The balance: Automated checks enforce minimum standards. Human review ensures code is not just correct, but maintainable, secure, and well-designed.
Peer Review Philosophy
What Makes a Good Review
Focus on substance over style:
- ❌ "This variable should be named
userDatanotdata" (linter should catch) - ✅ "This function has three responsibilities - should we split it?"
Ask questions, don't demand:
- ❌ "Change this to use Strategy pattern"
- ✅ "Have you considered the Strategy pattern here? It might make testing easier"
Understand intent before critiquing:
- Ask "Why did you approach it this way?" before suggesting alternatives
- Context might justify an unusual approach
Be specific and actionable:
- ❌ "This is confusing"
- ✅ "The variable name
xdoesn't indicate it's a timeout duration. ConsiderrequestTimeout?"
Review Scope
What to review carefully:
- Public APIs: Changes to interfaces, contracts, exported functions
- Security-sensitive code: Authentication, authorization, data validation, encryption
- Error handling: Are errors properly wrapped, logged, and propagated?
- Test coverage: Are critical paths tested? Are tests meaningful?
- Performance implications: Loops in loops, N+1 queries, unbounded operations
What to skim:
- Mechanical changes: Formatting, imports, minor refactoring
- Boilerplate: Standard CRUD operations, config files
- Generated code: Unless generator changed
What to trust automation for:
- Code style: Linters enforce this
- Test execution: CI confirms tests pass
- Coverage percentage: Coverage tools report this
- Known vulnerabilities: Security scanners detect these
The "Two Approvals" Rule
Many teams require two approving reviews for sensitive changes:
- Changes to authentication/authorization
- Database schema migrations
- API contract changes
- Security-critical code
- Infrastructure changes
This provides additional safety for high-risk changes.
Automated Checks
Stage 3 runs comprehensive automated validation in the CI/CD pipeline (Build Agents environment).
Pre-commit Checks (Repeated)
Why repeat:
- Enforce for all developers (even those who bypass locally)
- Clean environment (no local configuration drift)
- Formal record for compliance/audit
What's repeated:
- Code formatting verification
- Linting (full, not fast mode)
- Unit tests with coverage
- Secret detection
- Dependency vulnerability scanning
- Build verification
Integration Tests
New at Stage 3: Integration tests validate interactions between components.
Characteristics:
- Test service-to-service communication
- May use database (test instance)
- May use message queues, caches
- Run in isolated test environment
- Slower than unit tests (seconds to minutes)
Examples:
- API endpoint tests (HTTP requests/responses)
- Database query integration
- Message queue pub/sub
- Cache integration
- External service integration (with test doubles)
Why not at Stage 2: Integration tests are too slow for pre-commit (violates 5-10 minute budget) and require infrastructure (database, services) not available on developer laptops.
Code Coverage Analysis
What's measured:
- Line coverage: % of code lines executed by tests
- Branch coverage: % of conditional branches tested
- Function coverage: % of functions called by tests
Typical thresholds:
- Minimum: 80% line coverage
- Goal: 90%+ line coverage
- Critical code: 100% coverage (authentication, payment, security)
Why at Stage 3: Coverage analysis requires full test suite execution. Results inform peer reviewer: "Are critical paths tested?"
Coverage is necessary but not sufficient:
- 100% coverage with poor assertions = false confidence
- Reviewers must examine test quality, not just percentage
Static Application Security Testing (SAST)
Tools: semgrep, gosec, bandit, eslint-plugin-security
What's detected:
- SQL injection vulnerabilities
- XSS vulnerabilities
- Insecure deserialization
- Hardcoded secrets (redundant with Stage 2, but deeper scan)
- Insecure cryptography usage
- Command injection risks
Why at Stage 3: SAST tools can be slow (minutes) and noisy (many false positives). Stage 3 is appropriate for comprehensive analysis.
Handling false positives:
- Review and suppress with justification
- Document why flagged code is safe
- Track suppressions for periodic review
Dependency Scanning
Deeper than Stage 2:
- All severity levels (not just critical/high)
- Transitive dependencies
- License compliance
- Deprecated packages
Actions:
- Critical/High: Block merge
- Medium: Warning, allow merge with plan to address
- Low: Informational only
Documentation Verification
What's checked:
- Documentation builds without errors
- API documentation updated (OpenAPI, Swagger)
- README reflects new features
- Changelog entry added
- Inline code documentation for public APIs
Why at Stage 3: Documentation is part of the deliverable. Don't merge incomplete work.
RA vs CDe Pattern Differences
Stage 3 approval has profoundly different meanings in the two CD Model patterns:
Release Approval (RA) Pattern
Stage 3 = First-Level Approval Only
When a peer reviewer approves in RA pattern, they're saying:
- ✅ "This code is well-designed and correct"
- ✅ "Tests adequately cover the changes"
- ✅ "Security has been considered"
- ❌ NOT saying: "This is ready for production"
Production approval happens later:
- Stage 9 (Release Approval): Release manager evaluates production readiness
- Separate decision: code quality vs. production readiness
Approval scope:
- Code quality
- Design and architecture
- Test coverage
- Security considerations
Typical cycle:
- Code merged at Stage 3
- Validation in Stages 5-6
- Manual approval at Stage 9 before production
Continuous Deployment (CDe) Pattern
Stage 3 = Combined First and Second-Level Approval
When a peer reviewer approves in CDe pattern, they're saying:
- ✅ "This code is well-designed and correct"
- ✅ "Tests adequately cover the changes"
- ✅ "Security has been considered"
- ✅ "This is ready for production deployment"
Production approval happens now:
- By merging, the reviewer approves production deployment
- Stage 9 becomes automated (no human approval)
Approval scope:
- Everything from RA pattern, PLUS
- Production deployment approval
- Business risk assessment
- Release timing consideration (is now a good time to deploy?)
Typical cycle:
- Code merged at Stage 3 = production approved
- Automated validation in Stages 5-6
- Automated progression to production (2-4 hours)
Why this works:
- Comprehensive automated testing (Stages 5-6) provides confidence
- Feature flags decouple deployment from release (Stage 12)
- Team trusts automated validation
- Rollback is fast and automated
Requirements for CDe pattern:
- Robust automated testing (unit, integration, acceptance, performance, security)
- Feature flags for runtime control of feature exposure
- Automated rollback on threshold breaches
- Team culture that trusts automation and peer review
Code Coverage Thresholds
Why Coverage Matters
Code coverage indicates how much of your code is executed by tests. High coverage reduces (but doesn't eliminate) the risk of undetected bugs.
Benefits:
- Confidence that logic is tested
- Catch regressions (changes that break existing functionality)
- Force developers to consider test cases
- Create safety net for refactoring
Limitations:
- Coverage ≠ quality (can have 100% coverage with meaningless assertions)
- Some code is hard to test (UI, infrastructure)
- Chasing 100% can lead to diminishing returns
Setting Thresholds
Typical thresholds:
- 80%: Minimum for merge approval (catches untested major paths)
- 90%: Goal for mature codebases
- 100%: Only for critical code (payment, security, compliance)
Why 80%:
- Balances thoroughness with pragmatism
- Catches major gaps without perfection paralysis
- Achievable without excessive effort
Why not 100%:
- Diminishing returns (last 5% is hardest)
- Some code is not worth testing (trivial getters/setters)
- Over-focus on percentage misses test quality
Coverage Requirements by Code Type
Critical code (100% required):
- Authentication and authorization
- Payment processing
- Encryption and security
- Data validation at trust boundaries
Business logic (90%+ required):
- Core domain logic
- Calculation engines
- Workflow state machines
- Complex algorithms
Standard code (80%+ required):
- API handlers
- Service layer
- Repository layer
- Utility functions
Infrastructure code (70%+ acceptable):
- Configuration loading
- Logging setup
- Framework boilerplate
- Generated code
Measuring Quality Beyond Coverage
Look for:
- Assertion quality: Do tests actually check behavior?
- Edge case coverage: Null, empty, boundary conditions tested?
- Error path coverage: Failures tested, not just happy paths?
- Test isolation: Tests don't depend on each other?
Red flags:
- Tests with no assertions
- Tests that only check "no exception thrown"
- Tests that test implementation, not behavior
- Flaky tests (sometimes pass, sometimes fail)
Implementation Strategies
Merge Request Template
Provide a template that guides submitter and reviewer:
## Changes
<!-- Describe what changed and why -->
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
<!-- Describe testing performed -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review performed
- [ ] Comments added for complex logic
- [ ] Documentation updated
- [ ] Tests added for new functionality
- [ ] All tests passing
- [ ] No new warnings
## Security Considerations
<!-- Any security implications? -->
## Performance Impact
<!-- Any performance considerations? -->
Automated Review Comments
Use bots to provide automated feedback:
- Coverage decreased: "Warning: Coverage dropped from 85% to 82%"
- Large PR: "This PR has 800+ lines changed. Consider splitting."
- Missing tests: "No test files modified. Did you add tests?"
- Breaking change: "This modifies a public API. Update changelog?"
Review Time Budget
Small PRs (< 200 lines): 15-30 minutes Medium PRs (200-500 lines): 30-60 minutes Large PRs (> 500 lines): Consider splitting or schedule dedicated review time
Why size matters:
- Large PRs are harder to review thoroughly
- Reviewers rush through, missing issues
- Better: smaller, focused PRs reviewed carefully
Review Rotation
Distribute review load:
- Rotate reviewers to spread knowledge
- Avoid bottlenecks (one person reviewing everything)
- Build team expertise
Specialize when needed:
- Security changes → security experts
- Database migrations → database experts
- Performance-critical code → performance experts
Anti-Patterns
Anti-Pattern 1: Rubber-Stamp Reviews
Problem: Approving without actually reviewing ("LGTM" in 30 seconds on 500-line PR)
Impact: Defeats purpose of peer review, quality issues reach production
Solution: Set expectations for review depth, track review time, require meaningful feedback
Anti-Pattern 2: Nitpicking Style
Problem: Focusing on style preferences instead of substance
Impact: Wastes time, frustrates developers, misses real issues
Solution: Automate style checks (linters, formatters), focus review on design and logic
Anti-Pattern 3: Review Bottlenecks
Problem: One person reviews everything, becomes bottleneck
Impact: PRs wait days for review, slows entire team
Solution: Distribute reviews, rotate reviewers, set SLA (review within 24 hours)
Anti-Pattern 4: Massive PRs
Problem: 2000-line changes in one PR
Impact: Impossible to review thoroughly, issues slip through
Solution: Encourage smaller, focused PRs (< 500 lines), break features into increments
Anti-Pattern 5: Ignoring Automated Checks
Problem: Approving despite failing tests or security scans
Impact: Quality gates bypassed, problems reach main branch
Solution: Block approval until automated checks pass, investigate failures
Anti-Pattern 6: Review After Merge
Problem: Merging without review, "reviewing" afterwards
Impact: Defeats purpose of pre-merge review, main branch polluted
Solution: Enforce branch protection (require approval before merge)
Best Practices Summary
- Automated checks in parallel: Run while human reviews
- Focus review on substance: Design, logic, security - not style
- Small, focused PRs: < 500 lines when possible
- Review time budget: 15-60 minutes based on PR size
- Meaningful feedback: Specific, actionable comments
- Trust automation: Let tools handle mechanical issues
- Coverage + quality: Check percentage AND test quality
- RA vs CDe awareness: Understand what approval means in your pattern
- Distribute reviews: Avoid bottlenecks, spread knowledge
- Enforce gates: Don't approve if automated checks fail
Next Steps
- Pre-commit Quality Gates - Stage 2 validation
- Release Quality Gates - Stage 9 validation
- CD Model Stages 1-7 - See Stage 3 in full context
- Implementation Patterns - RA vs CDe approval differences
Tutorials | How-to Guides | Explanation | Reference
You are here: Explanation — understanding-oriented discussion that clarifies concepts.