Testing Strategy: Overview

Introduction

Testing is integrated throughout every stage of the Continuous Delivery Model. Rather than treating testing as a separate phase that happens after development, the CD Model embraces continuous validation through multiple test levels executed at different stages.

This article explains the test taxonomy used in the CD Model, the purpose of each test level, and how the shift-left strategy improves quality while reducing costs.

Why Multiple Test Levels

Different test levels serve different purposes:

Speed vs Coverage: Fast unit tests provide rapid feedback, while slower E2E tests validate complete workflows
Isolation vs Realism: Isolated tests are reliable and fast, while integrated tests catch real-world issues
Early Detection: Shift-left testing finds defects when they're cheapest to fix
Confidence: Multiple validation layers build confidence in quality

The CD Model uses a taxonomy of test levels (L0-L4) based on execution environment and scope to balance speed, coverage, and confidence.

Test Taxonomy

This diagram shows the taxonomy in context of shift-left and shift-right: The left side shows L0-L3 tests (shift-left) running in local/agent/PLTE environments with increasing scope from unit to vertical testing. The right side shows L4 tests (shift-right) running in production for horizontal validation. The center "out-of-category" area represents the anti-pattern of horizontal pre-production environments to avoid.

Test Taxonomy Breakdown

This diagram maps test levels to execution constraints: Shows how L0-L2 (local/agent), L3 (PLTE vertical), and L4 (production horizontal) are categorized by execution environment, scope, and test double usage. The breakdown illustrates the determinism vs coherency trade-off as you progress from L0 to L4.

How Tests Are Categorized

The taxonomy defines test categories based on execution environment and scope:

Mermaid diagram

Level	Name	Shift Direction	Execution Environment	Scope	External Dependencies	Determinism	Domain Coherency
L0	Unit Tests	LEFT	devbox or agent	Source and binary	All replaced with test doubles	Highest	Lowest
L1	Unit Tests	LEFT	devbox or agent	Source and binary	All replaced with test doubles	Highest	Lowest
L2	Emulated System Tests	LEFT	devbox or agent	Deployable artifacts	All replaced with test doubles	High	High
L3	In-Situ Vertical Tests	LEFT	PLTE	Deployed system	All replaced with test doubles	Moderate	High
L4	Testing in Production	RIGHT	Production	Deployed system	All production, may use live test doubles	High	Highest

Out-of-Category (Anti-Pattern):

Level	Name	Shift Direction	Execution Environment	Scope	External Dependencies	Determinism	Domain Coherency
Horizontal E2E	Horizontal End-to-End	non-shifted	Shared testing environment	Deployed system	Tied up to non-production "test" deployments	Lowest	High

Old-school horizontal pre-production environments where multiple teams' pre-prod services are linked together are highly fragile and non-deterministic. This taxonomy explicitly advocates shifting LEFT (to L0-L3) and RIGHT (to L4) to avoid this pattern.

Determinism vs Domain Coherency Trade-off

The taxonomy constrains two key aspects:

Execution requirements: What binaries, tooling, and configuration are needed
Test scope: Vertical vs horizontal boundaries, and test double usage

Determinism (Lower Lx values):

Predictable, repeatable results
Fast execution
Reliable failure signals
Easy to debug
Highest in L0-L1 (controlled environments, test doubles)

Domain Coherency (Higher Lx values):

Realistic domain language
Actual production behavior
Real cross-service interactions
Business-meaningful validation
Highest in L4 (production environment)

Lower Lx values provide higher determinism but lower domain coherency, while higher Lx values provide lower determinism but higher domain coherency.

The Shift-Left and Shift-Right Strategy:

Maximize testing at L0-L3 (left) and L4 (right) to avoid the out-of-category anti-pattern (Horizontal E2E): horizontal pre-production environments where multiple teams' pre-prod services are linked together. These tests are highly fragile and non-deterministic.

Each level builds on the confidence provided by lower levels.

Mermaid diagram

Tag Taxonomy

The testing taxonomy uses tags to categorize tests by level, verification type, and dependencies. See Testing Taxonomy for complete documentation.

Key Tag Categories:

Test Level Tags (@L0-@L4) - Execution environment and scope
Verification Tags (@ov, @iv, @pv, @piv, @ppv) - REQUIRED for all Gherkin scenarios
System Dependencies (@deps:*) - Declare required tooling
Test Suites - Tag-based test selection (commit, integration, acceptance, production-verification)

Tag Examples by Test Level:

L0-L1: Go tests (no Gherkin verification tags)
L2: @L2 @ov (operational verification)
L3: @L3 @iv (installation), @L3 @pv (performance), @L3 @ov (operational)
L4: @L4 @piv (installation), @L4 @ppv (performance)

L0: Unit Tests

Purpose: Validate individual functions, methods, and classes in isolation.

Characteristics

Speed:

Milliseconds per test
Hundreds or thousands of tests complete in seconds
Fastest feedback possible

Isolation:

No external dependencies
No network calls
No file system access
No database queries
All dependencies mocked or stubbed

Scope:

Single function or method
Single class or module
Pure logic validation

Example:

// Unit test for validation logic
func TestValidateEmail(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        wantErr bool
    }{
        {"valid email", "user@example.com", false},
        {"missing @", "userexample.com", true},
        {"empty string", "", true},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := ValidateEmail(tt.email)
            if (err != nil) != tt.wantErr {
                t.Errorf("ValidateEmail() error = %v, wantErr %v", err, tt.wantErr)
            }
        })
    }
}

Tools and Frameworks

Go: testing package, testify
JavaScript: Jest, Mocha, Jasmine
Python: pytest, unittest
Java: JUnit, TestNG

Best Practices

Test public APIs, not implementation details
Use table-driven tests for multiple cases
Aim for 80%+ code coverage
Keep tests independent and deterministic
Mock all external dependencies

When to Write L0 Tests

For all business logic
For utility functions
For data transformations
For validation logic
For calculations and algorithms

CD Model Integration

Stages:

Stage 2 (Pre-commit): 5-10 minute time-box
Stage 3 (Merge Request): PR validation
Stage 4 (Commit): Continuous integration

Quality Gates:

100% pass rate required
Minimum coverage thresholds
No skipped tests

L1: Unit Tests

Purpose: Validate interactions within a component or service, using mocks for external dependencies.

Note: L1 shares the same taxonomy classification as L0 (both are "Unit Tests" running on devbox or agent with all external dependencies replaced by test doubles). L1 typically involves testing interactions between internal modules within a component, while L0 focuses on individual functions or methods.

Characteristics

Speed:

Seconds per test
Faster than full integration tests
Still suitable for pre-commit

Isolation:

Tests one component at a time
External dependencies mocked
Internal component interactions tested
No network calls to external services

Scope:

Multiple classes/modules within a component
Internal APIs and interfaces
Data flow through the component

Example:

// Component integration test with mocked repository
func TestUserService_CreateUser(t *testing.T) {
    mockRepo := &MockUserRepository{
        SaveFunc: func(user *User) error {
            return nil
        },
    }

    service := NewUserService(mockRepo)

    user := &User{Name: "John", Email: "john@example.com"}
    err := service.CreateUser(user)

    assert.NoError(t, err)
    assert.True(t, mockRepo.SaveCalled)
}

Mocking Strategies

Manual Mocks:

Implement interfaces manually
Full control over behavior
Suitable for simple interfaces

Generated Mocks:

Use tools like mockgen, mockery
Generated from interfaces
Consistent and maintainable

Test Doubles:

Stubs: Return predetermined values
Fakes: Working implementations (e.g., in-memory database)
Spies: Record calls for verification

When to Use vs L0

Use L1 when:

Testing interactions between internal modules
Validating service orchestration logic
Testing error handling across layers
Verifying data transformation pipelines

Use L0 when:

Testing individual functions
No interactions to validate
Pure logic without dependencies

CD Model Integration

Stages:

Stage 2 (Pre-commit): Run alongside L0
Stage 4 (Commit): Continuous validation

L2: Emulated System Tests

Purpose: Validate deployable artifacts in an emulated environment with all external dependencies replaced by test doubles.

Characteristics

Speed:

Seconds per test
Runs on CI agent
Suitable for pre-commit and commit stages

Execution Environment:

Developer workstation OR CI agent (same as L0-L1)
Does NOT require cloud infrastructure or PLTE
Runs in local or agent environments only

Isolation:

All external dependencies replaced with test doubles
In-memory databases or test doubles for persistence
No real external service calls

Scope:

Deployable artifacts tested in emulated environment
Component-to-component communication within the unit-under-test
API contract validation using test doubles
Data flow through multiple components
Message handling with test double queues

Example:

// L2 integration test with test double database
func TestUserService_CreateAndRetrieveUser(t *testing.T) {
    // Test double: in-memory repository
    mockRepo := NewInMemoryUserRepository()

    service := NewUserService(mockRepo)
    user := &User{Name: "Jane", Email: "jane@example.com"}

    // Test the integration between service and repository
    err := service.CreateUser(user)
    assert.NoError(t, err)

    // Verify through the service (tests the full flow)
    retrieved, err := service.GetUserByEmail("jane@example.com")
    assert.NoError(t, err)
    assert.Equal(t, user.Name, retrieved.Name)
}

Test Doubles for All External Dependencies

L2 Taxonomy Constraint: All external dependencies must be replaced with test doubles.

Why Test Doubles:

Determinism: Predictable, repeatable results
Speed: No network latency or external service delays
Reliability: No flaky external service failures
Isolation: Tests only the unit-under-test logic

Types of Test Doubles:

In-memory databases: SQLite in-memory, test repositories
Mock message queues: In-memory queue implementations
Stubbed external APIs: Return predetermined responses
Fake services: Lightweight implementations for testing
Test double runtimes: Other containers in the composition can be full test databases etc.

CD Model Integration

Stages:

Stage 4 (Commit): Run after L0/L1 pass
Stage 5 (Acceptance Testing): In PLTE environment

L3: In-Situ Vertical Tests

Purpose: Validate a deployable module in-situ in a production-like environment (PLTE) with vertical testing boundaries.

Characteristics

Speed:

Minutes per test
Deployed to PLTE (cloud environment)
Moderate execution time

Isolation:

Single deployable module boundaries only (vertical testing)
All external dependencies replaced with test doubles
Tests the deployed system in production-like infrastructure
Validates deployment and configuration

Scope:

Deployed system tested in-situ in PLTE
Single deployable module behavior in production-like infrastructure
Infrastructure validation (networking, load balancing, DNS)
Deployment procedure verification
Configuration correctness
NOT cross-service integration (that's L4)

PLTE Requirement:

L3 tests require a Production-Like Test Environment (PLTE) because they:

Validate the deployable module runs correctly in cloud infrastructure
Test with production-like networking, storage, and compute
Verify deployment procedures work
Validate infrastructure configuration (e.g., Kubernetes, load balancers)
Do NOT test cross-service interactions (use test doubles for other services)

Example (Godog):

Feature: API Service Deployment Verification

  @L3 @iv
  Scenario: Service deploys successfully to PLTE
    Given the API service is deployed to PLTE
    When I check the health endpoint
    Then the service should respond with status 200
    And all dependencies should report healthy

  @L3 @ov
  Scenario: API handles requests in PLTE infrastructure
    Given the API service is running in PLTE
    And external dependencies are test doubles
    When I send a user creation request
    Then the API should process the request successfully
    And the response should match the expected format

Vertical Testing Scope

What L3 Tests Validate:

Deployable module runs in production-like infrastructure
Deployment procedures work correctly
Infrastructure configuration is correct (networking, DNS, load balancing)
Service responds correctly in cloud environment
Vertical slice: Single unit boundaries only

What L3 Tests Don't Validate:

Cross-service interactions (that's L4 in production)
External service integration (use test doubles)
Complete end-to-end user workflows (that requires horizontal testing in L4)

CD Model Integration

Stages:

Stage 5 (Acceptance Testing): In PLTE
Stage 6 (Extended Testing): Comprehensive E2E suite

Best Practices:

Limit to critical user workflows (5-20 scenarios)
Keep tests maintainable and reliable
Run in parallel where possible
Use BDD frameworks for readability

L4: Testing in Production

Purpose: Validate real-world cross-service interactions in production with horizontal testing.

Characteristics

Speed:

Variable (seconds to minutes per test)
Runs in production
May use synthetic traffic

Isolation:

Production environment
Cross-service interactions (horizontal testing)
All production dependencies, may use live test doubles for specific cases
Real production infrastructure and data

Scope:

Deployed system tested in production
Cross-service workflows in production
Real end-to-end user journeys
Actual production behavior
Business-meaningful validation
Real data and real services interacting

Horizontal Testing in Production

What L4 Tests Validate:

Real cross-service interactions in production
Actual end-to-end workflows with production data
Production infrastructure behavior under real load
Service mesh / networking in production
Real user journeys (via synthetic monitoring or shadowing)

Test Double Usage in Production:

L4 may use test doubles for specific cases:

Test payment processors: Route test payments through test-double payment service
Test notification services: Capture test emails/SMS without sending to real users
Canary users: Route specific test users through test paths

Examples:

Feature: Production Cross-Service Validation

  @L4 @piv
  Scenario: Complete order fulfillment workflow
    Given I am a test user in production
    When I place an order through the API
    Then the order service should create the order
    And the payment service should charge the test payment method
    And the inventory service should reserve the items
    And the notification service should send a test confirmation email

  @L4 @ppv
  Scenario: Synthetic monitoring of critical path
    Given synthetic monitoring is running
    When the synthetic transaction executes every 5 minutes
    Then the complete user journey should succeed
    And response times should be within SLA

Exploratory and UAT

L4 also includes human-driven validation:

Exploratory Testing:

Session-based testing in production (read-only or test accounts)
Discovering unexpected behaviors in real environment

UAT (User Acceptance Testing):

Real users validating in production (feature flags control exposure)
Business stakeholder sign-off based on production behavior

CD Model Integration

Stages:

Stage 11 (Live): Primary stage for L4 production testing
Stage 12 (Release Toggling): Control test user exposure via feature flags
Production environment with monitoring and observability

Horizontal End-to-End Testing (Out-of-Category Anti-Pattern)

Taxonomy Classification:

Out-of-Category
Horizontal End-to-End
non-shifted
Shared testing environment
Tied up to non-production team to team synchronized test deployments
Lowest determinism
High domain coherency

Horizontal End-to-End (Horizontal E2E) testing refers to environments where multiple teams deploy pre-production versions of their services to a shared environment, with services interacting horizontally (service A calls service B calls service C).

Characteristics

Multiple teams deploy pre-production versions of their services to shared environment
Services interact horizontally across team boundaries
Each team controls deployment timing independently
Version mismatches are common
Difficult to reproduce issues locally
Lowest determinism of all test approaches
Non-shifted (neither left nor right in the shift strategy)

When It's Acceptable

Horizontal pre-production environments serve valid purposes in specific contexts:

Exploratory Testing:

Manual testing to discover unexpected cross-service behaviors
Session-based testing by QA teams
Investigating integration scenarios before production

Validation and Demonstration:

Stakeholder demos of cross-team features
Pre-production validation of complex integrations
Manual verification of deployment procedures
Recommended: Call this environment "Demo" for its intended purpose

Important: These environments should be used for manual, exploratory activities - not as automated quality gates.

Anti-Pattern: Using as Automated Quality Gates

Critical: Horizontal E2E environments should NOT be used as automated quality gates in the deployment pipeline.

Problems when used for automated testing:

Highly fragile: Any team's broken deployment breaks everyone's tests
Non-deterministic (Lowest determinism): Test results vary based on other teams' deployments
Slow feedback: Can't test until all dependencies are deployed
Difficult debugging: Hard to isolate which service caused failure
Blocking: One team's issues block other teams
False signals: Tests fail due to environmental issues, not code defects

From the taxonomy: "If you decide to break this fundamental rule, all you need to do is to connect your L3 PLTE to something not a test double and you have automated tests running in L3 with external dependencies. This is not a technical constraint, its an inherent constraint in the nature of Horizontal E2E: No one can control the variables, so you are not performing verification, you are playing games."

The Solution: Shift-Left and Shift-Right

Instead of relying on Horizontal E2E environments for automated testing, shift testing LEFT (L0-L3) and RIGHT (L4):

Shift LEFT (L0-L3):

L0-L1: Unit tests with test doubles on devbox/agent
L2: Emulated system tests with test doubles on devbox/agent
L3: In-situ vertical tests in PLTE with test doubles for external services
Result: Fast feedback, high determinism, no cross-team dependencies

Shift RIGHT (L4):

L4: Testing in production with real services
Use feature flags, canary deployments, synthetic monitoring
Result: Real production validation without pre-prod fragility

Avoid the non-shifted middle ground (Horizontal E2E as automated quality gates) - it combines the worst of both worlds: slow, fragile, and non-deterministic.

Shift-Left Strategy

Shift-Left Testing

This diagram illustrates shift-left and shift-right strategy: The traditional center approach (shared testing environments with horizontal integration) is replaced by two strategies. Shift-left (left arrow) moves testing to L0-L3 with test doubles in local/agent/PLTE environments for fast, deterministic feedback. Shift-right (right arrow) moves horizontal validation to L4 in production with real services. Avoid the fragile middle ground of pre-production integration environments.

Implementation

Shift-Left (L0-L3): Run tests with test doubles on local/CI/PLTE environments (Stages 2-6) for fast, deterministic feedback - catch defects early when cheapest to fix.

Shift-Right (L4): Run horizontal tests in production (Stages 11-12) with real services for validation of actual production behavior - synthetic monitoring and exploratory testing.

Avoid the Middle: Skip horizontal pre-production integration environments - they're fragile, non-deterministic, and slow.

Summary

The CD Model uses a taxonomy based on execution environment and scope:

L0-L1: Unit Tests (Shift LEFT):

Execution: Devbox or agent
Scope: Source and binary
Dependencies: All replaced with test doubles
Tags: Go tests (no Gherkin verification tags)
Trade-off: Highest determinism, lowest domain coherency

L2: Emulated System Tests (Shift LEFT):

Execution: Devbox or agent
Scope: Deployable artifacts
Dependencies: All replaced with test doubles
Tags: @L2 @ov (operational verification)
Trade-off: High determinism, high domain coherency

L3: In-Situ Vertical Tests (Shift LEFT):

Execution: PLTE
Scope: Deployed system (single deployable module boundaries)
Dependencies: All replaced with test doubles
Tags: @L3 @iv (installation), @L3 @pv (performance), @L3 @ov (operational)
Trade-off: Moderate determinism, high domain coherency

L4: Testing in Production (Shift RIGHT):

Execution: Production
Scope: Deployed system (cross-service interactions)
Dependencies: All production, may use live test doubles
Tags: @L4 @piv (installation), @L4 @ppv (performance)
Trade-off: High determinism, highest domain coherency

Out-of-Category: Horizontal End-to-End (Anti-Pattern):

Execution: Shared testing environment (non-shifted)
Scope: Deployed system
Dependencies: Tied up to non-production "test" deployments
Trade-off: Lowest determinism, high domain coherency

The shift-left and shift-right strategy maximizes testing at L0-L3 (left) and L4 (right) to avoid the out-of-category anti-pattern (Horizontal E2E) of fragile horizontal pre-production environments.

Tag Usage:

See Testing Taxonomy for complete documentation of all tags, verification requirements, test suites, and filtering rules.

Next Steps

Testing Strategy Integration - Map test levels to CD Model stages
Stages 1-7 - See testing in development stages
Stages 8-12 - See testing in release stages
Environments - Understand PLTE for L3 tests

References

Tutorials | How-to Guides | Explanation | Reference

You are here: Explanation — understanding-oriented discussion that clarifies concepts.