Skip to content

The Wall at 85%: When Tests Aren't the Problem

The test suite was comprehensive. Table-driven tests. Error paths covered. Race detector enabled.

Coverage: 85.8%. Stuck.

We needed 95%. The team wrote more tests. Coverage stayed at 85%.

The problem wasn't the tests. It was the code.


The Comprehensive Test Suite

The project had 200+ tests across all packages:

  • ✅ Table-driven tests for variations
  • ✅ Error path coverage
  • ✅ Race detector in every run
  • ✅ Integration tests for CLI
  • ✅ Mocked file I/O

Coverage by package:

Package Coverage
cmd/readability 80.0%
pkg/analyzer 94.7%
pkg/config 89.8%
pkg/markdown 95.1%
pkg/output 99.6%
Total 85.8%

The CLI package dragged down the total. At 80%, it needed work.

But adding more tests didn't move the needle.


The Complexity Discovery

gocyclo revealed the problem:

cmd/readability/main.go:45: main.run() complexity = 35 (limit: 15)
pkg/markdown/parser.go:78: Parse() complexity = 21 (limit: 15)

Cyclomatic complexity measures decision points. A complexity of 35 means 35 different execution paths through one function.

Testing all 35 paths in a monolithic function is impossible without:

  • Setting up all flag combinations
  • Mocking file I/O for every branch
  • Handling errors at multiple depths
  • Verifying output for each path

The tests would be unmaintainable. The code needed to change first.


The Refactoring

main.run() did everything in one function:

  • Parse flags (+5 complexity)
  • Load config or auto-detect (+8 complexity)
  • Apply flag overrides (+4 complexity)
  • Analyze files (+6 complexity)
  • Format output (+7 complexity)
  • Check thresholds (+5 complexity)

Total: 35 decision points in one function.

We extracted 11 focused functions:

  • loadConfig() - complexity 5
  • applyFlagOverrides() - complexity 5
  • analyzeTarget() - complexity 5
  • outputResults() - complexity 7
  • checkResults() - complexity 3
  • countFailures() - complexity 8
  • printFailureGuidance() - complexity 4
  • Four more guidance functions - complexity 1 each

New run() complexity: 6 (was 35).

Each extracted function: testable independently.


The Breakthrough

After refactoring:

Before tests:

func TestRun_AllCombinations(t *testing.T) {
    // Need to test:
    // - Config from file vs auto-detect
    // - Multiple flag override scenarios
    // - File vs directory analysis
    // - JSON vs table vs markdown output
    // - Pass vs fail thresholds
    // - Error at each layer

    // Result: 50+ test cases, each with complex setup
}

After tests:

func TestLoadConfig(t *testing.T) {
    tests := []struct {
        name    string
        path    string
        want    *Config
        wantErr bool
    }{
        {"explicit file", "test.yml", expectedConfig, false},
        {"auto-detect", "", autoConfig, false},
        {"missing file", "none.yml", nil, true},
    }
    // Simple table-driven tests
}

func TestApplyFlagOverrides(t *testing.T) {
    // Test ONLY flag application logic
}

func TestOutputResults(t *testing.T) {
    // Test ONLY output formatting
}

Each function tested independently. Coverage jumped.

Final coverage:

Package Before After
cmd/readability 80.0% 97.5%
pkg/analyzer 94.7% 99.1%
pkg/config 89.8% 100.0%
pkg/markdown 95.1% 98.8%
pkg/output 99.6% 99.6%
Total 85.8% 99.0%

From 85% to 99% without adding exotic test infrastructure. By making code testable.


The Lesson: Complexity Blocks Coverage

High cyclomatic complexity makes comprehensive testing impractical:

  • Complexity 35: 35 paths to test. Requires combinatorial setup.
  • Complexity 6: 6 paths to test. Table-driven tests work.

The wall wasn't about writing more tests. It was about making code simple enough to test.

Refactoring for testability came first. Coverage followed.


The Enforcement

We raised the threshold from 80% to 95% and made codecov.yml the single source of truth.

Both CI and pre-commit hooks read from it:

THRESHOLD=$(grep -A2 "project:" codecov.yml | grep "target:" | head -1 | sed "s/.*target: *\([0-9]*\).*/\1/")

One definition. Zero drift. Pre-push hook blocks commits below threshold.

This satisfied OpenSSF Best Practices criteria for testing (95%+ coverage, race detection, CI enforcement).


What Changed

Before: "We need more tests to hit 95%."

After: "We need to refactor so tests are possible."

The wall wasn't about test quantity. It was about code quality.

Coverage reached 99% when code became simple enough to test.

Implementation Patterns

See Test Coverage Patterns for refactoring techniques, Codecov configuration, and single-source-of-truth threshold management.



The coverage was stuck. More tests didn't help. Refactoring broke the wall. Code became testable. Coverage reached 99%. The blocker wasn't tests. It was complexity.

Comments