The Wall at 85%: When Tests Aren't the Problem¶
The test suite was comprehensive. Table-driven tests. Error paths covered. Race detector enabled.
Coverage: 85.8%. Stuck.
We needed 95%. The team wrote more tests. Coverage stayed at 85%.
The problem wasn't the tests. It was the code.
The Comprehensive Test Suite¶
The project had 200+ tests across all packages:
- ✅ Table-driven tests for variations
- ✅ Error path coverage
- ✅ Race detector in every run
- ✅ Integration tests for CLI
- ✅ Mocked file I/O
Coverage by package:
| Package | Coverage |
|---|---|
| cmd/readability | 80.0% |
| pkg/analyzer | 94.7% |
| pkg/config | 89.8% |
| pkg/markdown | 95.1% |
| pkg/output | 99.6% |
| Total | 85.8% |
The CLI package dragged down the total. At 80%, it needed work.
But adding more tests didn't move the needle.
The Complexity Discovery¶
gocyclo revealed the problem:
cmd/readability/main.go:45: main.run() complexity = 35 (limit: 15)
pkg/markdown/parser.go:78: Parse() complexity = 21 (limit: 15)
Cyclomatic complexity measures decision points. A complexity of 35 means 35 different execution paths through one function.
Testing all 35 paths in a monolithic function is impossible without:
- Setting up all flag combinations
- Mocking file I/O for every branch
- Handling errors at multiple depths
- Verifying output for each path
The tests would be unmaintainable. The code needed to change first.
The Refactoring¶
main.run() did everything in one function:
- Parse flags (+5 complexity)
- Load config or auto-detect (+8 complexity)
- Apply flag overrides (+4 complexity)
- Analyze files (+6 complexity)
- Format output (+7 complexity)
- Check thresholds (+5 complexity)
Total: 35 decision points in one function.
We extracted 11 focused functions:
loadConfig()- complexity 5applyFlagOverrides()- complexity 5analyzeTarget()- complexity 5outputResults()- complexity 7checkResults()- complexity 3countFailures()- complexity 8printFailureGuidance()- complexity 4- Four more guidance functions - complexity 1 each
New run() complexity: 6 (was 35).
Each extracted function: testable independently.
The Breakthrough¶
After refactoring:
Before tests:
func TestRun_AllCombinations(t *testing.T) {
// Need to test:
// - Config from file vs auto-detect
// - Multiple flag override scenarios
// - File vs directory analysis
// - JSON vs table vs markdown output
// - Pass vs fail thresholds
// - Error at each layer
// Result: 50+ test cases, each with complex setup
}
After tests:
func TestLoadConfig(t *testing.T) {
tests := []struct {
name string
path string
want *Config
wantErr bool
}{
{"explicit file", "test.yml", expectedConfig, false},
{"auto-detect", "", autoConfig, false},
{"missing file", "none.yml", nil, true},
}
// Simple table-driven tests
}
func TestApplyFlagOverrides(t *testing.T) {
// Test ONLY flag application logic
}
func TestOutputResults(t *testing.T) {
// Test ONLY output formatting
}
Each function tested independently. Coverage jumped.
Final coverage:
| Package | Before | After |
|---|---|---|
| cmd/readability | 80.0% | 97.5% |
| pkg/analyzer | 94.7% | 99.1% |
| pkg/config | 89.8% | 100.0% |
| pkg/markdown | 95.1% | 98.8% |
| pkg/output | 99.6% | 99.6% |
| Total | 85.8% | 99.0% |
From 85% to 99% without adding exotic test infrastructure. By making code testable.
The Lesson: Complexity Blocks Coverage¶
High cyclomatic complexity makes comprehensive testing impractical:
- Complexity 35: 35 paths to test. Requires combinatorial setup.
- Complexity 6: 6 paths to test. Table-driven tests work.
The wall wasn't about writing more tests. It was about making code simple enough to test.
Refactoring for testability came first. Coverage followed.
The Enforcement¶
We raised the threshold from 80% to 95% and made codecov.yml the single source of truth.
Both CI and pre-commit hooks read from it:
THRESHOLD=$(grep -A2 "project:" codecov.yml | grep "target:" | head -1 | sed "s/.*target: *\([0-9]*\).*/\1/")
One definition. Zero drift. Pre-push hook blocks commits below threshold.
This satisfied OpenSSF Best Practices criteria for testing (95%+ coverage, race detection, CI enforcement).
What Changed¶
Before: "We need more tests to hit 95%."
After: "We need to refactor so tests are possible."
The wall wasn't about test quantity. It was about code quality.
Coverage reached 99% when code became simple enough to test.
Implementation Patterns
See Test Coverage Patterns for refactoring techniques, Codecov configuration, and single-source-of-truth threshold management.
Related Patterns¶
- OpenSSF Best Practices Badge - High coverage made badge criteria trivial
- Always Works QA - Coverage as quality gate
- SDLC Hardening - Testing in audit context
The coverage was stuck. More tests didn't help. Refactoring broke the wall. Code became testable. Coverage reached 99%. The blocker wasn't tests. It was complexity.