Three-Stage Design¶
Separate concerns into discovery, execution, and reporting phases.
The Pattern¶
graph TB
A[Trigger] -->|Event| B[Stage 1: Discovery]
B -->|Targets| C[Stage 2: Execution]
C -->|Results| D[Stage 3: Summary]
%% Ghostty Hardcore Theme
style A fill:#fd971e,color:#1b1d1e
style B fill:#65d9ef,color:#1b1d1e
style C fill:#a7e22e,color:#1b1d1e
style D fill:#9e6ffe,color:#1b1d1e
Complex workflows become tangled when discovery, execution, and reporting logic interleave. Separating them into distinct stages improves:
- Testability - Each stage can be tested independently
- Debuggability - Failures localize to a specific stage
- Reusability - Stages can be mixed and matched
- Observability - Clear boundaries for logging and metrics
Stage Responsibilities¶
| Stage | Input | Output | Purpose |
|---|---|---|---|
| Discovery | Trigger event | Target list | Determine what to process |
| Execution | Target list | Results | Perform the actual work |
| Summary | Results | Report | Aggregate and communicate |
When to Use¶
Good Fit
- Processing multiple targets (repos, files, services)
- Operations that benefit from parallelization
- Workflows requiring clear audit trails
- Complex logic that needs separation of concerns
Poor Fit
- Simple single-target operations
- Workflows where discovery and execution are tightly coupled
- Real-time operations where staging adds unacceptable latency
Implementation¶
Stage 1: Discovery¶
Query for targets and output a structured list:
jobs:
discover:
runs-on: ubuntu-latest
outputs:
targets: ${{ steps.query.outputs.targets }}
count: ${{ steps.query.outputs.count }}
steps:
- name: Query targets
id: query
run: |
# Discovery logic - API calls, file scans, etc.
TARGETS='[{"name": "target-1"}, {"name": "target-2"}]'
echo "targets=$TARGETS" >> $GITHUB_OUTPUT
echo "count=$(echo $TARGETS | jq 'length')" >> $GITHUB_OUTPUT
Key principles:
- Output structured data (JSON) for downstream consumption
- Include metadata (count) for conditional logic
- Keep discovery fast - defer heavy work to execution
Stage 2: Execution¶
Process each target, typically in parallel:
execute:
needs: discover
if: needs.discover.outputs.count > 0
strategy:
matrix:
target: ${{ fromJson(needs.discover.outputs.targets) }}
fail-fast: false
steps:
- name: Process target
run: |
echo "Processing ${{ matrix.target.name }}"
# Execution logic here
Key principles:
- Use matrix strategy for parallelization
- Set
fail-fast: falseto process all targets even if some fail - Each job should be idempotent
Stage 3: Summary¶
Aggregate results and report:
summary:
needs: [discover, execute]
if: always() # Run even if execution had failures
steps:
- name: Generate report
run: |
echo "## Workflow Summary" >> $GITHUB_STEP_SUMMARY
echo "Processed ${{ needs.discover.outputs.count }} targets" >> $GITHUB_STEP_SUMMARY
if [ "${{ needs.execute.result }}" == "failure" ]; then
echo ":warning: Some targets failed" >> $GITHUB_STEP_SUMMARY
else
echo ":white_check_mark: All targets succeeded" >> $GITHUB_STEP_SUMMARY
fi
Key principles:
- Use
if: always()to run regardless of execution outcome - Write to
$GITHUB_STEP_SUMMARYfor visibility - Aggregate success/failure counts when possible
Data Flow¶
flowchart LR
subgraph S1["Stage 1"]
A[Query API] --> B[Build JSON]
end
subgraph S2["Stage 2"]
C[Parse JSON] --> D[Matrix Jobs]
D --> E1[Job 1]
D --> E2[Job 2]
D --> E3[Job N]
end
subgraph S3["Stage 3"]
F[Collect Results] --> G[Generate Report]
end
B -->|"outputs.targets"| C
E1 & E2 & E3 -->|"needs.execute.result"| F
%% Ghostty Hardcore Theme
style A fill:#65d9ef,color:#1b1d1e
style B fill:#65d9ef,color:#1b1d1e
style C fill:#a7e22e,color:#1b1d1e
style D fill:#a7e22e,color:#1b1d1e
style E1 fill:#5e7175,color:#f8f8f3
style E2 fill:#5e7175,color:#f8f8f3
style E3 fill:#5e7175,color:#f8f8f3
style F fill:#9e6ffe,color:#1b1d1e
style G fill:#9e6ffe,color:#1b1d1e
Variations¶
Two-Stage (No Summary)¶
For simple workflows where reporting isn't needed:
Four-Stage (With Validation)¶
Add a validation stage before execution:
jobs:
discover:
# Find targets
validate:
needs: discover
# Verify targets are valid, check permissions, etc.
execute:
needs: validate
# Process validated targets
summary:
needs: [discover, execute]
Nested Stages¶
For hierarchical targets (orgs → repos → branches):
jobs:
discover-orgs:
# Find organizations
discover-repos:
needs: discover-orgs
strategy:
matrix:
org: ${{ fromJson(needs.discover-orgs.outputs.orgs) }}
# Find repos per org
execute:
needs: discover-repos
# Process all repos
Anti-Patterns¶
Interleaved Logic¶
# Bad: discovery and execution mixed
- name: Find and process
run: |
for target in $(find-targets); do
process "$target"
done
This loses parallelization and makes failures harder to diagnose.
Heavy Discovery¶
# Bad: doing too much in discovery
- name: Discover
run: |
TARGETS=$(expensive-api-call)
VALIDATED=$(validate-all "$TARGETS") # Should be separate stage
ENRICHED=$(enrich-all "$VALIDATED") # Should be separate stage
Keep discovery lightweight. Defer expensive operations to execution.
Ignoring Partial Failures¶
Always use if: always() on summary stages.
Real-World Applications¶
- File Distribution - Discover repos, distribute files, report results
- Dependency Updates - Find outdated deps, create PRs, summarize changes
- Security Scanning - Enumerate targets, scan each, aggregate findings
- Configuration Drift - Query resources, compare to desired state, report drift
See File Distribution for a complete implementation example.
Summary¶
Key Takeaways
- Separate concerns - Discovery, execution, and reporting are distinct
- Use job outputs - Pass structured data between stages
- Parallelize execution - Matrix strategy for horizontal scaling
- Always summarize - Use
if: always()on summary stages - Keep discovery light - Defer heavy work to execution