Three-Stage Design¶

Separate concerns into discovery, execution, and reporting phases.

Key Insight

Complex workflows become tangled when discovery, execution, and reporting logic interleave. Separating them into distinct stages improves testability, debuggability, and observability.

The Pattern¶

graph TB
    A[Trigger] -->|Event| B[Stage 1: Discovery]
    B -->|Targets| C[Stage 2: Execution]
    C -->|Results| D[Stage 3: Summary]

    %% Ghostty Hardcore Theme
    style A fill:#fd971e,color:#1b1d1e
    style B fill:#65d9ef,color:#1b1d1e
    style C fill:#a7e22e,color:#1b1d1e
    style D fill:#9e6ffe,color:#1b1d1e

Complex workflows become tangled when discovery, execution, and reporting logic interleave. Separating them into distinct stages improves:

Testability - Each stage can be tested independently
Debuggability - Failures localize to a specific stage
Reusability - Stages can be mixed and matched
Observability - Clear boundaries for logging and metrics

Stage Responsibilities¶

Stage	Input	Output	Purpose
Discovery	Trigger event	Target list	Determine what to process
Execution	Target list	Results	Perform the actual work
Summary	Results	Report	Aggregate and communicate

When to Use¶

Good Fit

Processing multiple targets (repos, files, services)
Operations that benefit from parallelization
Workflows requiring clear audit trails
Complex logic that needs separation of concerns

Poor Fit

Simple single-target operations
Workflows where discovery and execution are tightly coupled
Real-time operations where staging adds unacceptable latency

Implementation¶

Stage 1: Discovery¶

Query for targets and output a structured list:

jobs:
  discover:
    runs-on: ubuntu-latest
    outputs:
      targets: ${{ steps.query.outputs.targets }}
      count: ${{ steps.query.outputs.count }}
    steps:
      - name: Query targets
        id: query
        run: |
          # Discovery logic - API calls, file scans, etc.
          TARGETS='[{"name": "target-1"}, {"name": "target-2"}]'
          echo "targets=$TARGETS" >> $GITHUB_OUTPUT
          echo "count=$(echo $TARGETS | jq 'length')" >> $GITHUB_OUTPUT

Key principles:

Output structured data (JSON) for downstream consumption
Include metadata (count) for conditional logic
Keep discovery fast - defer heavy work to execution

Stage 2: Execution¶

Process each target, typically in parallel:

execute:
  needs: discover
  if: needs.discover.outputs.count > 0
  strategy:
    matrix:
      target: ${{ fromJson(needs.discover.outputs.targets) }}
    fail-fast: false
  steps:
    - name: Process target
      run: |
        echo "Processing ${{ matrix.target.name }}"
        # Execution logic here

Key principles:

Use matrix strategy for parallelization
Set fail-fast: false to process all targets even if some fail
Each job should be idempotent

Stage 3: Summary¶

Aggregate results and report:

summary:
  needs: [discover, execute]
  if: always()  # Run even if execution had failures
  steps:
    - name: Generate report
      run: |
        echo "## Workflow Summary" >> $GITHUB_STEP_SUMMARY
        echo "Processed ${{ needs.discover.outputs.count }} targets" >> $GITHUB_STEP_SUMMARY

        if [ "${{ needs.execute.result }}" == "failure" ]; then
          echo ":warning: Some targets failed" >> $GITHUB_STEP_SUMMARY
        else
          echo ":white_check_mark: All targets succeeded" >> $GITHUB_STEP_SUMMARY
        fi

Key principles:

Use if: always() to run regardless of execution outcome
Write to $GITHUB_STEP_SUMMARY for visibility
Aggregate success/failure counts when possible

Data Flow¶

flowchart LR
    subgraph S1["Stage 1"]
        A[Query API] --> B[Build JSON]
    end

    subgraph S2["Stage 2"]
        C[Parse JSON] --> D[Matrix Jobs]
        D --> E1[Job 1]
        D --> E2[Job 2]
        D --> E3[Job N]
    end

    subgraph S3["Stage 3"]
        F[Collect Results] --> G[Generate Report]
    end

    B -->|"outputs.targets"| C
    E1 & E2 & E3 -->|"needs.execute.result"| F

    %% Ghostty Hardcore Theme
    style A fill:#65d9ef,color:#1b1d1e
    style B fill:#65d9ef,color:#1b1d1e
    style C fill:#a7e22e,color:#1b1d1e
    style D fill:#a7e22e,color:#1b1d1e
    style E1 fill:#5e7175,color:#f8f8f3
    style E2 fill:#5e7175,color:#f8f8f3
    style E3 fill:#5e7175,color:#f8f8f3
    style F fill:#9e6ffe,color:#1b1d1e
    style G fill:#9e6ffe,color:#1b1d1e

Variations¶

Two-Stage (No Summary)¶

For simple workflows where reporting isn't needed:

jobs:
  discover:
    # ...
  execute:
    needs: discover
    # ...

Four-Stage (With Validation)¶

Add a validation stage before execution:

jobs:
  discover:
    # Find targets
  validate:
    needs: discover
    # Verify targets are valid, check permissions, etc.
  execute:
    needs: validate
    # Process validated targets
  summary:
    needs: [discover, execute]

Nested Stages¶

For hierarchical targets (orgs → repos → branches):

jobs:
  discover-orgs:
    # Find organizations
  discover-repos:
    needs: discover-orgs
    strategy:
      matrix:
        org: ${{ fromJson(needs.discover-orgs.outputs.orgs) }}
    # Find repos per org
  execute:
    needs: discover-repos
    # Process all repos

Anti-Patterns¶

Interleaved Logic¶

# Bad: discovery and execution mixed
- name: Find and process
  run: |
    for target in $(find-targets); do
      process "$target"
    done

This loses parallelization and makes failures harder to diagnose.

Heavy Discovery¶

# Bad: doing too much in discovery
- name: Discover
  run: |
    TARGETS=$(expensive-api-call)
    VALIDATED=$(validate-all "$TARGETS")  # Should be separate stage
    ENRICHED=$(enrich-all "$VALIDATED")   # Should be separate stage

Keep discovery lightweight. Defer expensive operations to execution.

Ignoring Partial Failures¶

# Bad: no summary when execution fails
summary:
  needs: execute
  # Won't run if execute fails!

Always use if: always() on summary stages.

Real-World Applications¶

File Distribution - Discover repos, distribute files, report results
Dependency Updates - Find outdated deps, create PRs, summarize changes
Security Scanning - Enumerate targets, scan each, aggregate findings
Configuration Drift - Query resources, compare to desired state, report drift

See File Distribution for a complete implementation example.

Summary¶

Key Takeaways

Separate concerns - Discovery, execution, and reporting are distinct
Use job outputs - Pass structured data between stages
Parallelize execution - Matrix strategy for horizontal scaling
Always summarize - Use if: always() on summary stages
Keep discovery light - Defer heavy work to execution

Three-Stage Design¶

The Pattern¶

Stage Responsibilities¶

When to Use¶

Implementation¶

Stage 1: Discovery¶

Stage 2: Execution¶

Stage 3: Summary¶

Data Flow¶

Variations¶

Two-Stage (No Summary)¶

Four-Stage (With Validation)¶

Nested Stages¶

Anti-Patterns¶

Interleaved Logic¶

Heavy Discovery¶

Ignoring Partial Failures¶

Real-World Applications¶

Summary¶

Comments