๐Ÿค– Automated PR Build Debugging

How Claude Code Fixes Your Failing Tests Automatically

Vibed by Jake Gilfix using Claude Sonnet 4.5

The Problem

๐Ÿ“ You push code to a PR

โณ Buildkite runs tests... they fail

๐Ÿ” You check GitHub, click build link, read logs

๐Ÿ› You identify the issue, write a fix

๐Ÿ”„ Push, wait, repeat...

This is repetitive, time-consuming, and interrupts flow.

The Vision

What if your PRs could debug themselves?

The Solution

An autonomous system that:

  • ๐Ÿ”Ž Detects when your PRs have failing builds
  • ๐Ÿ“Š Analyzes Buildkite logs using Claude
  • ๐Ÿ”ง Creates fix commits automatically
  • ๐Ÿ’ฌ Notifies you for approval
  • ๐Ÿš€ Pushes when you approve

System Architecture

๐ŸŽฏ /debug-build Skill

Claude Code skill that analyzes Buildkite failures

๐Ÿ‘๏ธ PR Monitor

Daemon that watches your PRs for failures

๐ŸŽ Debug Wrapper

Orchestrates Claude invocation & approval flow

๐Ÿ“‹ Pending Fixes Manager

Queue system for fixes awaiting review

Component 1: /debug-build Skill

Input: Buildkite build URL

Process:

  1. Fetches build details via Buildkite MCP
  2. Retrieves and analyzes log output
  3. Identifies root cause (compilation error, test failure, etc.)
  4. Generates fix and creates local commit

Output: Git commit with fix (not pushed yet)

Component 2: PR Monitor

Runs every 2 minutes via launchd

What it does:

  1. Queries GitHub for your open PRs
  2. Checks status of buildkite/* checks
  3. Detects new failures (tracks processed state)
  4. Launches debug wrapper for each failure

PR Monitor: Smart Features

๐ŸŽฏ Intelligent Batching

Waits for all CI checks, processes multiple failures in one session

๐Ÿšฆ Concurrency Control

Max 2 concurrent sessions with queueing

๐Ÿ“‹ Draft PR Handling

Prompts once per draft PR for opt-in

๐Ÿ”• Skip Markers

[skip-auto-debug] disables automation

Component 3: Debug Wrapper

Orchestrates a single debugging session:

  1. Registers session (prevents duplicate work)
  2. Invokes Claude CLI with /debug-build
  3. Captures full output to log file
  4. Checks if commit was created
  5. Shows approval dialog with confidence level
  6. Handles Push/View/Skip actions

The Approval Dialog

Shows you:

  • ๐ŸŸข/๐ŸŸก/๐Ÿ”ด Confidence level โ€ข PR/check details โ€ข Commit message โ€ข Fix summary

Three options:

  • Push: Deploys fix, posts PR comment
  • View Output: Opens debug log
  • Skip/Dismiss: Saves to queue

Component 4: Pending Fixes Manager

CLI tool for managing the queue:


pr-pending-fixes list           # Show all pending fixes
pr-pending-fixes review         # Interactive approval workflow
pr-pending-fixes cleanup-passed # Remove fixes for passing checks
pr-pending-fixes list-ignored   # Show ignored PRs
pr-pending-fixes unignore 123   # Re-enable auto-debug for PR
                    

Features:

  • Auto-cleanup when checks pass or PRs close
  • Periodic reminders (escalating intervals)
  • SketchyBar badge integration (optional)

End-to-End Flow

1. You push commits to PR #123
2. GitHub triggers Buildkite checks
3. Monitor detects failure in github-pr-tests
4. Wrapper launches Claude with PR info + build URL
5. Claude analyzes logs, identifies issue, creates commit
6. ๐Ÿ’ฌ Approval dialog appears with fix summary
7. You choose: Push (โ†’ PR comment) / View / Skip (โ†’ queue)
8. If skipped: SketchyBar badge + periodic reminders
9. Later: pr-pending-fixes review to process queue

System Integrations

โš™๏ธ launchd

Monitor runs every 2 min
Reminders every 10 min

๐Ÿ“Š SketchyBar (optional)

Shows pending fix count
Click to review
Nice-to-have UI feature

๐Ÿ”” Notifications

Session start/complete
Periodic reminders

๐Ÿ™ GitHub

PR comments with fix details
Auto-posts when pushed

Real-World Example

Scenario: Kotlin compilation error after upgrading to 2.3

What happened:

  1. Pushed code, tests failed
  2. 10 minutes later: notification appeared
  3. Reviewed fix: "restore necessary !! operators in TaskDAOTest"
  4. Saw detailed explanation about Kotlin smart cast behavior
  5. Clicked "Push"
  6. Fix deployed with co-authored commit + PR comment
Saved ~15 minutes of manual debugging

Key Features

Autonomous Detection Intelligent Batching Concurrency Control Human-in-the-Loop State Management Queue System Desktop Integration Auto-Cleanup Confidence Levels PR Comments

System Files


~/bin/
  watch-pr-builds.sh           # Monitor daemon
  claude-debug-wrapper.sh      # Session wrapper
  pr-pending-fixes             # Queue manager
  pr-monitor                   # Control script

~/.claude/
  pr-watch.log                 # Main log
  pr-watch-state.json          # Processed failures
  pending-fixes/               # Queue metadata
  debug-results/               # Session logs
  ignored-prs.json             # Excluded PRs

~/Library/LaunchAgents/
  com.user.claude-pr-monitor.plist
  com.user.claude-pr-reminder.plist

~/.config/sketchybar/
  plugins/claude_fixes.sh      # Badge plugin
                    

Control Commands

Monitor control:


pr-monitor start      # Start the monitor
pr-monitor stop       # Stop the monitor
pr-monitor status     # Check status + recent activity
pr-monitor logs       # Tail the log file
pr-monitor test       # Run once for testing
                    

Pending fixes:


pr-pending-fixes list           # List pending
pr-pending-fixes review         # Interactive review
pr-pending-fixes cleanup-passed # Auto-cleanup
                        

Impact

โฑ๏ธ Saves ~10-20 minutes per build failure

๐Ÿ”„ Eliminates context switching

๐Ÿง  Reduces cognitive load

๐Ÿš€ Accelerates feedback loops

Current Limitations

  • Single repository (all-the-things)
  • Buildkite-specific (doesn't handle other CI systems)
  • GitHub MCP has SAML issues (uses gh CLI fallback)
  • Not all failures are auto-fixable (complex schema changes, etc.)

Future Enhancements

  • ๐Ÿ” Multi-repo support - Monitor multiple repositories
  • ๐Ÿ“Š Success metrics - Track fix accuracy and time saved
  • ๐Ÿงช Local test running - Run tests before creating fixes
  • ๐ŸŽฏ Pattern learning - Build skill library from repeated fixes
  • โšก Faster detection - GitHub webhook instead of polling
  • ๐Ÿค Team integration - Auto-debug for team members' PRs

Technical Insights

  • State management is critical - Prevents duplicate work and tracks what's been processed
  • Concurrency limits prevent resource exhaustion - Claude sessions can be memory-intensive
  • Intelligent batching reduces costs - One session for multiple failures vs. N sessions
  • Human approval is essential - Not all Claude fixes are correct; review loop maintains trust
  • Auto-cleanup prevents queue bloat - Remove obsolete fixes when checks pass

Want to see it in action?

Let's walk through an example!

Live Demo: Setup

What we did:

  1. Created PR #170396 with deliberate Kotlin syntax error
  2. Pushed to trigger Buildkite tests
  3. Waited for the monitor to detect the failure...

Error introduced:

open class NotFoundException(
    message: String? = null
 : Exception(message)   // Missing ) before :

Demo: Step 1 - Detection

Monitor detected the failure and started debugging:

Started debugging notification

What happened:

  • Monitor polled GitHub API (runs every 2 minutes)
  • Found buildkite/all-the-things-github-pr-tests failing
  • Launched Claude debug session with build URL

Demo: Step 2 - Analysis & Fix

Claude analyzed the build logs:

  1. Fetched Buildkite logs via MCP
  2. Identified: Kotlin error in NotFoundException.kt
  3. Root cause: Missing ) before : Exception(message)
  4. Created fix commit โ€ข Confidence: ๐ŸŸข HIGH

Commit created:

fix: restore missing closing parenthesis in NotFoundException class

Root cause: Previous commit deliberately introduced syntax error...

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Analysis time: ~4 minutes

Demo: Step 3 - Approval Flow

Fix saved to pending queue:

Fix pending notification

Integration activated:

SketchyBar Badge

SketchyBar badge

Demo: Step 4 - Review & Push

Approval dialog with full context:

Approval dialog

Dialog shows:

  • ๐ŸŸข HIGH CONFIDENCE โ€ข PR/check details โ€ข Root cause summary
  • Actions: Push / View Output / Remind Later

Demo: Results

๐Ÿ“Š What was accomplished:

  • Failure detected automatically (2 min polling)
  • Root cause identified via AI (~4 min)
  • Fix commit created and ready for review
  • Human approval with full context
  • One-click push to deploy

Time saved: ~15 minutes โ€ข Context switching: Zero

View PR: github.com/Affirm/all-the-things/pull/170396