🤖 Automated PR Build Debugging

How Claude Code Fixes Your Failing Tests Automatically

Vibed by Jake Gilfix using Claude Sonnet 4.5

The Problem

📝 You push code to a PR

⏳ Buildkite runs tests... they fail

🔍 You check GitHub, click build link, read logs

🐛 You identify the issue, write a fix

🔄 Push, wait, repeat...

This is repetitive, time-consuming, and interrupts flow.

The Vision

What if your PRs could debug themselves?

The Solution

An autonomous system that:

🔎 Detects when your PRs have failing builds
📊 Analyzes Buildkite logs using Claude
🔧 Creates fix commits automatically
💬 Notifies you for approval
🚀 Pushes when you approve

System Architecture

🎯 /debug-build Skill

Claude Code skill that analyzes Buildkite failures

👁️ PR Monitor

Daemon that watches your PRs for failures

🎁 Debug Wrapper

Orchestrates Claude invocation & approval flow

📋 Pending Fixes Manager

Queue system for fixes awaiting review

Component 1: /debug-build Skill

Input: Buildkite build URL

Process:

Fetches build details via Buildkite MCP
Retrieves and analyzes log output
Identifies root cause (compilation error, test failure, etc.)
Generates fix and creates local commit

Output: Git commit with fix (not pushed yet)

Component 2: PR Monitor

Runs every 2 minutes via launchd

What it does:

Queries GitHub for your open PRs
Checks status of buildkite/* checks
Detects new failures (tracks processed state)
Launches debug wrapper for each failure

PR Monitor: Smart Features

🎯 Intelligent Batching

Waits for all CI checks, processes multiple failures in one session

🚦 Concurrency Control

Max 2 concurrent sessions with queueing

📋 Draft PR Handling

Prompts once per draft PR for opt-in

🔕 Skip Markers

[skip-auto-debug] disables automation

Component 3: Debug Wrapper

Orchestrates a single debugging session:

Registers session (prevents duplicate work)
Invokes Claude CLI with /debug-build
Captures full output to log file
Checks if commit was created
Shows approval dialog with confidence level
Handles Push/View/Skip actions

The Approval Dialog

Shows you:

🟢/🟡/🔴 Confidence level • PR/check details • Commit message • Fix summary

Three options:

Push: Deploys fix, posts PR comment
View Output: Opens debug log
Skip/Dismiss: Saves to queue

Component 4: Pending Fixes Manager

CLI tool for managing the queue:


pr-pending-fixes list           # Show all pending fixes
pr-pending-fixes review         # Interactive approval workflow
pr-pending-fixes cleanup-passed # Remove fixes for passing checks
pr-pending-fixes list-ignored   # Show ignored PRs
pr-pending-fixes unignore 123   # Re-enable auto-debug for PR

Features:

Auto-cleanup when checks pass or PRs close
Periodic reminders (escalating intervals)
SketchyBar badge integration (optional)

End-to-End Flow

1. You push commits to PR #123

2. GitHub triggers Buildkite checks

3. Monitor detects failure in github-pr-tests

4. Wrapper launches Claude with PR info + build URL

5. Claude analyzes logs, identifies issue, creates commit

6. 💬 Approval dialog appears with fix summary

7. You choose: Push (→ PR comment) / View / Skip (→ queue)

8. If skipped: SketchyBar badge + periodic reminders

9. Later: pr-pending-fixes review to process queue

System Integrations

⚙️ launchd

Monitor runs every 2 min
Reminders every 10 min

📊 SketchyBar (optional)

Shows pending fix count
Click to review
Nice-to-have UI feature

🔔 Notifications

Session start/complete
Periodic reminders

🐙 GitHub

PR comments with fix details
Auto-posts when pushed

Real-World Example

Scenario: Kotlin compilation error after upgrading to 2.3

What happened:

Pushed code, tests failed
10 minutes later: notification appeared
Reviewed fix: "restore necessary !! operators in TaskDAOTest"
Saw detailed explanation about Kotlin smart cast behavior
Clicked "Push"
Fix deployed with co-authored commit + PR comment

Saved ~15 minutes of manual debugging

Key Features

Autonomous Detection Intelligent Batching Concurrency Control Human-in-the-Loop State Management Queue System Desktop Integration Auto-Cleanup Confidence Levels PR Comments

System Files


~/bin/
  watch-pr-builds.sh           # Monitor daemon
  claude-debug-wrapper.sh      # Session wrapper
  pr-pending-fixes             # Queue manager
  pr-monitor                   # Control script

~/.claude/
  pr-watch.log                 # Main log
  pr-watch-state.json          # Processed failures
  pending-fixes/               # Queue metadata
  debug-results/               # Session logs
  ignored-prs.json             # Excluded PRs

~/Library/LaunchAgents/
  com.user.claude-pr-monitor.plist
  com.user.claude-pr-reminder.plist

~/.config/sketchybar/
  plugins/claude_fixes.sh      # Badge plugin

Control Commands

Monitor control:


pr-monitor start      # Start the monitor
pr-monitor stop       # Stop the monitor
pr-monitor status     # Check status + recent activity
pr-monitor logs       # Tail the log file
pr-monitor test       # Run once for testing

Pending fixes:


pr-pending-fixes list           # List pending
pr-pending-fixes review         # Interactive review
pr-pending-fixes cleanup-passed # Auto-cleanup

Impact

⏱️ Saves ~10-20 minutes per build failure

🔄 Eliminates context switching

🧠 Reduces cognitive load

🚀 Accelerates feedback loops

Current Limitations

Single repository (all-the-things)
Buildkite-specific (doesn't handle other CI systems)
GitHub MCP has SAML issues (uses gh CLI fallback)
Not all failures are auto-fixable (complex schema changes, etc.)

Future Enhancements

🔍 Multi-repo support - Monitor multiple repositories
📊 Success metrics - Track fix accuracy and time saved
🧪 Local test running - Run tests before creating fixes
🎯 Pattern learning - Build skill library from repeated fixes
⚡ Faster detection - GitHub webhook instead of polling
🤝 Team integration - Auto-debug for team members' PRs

Technical Insights

State management is critical - Prevents duplicate work and tracks what's been processed
Concurrency limits prevent resource exhaustion - Claude sessions can be memory-intensive
Intelligent batching reduces costs - One session for multiple failures vs. N sessions
Human approval is essential - Not all Claude fixes are correct; review loop maintains trust
Auto-cleanup prevents queue bloat - Remove obsolete fixes when checks pass

Want to see it in action?

Let's walk through an example!

Live Demo: Setup

What we did:

Created PR #170396 with deliberate Kotlin syntax error
Pushed to trigger Buildkite tests
Waited for the monitor to detect the failure...

Error introduced:

open class NotFoundException(
    message: String? = null
 : Exception(message)   // Missing ) before :

Demo: Step 1 - Detection

Monitor detected the failure and started debugging:

What happened:

Monitor polled GitHub API (runs every 2 minutes)
Found buildkite/all-the-things-github-pr-tests failing
Launched Claude debug session with build URL

Demo: Step 2 - Analysis & Fix

Claude analyzed the build logs:

Fetched Buildkite logs via MCP
Identified: Kotlin error in NotFoundException.kt
Root cause: Missing ) before : Exception(message)
Created fix commit • Confidence: 🟢 HIGH

Commit created:

fix: restore missing closing parenthesis in NotFoundException class

Root cause: Previous commit deliberately introduced syntax error...

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Analysis time: ~4 minutes

Demo: Step 3 - Approval Flow

Fix saved to pending queue:

Integration activated:

SketchyBar Badge

Demo: Step 4 - Review & Push

Approval dialog with full context:

Dialog shows:

🟢 HIGH CONFIDENCE • PR/check details • Root cause summary
Actions: Push / View Output / Remind Later

Demo: Results

📊 What was accomplished:

Failure detected automatically (2 min polling)
Root cause identified via AI (~4 min)
Fix commit created and ready for review
Human approval with full context
One-click push to deploy

Time saved: ~15 minutes • Context switching: Zero

View PR: github.com/Affirm/all-the-things/pull/170396