Performance Enhancer Agent — Design

Date: 2026-04-22 Status: Approved for planning Artifact location: docs/performance-enhancer/performance-enhancer.md (committed); docs/performance-enhancer/runs/ (generated, gitignored)

Purpose

A generic, reusable Claude Code agent prompt that:

  1. Restarts the Rails dev server with clean logs.
  2. Logs in as a seeded test user.
  3. Drives a user-specified, replayable set of UI scenarios via MCP Playwright.
  4. Captures per-scenario server-side and client-side performance metrics.
  5. Analyzes the logs, identifies hotspots, researches root causes in the codebase.
  6. Emits a fix-plan prompt (another .md) that — when handed to a fresh Claude session — implements the fixes AND re-executes the same scenarios to produce a full professional before/after comparison report.

The agent itself does not implement fixes. It produces a handoff prompt. The fix prompt enforces comparison as a completion gate.

Non-goals

  • Not a production load/benchmark tool. Dev env only.
  • Not a replacement for APM in production.
  • Not meant to be run in CI. Human-initiated.
  • Does not propose architectural changes beyond perf fixes.

Inputs

The only user-supplied input is the ## Scenarios section of performance-enhancer.md. Users edit this section before each run.

Grammar:

  • One step per line, imperative.
  • Reference routes by path, not human description.
  • Form submissions include the data as an inline hash.
  • Click targets use the visible button/link text.

Example:

- Visit /dashboard
- Visit /goals
- Visit /goals/new; fill {title: "Perf test goal", target_date: "2026-12-31"}; submit
- On the resulting /goals/:id page, click "Edit"; change title to "Perf test goal edited"; submit
- Visit /debts

The scenario list is the contract between the before run and the after run — both must execute the same list verbatim for the comparison to be meaningful.

Architecture

Tools

  • MCP Playwright (already available in Claude Code sessions with the plugin). Uses browser_navigate, browser_click, browser_fill_form, browser_type, browser_snapshot, etc.
  • Bash for process control, log truncation, log parsing.
  • Rails dev server started via bin/dev (foreman + Procfile.dev; web on port 3004).
  • Existing bullet gemlog/bullet.log.
  • Existing log/development.log → server-side request metrics.

No new dependencies. No new gems. No JS test toolchain.

Credentials

Hardcoded in the .md: [email protected] / 123456. This user is created by db/seeds.rb. Users must have seeded the DB before running the agent.

File layout

docs/performance-enhancer/
├── performance-enhancer.md                 (committed; the generic agent)
└── runs/                                   (gitignored; all generated)
    ├── <ts>-before.json                    (raw per-scenario metrics)
    ├── <ts>-findings.md                    (hotspot analysis + root causes)
    ├── <ts>-plan.md                        (the fix prompt)
    ├── <ts>-after.json                     (written by the plan)
    └── <ts>-comparison.md                  (written by the plan)

<ts> is a single timestamp per run (format YYYY-MM-DD-HHMMSS) shared across all five files. This links a before run to its after run deterministically.

.gitignore will add docs/performance-enhancer/runs/.

Execution flow

Phase 1 — Setup (clean slate)

  1. Kill any running bin/dev, foreman, or rails server processes.
    • Graceful first (SIGTERM), then force (SIGKILL) if needed.
    • Verify port 3004 is free before proceeding.
  2. Truncate log/development.log and log/bullet.log (: > log/development.log).
  3. Start bin/dev in the background.
  4. Poll http://localhost:3004 until it returns a non-5xx response (timeout after 60s).
  5. Open a Playwright browser session via MCP, navigate to the sign-in page, authenticate as [email protected] / 123456.

Phase 2 — "Before" run

For each step in ## Scenarios, in order:

  1. Record a step-start timestamp (ms precision).
  2. Execute the step via Playwright (browser_navigate, browser_click, browser_fill_form, etc., as the step dictates).
  3. Capture Playwright navigation time (time-to-load-event for navigations; click-to-next-paint for in-page clicks).
  4. Record step-end timestamp.
  5. Tail log/development.log for all Started / Completed request pairs whose Rails timestamp falls in [step-start, step-end]. For each request extract:
    • HTTP method, path, status
    • Total time (ms)
    • View render time (ms)
    • ActiveRecord time (ms) and query count (parsed from ActiveRecord: Xms (N queries))
    • Allocations (ms)
  6. Collect any lines from log/bullet.log whose timestamp falls in the same window.
  7. Append a structured record to runs/<ts>-before.json.

before.json / after.json schema

{
  "timestamp": "2026-04-22-143000",
  "scenario_source": [
    "Visit /dashboard",
    "Visit /goals",
    "Visit /goals/new; fill {title: \"Perf test goal\", target_date: \"2026-12-31\"}; submit"
  ],
  "scenarios": [
    {
      "index": 1,
      "step": "Visit /goals",
      "playwright_ms": 412,
      "requests": [
        {
          "method": "GET",
          "path": "/goals",
          "status": 200,
          "total_ms": 387,
          "view_ms": 120,
          "db_ms": 245,
          "query_count": 42,
          "allocations_ms": 22
        }
      ],
      "bullet_warnings": [
        "N+1 detected: Goal => [:category] in GoalsController#index"
      ]
    }
  ]
}

Identical schema in both runs is mandatory — the comparison depends on it.

scenario_source is a verbatim copy of the user's ## Scenarios list at the time of the before run. This is the immutable contract the after run must replay. If performance-enhancer.md is edited between runs, the plan prompt uses before.json.scenario_source as the source of truth, not the live .md.

Phase 3 — Analysis & artifact generation

  1. Build a per-scenario summary table from before.json.
  2. Flag hotspots using editable thresholds defined at the top of performance-enhancer.md:
    • total_ms > 500
    • query_count > 30
    • any Bullet warning
    • db_ms > 0.6 * total_ms
    • repeated identical queries within a single request (loose dup detection)
  3. For the top hotspots, read the Rails code responsible (controller action, model, view partials) to identify root causes: missing includes, .count on relations in views, unscoped loads, fragment caching opportunities, etc.
  4. Optionally consult context7 for framework-specific guidance on any non-obvious fix.
  5. Write runs/<ts>-findings.md:
    • Summary table (scenario × metric)
    • Ordered hotspot list with root-cause notes and suggested fixes
    • Per-hotspot: file references with line numbers
  6. Write runs/<ts>-plan.md (see next section).

Phase 4 is not executed by this agent. It is defined as instructions inside the plan.md the agent generates.

The generated plan.md (fix prompt)

The generic performance-enhancer.md holds this template. The agent fills in placeholders at generation time and writes the concrete plan as runs/<ts>-plan.md.

# Performance fix plan — <timestamp>

## Context
<PLACEHOLDER: one-paragraph summary of the app area(s) touched, populated by the agent>

## Before-run artifacts (do not modify)
- Scenarios (contract):    runs/<ts>-before.json → `scenario_source` (frozen copy; authoritative)
- Raw metrics:             docs/performance-enhancer/runs/<ts>-before.json
- Findings:                docs/performance-enhancer/runs/<ts>-findings.md

## Hotspots to fix (ordered by impact)
<PLACEHOLDER: ordered list. Each entry:
  N. <route or action> — <metric that flagged it> — <root cause, with file:line refs> — <proposed fix>
>

## Implementation rules
- Follow AGENTS.md patterns (no app/services, user-owned models, Pundit with user).
- Prefer eager-loading, counter caches, scopes, and fragment caching over new abstractions.
- No behavior changes — perf only. Existing tests must still pass.
- Run `bin/rails test` (or the project's test command) before the verification gate.

## Verification gate (REQUIRED — do not claim completion without this)

After implementing the fixes you MUST:

1. Stop any running bin/dev processes.
2. Truncate log/development.log and log/bullet.log.
3. Start bin/dev; wait until :3004 responds.
4. Log in as [email protected] / 123456 via MCP Playwright.
5. Execute the EXACT scenarios from runs/<ts>-before.json `scenario_source`,
   in order, verbatim. No substitutions, no skipping. Do NOT read the live
   performance-enhancer.md — it may have been edited since the before run.
6. Capture metrics using the schema defined in performance-enhancer.md
   (same shape as <ts>-before.json).
7. Write runs/<ts>-after.json.
8. Write runs/<ts>-comparison.md containing:
   - Header: date, scenario count, before-run <ts>, after-run <ts>.
   - Per-scenario side-by-side table:
     | metric | before | after | delta | delta % | regression? |
     Rows: total_ms, db_ms, query_count, view_ms, allocations_ms, playwright_ms.
   - Aggregate totals row across all scenarios.
   - Regression flag on any metric that got worse by >5%.
   - Per-hotspot status: fixed / improved / unchanged / regressed, with the numbers.
   - "Remaining issues" section listing hotspots not addressed, with justification.
   - Bullet warnings: list of warnings present before but absent after; list of any new ones.
9. Do not report completion until runs/<ts>-comparison.md exists and the numbers are in it.

Structure of the committed performance-enhancer.md

Top-to-bottom:

  1. Title + one-paragraph purpose.
  2. How to invoke. "Open in Claude Code, edit the Scenarios section below, say 'run the performance enhancer'."
  3. Prerequisites (terse): MCP Playwright plugin active; db:seed run so [email protected] exists; port 3004 free.
  4. Credentials: [email protected] / 123456 (one line).
  5. Hotspot thresholds — editable constants (request ms, query count, db-fraction, etc.).
  6. ## Scenarios — the user-edited section, with a worked example and grammar notes.
  7. ## Agent instructions — do-not-edit. Phases 1–3 in imperative form with exact commands:
    • Phase 1: kill/poll/truncate/start commands.
    • Phase 2: log-parsing hints (regex for Started, Completed, ActiveRecord: lines), timestamp-window strategy, JSON write contract.
    • Phase 3: hotspot rules, findings.md structure, when to read code for root causes.
  8. JSON schema block for before.json / after.json (shared by both runs).
  9. Plan-prompt template — the full block from the previous section with <PLACEHOLDER> markers.
  10. Appendix — troubleshooting:
    • "port 3004 is held" → lsof -ti :3004 | xargs kill -9
    • "bullet.log is empty" → verify BULLET_ENABLED in config/environments/development.rb
    • "a scenario step fails mid-run" → record the failure with full error, continue with remaining steps, flag in findings.md; do not abort the run

Error handling

  • Server fails to boot within 60s → abort; surface the last ~50 lines of development.log to the user.
  • Login fails → abort; instruct user to check db:seed ran and report.
  • A single scenario step fails → record the error in the step's record, continue with the next step. The findings report includes a "scenario failures" section.
  • Log parsing misses a request → store requests: [] for that step with a parse_note field; don't fabricate numbers.

Testing strategy

Not unit-tested — this is a prompt artifact. Validation is end-to-end: run the agent against a known page with a known N+1, confirm:

  1. before.json captures the expected query count and Bullet warning.
  2. findings.md flags the N+1 with correct file:line.
  3. plan.md contains the verification gate block verbatim.
  4. Running the plan prompt (by hand) on a fixed branch produces a comparison.md that shows the N+1 resolved.

This smoke test is documented in the spec but not automated.

Open questions

None at design time. Any ambiguities surface during the writing-plans step.

Out of scope (explicitly)

  • LCP / web vitals (server-side perf is the target; can be added later per page if needed).
  • Running the agent in CI.
  • Cross-run history / trendlines beyond a single before/after pair.
  • Production APM integration.
  • Automated fix implementation — that is the plan prompt's responsibility, by design.