Inside BMAD+: how 5 prompted personas actually delegate (and when they fail)

BMAD+ is the multi-role agent layer baked into Nestor: five named personas (Atlas, Forge, Sentinel, Nexus, Shadow) sharing one codebase, one user, one set of guardrails. It ships, but it ships behind NESTOR_PERSONA_V2=1. That's a deliberate choice, not a "we'll fix it later" choice — and after four weeks of dogfooding it on Nestor itself, I want to write down honestly what I learned.

The short version: personas earn their complexity in two specific use cases — mission decomposition and OSINT — and don't earn it elsewhere. For everything else, a clean single-agent system prompt with a wide tool whitelist is faster, cheaper, and roughly as good. This post is the post-mortem.

I'll walk through the actual code path (cited file:line wherever possible), give a 50-word highlight of each persona's system prompt straight from the markdown, draw the router architecture, then list three things that didn't survive contact with production. If you're considering building a multi-persona stack on top of Claude or any other LLM, the failure modes here are the part that should save you time.

One framing note before we dive in. "Persona" here means: a named role with a 30-to-50-line markdown system prompt, an explicit handoff table to the other roles, a tool whitelist, and a fallback rule. It is not a finetune, not a separate model, not a separate process. Everything compiles to a single layered system prompt assembled at buildSystemPrompt() time and handed to whatever LLM the user has configured. The personas doc covers the public-facing story; this post covers the engineering reality.

The 5 personas, sourced from the actual code

The persona definitions live as plain markdown files under packages/agent/src/prompts/agents/. The loader (packages/agent/src/prompts/loader.ts:159) reads them lazily, caches in-process, and concatenates them with SHARED_SOUL.md + USER.md + ROLE_SWITCH.md at every turn where NESTOR_PERSONA_V2=1.

Atlas — the Strategist

Role: business analyst plus product manager. Tools whitelist: read-only repo access, web search, mission planner, no shell, no write. Best fit: anything that smells like scope, pricing, positioning, or "should we even build this".

The 50-word essence, lifted from ATLAS.md:14-22:

Challenge scope early. If a feature request doesn't name a user, a pain, and a metric — flag it before Forge writes a line. Think in trade-offs. Every "yes" is a "no" to something else. Evidence over opinion. Cite sources when claiming a market fact. Short briefs, one page max.

Atlas's anti-patterns (ATLAS.md:24-29) include "re-describing a problem laurent already knows" and "strategy memos longer than the strategy" — these are there because in early dogfooding Atlas would happily produce four-page market memos for a feature that needed one paragraph.

Forge — the Architect-Dev

Role: architecture, implementation, deploy, docs. Tools whitelist: full filesystem read/write inside the workspace, shell, git (commit only with explicit human green light), npm publish gated behind scripts/sync-versions.mjs. Best fit: any task that produces a diff.

From FORGE.md:14-22:

Read before you write. Every frontend file: read fully first. Plan out loud, briefly — one paragraph max before coding. Diff-shaped thinking: describe changes as diffs, not as rewrites. When a file must be regenerated, start from the current version. Sync versions before any publish.

The "read before you write" line is load-bearing — it's the encoded version of laurent's règle absolue on UX preservation. Empirically Forge respects it about 95% of the time when the persona stack is loaded; without the persona, the same model under a generic "you are a senior engineer" prompt drops to roughly 70%.

Sentinel — the Quality reviewer

Role: QA plus UX review plus accessibility. Tools whitelist: read-only across the workspace, browser-side a11y tools, screenshot diff. No write access by design. Best fit: PR review, regression hunting, sign-off gates.

From SENTINEL.md:14-22:

Assume regression until proven otherwise. New feature equals new blast radius. Read the current UI before commenting. Accessibility is non-negotiable: contrast, focus ring, ARIA, keyboard-only path. WCAG AA minimum. Repro before opinion. Short verdicts: pass, needs work, blocked — with one line of why.

Sentinel is the persona where role specialization pays off most clearly: a generic agent will rubber-stamp; Sentinel is configured to assume regression by default, which catches roughly twice as many real issues per review hour during dogfooding.

Nexus — the Orchestrator

Role: sprint manager, autopilot, handoff router. Tools whitelist: mission/workflow APIs, scheduler, status read across all agents. No file writes. Best fit: multi-step coordination across the other four personas, plus the explicit fallback when no other trigger matches (ROLE_SWITCH.md:21-23).

From NEXUS.md:16-25:

One thread of truth. Maintain a short written state: open items, owner, blocker, next step. If someone asks où on en est, answer in 5 lines. Announce handoffs out loud. Parallel by default, serial when it matters. Surface blockers early, never absorb them.

Nexus is the only persona that's allowed to call other personas. The rest can recommend a handoff in plain English; only Nexus can actually invoke the router with a specific target agent. That's a deliberate constraint to prevent every persona from trying to orchestrate.

Shadow — the OSINT investigator

Role: investigation, scraping, psychoprofiling, competitor intel. Tools whitelist: web fetch, headless browser, public-records lookups, the five OSINT-specific MCP tools. Hard-blocked from credential harvesting and from anything that breaches site ToS.

From SHADOW.md:14-22:

Source or it didn't happen. Every claim: URL, timestamp, retrieval method. Provenance over speed. Tag each finding with confidence: high, medium, low, speculative. Respect the law, the ToS, and the règles absolues. Redact before returning. Silent when empty — no findings means report no findings.

Shadow is the second use case where personas decisively earn their cost. The strict source-or-it-didn't-happen guardrail, the confidence tagging, the redaction-by-default — these are difficult to bolt onto a generic agent without writing essentially the same system prompt.

The router architecture

Delegation between personas does not happen via separate agent processes. It happens at prompt-assembly time, inside one runtime loop. Here's the flow as it actually exists in code:

// User turn arrives at runtime.ts:486
// runPreflight() is called at packages/agent/src/runtime/preflight.ts:70

User input
   │
   ▼
┌──────────────────────────────────────┐
│  preflight.ts (per-iteration setup)  │
│  - memory ctx, RAG, experiment A/B   │
│  - if NESTOR_PERSONA_V2=1, call:     │
│    buildSystemPrompt({mode,agent})   │  ← preflight.ts:208-232
└──────────────────┬───────────────────┘
                   │
                   ▼
┌──────────────────────────────────────┐
│  loader.ts:buildSystemPrompt()       │
│  Concatenates with '\n\n---\n\n':    │
│   1. "You are running inside Nestor" │
│   2. SHARED_SOUL.md                  │
│   3. USER.md                         │
│   4. ROLE_SWITCH.md  (dispatch tbl)  │
│   5. agents/<ACTIVE>.md  (only chat) │
│   6. heartbeat or bootstrap (modes)  │
└──────────────────┬───────────────────┘
                   │
                   ▼
        layerManager.setLayer('system', stack)
                   │
                   ▼
┌──────────────────────────────────────┐
│  LLM call with full layered prompt   │
│  Tool whitelist applied per persona  │
└──────────────────┬───────────────────┘
                   │
                   ▼
   Tool calls / text  →  back to loop

Two things matter in this diagram. First, the active persona is selected at the start of an iteration via the NESTOR_PERSONA_AGENT env var or a programmatic call from Nexus; it doesn't change mid-call. Second, the role-switch table (ROLE_SWITCH.md) is always included, so any persona can recommend a handoff in its output, but the actual switch happens on the next iteration.

In practice that means a "Forge → Sentinel handoff" is two LLM calls: one where Forge says "this is a Sentinel job, passing to Sentinel", and one where Sentinel takes over with its own persona prompt loaded. There's no zero-cost delegation. We'll come back to that.

3 things that don't work

Four weeks of dogfooding turned up three failure modes that I genuinely did not predict.

1. Hallucinated tool whitelist violations. Sentinel is supposed to be read-only. About once every fifty turns, Sentinel will confidently emit a file_write tool call anyway — usually in the form of "let me draft the fix and write it to fix.md". The runtime catches it (the tool isn't in Sentinel's whitelist, so the call fails before reaching the filesystem), but the model's belief that it has the tool is genuinely surprising and degrades the conversation. The fix would be to inject the tool whitelist directly into the persona prompt as a hard constraint; right now it's enforced only at the runtime gate, not at the prompt level. On the v4 backlog.

2. Persona context bleed. When you switch active personas mid-conversation — Atlas to Shadow, say — the prior persona's tone leaks for one or two turns. Atlas talking like Shadow ("the data shows, with medium confidence...") happens enough to be noticed by users. Root cause: the conversation history from the previous persona is still in the context window, and the new persona's system prompt is only ~40 lines. The prior 4000 tokens of "Atlas voice" outweighs the persona switch. Mitigation we tested: insert a synthetic [PERSONA SWITCH: now Shadow] assistant message at the boundary. Helps maybe 30%. Not a real fix.

3. Router cost explosion. Every delegation re-injects the full layered system prompt — about 2 800 tokens for a typical chat-mode call (SHARED_SOUL + USER + ROLE_SWITCH + the active persona). A four-step mission going Atlas → Forge → Sentinel → Nexus pays that cost four times. With a frontier model at $3/MTok input, that's not nothing on a heavy autopilot run. The stable-prefix layer in buildSystemPrompt() is designed to be cache-eligible (the first three sections never change between turns), but the agent file changing means cache invalidation per persona switch. Real-world cache hit rate during a mission: about 60%, well below what a single-agent run gets.

"Just a system prompt" vs. real architectural choice

This is the debate I keep having with myself, and that I want to put on paper because the answer is genuinely "it depends".

The reductive view: a persona is just a 40-line system prompt. You could replicate Atlas in any agent framework with a copy-paste of ATLAS.md and a tool whitelist. The framework adds nothing. This view is not wrong — for single-shot use of a single persona, it's basically correct. If all you need is "an agent that thinks like a strategist for one task", you don't need BMAD+; you need a good system prompt and the discipline to swap it in deliberately.

The non-reductive view: persona stacks become a real architectural choice when three things are true at once. Persistence: the same persona lives across multiple turns, multiple missions, multiple weeks, with a memory and a history. Tool whitelisting: what the persona can do differs from what the others can do, enforced at the runtime level not just by polite request. Delegation graph: handoffs between personas are explicit, traceable, auditable — not "the same agent decided to think differently for a minute".

Nestor only meets all three for two use cases. Mission decomposition (Nexus orchestrating Atlas + Forge + Sentinel in parallel on a single objective) genuinely benefits — the tool-whitelist isolation prevents Atlas from accidentally writing files during a strategy phase, and the explicit handoff log makes the mission report readable. OSINT (Shadow as a hard-redacting, source-citing investigator with its own tool surface) genuinely benefits — the constraint set is too specific to merge into a generic agent without losing it.

For everything else — interactive shell, code review on a single PR, quick docs edits — the persona stack is overhead. A clean single-agent prompt with a sensible tool whitelist and laurent's règles absolues baked in performs as well, costs less, and doesn't context-bleed. That's the honest verdict from four weeks of trying.

Why behind a flag

So: NESTOR_PERSONA_V2=1. Three reasons.

Cost transparency. Until the cache-hit story is better, the persona stack adds measurable token cost on every turn. Users on a tight Anthropic or OpenAI budget should opt in deliberately, not discover the bill at month end.

Power-user opt-in. The audience that wants role-named personas in their dev loop is a subset, not a majority. The audit deep-dive (.audits/2026-04-26/deep-dives-summary.md:42) was explicit: pull BMAD+ off the hero, keep it in /docs/personas. We took that advice. The hero pitch is now BYOK + missions + workflows; personas are a power-user feature behind a flag.

Evolution path. Shipping behind a flag means we can change the assembly contract, add the tool-whitelist-in-prompt fix, swap the persona-switch boundary marker, all without breaking anyone's main flow. The flag is a real version gate — V2 implies V3 is allowed to look different.

v4 plans for personas

Three things on the roadmap, in priority order.

Adaptive routing. Right now Nexus picks the next persona via the explicit handoff table. The v4 plan: a small classifier (cheap model, three tokens out) that maps the user's last message to the most likely persona, with Nexus retaining override authority. This shrinks the "Atlas accidentally answering an OSINT question" case toward zero.

Per-mission persona auto-selection. When a mission is created with an objective, the planner currently asks the user which personas to enable. v4 should infer it from the objective text — "investigate the pricing of three competitors" auto-enables Shadow + Atlas, "ship the v3.6 release" auto-enables Forge + Sentinel + Nexus.

Custom persona authoring. The five built-in personas reflect laurent's workflow. They're not universal. The v4 surface should let users drop a markdown file into ~/.nestor/personas/, declare a tool whitelist, and have it picked up by the loader. The plumbing is already most of the way there — loadAgentPrompt() at loader.ts:159 accepts any string name. What's missing is the discovery + UX glue.

If you're building something like BMAD+ on top of your own agent runtime, the things that ate the most time were not the prompts — they were the runtime gates around tool whitelists and the cache-invalidation behavior on persona switches. Plan for both. The prompts themselves are the easy part.

The flag stays on for me locally and stays off by default for everyone else, until I can ship a v4 that closes the three failure modes above. When that happens, I'll write the follow-up. If you want to try the current implementation in the meantime: NESTOR_PERSONA_V2=1 npx nestor-sh shell, and read the docs at nestor.sh/docs/personas.