Structural
discipline
for agentic
development.
A Codex Desktop App plugin that orchestrates multi-stage agentic work through a manifest-driven pipeline. Human-approval gates where decisions actually live. Instrumented audit stages that catch the drift class which silently ships the wrong product.
exists
Agentic work fails in predictable ways.
The agent improvises past the spec. It claims tests pass without running them on a fresh dependency set. It merges in-flight work while a scope question is still open. It picks architectural decisions silently rather than surfacing them for review.
agent-pipeline-codex enforces a structural pattern that catches every
one of those - not by hoping the agent is well-behaved, but by
making the well-behaved path the only path. Every stage produces
a durable artifact under .agent-runs/<run-id>/;
every gate produces a one-question prompt that cannot be bypassed.
- iManifest gate - every run begins with a human-approved contract naming goal, allowed paths, forbidden paths, non-goals, expected outputs, and definition-of-done. Fuzzy manifests fail strict schema validation at the gate, before they cascade.
- iiDirector-decisions - the researcher surfaces open questions; the human picks; choices are recorded as binding constraints before the planner runs. Architectural decisions never happen in chat.
- iiiStanding invariants - the cumulative-drift gap that lets feature-scoped manifests silently ship stale top-of-file content is closed by doc-currency invariants the drift-detector checks on every run.
thirty seconds
One command. Then orient. Then run.
Install once across all projects. From any project root - empty
directory, fresh clone, or a working repo - run
the pipeline-init skill. It asks one question (PRD path, repo
URL, or description paragraph), produces a project-orientation
summary, and scaffolds .pipelines/,
scripts/policy/, and a starter AGENTS.md
if one is not present.
# Install once, across all Codex Desktop App projects > git clone https://github.com/scottconverse/agent-pipeline-codex.git ~/agent-pipeline-codex-plugin > python scripts/verify_plugin_release.py --live # The live gate checks deterministic install state, then repeats fresh Codex probes. # Then in any project root > Use pipeline-init for this project. > Use intake for <plain-English task>. > Use new-run for feature <task-slug>. > ... fill in the manifest the plugin scaffolds ... > Use validate-manifest for <run-id>. > Use run-pipeline for feature <run-id>. > Use show-run-status for <run-id>.
Codex skills
Eight Codex skills cover the full surface: agent-pipeline orients and routes; pipeline-init onboards a project, intake drafts starting artifacts from a plain-English task, new-run initializes a blank run manifest, validate-manifest preflights the manifest schema, run-pipeline orchestrates the eleven stages with resume-from-log, show-run-status gives read-only run orientation, and audit-init scaffolds the v0.3 dual-AI audit-handoff discipline.
Subagent isolation
Each agent stage runs as an isolated Codex subagent with no parent-conversation memory. The orchestrator passes a role file plus run context as the entire prompt. The judge layer (v0.4 opt-in) intercepts proposed executor tool calls and routes high-risk actions through a context-isolated judge subagent before they execute.
history
Each minor release adds a layer the previous version did not catch.
The layers stack. Every version's failure-mode coverage is preserved on upgrade.
Phase 0 audits the release workflow before any product code is touched. Phase 2 rehearses the release sequence locally on fresh state. The CI workflow becomes the execution mechanism, not the discovery mechanism.
audit-init scaffolds the three-artifact discipline for projects where one AI implements and another audits. Implementer runs a hostile 5-lens self-audit before push; auditor runs a 10-section verification protocol; both share an in-repo drift-patterns catalog that grows over time.
Opt-in. Every executor tool call is classified by risk (read_only / reversible_write / external_facing / high_risk). Dangerous actions route to a context-isolated judge subagent with four verdicts - allow, block, revise, escalate.
Critic reads every artifact adversarially in fresh context. Drift-detector compares manifest contract to assembled state and, at v0.5.1, enforces standing doc-currency invariants regardless of manifest scope. Auto-promote scores six structural conditions and collapses the manager gate when clean. Pre-edit fact-forcing in the executor catches blast-radius surprises before they hit the verifier.
Run status now reports skipped malformed log lines, decision ledgers are tested through the production writer and validator together, git classification rules have focused negative cases, and public CI badges make source-only verification visible.
Every product run now carries a scope lock tied to the canonical release plan. Policy checks block missing locks, future-rung paths, contradictory docs, and commit subjects that do not belong to the locked rung. Prompt-plan conflicts stop before edits with SCOPE_CONFLICT.
A run-local directive binds exact manifest and scope-lock content, plan assertions, manager assertions, author provenance, and a SHA-256 hash into run.log; mismatch or tampering falls back to human review.
Optional plugin hooks add active-run context, warn on stale skill names, preflight risky tools, deny unsafe approval requests, add corrective context after failed tools, and continue invalid stops when plugin_hooks is enabled.
The new intake skill writes intake.md, draft manifest.yaml, draft scope-lock.yaml, and missing-question notes, then stops before validation or agent work.
Active runs now keep memory/events.jsonl, turns.jsonl, decisions.jsonl, open_loops.jsonl, memory_probe.log, and handoff_current.md. SessionStart injects the compact handoff beside active-run context.
local-rehearsal.md Step 2.5 requires a branch-vs-merge-base rerun when verify-release.sh fails or hangs. verifier.md points to the same rule when verifier tests diverge from the implementation report.
space
What the plugin will not do - by construction.
The contract is enforced as much by what is forbidden as by what is required. The hard rules below are baked into the role files, the policy stage, and the orchestrator. They are not toggles.
- Propose autonomous mode. Every gate is explicit.
- Silently expand scope. The policy stage blocks every change outside
allowed_paths. - Skip tests. Never skip tests is enforced as a project-level hard rule.
- Promote a run when the verifier marked any criterion as not met.
- Merge in-flight work while a halt is active - including cleanup PRs that "seem obviously safe."
- Stop, defer, skip push, skip CI, or write a stopping handoff because of an unverified risk.
baked in
Defaults reflect failures, not preferences.
Every default in this plugin earned its place by being the fix for a failure that cost real time on a real project. The following are notable enough to call out.
director-decisions.md, not in chat. The researcher surfaces them; the human picks in writing.verify_plugin_release.py --live proves a fresh Codex process sees the plugin and the namespaced agent-pipeline-codex:* skills.Stability
Beta. The structural pattern has shipped across multiple projects and absorbed real failure receipts into its defaults. Semver applies once 0.1.x-beta drops.