Seven agents. Optional human approval gates. One CLI. The builder never grades its own work — a separate, skeptical evaluator does.
Standalone Python CLI. Supports Anthropic, OpenAI, Ollama, and Gemini.
Choose Your Path
The interactive wizard detects what you have installed and lets you pick.
gpt-oss:20b.Local models are free but slower. Each pipeline step takes ~20 minutes on a 20B parameter model running on consumer hardware, so a full project takes hours. Cloud APIs are significantly faster.
Batch Execution
Submit pipeline jobs from the CLI or dashboard. The daemon runs the full pipeline headlessly with auto-approve — PRD, plan, build, evaluate, document. Monitor progress and inspect logs from the dashboard. Remote gate approval is planned for a future release.
Zero infrastructure. File-based queue. Python stdlib HTTP server. No React, no build step, no npm.
Most AI coding tools let a single agent build something and then declare it done. That's a student grading their own exam.
ProductTeam separates builder from judge. The Builder writes code and says "ready for review." The Evaluator — a separate agent, separate prompt, skeptical by default — reads the source, runs the tests, tries to break things. It grades PASS, NEEDS_WORK, or FAIL.
If NEEDS_WORK, findings route back to the Builder automatically. Maximum 3 loops. After loop 3, the plan is wrong — not the implementation. The Builder can never ship its own code.
| Other AI Tools | ProductTeam |
|---|---|
| Agent self-evaluates | Separate skeptical judge |
| "Done" when builder says so | "Done" only when Evaluator grades PASS |
| State in conversation memory | State in files that persist across sessions |
| All agents or nothing | Drop in only the skills you need |
| Complex setup | pip install and run |
| One quality standard | Code evaluator + design evaluator |
The Pipeline
Seven specialized agents pass structured artifacts through a pipeline. In interactive mode, three human approval gates let you confirm intent, scope, and readiness.
Human in the Loop
In interactive mode (productteam run), the pipeline pauses at three gates so you can confirm intent, scope, and readiness. In headless mode (--auto-approve) and Forge, gates are bypassed.
Commitments
These aren't marketing claims. They're architectural constraints enforced by the code.
The Doc Writer is a doer stage — it reads every source file via read_file before writing documentation. If a function doesn't exist in the code, it doesn't appear in the docs. No hallucinated APIs. No invented features.
Only the Evaluator can grade a sprint PASS. The Builder declares "ready for review" — never "done." This is the GAN-inspired insight: separate the generator from the discriminator.
state.json is written on every state change. Crash, timeout, or Ctrl+C at any point — productteam run resumes from exactly where you left off. Passed sprints are skipped.
Sensitive environment variables (*_KEY, *_TOKEN, *_SECRET) are stripped from the subprocess environment before run_bash executes. The Builder writes Python and runs tests — it doesn't need your credentials.
Doer agents get read_file, write_file, run_bash, list_dir. A narrow tool surface means more predictable behavior and a smaller attack surface than frameworks with dozens of tools.
Each agent is a standalone markdown skill file. Want just the Evaluator as a QA agent? Just the PRD Writer as a thinking tool? Drop in the skills you need. Skip the rest.
The --budget flag sets a hard dollar limit (default $2.00). When cumulative cost exceeds the limit, BudgetExceededError kills the pipeline mid-loop and saves all work to disk. Tracks cached tokens at correct rates — no blind spots.
The Team
Each skill is a markdown file. Readable, editable, replaceable.
60 Seconds
Install and run. The wizard handles the rest.
Fit
ProductTeam is an opinionated, auditable idea-to-code operating system for small software teams.
You can describe a product but want structured, auditable AI execution instead of chatting with a coding assistant. ProductTeam gives you a delivery pipeline, not a conversation partner.
You want PRD → Sprint → Build → Evaluate → Document → Ship with optional human gates at strategic decision points. ProductTeam encodes a software delivery pipeline you can configure to your trust level.
The evaluator loop is the difference between "the AI said it's done" and "the AI proved it works." If you've been burned by hallucinated features or rubber-stamped tests, this is for you.