Structured AI software delivery pipeline.

Seven agents. Optional human approval gates. One CLI. The builder never grades its own work — a separate, skeptical evaluator does.

Standalone Python CLI. Supports Anthropic, OpenAI, Ollama, and Gemini.

$ productteam
What are we building? _
Anthropic Claude OpenAI GPT-4o Ollama (free, local) Google Gemini LM Studio vLLM

Two ways to run

The interactive wizard detects what you have installed and lets you pick.

A — LOCAL AI
Free. Runs on your machine.
Powered by Ollama. No API key needed. Recommended model: gpt-oss:20b.
~20 min per pipeline step on consumer hardware. A full project takes hours.
Just install Ollama and go.
ollama pull gpt-oss:20b
productteam preflight # verify model
B — CLOUD AI
Fast. Bring your API key.
Anthropic Claude, OpenAI, or Google Gemini.
Use your API key for deeper and faster work. Incurs standard API access costs.
Paste your API key when prompted.
productteam
# Select Cloud, paste key, done

Local models are free but slower. Each pipeline step takes ~20 minutes on a 20B parameter model running on consumer hardware, so a full project takes hours. Cloud APIs are significantly faster.

Forge: Local Job Queue + Dashboard

Submit pipeline jobs from the CLI or dashboard. The daemon runs the full pipeline headlessly with auto-approve — PRD, plan, build, evaluate, document. Monitor progress and inspect logs from the dashboard. Remote gate approval is planned for a future release.

Zero infrastructure. File-based queue. Python stdlib HTTP server. No React, no build step, no npm.

# Start the daemon + dashboard
productteam forge --listen --dashboard
# Dashboard: http://localhost:7654
1
Submit
From the CLI or local dashboard
2
Daemon runs
PRD, plan, build, evaluate, document — fully headless
3
Monitor
Track progress and inspect logs from the dashboard.
4
Product ready
Code written, tests passing, docs generated. Ship it.

The builder never grades its own work.

Most AI coding tools let a single agent build something and then declare it done. That's a student grading their own exam.

ProductTeam separates builder from judge. The Builder writes code and says "ready for review." The Evaluator — a separate agent, separate prompt, skeptical by default — reads the source, runs the tests, tries to break things. It grades PASS, NEEDS_WORK, or FAIL.

If NEEDS_WORK, findings route back to the Builder automatically. Maximum 3 loops. After loop 3, the plan is wrong — not the implementation. The Builder can never ship its own code.

Other AI ToolsProductTeam
Agent self-evaluatesSeparate skeptical judge
"Done" when builder says so"Done" only when Evaluator grades PASS
State in conversation memoryState in files that persist across sessions
All agents or nothingDrop in only the skills you need
Complex setuppip install and run
One quality standardCode evaluator + design evaluator

From idea to shipped product

Seven specialized agents pass structured artifacts through a pipeline. In interactive mode, three human approval gates let you confirm intent, scope, and readiness.

PRD Writer
Product Manager
Planner
Tech Lead
max 3 loops
Builder
Engineer
Evaluator
QA Engineer
Doc Writer
Technical Writer
Ship
Done

Three approval gates (interactive mode)

In interactive mode (productteam run), the pipeline pauses at three gates so you can confirm intent, scope, and readiness. In headless mode (--auto-approve) and Forge, gates are bypassed.

Gate 1
PRD Approval
"Does this capture your intent?" Review the PRD before planning begins.
Gate 2
Sprint Approval
"Does this scope look right?" Review sprint contracts and acceptance criteria.
Gate 3
Ship Approval
"Ready to commit/push/publish?" All evaluations passed. Review and ship.

What we guarantee

These aren't marketing claims. They're architectural constraints enforced by the code.

The Doc Writer reads code. It never fabricates.

The Doc Writer is a doer stage — it reads every source file via read_file before writing documentation. If a function doesn't exist in the code, it doesn't appear in the docs. No hallucinated APIs. No invented features.

The Builder cannot ship its own code.

Only the Evaluator can grade a sprint PASS. The Builder declares "ready for review" — never "done." This is the GAN-inspired insight: separate the generator from the discriminator.

State survives crashes.

state.json is written on every state change. Crash, timeout, or Ctrl+C at any point — productteam run resumes from exactly where you left off. Passed sprints are skipped.

Your API keys are never exposed to build commands.

Sensitive environment variables (*_KEY, *_TOKEN, *_SECRET) are stripped from the subprocess environment before run_bash executes. The Builder writes Python and runs tests — it doesn't need your credentials.

Four tools. No more.

Doer agents get read_file, write_file, run_bash, list_dir. A narrow tool surface means more predictable behavior and a smaller attack surface than frameworks with dozens of tools.

Use only what you need.

Each agent is a standalone markdown skill file. Want just the Evaluator as a QA agent? Just the PRD Writer as a thinking tool? Drop in the skills you need. Skip the rest.

Cost circuit breaker. No surprise bills.

The --budget flag sets a hard dollar limit (default $2.00). When cumulative cost exceeds the limit, BudgetExceededError kills the pipeline mid-loop and saves all work to disk. Tracks cached tokens at correct rates — no blind spots.

8 specialized agents

Each skill is a markdown file. Readable, editable, replaceable.

prd-writer
Product Manager
Takes a concept, applies sensible defaults, produces a structured PRD with requirements, constraints, and success criteria.
planner
Tech Lead
Reads PRD, decomposes into sprint contracts with testable acceptance criteria. Writes sprint YAML files to disk. Never writes code.
builder
Engineer
Implements sprint contracts with production-quality code and tests. Declares "ready for review" — never "done."
ui-builder
Frontend Engineer
Specialized builder for visual work. Landing pages, dashboards, web UIs. Dark theme, responsive, WCAG AA by default.
evaluator
QA Engineer
Skeptical by default. Reads source, runs tests, verifies acceptance criteria, tries to break things. PASS / NEEDS_WORK / FAIL.
evaluator-design
Design Reviewer
Grades visual artifacts on Coherence, Originality, Craft, and Functionality. 1-5 scale. 4.0+ to pass.
doc-writer
Technical Writer
Reads every source file. Produces README, landing page, changelog with real data only. Never fabricates features.
orchestrator
Project Manager
Routes work between agents, manages build-evaluate loops (max 3), handles approval gates in interactive mode, writes handoff artifacts.

Getting started

Install and run. The wizard handles the rest.

# Install (Python 3.11+)
pip install productteam

# Launch the interactive wizard
productteam
# The wizard asks:
What are we building? a CLI tool that estimates LLM API costs

How do you want to run?
  [A] Local AI (Ollama) — free, ~20 min/step
  [B] Cloud AI — deeper and faster, requires API key
# Pipeline control
productteam run # resume
productteam recover # unstick
productteam run --auto-approve
# Forge: local job queue
productteam forge "idea"
productteam forge --listen --dashboard
productteam forge status
# Diagnostics
productteam doctor
productteam preflight # test Ollama model
productteam status
productteam test

Who this is for

ProductTeam is an opinionated, auditable idea-to-code operating system for small software teams.

Solo founders and indie hackers

You can describe a product but want structured, auditable AI execution instead of chatting with a coding assistant. ProductTeam gives you a delivery pipeline, not a conversation partner.

Small product teams

You want PRD → Sprint → Build → Evaluate → Document → Ship with optional human gates at strategic decision points. ProductTeam encodes a software delivery pipeline you can configure to your trust level.

Anyone tired of AI that grades its own homework

The evaluator loop is the difference between "the AI said it's done" and "the AI proved it works." If you've been burned by hallucinated features or rubber-stamped tests, this is for you.