ProductTeam — Structured AI software delivery pipeline

Choose Your Path

Two ways to run

The interactive wizard detects what you have installed and lets you pick.

A — LOCAL AI

Free. Runs on your machine.

Powered by Ollama. No API key needed. Recommended model: gpt-oss:20b.
~20 min per pipeline step on consumer hardware. A full project takes hours.
Just install Ollama and go.

ollama pull gpt-oss:20b
productteam preflight # verify model

B — CLOUD AI

Fast. Bring your API key.

Anthropic Claude, OpenAI, or Google Gemini.
Use your API key for deeper and faster work. Incurs standard API access costs.
Paste your API key when prompted.

productteam
# Select Cloud, paste key, done

Local models are free but slower. Each pipeline step takes ~20 minutes on a 20B parameter model running on consumer hardware, so a full project takes hours. Cloud APIs are significantly faster.

Batch Execution

Forge: Local Job Queue + Dashboard

Submit pipeline jobs from the CLI or dashboard. The daemon runs the full pipeline headlessly with auto-approve — PRD, plan, build, evaluate, document. Monitor progress and inspect logs from the dashboard. Remote gate approval is planned for a future release.

Zero infrastructure. File-based queue. Python stdlib HTTP server. No React, no build step, no npm.

# Start the daemon + dashboard
productteam forge --listen --dashboard
# Dashboard: http://localhost:7654

1

Submit

From the CLI or local dashboard

2

Daemon runs

PRD, plan, build, evaluate, document — fully headless

3

Monitor

Track progress and inspect logs from the dashboard.

4

Product ready

Code written, tests passing, docs generated. Ship it.

The builder never grades its own work.

Most AI coding tools let a single agent build something and then declare it done. That's a student grading their own exam.

ProductTeam separates builder from judge. The Builder writes code and says "ready for review." The Evaluator — a separate agent, separate prompt, skeptical by default — reads the source, runs the tests, tries to break things. It grades PASS, NEEDS_WORK, or FAIL.

If NEEDS_WORK, findings route back to the Builder automatically. Maximum 3 loops. After loop 3, the plan is wrong — not the implementation. The Builder can never ship its own code.

Other AI Tools	ProductTeam
Agent self-evaluates	Separate skeptical judge
"Done" when builder says so	"Done" only when Evaluator grades PASS
State in conversation memory	State in files that persist across sessions
All agents or nothing	Drop in only the skills you need
Complex setup	pip install and run
One quality standard	Code evaluator + design evaluator

The Pipeline

From idea to shipped product

Seven specialized agents pass structured artifacts through a pipeline. In interactive mode, three human approval gates let you confirm intent, scope, and readiness.

PRD Writer

Product Manager

→

Planner

Tech Lead

→

max 3 loops

Builder

Engineer

↔

Evaluator

QA Engineer

→

Doc Writer

Technical Writer

→

Ship

Done

Human in the Loop

Three approval gates (interactive mode)

In interactive mode (productteam run), the pipeline pauses at three gates so you can confirm intent, scope, and readiness. In headless mode (--auto-approve) and Forge, gates are bypassed.

Gate 1

PRD Approval

"Does this capture your intent?" Review the PRD before planning begins.

Gate 2

Sprint Approval

"Does this scope look right?" Review sprint contracts and acceptance criteria.

Gate 3

Ship Approval

"Ready to commit/push/publish?" All evaluations passed. Review and ship.

Commitments

What we guarantee

These aren't marketing claims. They're architectural constraints enforced by the code.

The Doc Writer reads code. It never fabricates.

The Doc Writer is a doer stage — it reads every source file via read_file before writing documentation. If a function doesn't exist in the code, it doesn't appear in the docs. No hallucinated APIs. No invented features.

The Builder cannot ship its own code.

Only the Evaluator can grade a sprint PASS. The Builder declares "ready for review" — never "done." This is the GAN-inspired insight: separate the generator from the discriminator.

State survives crashes.

state.json is written on every state change. Crash, timeout, or Ctrl+C at any point — productteam run resumes from exactly where you left off. Passed sprints are skipped.

Your API keys are never exposed to build commands.

Sensitive environment variables (*_KEY, *_TOKEN, *_SECRET) are stripped from the subprocess environment before run_bash executes. The Builder writes Python and runs tests — it doesn't need your credentials.

Four tools. No more.

Doer agents get read_file, write_file, run_bash, list_dir. A narrow tool surface means more predictable behavior and a smaller attack surface than frameworks with dozens of tools.

Use only what you need.

Each agent is a standalone markdown skill file. Want just the Evaluator as a QA agent? Just the PRD Writer as a thinking tool? Drop in the skills you need. Skip the rest.

Cost circuit breaker. No surprise bills.

The --budget flag sets a hard dollar limit (default $2.00). When cumulative cost exceeds the limit, BudgetExceededError kills the pipeline mid-loop and saves all work to disk. Tracks cached tokens at correct rates — no blind spots.

The Team

8 specialized agents

Each skill is a markdown file. Readable, editable, replaceable.

prd-writer

Product Manager

Takes a concept, applies sensible defaults, produces a structured PRD with requirements, constraints, and success criteria.

planner

Tech Lead

Reads PRD, decomposes into sprint contracts with testable acceptance criteria. Writes sprint YAML files to disk. Never writes code.

builder

Engineer

Implements sprint contracts with production-quality code and tests. Declares "ready for review" — never "done."

ui-builder

Frontend Engineer

Specialized builder for visual work. Landing pages, dashboards, web UIs. Dark theme, responsive, WCAG AA by default.

evaluator

QA Engineer

Skeptical by default. Reads source, runs tests, verifies acceptance criteria, tries to break things. PASS / NEEDS_WORK / FAIL.

evaluator-design

Design Reviewer

Grades visual artifacts on Coherence, Originality, Craft, and Functionality. 1-5 scale. 4.0+ to pass.

doc-writer

Technical Writer

Reads every source file. Produces README, landing page, changelog with real data only. Never fabricates features.

orchestrator

Project Manager

Routes work between agents, manages build-evaluate loops (max 3), handles approval gates in interactive mode, writes handoff artifacts.

60 Seconds

Getting started

Install and run. The wizard handles the rest.

# Install (Python 3.11 or 3.12)
pip install productteam

# Launch the interactive wizard
productteam

# The wizard asks:
What are we building? a CLI tool that estimates LLM API costs

How do you want to run?
[A] Local AI (Ollama) — free, ~20 min/step
[B] Cloud AI — deeper and faster, requires API key

# Pipeline control
productteam run # resume
productteam recover # unstick
productteam run --auto-approve

# Forge: local job queue
productteam forge "idea"
productteam forge --listen --dashboard
productteam forge status

# Diagnostics
productteam doctor
productteam preflight # test Ollama model
productteam status
productteam test

Fit

Who this is for

ProductTeam is an opinionated, auditable idea-to-code operating system for small software teams.

Solo founders and indie hackers

You can describe a product but want structured, auditable AI execution instead of chatting with a coding assistant. ProductTeam gives you a delivery pipeline, not a conversation partner.

Small product teams

You want PRD → Sprint → Build → Evaluate → Document → Ship with optional human gates at strategic decision points. ProductTeam encodes a software delivery pipeline you can configure to your trust level.

Anyone tired of AI that grades its own homework

The evaluator loop is the difference between "the AI said it's done" and "the AI proved it works." If you've been burned by hallucinated features or rubber-stamped tests, this is for you.

Structured AI software delivery pipeline.