OpenAgent

The open reference pipeline for AI agents that think before they act.

Intent → Ambiguity → Clarifier → Planner → Executor. Five typed stages. One streaming pipeline. ~4k lines you can read in an afternoon.

_{Not playing? Watch on YouTube ↗}

The product in one image

_{Each stage has a typed input and a typed output. The Pydantic schema between any two stages is your test surface — and your debug trail.}

What makes this different

"Most agents fail in one of five places.

OpenAgent is built for all five."

🧠 Five specialists, not one monolith

Intent ▸ Ambiguity ▸ Clarifier ▸ Planner ▸ Executor — each testable in isolation. When something breaks, you know exactly where.

🕸️ Asks humans last

Clarifier searches the web first, auto-fills what it can, and asks you only what it genuinely couldn't find.

One question. Not seven.

🧯 Degrades, never dies

No Exa? Skips web. No Redis? In-memory. No RAG? No problem.

Missing keys are features, not errors.

Quickstart

git clone https://github.com/OpenGraph-AI/OpenAgent.git
cd OpenAgent
pip install -r requirements.txt
cp .env.example .env                # set LLM_API_KEY at minimum
python run.py

Open http://<your-domain>:8000/static/index.html and type a fuzzy request. Watch each phase stream into the UI in real time: intent extraction, ambiguity flags, clarifying questions, the plan, and finally the executor producing the answer step-by-step.

Minimum config — one variable:

Variable	Required	Purpose
`LLM_API_KEY`	✅	Any OpenAI-compatible provider
`LLM_BASE_URL`	—	Defaults to `https://api.openai.com/v1`
`LLM_MODEL`	—	Defaults to `gpt-4o`

Optional providers (Exa for web search, Upstash for session persistence, PageIndex for RAG) are listed in .env.example. The pipeline degrades gracefully when any of them are missing — that's on purpose.

Prefer Docker?

docker compose up

The mental model

Each stage is a specialist. The output of one is the typed input of the next. If any stage misbehaves, you can swap it, mock it, or inspect it without touching the others.

The five stages

Read this like a cookbook. Every stage answers a question you'll eventually have to answer yourself:

How do I turn a blurry request into something structured? How do I know when I don't know enough? How do I ask without annoying the user? How do I turn intent into an executable plan? How do I execute without losing the thread?

Five questions. Five agents. That's the whole book.

#	Stage	In	Out	Mission
01	🧠 Intent	`str`	`IntentSchema`	Turn fuzz into a typed goal
02	❓ Ambiguity	`IntentSchema`	`AmbiguityReport`	Flag the known unknowns
03	🕸️ Clarifier	`AmbiguityReport`	`ClarifiedIntent`	Auto-resolve, ask only the rest
04	🗺️ Planner	`ClarifiedIntent`	`ExecutionPlan`	A DAG of verifiable steps
05	⚡ Executor	`ExecutionPlan`	`ExecutionResult`	Run, stream, trace to goal

1. Intent — "What is the user actually asking?"

The problem. Humans don't type goals. They type fragments, moods, half-sentences. "can you make this better" is not a specification — it's a vibe. Executing on a vibe gives you a confident-sounding wrong answer.

The job. Turn raw text into a typed object the rest of the pipeline can reason about: goal, context, constraints, output format, success criteria, and alternative interpretations. That last field is the one most tutorials skip and the one that saves you.

The mental model. Think of Intent as the function signature. Until you have it, you don't have a problem — you have a feeling.

The pattern. One LLM call, a strict schema, and a prompt that forbids prose. Parse, validate, store.

class IntentSchema(BaseModel):
    goal: str
    context: str
    constraints: list[str]
    expected_output_format: str
    success_criteria: list[str]
    alternative_interpretations: list[str]   # ← your future self will thank you
    confidence: float                         # ← triggers the next stage

Pitfalls.

Skipping alternative_interpretations — you lose the ability to catch ambiguity downstream.
Letting the model hallucinate fields by not pinning the schema.
Treating low-confidence intents as high-confidence ones. Confidence is a signal — use it.

Code: backend/agents/intent_agent.py

2. Ambiguity — "What do I not know yet?"

The problem. Even a cleanly extracted intent can be under-specified. "Write a blog post about our launch" is structured but still missing: audience, length, tone, deadline, distribution channel. An agent that steamrolls past this will produce a polished artifact no one asked for.

The job. Audit the intent along fixed dimensions — scope, audience, depth, format, deadline, domain — and flag each with a severity: low, medium, high. If any flag is medium-or-higher, set needs_clarification = true.

The mental model. This is your agent's epistemic humility layer. Its output is a list of known unknowns. Don't let the agent "figure it out" — let it admit what it can't.

The pattern.

class AmbiguityReport(BaseModel):
    flags: list[AmbiguityFlag]            # dimension, level, impact
    needs_clarification: bool
    reasoning: str

The report is a decision gate. The pipeline branches on needs_clarification, not on a gut feel.

Pitfalls.

Letting the ambiguity agent also write the clarification questions. Separate the "what's missing" from the "how do I ask" — otherwise you optimise both poorly.
Hard-coding severity thresholds in the prompt instead of leaving them explicit as fields you can tune later.

Code: backend/agents/ambiguity_agent.py

3. Clarifier — "Ask, or look it up?"

The problem. The naive fix for ambiguity is "just ask the user." Do that for every request and your agent becomes a questionnaire. The good fix: answer what the web can answer, ask the user for what only they know.

The job. For each ambiguity flag:

Derive 2–3 targeted web queries from the flag's dimension.
Search (Exa, in this repo — pluggable).
Ask the LLM: "can this question be confidently answered from these results alone?" with a confidence threshold (default 0.7).
If yes → auto-resolve, mark the source.
If no → surface to the user as a minimal, decision-oriented question.

The mental model. This is a cost-triage step. User attention is the most expensive resource your agent has. Spend it only on personal or organizational context — things that are genuinely unknowable without the user.

The pattern.

questions, search_results = await clarifier.generate_questions(ambiguity_report)
questions = await clarifier.auto_resolve_questions(questions, search_results, ambiguity_report)

unresolved = [q for q in questions.questions if not q.auto_resolved]
if unresolved:
    await send_to_user(unresolved)          # pause the pipeline
    answers = await wait_for_user_response()
else:
    answers = clarifier.build_auto_answers(questions)   # sail through

clarified = await clarifier.process_answers(ambiguity_report, answers)

Pitfalls.

Asking the user every question you could have searched for.
Setting the auto-resolve confidence threshold too low (the model will confabulate sources).
Asking four questions at once. Cap it at three. Users answer three; they abandon seven.

Code: backend/agents/clarification_agent.py

4. Planner — "What's the smallest set of steps that gets us there?"

The problem. Dropping a clarified goal into a single "do the thing" prompt gives you a brittle monolith. The model can't back up. You can't resume. You can't verify anything until the whole thing finishes — and by then, you're five paragraphs deep into the wrong answer.

The job. Turn a clarified intent into a list of numbered, dependency-aware, independently verifiable steps. Each step declares its inputs, expected output, dependencies (by step number), and a validation criterion.

The mental model. A plan is a contract. Each step is a row; the whole plan is auditable before anything runs.

The pattern.

class PlanStep(BaseModel):
    step_number: int
    description: str
    inputs: list[str]
    expected_output: str
    dependencies: list[int]       # topological execution
    validation: str               # how to know it succeeded

Dependencies let the executor topologically sort steps and, later, parallelize independent branches. Validation turns "done" into a checkable predicate instead of a feeling.

Pitfalls.

Vague steps ("Analyze the data"). If you can't say how you'd verify it, the model can't execute it.
Flat plans with no dependency graph. You'll miss parallelism opportunities and silently re-execute work.
Coupling planning and execution in one prompt. You lose the ability to inspect, edit, or cache the plan.

Code: backend/agents/planning_agent.py

4.5. Context (optional but powerful) — "Gather before you act"

The problem. An executor that reaches for tools mid-generation is slow and erratic. The model decides while generating what to search for, then context-switches. Quality drops.

The job. Before executing any step, read the whole plan and, for each step, decide what it needs: knowledge-base lookups, web searches, outputs from dependency steps. Fan out all retrievals in parallel. Attach the results to each step as a StepContext.

The mental model. Separate gathering from reasoning. Gathering is embarrassingly parallel. Reasoning is serial. Don't interleave them.

The pattern.

resource_plan = await context_agent.gather(plan, clarified_intent, session_state)
# resource_plan.step_contexts[i] contains everything step i will need

Code: backend/agents/context_agent.py

5. Executor — "Run the steps, keep the thread, prove you hit the goal"

The problem. "Execute the plan" is another vibe. A real executor has to: run steps in dependency order, pass prior outputs into dependents, keep streaming chunks to the UI, handle a step that fails without corrupting the rest, and at the end prove the deliverable actually answers the original goal.

The job. Four things:

Topologically sort steps.
For each step, build a message with: the step description, its gathered context, and the outputs of its dependencies.
Stream chunks back as the model generates, so the UI isn't frozen.
At the end, run a final-assembly pass with explicit checks: completeness, clarity, relevance, correctness, and a trace_to_goal that maps the deliverable back to the original intent.

The mental model. Execution is a loop with memory. Each iteration has a narrow, well-scoped prompt. The outer loop holds the state.

The pattern.

for step in topological_sort(plan.steps):
    ctx = resource_plan.step_contexts[step.step_number]
    deps = {d: step_results[d] for d in step.dependencies}
    result = await executor.process_step(step, ctx, deps, on_stream=send_chunk)
    step_results[step.step_number] = result

final = await executor.assemble_final(plan, list(step_results.values()))
# final includes: output, completeness_check, clarity_check,
# relevance_check, correctness_check, trace_to_goal

Pitfalls.

One giant "execute everything" prompt. You lose per-step streaming, per-step failure recovery, and the ability to retry just the broken step.
Skipping trace_to_goal. The final check is what catches a technically-correct, goal-irrelevant output.
Throwing away intermediate step results once you have the final. Keep them — that's your debug trail.

Code: backend/agents/execution_agent.py

Recipes

Small patterns that keep recurring once you start building real agents.

Stream by default, not as an afterthought. Every agent here accepts an on_stream callback. If your LLM client doesn't support streaming, wrap it so it does — even if the wrapper just flushes at the end. Your UI code should never have two paths.

Pause the pipeline like a coroutine, not a state machine. Clarification pauses between phases and resumes when the user answers. Model this with a session object that knows its current phase, not a pile of booleans. See SessionState and resume_after_clarification in backend/orchestrator/pipeline.py.

Cache the intent, not the final output. Intent is a deterministic function of the user's text + your prompt. Cache it. The final output depends on the full session including clarifications — caching it will bite you.

Use typed schemas at every boundary. Pydantic models between agents aren't ceremony — they're your test surface. A bad output gets caught at parse time, not six steps later when something dereferences a missing field.

Run all retrievals for a step in parallel. The context gatherer fans out KB, web, and prior-phase extraction with asyncio.gather. Sequential retrievals leave 3–5× latency on the table.

Keep the LLM out of control flow. The LLM decides content. Python decides flow — which phase runs, when to pause, when to retry, when to fall back. If your agent has an if tool_name == "ask_user": … branch inside the prompt, you've inverted it.

How it compares

	OpenAgent	LangGraph	CrewAI	AutoGen
Mental model	Typed pipeline	Graph of nodes	Role-playing crew	Multi-agent chat
Typed contracts between stages	✅ Pydantic	⚠️ Optional	⚠️ Loose	⚠️ Loose
Auto-resolving clarifier	✅ Built-in	❌	❌	❌
Model lock-in	None	None	None	None
Framework weight	~4k LOC, readable	Heavy	Heavy	Heavy
"Pause for user" as first-class	✅	⚠️ Via interrupts	❌	⚠️ Via prompts
Reads like a cookbook	✅ By design	⚠️ Reference docs, not narrative	❌	❌

When to pick OpenAgent. You want to understand every moving part, control each prompt, and own your agent's reasoning end-to-end — not inherit someone else's abstraction.

When to pick a framework instead. You want to ship fast without thinking about architecture, and the framework's defaults happen to match your domain.

Who this is for


🛠️ First-agent builders	Five `while True: llm()` prototypes in a drawer. None ship. Start here.
🏗️ Framework evaluators	You've read the LangGraph docs twice and still don't trust the abstractions. This is ~4k lines. Read it in an afternoon.
🧪 Production debuggers	"It's doing something weird in prod." OpenAgent tells you exactly which stage lied — with the transcript.

Where to start reading the code

If you're here to learn, open files in this order:

backend/models/schemas.py — the contracts between phases. Read this first; everything else is transformations.
backend/agents/intent_agent.py — the simplest agent. A good template for your own.
backend/agents/clarification_agent.py — the most interesting one. Auto-resolve via web search is the trick worth stealing.
backend/orchestrator/pipeline.py — how the phases are wired, paused, and resumed.
backend/agents/execution_agent.py — step-by-step execution with per-step context injection.

What's next

Typed contracts between all five stages
Anthropic-native tool use (beyond OpenAI-compatible)
Step-level retries with plan-edit capability
First-class observability (OpenTelemetry spans per stage)
Browser extension: capture user intent from any form

Roadmap subject to change. Open an issue if one of these matters to you and we'll bump it.

Contributing

Issues, patches, and hard questions are all welcome. See CONTRIBUTING.md for the short version — fork, keep the change small, smoke-test with python run.py, and open the PR with a one-paragraph why. Every stage is intentionally small enough that a PR can meaningfully change one thing at a time.

About OpenGraph.tech

OpenGraph.tech builds the infrastructure for agents that reason openly, not opaquely. This repo is our reference pipeline — the thing we run, the thing we ship against, and the thing we learn from. If you're building agents in production and want to compare notes, we'd like to hear from you.

Follow us on LinkedIn — that's where we post build notes and what shipped this week.

Leave a ⭐ if this saved you a week

This repo is free. The cookbook is free. The walkthrough is on YouTube, free. The only thing we ask back is a star — it's the one signal that tells us to write more of these, louder.

_{One click. No account prompt. It genuinely helps.}

_{Made with intent, by OpenGraph.tech.}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
api		api
backend		backend
frontend		frontend
public		public
wiki		wiki
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.py		run.py
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAgent

The product in one image

What makes this different

"Most agents fail in one of five places.

OpenAgent is built for all five."

🧠 Five specialists, not one monolith

🕸️ Asks humans last

🧯 Degrades, never dies

Quickstart

The mental model

The five stages

1. Intent — "What is the user actually asking?"

2. Ambiguity — "What do I not know yet?"

3. Clarifier — "Ask, or look it up?"

4. Planner — "What's the smallest set of steps that gets us there?"

4.5. Context (optional but powerful) — "Gather before you act"

5. Executor — "Run the steps, keep the thread, prove you hit the goal"

Recipes

How it compares

Who this is for

Where to start reading the code

What's next

Contributing

About OpenGraph.tech

Leave a ⭐ if this saved you a week

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenAgent

The product in one image

What makes this different

"Most agents fail in one of five places.

OpenAgent is built for all five."

🧠 Five specialists, not one monolith

🕸️ Asks humans last

🧯 Degrades, never dies

Quickstart

The mental model

The five stages

1. Intent — "What is the user actually asking?"

2. Ambiguity — "What do I not know yet?"

3. Clarifier — "Ask, or look it up?"

4. Planner — "What's the smallest set of steps that gets us there?"

4.5. Context (optional but powerful) — "Gather before you act"

5. Executor — "Run the steps, keep the thread, prove you hit the goal"

Recipes

How it compares

Who this is for

Where to start reading the code

What's next

Contributing

About OpenGraph.tech

Leave a ⭐ if this saved you a week

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages