Multi-Agent Epistemic Research System

Truth has
two sides.

Two AI agents independently research opposing sides of any question using live web sources, structured belief graphs, and semantic contradiction detection. Not another chatbot — a belief accumulation engine.

Run a Debate → See how it works
50–80
Real sources per debate
3
Collaborative agents
0.41
Honest clarity on contested topics

Watch beliefs form in real time

Every piece of content ingested is parsed into typed, confidence-weighted belief nodes. The system knows what it doesn't know.

debate-engine-two.vercel.app — Round 2 / 6
Topic: Are EVs good or bad for the planet?  ·  PRO-EV vs ANTI-EV
PRO-EV — Director: lifecycle emissions
Nature Energy — EV lifecycle carbon study 2024
IEA — Global EV Outlook 2024 report
MIT Climate Portal — battery manufacturing emissions
0.54
ANTI-EV — Director: mining footprint
Science Direct — lithium mining ecosystem impact
Reuters — cobalt supply chain analysis
Yale E360 — battery recycling limitations
0.47
⚡ Semantic contradiction detected
"EVs produce 50–70% less lifecycle CO₂ than ICE vehicles on average grids" ↔ "Manufacturing phase emissions negate EV benefits in coal-heavy grids for up to 8 years"

How a debate runs

Six coordinated phases turn a question into calibrated epistemic understanding — not just a list of arguments.

01
Config Bootstrap
GPT-4o generates two named sides, a goal statement, 4 investigable knowledge gaps, and seed search queries — before any research begins.
02
Shared Namespace
PRO and ANTI agents write independently to the same belief graph. The SDK fuses outputs automatically — no manual diffing needed.
03
Information-Gain Loops
Each round, the judge reads ranked "moves" — next research actions by expected epistemic value. The Director translates these into Exa search queries.
04
Contradiction Detection
The SDK semantically detects when two beliefs from completely independent sources negate each other — and suppresses clarity accordingly.
05
Early Exit Logic
Research stops when: director says sufficient, clarity plateaus with contradictions, or both sides reach high clarity. Never runs longer than needed.
06
Grounded Verdict
The judge calls beliefs.before() — a structured brief injected into GPT-4o. The verdict is grounded in the belief graph, not raw web text.

What makes this different

Traditional LLM research accumulates text. This system accumulates understanding.

🧬
Evolving Knowledge Base
Save debates to your personal belief system. Cumulative stats track beliefs, contradictions, and clarity across all your research sessions over time.
Semantic Contradiction Detection
Finds when two sourced claims from independent publications negate each other — without either source referencing the other. Pure semantic reasoning.
📊
Calibrated Uncertainty
Clarity scores are epistemic readiness, not quality scores. A 0.41 on a genuinely contested topic is correct behavior — the system knows it doesn't know.
📄
Research Report Export
Generate comprehensive, fully-cited reports in Academic, Executive, or Technical style. Export as PDF or DOCX with one click after any debate.
🌙
Dark / Light / Sepia Themes
Three carefully designed themes with full CSS variable system. Your preference is remembered across sessions.
🔒
Rate Limited by Design
5 debates per hour per IP. Protects API costs while keeping the tool freely accessible. Middleware-level enforcement, not application logic.

vs Other research tools

Most tools accumulate text. This one accumulates structured understanding with calibrated uncertainty.

Capability Belief Engine ChatGPT Deep Research Perplexity Elicit
Live web sources ~
Adversarial agent structure
Belief graph (typed, weighted)
Semantic contradiction detection
Information-gain driven research ~
Calibrated uncertainty output ~
Persistent knowledge base
PDF / DOCX report export ~ ~

The Clarity Score

Clarity is not a quality score. It's epistemic readiness — computed across four independent channels.

clarity = f( decisionResolution, knowledgeCertainty, coherence, coverage )
decisionResolution
Goals met by the research. How many seed objectives have been addressed by the evidence collected.
knowledgeCertainty
Proportion of high-confidence beliefs. Beliefs above 0.70 threshold vs total belief count.
coherence
Inverse of contradictions. Each semantic conflict suppresses this channel — contested topics stay low by design.
coverage
Knowledge gaps closed. Seeded unknowns that have been resolved by evidence during the research loop.
A clarity of 0.41 after 53 ingested sources on a genuinely contested topic is correct behavior — the coherence channel stays suppressed when real epistemic conflict exists. The system knows it doesn't know.

System architecture

Three agents, one shared namespace, one belief graph. The SDK handles fusion, contradiction detection, and move ranking automatically.

User Question
Any debatable topic → GPT-4o config bootstrap
generateDebateConfig()
GPT-4o generates: two sides, goal node, 4 gap nodes, seed queries for rounds 1–2
Shared Namespace — thinkn.ai beliefs SDK
PRO Agent
beliefs.after(webContent)
ANTI Agent
beliefs.after(webContent)
Judge Agent
beliefs.read() → before()
debateDirector()
GPT-4o reads world.moves[] ranked by information gain → writes Exa queries
Exa Neural Search
Live web retrieval · 50–80 sources per debate · deduplication across all rounds
GPT-4o Verdict
judge.before() injects belief graph brief · streamed to UI via SSE
Built with
thinkn.ai beliefs SDK Exa Neural Search GPT-4o Next.js 15 React 19 Tailwind CSS 4 Server-Sent Events jsPDF docx.js Vercel

Stop accumulating text.
Start accumulating truth.

Run your first debate in under a minute. No account needed. Results stream live as agents research in real time.

Open Debate Engine → View on GitHub