Debate Chatbot (RAG)

What it does

This app answers questions by retrieving relevant debate transcript snippets at query time, then asking Claude to respond using only those sources. The response includes citations so the user can see exactly what the answer was grounded on.

Key features

Web chat UI at /
API endpoint at POST /chat
Source citations included in responses
One-command ingestion into Pinecone via scripts/ingest.py

Architecture

Ingest transcripts into Pinecone as records with metadata (speaker/date/debate info).
At query time: embed the question and retrieve top_k relevant records.
Inject retrieved sources into a prompt and ask Claude to answer with citations.

API

GET /health

Returns:

{ "status": "ok" }

POST /chat

Request body:

{
  "question": "What did candidates say about Medicare for All?",
  "top_k": 8
}

Response:

answer: Markdown-formatted answer with citations like [1]
citations: the retrieved transcript snippets with metadata + excerpt

Quickstart (local)

python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# set:
#   PINECONE_API_KEY
#   ANTHROPIC_API_KEY

# add dataset CSV (not committed in repo):
#   debate_transcripts_v3_2020-02-26.csv

python3 scripts/ingest.py
uvicorn backend.main:app --reload --host 127.0.0.1 --port 8001

Deployment

The project includes a Dockerfile and is deployed on AWS App Runner (health check path: /health).

Configuration

Minimum environment variables:

PINECONE_API_KEY
ANTHROPIC_API_KEY

Common options:

PINECONE_INDEX_NAME, PINECONE_NAMESPACE
PINECONE_INDEX_HOST (recommended to avoid control plane lookup)
TOP_K, MAX_CONTEXT_CHARS

Back: Fraud ML All projects