Projects Debate Chatbot (RAG)
Debate Chatbot (RAG)
Grounded Q&A over the 2019-2020 U.S. Democratic primary debate transcripts using Pinecone retrieval and Anthropic (Claude) generation, returning citations for every answer.
What it does
This app answers questions by retrieving relevant debate transcript snippets at query time, then asking Claude to respond using only those sources. The response includes citations so the user can see exactly what the answer was grounded on.
Key features
- Web chat UI at
/ - API endpoint at
POST /chat - Source citations included in responses
-
One-command ingestion into Pinecone via
scripts/ingest.py
Architecture
- Ingest transcripts into Pinecone as records with metadata (speaker/date/debate info).
-
At query time: embed the question and retrieve
top_krelevant records. - Inject retrieved sources into a prompt and ask Claude to answer with citations.
API
GET /health
Returns:
{ "status": "ok" }
POST /chat
Request body:
{
"question": "What did candidates say about Medicare for All?",
"top_k": 8
}
Response:
-
answer: Markdown-formatted answer with citations like[1] -
citations: the retrieved transcript snippets with metadata + excerpt
Quickstart (local)
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# set:
# PINECONE_API_KEY
# ANTHROPIC_API_KEY
# add dataset CSV (not committed in repo):
# debate_transcripts_v3_2020-02-26.csv
python3 scripts/ingest.py
uvicorn backend.main:app --reload --host 127.0.0.1 --port 8001
Deployment
The project includes a Dockerfile and is deployed on AWS App Runner
(health check path: /health).
Configuration
Minimum environment variables:
PINECONE_API_KEYANTHROPIC_API_KEY
Common options:
-
PINECONE_INDEX_NAME,PINECONE_NAMESPACE -
PINECONE_INDEX_HOST(recommended to avoid control plane lookup) TOP_K,MAX_CONTEXT_CHARS