Projects CitySort
CitySort
A runnable AI MVP to ingest, OCR, classify, validate, and route city documents with human-in-the-loop review.
Problem
City teams handle high volumes of documents that need to be extracted, classified, validated, and routed quickly. Manual triage is slow and inconsistent, and you still need auditability and safe fallbacks when OCR/LLM providers are unavailable.
What it is
CitySort AI MVP is a runnable pipeline that can ingest city documents, extract fields, classify document type, validate required fields, and route items into department queues with a human-in-the-loop review workflow.
Document lifecycle
Documents move through explicit states so operators always know
what happened and what to do next:
ingested, routed,
needs_review, approved,
corrected, failed.
AI providers (Anthropic + fallback)
OCR and classification are provider-switched. If credentials are missing or calls fail, the system automatically falls back to local processing so the pipeline keeps moving.
-
OCR providers:
local(native text + PDF parsing),azure_di(Azure Document Intelligence) -
Classification providers:
rules(keyword model),openai(JSON classification),anthropic(JSON classification) -
Confidence gate for auto-routing via
CITYSORT_CONFIDENCE_THRESHOLD(default 0.82) -
Always-human-review doc types via
CITYSORT_FORCE_REVIEW_DOC_TYPES
Human-in-the-loop review
The dashboard supports operator review actions end-to-end, with a review pane that shows extracted fields, validation issues, corrected JSON fields, text preview, and per-document audit history.
- Reprocess selected documents to apply the latest rules/providers without re-upload
-
Audit trail API per document:
/api/documents/{id}/audit
Rules and routing without code changes
Rules are configurable at runtime. The UI includes a rules editor plus a form-based rules builder (types, keywords, required fields, routing) so most users never need to touch JSON.
-
Rules config APIs:
GET/PUT /api/config/rules,POST /api/config/rules/reset - Department queues + analytics APIs for routing visibility
Platform operations (enterprise-style controls)
CitySort includes platform APIs for connectivity checks, manual deployments, invitations, and API key lifecycle management.
-
Connectivity:
GET /api/platform/connectivity,POST /api/platform/connectivity/check -
Deployments:
POST /api/platform/deployments/manual,GET /api/platform/deployments -
Invitations:
POST /api/platform/invitations,GET /api/platform/invitations -
API keys:
POST /api/platform/api-keys,GET /api/platform/api-keys,POST /api/platform/api-keys/{id}/revoke - Platform summary:
GET /api/platform/summary
Implemented capabilities
- FastAPI backend with SQLite persistence and audit events
-
Bulk database import API (SQLite/PostgreSQL/MySQL) using a
SELECTquery - Durable async job queue (worker thread + persisted job state in SQLite)
- Job APIs: list (
GET /api/jobs) + detail - Auth/RBAC APIs (bootstrap admin, login, user management)
- Web dashboard: upload, queue monitoring, analytics, review
- Dashboard topbar: Connect, Manual Deploy, Invite, API Key
- Unit tests for core pipeline logic
Key files
backend/app/main.py: API routes + orchestrationbackend/app/pipeline.py: pipeline core-
backend/app/providers.py: Azure/OpenAI/Anthropic integrations backend/app/auth.py: auth, hashing, RBACbackend/app/jobs.py: durable background workerbackend/app/deployments.py: deploy triggers-
backend/app/document_tasks.py: reusable task logic backend/app/rules.py: runtime rules + persistencebackend/app/config.py: env-based config-
backend/tests/test_platform_api.py: platform API tests frontend/index.html: dashboard shellfrontend/app.v2.js: dashboard behaviordeploy/k8s/: Kubernetes manifestsdocker-compose.yml: local orchestrationscripts/run_demo.sh: end-to-end demo runner
Run locally
cd citysort
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
cp .env.example .env
# Default .env values run fully local.
Safer AI rollout profile (recommended)
The system is designed so you can start rules-first, require human review, and only gradually enable external providers when connectivity and security controls are in place.
-
Start with
CITYSORT_CLASSIFIER_PROVIDER=rulesand local OCR, then enable Anthropic for JSON classification when ready -
Keep a high confidence gate via
CITYSORT_CONFIDENCE_THRESHOLD -
Force review for sensitive doc types via
CITYSORT_FORCE_REVIEW_DOC_TYPES -
Require auth/RBAC:
CITYSORT_REQUIRE_AUTH=trueand strong secrets - Use platform connectivity checks before routing real traffic to providers
- Keep audit trails on by default so every decision is traceable
Highlights
Provider switching + fallback
Swap OCR/classification providers (rules/OpenAI/Anthropic) and fall back to local processing when external calls fail.
Durable jobs + reprocess
Async worker with persisted job state, plus a reprocess action to apply updated rules/providers without re-upload.
Ops controls + audit trail
Connectivity checks, deploy history, API keys, invitations, and per-document audit events for traceability.
What I would show in a live demo
- Upload a document: OCR, classification, validation, routing
- Human review: corrections, audit history, approval
- Switch providers + reprocess to apply updated rules