CitySort AI MVP - Case Study

Problem

City teams handle high volumes of documents that need to be extracted, classified, validated, and routed quickly. Manual triage is slow and inconsistent, and you still need auditability and safe fallbacks when OCR/LLM providers are unavailable.

What it is

CitySort AI MVP is a runnable pipeline that can ingest city documents, extract fields, classify document type, validate required fields, and route items into department queues with a human-in-the-loop review workflow.

Document lifecycle

Documents move through explicit states so operators always know what happened and what to do next: ingested, routed, needs_review, approved, corrected, failed.

AI providers (Anthropic + fallback)

OCR and classification are provider-switched. If credentials are missing or calls fail, the system automatically falls back to local processing so the pipeline keeps moving.

OCR providers: local (native text + PDF parsing), azure_di (Azure Document Intelligence)
Classification providers: rules (keyword model), openai (JSON classification), anthropic (JSON classification)
Confidence gate for auto-routing via CITYSORT_CONFIDENCE_THRESHOLD (default 0.82)
Always-human-review doc types via CITYSORT_FORCE_REVIEW_DOC_TYPES

Human-in-the-loop review

The dashboard supports operator review actions end-to-end, with a review pane that shows extracted fields, validation issues, corrected JSON fields, text preview, and per-document audit history.

Reprocess selected documents to apply the latest rules/providers without re-upload
Audit trail API per document: /api/documents/{id}/audit

Rules and routing without code changes

Rules are configurable at runtime. The UI includes a rules editor plus a form-based rules builder (types, keywords, required fields, routing) so most users never need to touch JSON.

Rules config APIs: GET/PUT /api/config/rules, POST /api/config/rules/reset
Department queues + analytics APIs for routing visibility

Platform operations (enterprise-style controls)

CitySort includes platform APIs for connectivity checks, manual deployments, invitations, and API key lifecycle management.

Connectivity: GET /api/platform/connectivity, POST /api/platform/connectivity/check
Deployments: POST /api/platform/deployments/manual, GET /api/platform/deployments
Invitations: POST /api/platform/invitations, GET /api/platform/invitations
API keys: POST /api/platform/api-keys, GET /api/platform/api-keys, POST /api/platform/api-keys/{id}/revoke
Platform summary: GET /api/platform/summary

Implemented capabilities

FastAPI backend with SQLite persistence and audit events
Bulk database import API (SQLite/PostgreSQL/MySQL) using a SELECT query
Durable async job queue (worker thread + persisted job state in SQLite)
Job APIs: list (GET /api/jobs) + detail
Auth/RBAC APIs (bootstrap admin, login, user management)
Web dashboard: upload, queue monitoring, analytics, review
Dashboard topbar: Connect, Manual Deploy, Invite, API Key
Unit tests for core pipeline logic

Key files

backend/app/main.py: API routes + orchestration
backend/app/pipeline.py: pipeline core
backend/app/providers.py: Azure/OpenAI/Anthropic integrations
backend/app/auth.py: auth, hashing, RBAC
backend/app/jobs.py: durable background worker
backend/app/deployments.py: deploy triggers
backend/app/document_tasks.py: reusable task logic
backend/app/rules.py: runtime rules + persistence
backend/app/config.py: env-based config
backend/tests/test_platform_api.py: platform API tests
frontend/index.html: dashboard shell
frontend/app.v2.js: dashboard behavior
deploy/k8s/: Kubernetes manifests
docker-compose.yml: local orchestration
scripts/run_demo.sh: end-to-end demo runner

Run locally

cd citysort
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt

cp .env.example .env
# Default .env values run fully local.

Safer AI rollout profile (recommended)

The system is designed so you can start rules-first, require human review, and only gradually enable external providers when connectivity and security controls are in place.

Start with CITYSORT_CLASSIFIER_PROVIDER=rules and local OCR, then enable Anthropic for JSON classification when ready
Keep a high confidence gate via CITYSORT_CONFIDENCE_THRESHOLD
Force review for sensitive doc types via CITYSORT_FORCE_REVIEW_DOC_TYPES
Require auth/RBAC: CITYSORT_REQUIRE_AUTH=true and strong secrets
Use platform connectivity checks before routing real traffic to providers
Keep audit trails on by default so every decision is traceable

Highlights

Provider switching + fallback

Swap OCR/classification providers (rules/OpenAI/Anthropic) and fall back to local processing when external calls fail.

Durable jobs + reprocess

Async worker with persisted job state, plus a reprocess action to apply updated rules/providers without re-upload.

Ops controls + audit trail

Connectivity checks, deploy history, API keys, invitations, and per-document audit events for traceability.

What I would show in a live demo

Upload a document: OCR, classification, validation, routing
Human review: corrections, audit history, approval
Switch providers + reprocess to apply updated rules

Next: OfflineAI