Projects CitySort

CitySort

A runnable AI MVP to ingest, OCR, classify, validate, and route city documents with human-in-the-loop review.

Problem

City teams handle high volumes of documents that need to be extracted, classified, validated, and routed quickly. Manual triage is slow and inconsistent, and you still need auditability and safe fallbacks when OCR/LLM providers are unavailable.

What it is

CitySort AI MVP is a runnable pipeline that can ingest city documents, extract fields, classify document type, validate required fields, and route items into department queues with a human-in-the-loop review workflow.

Document lifecycle

Documents move through explicit states so operators always know what happened and what to do next: ingested, routed, needs_review, approved, corrected, failed.

AI providers (Anthropic + fallback)

OCR and classification are provider-switched. If credentials are missing or calls fail, the system automatically falls back to local processing so the pipeline keeps moving.

  • OCR providers: local (native text + PDF parsing), azure_di (Azure Document Intelligence)
  • Classification providers: rules (keyword model), openai (JSON classification), anthropic (JSON classification)
  • Confidence gate for auto-routing via CITYSORT_CONFIDENCE_THRESHOLD (default 0.82)
  • Always-human-review doc types via CITYSORT_FORCE_REVIEW_DOC_TYPES

Human-in-the-loop review

The dashboard supports operator review actions end-to-end, with a review pane that shows extracted fields, validation issues, corrected JSON fields, text preview, and per-document audit history.

  • Reprocess selected documents to apply the latest rules/providers without re-upload
  • Audit trail API per document: /api/documents/{id}/audit

Rules and routing without code changes

Rules are configurable at runtime. The UI includes a rules editor plus a form-based rules builder (types, keywords, required fields, routing) so most users never need to touch JSON.

  • Rules config APIs: GET/PUT /api/config/rules, POST /api/config/rules/reset
  • Department queues + analytics APIs for routing visibility

Platform operations (enterprise-style controls)

CitySort includes platform APIs for connectivity checks, manual deployments, invitations, and API key lifecycle management.

  • Connectivity: GET /api/platform/connectivity, POST /api/platform/connectivity/check
  • Deployments: POST /api/platform/deployments/manual, GET /api/platform/deployments
  • Invitations: POST /api/platform/invitations, GET /api/platform/invitations
  • API keys: POST /api/platform/api-keys, GET /api/platform/api-keys, POST /api/platform/api-keys/{id}/revoke
  • Platform summary: GET /api/platform/summary
Implemented capabilities
  • FastAPI backend with SQLite persistence and audit events
  • Bulk database import API (SQLite/PostgreSQL/MySQL) using a SELECT query
  • Durable async job queue (worker thread + persisted job state in SQLite)
  • Job APIs: list (GET /api/jobs) + detail
  • Auth/RBAC APIs (bootstrap admin, login, user management)
  • Web dashboard: upload, queue monitoring, analytics, review
  • Dashboard topbar: Connect, Manual Deploy, Invite, API Key
  • Unit tests for core pipeline logic
Key files
  • backend/app/main.py: API routes + orchestration
  • backend/app/pipeline.py: pipeline core
  • backend/app/providers.py: Azure/OpenAI/Anthropic integrations
  • backend/app/auth.py: auth, hashing, RBAC
  • backend/app/jobs.py: durable background worker
  • backend/app/deployments.py: deploy triggers
  • backend/app/document_tasks.py: reusable task logic
  • backend/app/rules.py: runtime rules + persistence
  • backend/app/config.py: env-based config
  • backend/tests/test_platform_api.py: platform API tests
  • frontend/index.html: dashboard shell
  • frontend/app.v2.js: dashboard behavior
  • deploy/k8s/: Kubernetes manifests
  • docker-compose.yml: local orchestration
  • scripts/run_demo.sh: end-to-end demo runner
Run locally
cd citysort
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt

cp .env.example .env
# Default .env values run fully local.

Safer AI rollout profile (recommended)

The system is designed so you can start rules-first, require human review, and only gradually enable external providers when connectivity and security controls are in place.

  • Start with CITYSORT_CLASSIFIER_PROVIDER=rules and local OCR, then enable Anthropic for JSON classification when ready
  • Keep a high confidence gate via CITYSORT_CONFIDENCE_THRESHOLD
  • Force review for sensitive doc types via CITYSORT_FORCE_REVIEW_DOC_TYPES
  • Require auth/RBAC: CITYSORT_REQUIRE_AUTH=true and strong secrets
  • Use platform connectivity checks before routing real traffic to providers
  • Keep audit trails on by default so every decision is traceable

Highlights

Provider switching + fallback

Swap OCR/classification providers (rules/OpenAI/Anthropic) and fall back to local processing when external calls fail.

Durable jobs + reprocess

Async worker with persisted job state, plus a reprocess action to apply updated rules/providers without re-upload.

Ops controls + audit trail

Connectivity checks, deploy history, API keys, invitations, and per-document audit events for traceability.

What I would show in a live demo

  1. Upload a document: OCR, classification, validation, routing
  2. Human review: corrections, audit history, approval
  3. Switch providers + reprocess to apply updated rules

Next