Projects OfflineAI
OfflineAI
A production-grade offline-first iOS app running small language models completely on-device, with privacy-first design and intelligent resource management.
What it is
OfflineAI is a privacy-first iOS app that runs small language models completely offline. There are no cloud dependencies and no API costs, while still supporting a usable, fast experience on real devices.
Features
- 100% offline: all inference runs locally on device.
- Zero API costs: no cloud dependencies.
- Complete privacy: AES-256 encryption and no telemetry.
- Intelligent resource management: dynamic quantisation (4-bit/8-bit), on-demand model downloads, LRU model caching with automatic unloading, memory pressure monitoring, and battery-aware processing.
- Context management: semantic chunking with embedding-based relevance.
- Offline-first sync (optional): optional encrypted cloud sync.
Architecture
The app is split into a few core components that keep inference reliable under real-world constraints (memory, battery, and device performance variability).
Model management
Lazy loading with an LRU cache, automatic unloading on memory pressure, preloading during idle + charging, and support for multiple quantisation levels.
Device profiling
Memory detection/monitoring, battery state tracking, automatic model selection, and performance benchmarking.
Inference engine
Streaming token generation, batch processing for efficiency, and embedding generation.
Data layer
SwiftData local persistence with AES-256-GCM encryption and per-conversation encryption keys.
Supported models
Phi-3-Mini (recommended)
- Parameters: 3.8B
- Context: 4096 tokens
- Quantisations: Q4_0 (~2.1 GB), Q8_0 (~3.9 GB)
- Best for: modern devices (iPhone 12+)
TinyLlama (fallback)
- Parameters: 1.1B
- Context: 2048 tokens
- Quantisation: Q4_0 (~0.6 GB)
- Best for: older devices or low memory
What I would demo
- On-device model selection (Phi-3 vs TinyLlama)
- Quantisation switching (4-bit/8-bit) based on device profile
- Memory pressure handling and LRU cache eviction
- Encrypted local storage for conversations