Projects
[FastAPI, Celery, PostgreSQL, PGVector, MinIO, PaddleOCR-VL, vLLM, Nginx, Docker]
- Architected full migration off AWS (Textract, Bedrock, Lambda, DynamoDB, S3) for an enterprise AI platform processing 1,000 engineering PDFs/day.
- Replaced 14 AWS services with a single Docker host; delivered risk-free migration plan: $132k → $7.7k/yr (94% reduction), 14-day payback.
[Go, llama.cpp, turboquant, SQLite, OpenAI API]
- Single Go binary for self-hosted AI model serving — OpenAI-compatible /v1/ API, web UI with HuggingFace search, systemd setup, self-update.
- Dual backend (turboquant for SafeTensors, llama-server for GGUF); RAM-aware quant selection reads /proc/meminfo to pick optimal tier at load time.
[TypeScript, Node.js, Vite, Transformers.js, e5-small-v2]
- npm CLI that converts markdown docs to static sites with client-side semantic search — no backend, no server, deploys anywhere.
- Generates 384-dim embeddings at build time into vector-db.json; browser runs cosine similarity via Transformers.js entirely offline.
[GitHub Actions, Python, Gemini 2.5 Flash]
- Composite GitHub Action — diffs PRs or recent commits against all markdown, opens a PR when docs are factually stale (renamed APIs, changed flags, removed features).
- Zero infrastructure, human-in-the-loop (never auto-merges), ignores style/typos. Works on any repo with a one-file workflow drop-in.
[Python, FastAPI, llama.cpp, SQLCipher, ChromaDB]
- Air-gapped AI journal — SQLCipher-encrypted, zero telemetry, on-device RAG with ChromaDB and quantized GGUF models. No data leaves the machine.