AWS → Self-Hosted

Enterprise AI platform. 94 % cost cut. 14-day pay-back.

The Problem

Client spending $132 k / year on AWS (Textract, Bedrock, Lambda, Dynamo, S3, etc.) to chew through 1 000 engineering PDFs a day. OCR alone = 61 % of the bill.

3-Day Diagnosis

Textract → PaddleOCR-VL on a consumer GPU (12 pages / min, $0)
Bedrock Claude → vLLM + 14 B open model (75-90 % quality, $0)
14 other AWS services → single Ubuntu box + Docker

Risk-Free Route

Cloud GPU try-out – RunPod / Lambda, $700 / mo × 2 mo = $1.4 k
Prove quality – A / B vs Claude on real docs; abort if < 90 %
Buy box only if happy – 3× RTX 5060 Ti, 128 GB RAM, $4.9 k
Parallel run 2 weeks – DNS flip when metrics green

What Stays the Same

API shape, S3-compatible buckets, Postgres instead of Dynamo, JWT instead of Cognito. Users barely notice.

What Drops

Period	Old Bill	New Bill	Savings
Year 1	$132 k	$7.7 k	94 %
Year 2+	$132 k	$2.8 k	98 %

Tech Stack (TL;DR)

FastAPI + Celery, PostgreSQL + PGVector, MinIO, PaddleOCR-VL, vLLM, Nginx, Docker. One 650 W box, $71 / mo power.

If It Breaks

Spare GPU & PSU on shelf, nightly encrypted backups to external drive, feature-flagged fallback to AWS in 5 min.

Next Step

Send 50 ugly PDFs, pick a cloud GPU, validate in 2 weeks. No hardware risk.

Download Full Plan

Download complete migration plan (Markdown) - Includes detailed technical specifications, code samples, deployment instructions, and operational procedures.

Document Version: 1.0 | Last Updated: 2026-01-07
Prepared By: William Welsh | [email protected] | https://wwel.sh

< Back to Portfolio