Project Overview

Cloud Resume Challenge

A cloud resume implementation on GCP and Cloudflare, automated with Terraform and Ansible, and operated with full Datadog observability.

GCP Terraform Ansible Cloudflare Firestore Datadog

"Designed, built, automated, operated, and debugged end-to-end with practical tradeoffs in reliability, security, cost, and maintainability."

Quick Facts

Deployment Terraform + Ansible + CI/CD
Availability 99% (30-day SLO)
Cost $3.52/mo
Ownership Full End-to-End
System Architecture Diagram
Fig. 01 System Topology

Production Architecture at a Glance

check_circle

Complete Ownership

Provisioned GCP and Cloudflare with Terraform, configured the host with Ansible, and deployed a containerized Flask/Gunicorn app.

shield

Secure System Flow

Cloudflare proxy and WAF in front of Caddy TLS, Cloudflare-only origin firewall rules, and IAM-based Firestore access.

speed

Edge-First Strategy

Traffic enters through Cloudflare for DDoS mitigation and caching, then routes through Caddy to the app while preserving real client IPs.

Day-2 Operations

Operational Maturity & Observability

Monitoring is focused on useful signals: synthetic /healthz checks, APM traces, structured JSON logs, dashboards, monitors, and SLO tracking in Datadog.

99%
30-day availability SLO
24h
Visitor counter dedupe window
Cloud Monitoring Dashboard
Live System Telemetry

Datadog Dashboard, APM & Logs

Synthetic Monitoring

A Datadog Synthetic API test checks /healthz so uptime monitoring never increments the visitor counter.

Dashboard & SLO Tracking

Monitors, service health, host metrics, traffic, logs, and synthetic performance are consolidated into one daily-use operations view.

Log Aggregation

Structured JSON logs and trace correlation make it fast to move from a symptom to the exact request that caused it.

Engineering Highlights

settings_suggest

Automation

Terraform provisions infrastructure, Ansible configures the host and deploys the container, and GitHub Actions drives the release workflow.

01 / CI.CD
vpn_lock

Security & Edge

Cloudflare adds DDoS protection, WAF filtering, and edge caching while Caddy automates TLS and forwards trusted client IPs to the app. Fail2ban protects SSH against brute-force attacks.

02 / SEC
monetization_on

Cost-Aware

24-hour in-memory dedup cache cuts Firestore operations to preserve free-tier limits and reduce database cost.

03 / OPS
terminal

Troubleshooting

Diagnosed and resolved severe production iowait that was stalling deploys and degrading availability on a resource constrained host.

04 / DEBUG
Case Study Highlight

The Production Incident: Diagnosing Severe iowait

Once the stack went live on an e2-micro host, severe iowait started stalling deploys and threatening availability. The eventual root cause wasn’t where I expected, and it changed how I think about constraints and diagnostics.

Read the Lessons Learned launch
SYSTEM_ALERT: ELEVATED_IOWAIT
[WARN] Deploy latency spiking; host responsiveness degraded.
[WARN] Page timeouts increasing!
[ERROR] Host SSH connection has been lost.......reconnection attempt [15/30]