All Projects
AI · NLP · Security · API

Email Security Backend

Production email threat detection system that automatically quarantines phishing and malware before they reach the inbox.

Overview

Multi-signal threat scoring API monitoring IMAP inboxes. Combines SPF/DKIM/DMARC header analysis, URL reputation (VirusTotal, Google Safe Browsing, URLHaus), attachment hash scanning, and a calibrated NLP phishing classifier.

Key Features

  • NLP phishing classifier: TF-IDF + LinearSVC, calibrated with sigmoid — trained on labelled email datasets
  • Weighted threat score → auto-tag / quarantine / delete pipeline
  • VirusTotal + Google Safe Browsing + URLHaus URL reputation checks
  • Token-bucket rate limiter for free-tier API compliance
  • REST API with Prometheus metrics endpoint — deployed on Render + PostgreSQL

Architecture

FastAPI REST API + IMAP polling loop + PostgreSQL + multi-signal threat scoring pipeline

What I Learned

Threat scoring is inherently probabilistic, and the cost of a false positive (legitimate email deleted) is different from the cost of a false negative (phishing email delivered). Tuning the classifier meant thinking carefully about asymmetric error costs — not just accuracy.