Skip to content

Postfix Insights with Prometheus, Alloy, and Grafana (2026)

Prometheus, Grafana, and Grafana Alloy form a powerful metrics stack for monitoring infrastructure at scale. Postfix Insights is a mail-specific search and analytics tool over your Postfix logs. They are complementary: the metrics stack answers “that changed” (your bounce rate rose, throughput dropped); Postfix Insights answers “which messages, to whom, with what reason” (the exact domain, recipient, and DSN codes). Together they give you both aggregate visibility and message-level forensics. This guide explains the layers each tool covers and where they integrate.

Metrics & alerting
PrometheusGrafanaAlloy
Answers "that changed": the bounce rate rose, throughput dropped.
Message-level search & correlation
Postfix Insights
Answers "which messages, to whom, with what reason": the exact domain, recipient, and DSN codes.
Different layers, not competitors. The metrics stack tells you something changed; Postfix Insights tells you which messages and why.

Prometheus collects and stores time-series metrics by scraping /metrics endpoints at regular intervals. It evaluates alerting rules continuously and fires alerts when thresholds are breached. It does not visualize; it is a time-series database and rules engine.

Grafana is a visualization platform that connects to Prometheus (and many other data sources) and renders dashboards, graphs, and alerts. It reads metric history from Prometheus and lets you build custom visualizations, set notification channels, and explore data interactively.

Grafana Alloy is an open-source telemetry collector that scrapes, receives, processes, and forwards metrics, logs, traces, and profiles. It acts as a flexible middleman: you can use it to collect metrics from many sources, enrich them, and ship them to Prometheus, Grafana Cloud, or other endpoints. It uses a component-based pipeline model so each team can shape their data flow without rigid constraints.

Postfix Insights is a self-hosted FastAPI service that reads Postfix maillog, indexes it in libSQL, and exposes two surfaces:

  1. Interactive search by recipient, domain, or subject (with optional date range). Results are correlated by queue ID into per-recipient delivery records, showing raw log lines and formatted status.
  2. Delivery-health dashboard (/stats) that aggregates volume, bounce and defer rate, SLA, domain mix, slow domains, DSN breakdown, and calendar/heatmap trends.

The Prometheus/Grafana/Alloy stack solves the aggregate metrics and alerting layer. Postfix Insights solves the message-level search and correlation layer. They do not compete; they overlap strategically.

Section titled “The layer boundary: aggregate metrics vs message-level search”

Prometheus + Grafana + Alloy excel at aggregate infrastructure monitoring. You define what you want to measure (bounce rate, message throughput, queue depth), Prometheus scrapes it continuously, Grafana visualizes it, and you set alerting rules to notify when anomalies occur. This is perfect for org-wide visibility, trending, and knowing when something changed.

But aggregate metrics do not tell you which specific messages or domains caused the spike. A Prometheus alert says “bounce rate exceeded 5% in the last hour.” It does not tell you whether the bounces are concentrated in a single domain, what DSN codes are driving them, which recipients are affected, or what the root cause is. Finding those answers requires drilling down to message-level data and searching by recipient, domain, or subject.

That is the boundary Postfix Insights crosses. It provides:

  • Recipient search: find all messages to alice@example.com and see per-message delivery status, DSN, and raw log lines.
  • Domain search: find all messages to example.com and see volume, bounce/defer breakdown, slow recipients, and which specific messages are pending or failed.
  • Subject search: find messages by subject keyword and correlate them across queue IDs.
  • Per-recipient status: for each message, see whether it was delivered, deferred, or bounced, and if bounced, the exact reason (DSN code, SMTP response).
  • Queue ID tracing: follow a single queue ID from receipt through delivery to all recipients, with per-hop details.

Prometheus and Grafana have no message-level search or queue-ID correlation. Postfix Insights has no org-wide alerting engine or external data-source integration.

Postfix Insights bridges the gap because it exposes both surfaces:

Layer 1: Prometheus metrics (aggregate) Postfix Insights exposes a /metrics endpoint serving Prometheus-format metrics covering mail delivery outcomes, queue status, and security observables like TLS and DKIM. The metrics persist across restarts and aggregate across worker processes.

Your existing Prometheus can scrape Postfix Insights at /metrics just like any other exporter. These metrics flow into Grafana for trending and alerting.

Layer 2: Message-level search (correlation) Postfix Insights provides interactive search, queue-ID correlation, and per-recipient status: the layer Prometheus/Grafana cannot reach.

Layer 3: Grafana dashboards Postfix Insights ships bundled Grafana dashboards that visualize the same data you see in /stats. You can import them into your Grafana instance and monitor mail health alongside your infrastructure metrics on a unified dashboard.

In practice: Prometheus scrapes /metrics from Postfix Insights at regular intervals (e.g., every 30 seconds). Grafana visualizes those Prometheus metrics in an infrastructure-wide dashboard. When a Prometheus alert fires on bounce rate, you click through to Postfix Insights, search for the domain or time range, and drill into the exact messages and DSN codes.

The practical pattern: metrics-first alerting, message-level diagnosis

Section titled “The practical pattern: metrics-first alerting, message-level diagnosis”

A concrete workflow:

  1. Setup: Prometheus scrapes Postfix Insights /metrics. Grafana connects to Prometheus. You define an alerting rule: “bounce_rate > 5% for 10 minutes.”

  2. Alert fires: At 14:30 UTC, Postfix Insights reports bounce rate of 6% in the latest window. Prometheus evaluates the rule and it triggers. Grafana sends a Slack notification: “Postfix bounce rate alert.”

  3. Diagnosis: You open Postfix Insights (http://your-host:8080). The /stats dashboard already shows the alert-time data. You see bounce rate is 6% but only for domain xyz.com; other domains are fine. You search for “xyz.com” and set the date range to the last hour.

  4. Finding root cause: Search results show 12 messages to xyz.com, 7 bounced with DSN 5.5.2 (syntax error), 3 with 5.1.1 (bad recipient), 2 pending. You click into one of the 5.5.2 bounces and see the full log line: “550 5.5.2 Syntax error in parameters or arguments.”

  5. Action: You contact xyz.com’s postmaster to fix their mail server configuration. Meanwhile, you update your Prometheus alerting rules to be more specific: “bounce_rate for domain xyz.com” instead of global.

Postfix Insights gives you the message-level search and per-recipient detail that Prometheus metrics alone cannot provide. Prometheus gives you the aggregate trending and alerting that message-level search alone cannot provide.

Layer / CapabilityPrometheus + GrafanaPostfix Insights
Aggregate metricsYes (time-series trends, alerting rules)Yes (via /metrics endpoint)
Infrastructure-wide visibilityYes (monitor any exporter)Mail-specific only
Message-level searchNoYes (by recipient, domain, subject)
Queue-ID correlationNoYes (per-message, per-recipient status)
Per-recipient DSN breakdownNoYes
TLS/DKIM monitoringLimited (via exported metrics only)Native (TLS version, DKIM rate)
Remote logs (SSHFS)NoYes
Alerting engineYesNo
Dashboard visualizationYes (generic, flexible)Yes (mail-specific)
Setup effortModerate to high (many moving parts)Moderate (Docker, single service)

If you run Postfix and want metrics and alerting on mail health, deploy Prometheus and Grafana alongside Postfix Insights:

  1. Set up Postfix Insights: Follow the Quick start guide (Docker, 4 commands).

  2. Add Prometheus scrape target: In your Prometheus config (prometheus.yml), add a scrape job:

scrape_configs:
- job_name: 'postfix-insights'
static_configs:
- targets: ['postfix-insights-host:8080']
metrics_path: '/metrics'
scrape_interval: '30s'
  1. Connect Grafana to Prometheus: Add Prometheus as a data source in Grafana, and use the bundled Postfix Insights dashboards (available in the Postfix Insights GitHub repo) as templates.

  2. Set alerting rules: Define Prometheus alerting rules for bounce rate, queue depth, or domain-specific thresholds. Check the Postfix Insights /metrics output to see the exact metric names available (look at http://postfix-insights-host:8080/metrics):

groups:
- name: postfix-alerts
rules:
- alert: HighBounceRate
expr: <metric_name_from_/metrics> > 0.05
for: 10m
annotations:
summary: "Postfix bounce rate above 5%"
  1. Use Postfix Insights for diagnosis: When alerts fire, open Postfix Insights to search and drill into message-level details.

Both tools read the same maillog, so there is no conflict in running both.