Measure delivery SLA (time-to-deliver) from Postfix logs
Postfix logs the time each message took to deliver in the delay= field (total seconds from arrival to delivery event) and a breakdown in delays=a/b/c/d. This guide shows how to read these fields, interpret the breakdown to locate bottlenecks, and compute a delivery SLA from them.
The delay field: total time to deliver
Section titled “The delay field: total time to deliver”Every successful SMTP or LMTP delivery line in the maillog includes a delay= value showing the total number of seconds from message arrival to the delivery event:
Jun 17 14:32:18 mailserver postfix/smtp[12345]: ABC123DEF456: to=<user@example.com>, relay=mail.example.com[203.0.113.45], delay=1.23, delays=0.45/0.12/0.15/0.51, dsn=2.0.0, status=sentHere delay=1.23 means 1.23 seconds elapsed from when the message arrived at Postfix to when it was successfully delivered to the recipient’s mail server. This is the real user-facing SLA metric: how long did it take for the message to reach its destination.
The delay= field is measured in seconds with decimal precision. It starts from the moment a mail submission agent (MSA) or local process hands off the message to Postfix, and ends when the SMTP protocol successfully transmits the message to the next mail hop (the recipient’s MX server, a relay, or a local mailbox). For most outbound mail, you will see delay= values in the Postfix smtp process logs (external deliveries via SMTP) or lmtp logs (local mailbox or LMTP relay). Locally delivered messages to the Postfix local service may have lower delay values because they bypass remote connection overhead.
The delays breakdown: where time went
Section titled “The delays breakdown: where time went”The delays=a/b/c/d field breaks the total delay into four components:
delays= components show where a delivery's time went. Total here is 1.23s.-
a (0.45 in the example): Time before the queue manager. This includes submission queue processing and initial message intake by Postfix. Low values are normal (usually under 0.1 seconds).
-
b (0.12): Time in the queue manager. This is the message sitting in the active queue, waiting for a delivery attempt. High values indicate queueing backlog or scheduling delays; low values mean the queue manager released the message quickly.
-
c (0.15): Connection setup time. This includes DNS lookups, SMTP/LMTP handshake (HELO/EHLO), and TLS negotiation. High values reveal slow DNS, unresponsive mail servers, or TLS overhead.
-
d (0.51): Message transmission time. The time to send the actual message body to the remote server. High values indicate large messages or slow upload bandwidth.
Computing delivery SLA
Section titled “Computing delivery SLA”A delivery SLA is the percentage of delivered messages that meet a time threshold. The threshold depends on your use case and mail characteristics. If your organization sends primarily external mail, you might aim for:
- 90% of messages delivered within 5 seconds
- 95% of messages delivered within 30 seconds
- 99% delivered within 60 seconds
If you send mail to slow domains or operate a mail relay that processes high volume, your thresholds might be more conservative (e.g., 95% within 60 seconds). The key is to establish a baseline and monitor for regressions.
To compute SLA from logs, count successfully delivered messages (those with status=sent) and measure what fraction have delay= under your threshold. Here is a basic approach:
# Count messages delivered within 5 secondsgrep "status=sent" maillog | awk -F'delay=' '{print $2}' | awk '{print $1}' | awk '$1 < 5 {c++} END {print "Under 5s:", c}'
# Total successfully delivered (for percentage)grep "status=sent" maillog | wc -lIf 900 of 1000 delivered messages had delay < 5, your 5-second SLA is 90%.
For a more robust calculation, collect delivery metrics over a fixed period (e.g., hourly or daily) and compute the percentile distribution. The 50th percentile (median) shows typical delivery time. The 95th percentile shows how bad a slow delivery is; if P95 is 15 seconds, most users experience delivery in under 15 seconds, but some wait longer. The max delay shows the worst-case outlier. Together, these give you a picture of your delivery health.
A worked example: fast vs. slow deliveries
Section titled “A worked example: fast vs. slow deliveries”Fast delivery: under 1 second
Section titled “Fast delivery: under 1 second”Jun 17 14:32:18 mailserver postfix/smtp[12345]: ABC123DEF456: to=<user@fastdomain.com>, relay=mail.fastdomain.com[203.0.113.45], delay=0.87, delays=0.04/0.01/0.18/0.64, dsn=2.0.0, status=sent- Total delay: 0.87 seconds
- Breakdown: a=0.04 (normal intake), b=0.01 (no queue delay), c=0.18 (DNS + HELO), d=0.64 (message upload)
- What this means: The recipient’s mail server was reachable, responsive, and fast. This is ideal delivery behavior.
Slow delivery: high connection setup (c spike)
Section titled “Slow delivery: high connection setup (c spike)”Jun 17 14:33:45 mailserver postfix/smtp[12345]: DEF456GHI789: to=<user@slowdomain.com>, relay=slowdomain-mx.example.com[203.0.113.99], delay=18.92, delays=0.03/0.05/12.34/6.50, dsn=2.0.0, status=sent- Total delay: 18.92 seconds
- Breakdown: a=0.03 (normal), b=0.05 (no queue), c=12.34 (connection setup), d=6.50 (message send)
- What this means: The spike is in c (connection setup). This is likely DNS latency, a slow SMTP response, or TLS negotiation overhead. The slowdomain.com mail server took 12 seconds to set up a connection. This is operationally significant: the user waits 19 seconds for a delivery confirmation, and queue buildup happens when many messages target slow domains.
High backlog: b spike
Section titled “High backlog: b spike”Jun 17 14:35:20 mailserver postfix/smtp[12345]: GHI789JKL012: to=<user@example.org>, relay=mail.example.org[203.0.113.88], delay=52.18, delays=0.02/45.50/1.20/5.46, dsn=2.0.0, status=sent- Total delay: 52.18 seconds
- Breakdown: a=0.02 (normal), b=45.50 (queue), c=1.20 (setup), d=5.46 (send)
- What this means: The message sat in the queue for 45 seconds before delivery was attempted. This indicates the mail server is under load, or max parallel connections are reached. Increasing
smtp_connection_cache_time_limitorsmtp_max_idle_timemay help.
Deferred messages and total time-to-deliver
Section titled “Deferred messages and total time-to-deliver”A critical caveat: if a message is deferred (temporarily rejected, then retried later), the delay= on the final successful delivery line includes all time from the original arrival to the successful delivery, including the defer wait between attempts.
Jun 17 14:32:00 mailserver postfix/smtp[12345]: ABC123: to=<user@flaky.com>, relay=mail.flaky.com[203.0.113.50], delay=0, delays=0/0/0/0, dsn=4.4.2, status=deferredJun 17 14:37:05 mailserver postfix/smtp[12345]: ABC123: to=<user@flaky.com>, relay=mail.flaky.com[203.0.113.50], delay=305.12, delays=304.8/0.1/0.15/0.07, dsn=2.0.0, status=sentThe first attempt deferred at 14:32:00. The second attempt succeeded at 14:37:05 (5 minutes later). The final delay=305.12 includes the full 5-minute span from arrival to delivery. The large component (a) value (304.8 seconds) reflects the combined time before queue manager activity on the successful attempt, which includes the defer wait. The connection setup (c) and transmission (d) on the successful attempt were fast (0.15 + 0.07 seconds).
When computing SLA, this behavior is important: slow deliveries often include defer time. A message with delay=305 seconds counts toward your SLA metrics as a slow delivery, which correctly reflects the user experience (the message took 5+ minutes to arrive). First-attempt latency is harder to extract from logs but would exclude the deferral loop and show only the direct connection and transmission times.
Why delivery SLA matters
Section titled “Why delivery SLA matters”Delivery speed is user-visible and actionable. Unlike abstract queue metrics (number of pending messages, bytes in queue), delivery SLA is a concrete, user-facing measure. When a user sends a message, they expect confirmation within seconds, not minutes. Tracking your organization’s delivery SLA helps you spot real problems and communicate their scope to stakeholders.
Slow domains: If 80% of messages to example.com take > 30 seconds, the domain’s mail server has a bottleneck. This could be slow DNS servers (component c spike), an overloaded or misconfigured MX, or network issues on the path. You can flag this domain in your monitoring, alert on it, and investigate with the domain operator. Some mail admins will optimize DNS or upgrade infrastructure if you show them data.
Connection setup spikes: A cluster of high c values reveals DNS, TLS, or network issues on the path to a recipient’s server. If component c is consistently 5+ seconds for a domain, the issue is not your server or queue; it is the outbound connection. Checking postfix/tlsmgr logs and running system dig queries against the recipient’s DNS servers can confirm whether DNS is slow. TLS handshake delays may indicate the recipient’s server is CPU-bound or the network is congested.
Queueing backlog: High b values (component b > 1 second) mean your own mail server is under load or hitting concurrency limits. The message is ready to send but waiting in the active queue. This is actionable: scale up server resources, increase default_process_limit or smtp_connection_cache_time_limit, or identify what is consuming your SMTP process slots. If b is consistently high, your server’s delivery throughput is throttled.
User experience: End users and downstream systems rely on mail delivery speed. If your organization sends automated alerts or notifications, slow delivery degrades the user experience. A 1-minute delay on a password-reset email is frustrating. A 5-minute delay on a transaction confirmation looks like the system failed. Knowing your 95th percentile delivery time (e.g., 12 seconds) lets you set realistic expectations and spot regressions.
How Postfix Insights helps
Section titled “How Postfix Insights helps”Parsing delays= by hand for every message is tedious. Postfix Insights computes delivery SLA automatically and surfaces slow domains. On the /stats dashboard:
-
Delivery SLA metric: Shows the percentage of delivered messages under a configurable threshold (e.g., 30 seconds). You can compare SLA across time periods and spot degradation.
-
Slow domains table: Lists recipient domains sorted by 95th percentile delay. If example.com consistently has high delays, it appears here, and you can drill into message logs for that domain.
-
Delay distribution: A histogram of all delays so you can see whether the distribution is bimodal (fast local messages + slow external), concentrated, or spread.
Without this automation, you would need to parse the maillog manually with grep, awk, or custom scripts. Postfix Insights replaces that with a few clicks.
To get started, see the Quick start guide to install Postfix Insights in four Docker commands. Once running, the /stats dashboard shows SLA, slow domains, and delivery trends. For source code and contributing, visit Postfix Insights on GitHub.