Deliverability Incident Runbook — From Alert to Recovery in AcelleMail

Bounce rate spiked. Complaint volume jumped. A campaign is stuck on Sending. This is the step-by-step runbook for when something breaks — every check runs from the dashboard, in the order that resolves most incidents fastest.

The 5-minute triage flow

When something looks broken — a campaign stuck on Sending, a bounce-rate spike, a customer complaint — work this exact flow in the dashboard first, top to bottom. Most incidents resolve before you reach the last step. Each check tells you where the problem is so you don't guess.

Open the affected campaign. On a regular campaign the report tabs sit under the Sending logs dropdown: Delivery log, Bounce log, Feedback log, Open log, Click log, Unsubscribe log. (On an A/B-test campaign the same dropdown is labelled Logs, and the per-message tabs live under it the same way.) The first three are your triage tools.

Step 1 — Read the Delivery log

Open the campaign → Sending logs → Delivery log:

Delivery log — per-message audit

This is the per-message record of what AcelleMail handed to your sending server. Each row shows the recipient and a Status badge — Sent (green) when the message was accepted, or Error (red) when the send failed (hover the badge for the error text). Read the pattern:

  • Recent rows, all marked Sent — AcelleMail delivered the messages to the SMTP server fine. The problem is recipient-side or reputation-side, not in your queue. Go to the Bounce log.
  • Recent rows marked Error — sends are failing at the server. Go straight to the Bounce log to see why.
  • No new rows in the last 30 minutes while the campaign still shows Sending — nothing is draining the queue. Either the campaign is paused, or the background worker that sends mail has stopped. See Why is my campaign stuck in Sending status.

Step 2 — Read the Bounce log

Open the campaign → Sending logs → Bounce log. Every bounce is listed with the recipient, a Bounce type badge, and the raw Reason the receiving server sent back (click a reason to see the full text):

Campaign bounce log — hard + soft bounces with DSN reason

Pattern-match what dominates the list:

What you see Diagnosis
Mostly 5.1.1 "User unknown" Stale list — addresses no longer exist. Normal under 5%, a problem above it.
Mostly 5.7.x "Delivery not authorized" Authentication broken (SPF/DKIM/DMARC) or your IP is blocklisted.
Mostly 4.x.x "Mailbox busy / try again" Receiving server is throttling you — these are soft bounces and get retried automatically.
All bounces to one specific domain That one receiver is blocking you. Investigate that domain on its own.
A mix of 5.x bounces + the Delivery log shows no new rows Worker is struggling or down — circle back to Step 1.

The verbatim reason text in each row is the receiving server's own response. For the full code reference, see Decoding bounce messages.

Step 3 — Read the Feedback log (complaints)

Same campaign → Sending logs → Feedback log. Each row shows the recipient, a complaint Type badge, and the raw reason:

Campaign feedback log — complaints

Complaint volume tells you whether this is a content/list problem or a pure deliverability problem:

  • Spike concentrated on one segment — that segment's consent was shaky, or the content missed. Pause sending to it and audit before resuming.
  • Spread evenly across the whole list — content or subject-line mismatch. Audit the campaign before the next send.
  • Zero complaints despite a bounce spike — this is a deliverability issue (auth or IP), not a content or list issue. That narrows your next steps a lot.

Step 4 — Check your sending-domain authentication

If the Bounce log is full of 5.7.x "not authorized" lines, your domain authentication is the prime suspect. Open Sending → Sending domains in the left sidebar. Each domain shows a Verified / Unverified status, with SPF, DKIM, and DMARC record chips:

Sending domains list — SPF/DKIM/DMARC status per domain

Click into the domain to see every DNS record and its individual check state:

Sending domain detail — DNS records with verified chips

If any record shows Unverified, click Verify DNS records to re-check them all at once. AcelleMail tells you which records are still pending — DNS changes can take up to 48 hours to propagate worldwide, so a freshly edited record may need time before it turns green.

Step 5 — Run a live test send

To confirm whether the problem is reputation or something else, send yourself a real message. Open Sending → Sending servers, click into your active server, and use Send test email in the toolbar:

Sending server — Send test email modal

Send one to your own Gmail address, one to Outlook, one to Yahoo, then open each one:

  • All land in the inbox — your reputation is fine. The original issue is content-side or list-side, not your sending setup.
  • Some land in spam — an ISP-specific reputation hit (often just Gmail or just Outlook). Check that provider's postmaster tooling.
  • None arrive at all — your IP is likely blocklisted. Look it up on a public blocklist checker and follow that list's delisting process.

Triage decision tree

Once the five checks above point you at a cause, this maps the cause to the next action:

Signal Action
Delivery log shows no new rows + campaign still on Sending for >30 min Worker isn't draining the queue — see Why is my campaign stuck in Sending status
5.7.x bounces + a DKIM/SPF chip is red Fix the record at Sending → Sending domains, then click Verify DNS records
4.x.x bounces and nothing else off Wait — these soft bounces auto-retry within 24h
5.1.1 bounces above 10% of recipients List-quality problem — pause, run email verification on the list before resuming
Complaint spike above 0.3% PAUSE remaining sends; audit consent + content; do NOT resume to the full list
Test send lands in Spam at Gmail Gmail-specific reputation — check Google Postmaster Tools
Test send doesn't arrive at all IP blocklisted — look it up on a blocklist checker and follow delisting
Every message to one domain fails That receiver is blocking you — pause sending to that domain and investigate separately

Recovery checklist

After you've identified the root cause, work in this order:

  1. Pause active sending if there's customer-visible impact (queue not draining, complaints climbing).
  2. Fix the source — the DNS record, the list, or the content.
  3. Send a small test to your most-engaged 5% to validate the fix — never the full list.
  4. Monitor for 24 hours — are bounce and complaint rates back to baseline?
  5. Resume normal sending once everything is green.
  6. Write the post-mortem within 48 hours (template below).

When the problem is below the dashboard

If you've worked every step above and the Delivery log shows no new rows while the campaign sits on Sending, the issue is the background process that actually sends mail — the queue worker — not anything you can fix from a campaign screen. That's a separate, infrastructure-level fix (restarting the worker, checking the queue) and it's covered end to end in Setting Up Queue Workers and Cron Jobs. Hand that one to whoever administers your install.

Reputation recovery cycle

If Step 5 confirms a reputation hit (messages going to spam or not arriving), don't blast the full list to "push through" it — that makes it worse. Ramp back in instead, sending only to your most-engaged subscribers first so receiving servers see strong engagement again:

  • Day 0 — Pause all bulk sending. Re-verify your domain at Sending → Sending domains and confirm SPF, DKIM, and DMARC are all green. Begin sending only to subscribers who opened in the last 7 days (your highest-engagement group).
  • Day 7 — If those sends stayed clean, widen to the last-30-day openers.
  • Day 14 — Widen to the last-90-day openers. Keep checking Google Postmaster Tools and Microsoft SNDS daily.
  • Day 30 — Resume to your full audience once reputation is back to baseline.

A mild dip recovers in one to two weeks. A severe hit (sustained complaint rate above 1%) can take four to six weeks, or a fresh IP and a full warmup — see IP warmup schedule for new sending servers.

Post-mortem template

Capture every incident in a shared doc so the team learns from it. Fill in:

INCIDENT: [Date] — [Short description]
DETECTED VIA: [Customer complaint / routine check / alert]
DETECTED AT: [Timestamp]
SEVERITY: [Stalled campaigns / damaged reputation / complaint spike]

ROOT CAUSE: [One sentence — what actually broke]

TIMELINE:
  T+0:   [What happened]
  T+5m:  [What we did]
  T+30m: [What we did]
  T+...: [Resolution]

CONTRIBUTING FACTORS: [What made detection or response slow / impact wider]

PREVENTION: [What we change so it doesn't recur — action items with owners + dates]

LEARNINGS: [Even with no action — what we now know that we didn't before]

Related articles

17 comments

9 comments

  1. hung.nguyen.it
    This is the clearest IP warmup schedule I've found. The volume table at the top is what I'm referencing daily
    1. admin
      appreciate it. if anything in this needs updating, ping us — we revisit articles every few months.
  2. tranminh.devop…
    We hit a Spamhaus listing once. Self-service delisting was actually fast (< 24h) but the reputation recovery took weeks. Not the listing itself that hurt — the user complaints that caused it
    1. admin
      Great real-world detail. Your point about stale running_pid > 30 min as an alert is something we should add to the diagnostic flow. :)
  3. phuong.mai.hn
    Confirming the Postmaster Tools data lag — sometimes 48 hours, sometimes longer. Don't make decisions on a single day's data.
  4. ahmed.hassan.c…
    We warmed up a dedicated IP last fall. The 2-week ramp this article describes is on the aggressive side — Gmail in particular punishes anything faster than ~3-4 weeks. We did 4 weeks and had a clean ramp. anyway
  5. lequan.saigon
    The Postmaster Tools section is gold. Most senders don't even know it exists.
  6. cmendoza.mx
    Does engagement-based segmentation help during warmup? E.g. only sending to the most-engaged 20% during week 1?
    1. admin
      Good question — and one that comes up often enough we should add an FAQ section. Short answer: yes for the common case; the exception is when you're running custom plugins that override the default behavior...
  7. danrey.dev
    If you're warming a new IP after a known issue, consider seeding with transactional mail first (password resets, order confirmations). Higher engagement rate per send than marketing — helps the reputation ramp.
  8. femi.adeyemi
    Bookmarked. Going to share with the team — we've been winging warmup and it shows in the numbers...
    1. admin (edited)
      Thanks for the kind words. We try to keep these source-grounded so they age well...
  9. v.petrova.ru
    For very low-volume senders (< 5k/month), does warmup even matter? Or just send and let the provider's shared pool absorb the trickle?
    1. admin
      we're aware of the silent-bail-out on deleted customers — there's an open issue for it. workaround for now: monitor the campaign:rerun log for absence of expected log lines, alert when silent for > 20 min.
    2. admin (edited)
      Honest answer: it depends on your provider. SES handles it gracefully; Mailgun is stricter. We'll add a provider-by-provider table in the next revision.
    3. admin (edited)
      There's no built-in way today. Two workarounds: (1) cron + custom script polling the API every N minutes, (2) webhook-driven if your event source supports it. Most operators go with #2...
    4. admin (edited)
      Short answer: yes — set the MySQL session variable from your workers .env on boot and you'll get the longer timeout per connection. We'll add an explicit recipe in the next refresh.

More in Sending & Deliverability