Sending & Deliverability

Deliverability Incident Runbook — From Alert to Recovery in AcelleMail

Bounce rate spiked. Complaint volume jumped. A campaign is queueing and not sending. This is the step-by-step runbook for AcelleMail operators when something breaks — UI checks first, server-side diagnostics last.

May 19, 2026 6 min read Intermediate

The 5-minute triage flow

When something looks broken (campaign stuck / bounce rate spike / customer complaint), don't jump to SSH. Walk this exact flow in the AcelleMail dashboard first. Most incidents resolve here.

Step 1: Read the tracking log

Campaign → Tracking log tab:

Tracking log — per-message audit

What the rows tell you:

All recent timestamps + "Sent" status — AcelleMail handed messages to the SMTP successfully. Issue is recipient-side.
Recent timestamps + "Failed" + retry counter — receiving servers rejecting. Open bounce log next.
No new rows in last 30 min — queue worker is dead OR campaign is paused. Check campaign state + worker (see Advanced).

Step 2: Read the bounce log

Where the bounce signals live

Open the campaign → Bounce log tab. Every failed delivery is row-listed with the recipient, bounce type (Hard/Soft chip), and the raw reason from the receiving server:

Campaign bounce log — hard + soft bounces with DSN reason

For machine-codes (5.x.x / 4.x.x DSN classifications) the per-row reason text is the receiving server's verbatim response. See Decoding bounce messages for the full code reference.

Pattern-match the bounces:

What you see	Diagnosis
Mostly 5.1.1 "User unknown"	Stale list — addresses no longer exist. Normal at <5%, problem above.
Mostly 5.7.x "Delivery not authorized"	Authentication broken (SPF/DKIM/DMARC) OR IP blocklisted
Mostly 4.x.x "Mailbox busy / try again"	Receiving server overloaded — AcelleMail retries automatically
ALL bounces to one specific domain	That receiver blocking you specifically — investigate FBL signals
Mix of 5.x + tracking log shows "Pending" rows piling up	Queue worker not picking up — worker process down

Step 3: Check the feedback log (complaints)

Same campaign → Feedback log tab:

Campaign feedback log — complaints

Complaint volume tells you:

Spike + concentrated on one segment — that segment's consent was unclear or content was off-target. Pause sending to it; audit.
Spread across the list — content / subject mismatch. Audit the campaign before next send.
Zero complaints despite bounce spike — pure deliverability issue (auth/IP), not content/list.

Step 4: Check sending server health

Open Settings → Sending servers → click your active server:

Sending server config with SPF/DKIM/DMARC chips

Verify all three: SPF green, DKIM green, DMARC green. Any red = receiving servers immediately throttling. Click Verify domain to walk through DNS records.

Step 5: Run the live diagnostic

Same sending server → toolbar Send test email:

Test send modal

Send to your own personal Gmail, Outlook, Yahoo. Open each:

All Inbox → reputation is OK; original issue is content-side or list-side
Some spam folder → ISP-specific reputation hit (Gmail or Outlook only). Check that ISP's postmaster tool.
None arrive → IP blocklisted. Use mxtoolbox.com to identify which blocklist; follow delisting process.

Triage decision tree

Signal	Action
Tracking log empty + campaign running >30min	Queue worker dead (escalate to operator)
5.7.x bounces + DKIM red	Re-run Verify domain wizard at sending-server config
4.x.x bounces + nothing else off	Wait — AcelleMail auto-retries within 24h
5.1.1 bounces >10% of recipients	List import error — pause; run email verification on full list
Complaint spike >0.3%	PAUSE remaining sends; audit consent + content; do NOT resume to full list
Test send arrives in Spam at Gmail	Gmail-specific. Check Postmaster Tools.
Test send doesn't arrive at all	IP blocklisted. Check mxtoolbox.com
All to one domain fail (e.g. all yahoo.com)	Domain-specific block — pause sending to that domain; investigate via Y!Mail postmaster signals

Recovery checklist

After identifying root cause:

Pause active sending if customer-visible impact (campaign log not draining, complaints spike)
Fix the source — DNS record, list cleanup, content edit
Send a small re-engagement to your most-engaged 5% to validate (NOT full list)
Monitor for 24h — bounce + complaint rates back to baseline?
Resume normal sending if all green
Post-mortem within 48h (template below)

Post-mortem template

For each incident, capture in a shared doc:

INCIDENT: [Date] — [Short description]
DETECTED VIA: [Customer complaint / auto-alert / routine check]
DETECTED AT: [Timestamp]
SEVERITY: [Customer-facing impact: stalled campaigns / damaged reputation / complaint spike]

ROOT CAUSE: [One sentence — what actually broke]

TIMELINE:
  T+0:   [What happened]
  T+5m:  [What we did]
  T+30m: [What we did]
  T+...: [Resolution]

CONTRIBUTING FACTORS: [What made detection slow / response slow / impact wider]

PREVENTION: [What we change to make this not happen again — explicit action items with owners + dates]

LEARNINGS: [Even if no action — what we now know that we didn't before]

Advanced: operator-side server diagnostics + reputation recovery cycles

When dashboard checks point at infrastructure, the operator's checklist:

Worker process health:

ssh acelle@acellemail.com "
  ps aux | grep -E 'queue:work' | grep -v grep | wc -l
  # Expect 1+ per --queue= pool defined in supervisor config

  php artisan queue:size
  # Should drop toward zero; if growing, workers are stuck

  php artisan queue:failed | wc -l
  # If >0, jobs gave up; investigate:
  php artisan queue:failed
"

Restart workers (most common single fix):

ssh acelle@acellemail.com "sudo supervisorctl restart all"

Outbound SMTP smoke from the AcelleMail host:

ssh acelle@acellemail.com "nc -zv email-smtp.us-east-1.amazonaws.com 587"
# Expected: succeeded. Failure = firewall outbound block.

TLS handshake check:

ssh acelle@acellemail.com "
  openssl s_client -connect smtp.sendgrid.net:587 -starttls smtp -crlf < /dev/null 2>/dev/null \
    | grep -E '(subject|issuer|verify)'
"

DNS check from the AcelleMail host:

ssh acelle@acellemail.com "
  dig TXT yourdomain.com +short | grep -i spf
  dig TXT default._domainkey.yourdomain.com +short
  dig TXT _dmarc.yourdomain.com +short
"

If any returns empty, DNS not propagated or records misconfigured.

Reputation recovery cycle (when reputation hit confirmed):

Day 0:  PAUSE all bulk sending. Re-verify domain. Confirm SPF/DKIM/DMARC green.
        Begin sending ONLY to engaged-last-7d segment (highest open rate).
Day 7:  If engagement-segment sends stayed clean, expand to engaged-last-30d.
Day 14: Expand to engaged-last-90d. Daily check of SNDS / Gmail Postmaster.
Day 30: Resume to full segment if reputation back to baseline.

Mild reputation dips recover in 1-2 weeks; severe (>1% sustained complaint) takes 4-6 weeks or requires fresh IP + warm-up cycle.

Post-mortem automation — wire the incident-template to a shared Notion / Confluence page via webhook. Every incident auto-creates a new entry; team fills it in within 48h. Builds institutional knowledge over time.

Pre-incident detection — daily-run audit script that exit-codes non-zero if:

Yesterday's bounce rate > 4%
Yesterday's complaint rate > 0.2%
Any sending server has DKIM/SPF red chip
Queue depth at 8am exceeds 10× the average backlog

Wire to PagerDuty / Slack. Catches creeping problems before customers do.

Tagged

Acellemail

9 comments

Join the conversation. Comments are open to AcelleMail community members.

Create an account Sign in

hung.nguyen.it il y a 1 mois

This is the clearest IP warmup schedule I've found. The volume table at the top is what I'm referencing daily

0
1. admin il y a 1 mois
  
  appreciate it. if anything in this needs updating, ping us — we revisit articles every few months.
  
  0
tranminh.devop… il y a 2 mois

We hit a Spamhaus listing once. Self-service delisting was actually fast (< 24h) but the reputation recovery took weeks. Not the listing itself that hurt — the user complaints that caused it

0
1. admin il y a 1 mois
  
  Great real-world detail. Your point about stale running_pid > 30 min as an alert is something we should add to the diagnostic flow. :)
  
  0
phuong.mai.hn il y a 3 mois

Confirming the Postmaster Tools data lag — sometimes 48 hours, sometimes longer. Don't make decisions on a single day's data.

0
ahmed.hassan.c… il y a 3 mois

We warmed up a dedicated IP last fall. The 2-week ramp this article describes is on the aggressive side — Gmail in particular punishes anything faster than ~3-4 weeks. We did 4 weeks and had a clean ramp. anyway

0
lequan.saigon il y a 3 mois

The Postmaster Tools section is gold. Most senders don't even know it exists.

0
cmendoza.mx il y a 3 mois

Does engagement-based segmentation help during warmup? E.g. only sending to the most-engaged 20% during week 1?

0
1. admin il y a 3 mois
  
  Good question — and one that comes up often enough we should add an FAQ section. Short answer: yes for the common case; the exception is when you're running custom plugins that override the default behavior...
  
  0
danrey.dev il y a 3 mois

If you're warming a new IP after a known issue, consider seeding with transactional mail first (password resets, order confirmations). Higher engagement rate per send than marketing — helps the reputation ramp.

0
femi.adeyemi il y a 3 mois

Bookmarked. Going to share with the team — we've been winging warmup and it shows in the numbers...

0
1. admin il y a 1 mois (edited)
  
  Thanks for the kind words. We try to keep these source-grounded so they age well...
  
  0
v.petrova.ru il y a 3 mois

For very low-volume senders (< 5k/month), does warmup even matter? Or just send and let the provider's shared pool absorb the trickle?

0
1. admin il y a 3 mois
  
  we're aware of the silent-bail-out on deleted customers — there's an open issue for it. workaround for now: monitor the campaign:rerun log for absence of expected log lines, alert when silent for > 20 min.
  
  0
2. admin il y a 3 mois (edited)
  
  Honest answer: it depends on your provider. SES handles it gracefully; Mailgun is stricter. We'll add a provider-by-provider table in the next revision.
  
  0
3. admin il y a 1 mois (edited)
  
  There's no built-in way today. Two workarounds: (1) cron + custom script polling the API every N minutes, (2) webhook-driven if your event source supports it. Most operators go with #2...
  
  0
4. admin il y a 2 semaines (edited)
  
  Short answer: yes — set the MySQL session variable from your workers .env on boot and you'll get the longer timeout per connection. We'll add an explicit recipe in the next refresh.
  
  0

Sending & Deliverability

Hard Bounce vs Soft Bounce — Reading the AcelleMail Bounce Log

A hard bounce is permanent. A soft bounce is temporary. AcelleMail handles them differently — hard bounces unsubscribe the address; soft bou...

7 min read Beginner

5 12

Sending & Deliverability

Amazon SES + AcelleMail — Setup, Sandbox, Production Mode

Amazon SES is the cheapest reliable sending option at scale — $0.10 per 1,000 emails. This guide walks the AWS-side setup (verify domain, re...

7 min read Intermediate

6 13

Sending & Deliverability

Multiple Sending Servers in AcelleMail — Add, Mix, Route

Spread your volume across multiple SMTP + API senders to absorb downtime, isolate reputation per campaign type, and grow past any single ven...

5 min read Intermediate

8 21

Deliverability Incident Runbook — From Alert to Recovery in AcelleMail

The 5-minute triage flow

Step 1: Read the tracking log

Step 2: Read the bounce log

Where the bounce signals live

Step 3: Check the feedback log (complaints)

Step 4: Check sending server health

Step 5: Run the live diagnostic

Triage decision tree

Recovery checklist

Post-mortem template

Related articles

9 comments

Hard Bounce vs Soft Bounce — Reading the AcelleMail Bounce Log

Amazon SES + AcelleMail — Setup, Sandbox, Production Mode

Multiple Sending Servers in AcelleMail — Add, Mix, Route

More in Sending & Deliverability

Microsoft SNDS Walkthrough — Outlook IP Reputation in AcelleMail Context

Amazon SES Sending Limits — Cookbook for AcelleMail Operators

Multi-Server Rotation in AcelleMail — How the Dispatcher Picks

Sending Throttling Strategies — Pace Your Volume in AcelleMail

Run your email marketing on your own server, your own terms

The 5-minute triage flow#

Step 1: Read the tracking log#

Step 2: Read the bounce log#

Where the bounce signals live#

Step 3: Check the feedback log (complaints)#

Step 4: Check sending server health#

Step 5: Run the live diagnostic#

Triage decision tree#

Recovery checklist#

Post-mortem template#

Related articles#

Get more guides like this

Related reading

Hard Bounce vs Soft Bounce — Reading the AcelleMail Bounce Log

Amazon SES + AcelleMail — Setup, Sandbox, Production Mode

Multiple Sending Servers in AcelleMail — Add, Mix, Route

More in Sending & Deliverability

Microsoft SNDS Walkthrough — Outlook IP Reputation in AcelleMail Context

Amazon SES Sending Limits — Cookbook for AcelleMail Operators

Multi-Server Rotation in AcelleMail — How the Dispatcher Picks

Sending Throttling Strategies — Pace Your Volume in AcelleMail

Run your email marketing on your own server, your own terms

Get the AcelleMail newsletter

The 5-minute triage flow

Step 1: Read the tracking log

Step 2: Read the bounce log

Where the bounce signals live

Step 3: Check the feedback log (complaints)

Step 4: Check sending server health

Step 5: Run the live diagnostic

Triage decision tree

Recovery checklist

Post-mortem template

Related articles