The 5-minute triage flow#
When something looks broken (campaign stuck / bounce rate spike / customer complaint), don't jump to SSH. Walk this exact flow in the AcelleMail dashboard first. Most incidents resolve here.
Step 1: Read the tracking log#
Campaign → Tracking log tab:

What the rows tell you:
- All recent timestamps + "Sent" status — AcelleMail handed messages to the SMTP successfully. Issue is recipient-side.
- Recent timestamps + "Failed" + retry counter — receiving servers rejecting. Open bounce log next.
- No new rows in last 30 min — queue worker is dead OR campaign is paused. Check campaign state + worker (see Advanced).
Step 2: Read the bounce log#
Where the bounce signals live#
Open the campaign → Bounce log tab. Every failed delivery is row-listed with the recipient, bounce type (Hard/Soft chip), and the raw reason from the receiving server:

For machine-codes (5.x.x / 4.x.x DSN classifications) the per-row reason text is the receiving server's verbatim response. See Decoding bounce messages for the full code reference.
Pattern-match the bounces:
| What you see |
Diagnosis |
| Mostly 5.1.1 "User unknown" |
Stale list — addresses no longer exist. Normal at <5%, problem above. |
| Mostly 5.7.x "Delivery not authorized" |
Authentication broken (SPF/DKIM/DMARC) OR IP blocklisted |
| Mostly 4.x.x "Mailbox busy / try again" |
Receiving server overloaded — AcelleMail retries automatically |
| ALL bounces to one specific domain |
That receiver blocking you specifically — investigate FBL signals |
| Mix of 5.x + tracking log shows "Pending" rows piling up |
Queue worker not picking up — worker process down |
Step 3: Check the feedback log (complaints)#
Same campaign → Feedback log tab:

Complaint volume tells you:
- Spike + concentrated on one segment — that segment's consent was unclear or content was off-target. Pause sending to it; audit.
- Spread across the list — content / subject mismatch. Audit the campaign before next send.
- Zero complaints despite bounce spike — pure deliverability issue (auth/IP), not content/list.
Step 4: Check sending server health#
Open Settings → Sending servers → click your active server:

Verify all three: SPF green, DKIM green, DMARC green. Any red = receiving servers immediately throttling. Click Verify domain to walk through DNS records.
Step 5: Run the live diagnostic#
Same sending server → toolbar Send test email:

Send to your own personal Gmail, Outlook, Yahoo. Open each:
- All Inbox → reputation is OK; original issue is content-side or list-side
- Some spam folder → ISP-specific reputation hit (Gmail or Outlook only). Check that ISP's postmaster tool.
- None arrive → IP blocklisted. Use mxtoolbox.com to identify which blocklist; follow delisting process.
Triage decision tree#
| Signal |
Action |
| Tracking log empty + campaign running >30min |
Queue worker dead (escalate to operator) |
| 5.7.x bounces + DKIM red |
Re-run Verify domain wizard at sending-server config |
| 4.x.x bounces + nothing else off |
Wait — AcelleMail auto-retries within 24h |
| 5.1.1 bounces >10% of recipients |
List import error — pause; run email verification on full list |
| Complaint spike >0.3% |
PAUSE remaining sends; audit consent + content; do NOT resume to full list |
| Test send arrives in Spam at Gmail |
Gmail-specific. Check Postmaster Tools. |
| Test send doesn't arrive at all |
IP blocklisted. Check mxtoolbox.com |
| All to one domain fail (e.g. all yahoo.com) |
Domain-specific block — pause sending to that domain; investigate via Y!Mail postmaster signals |
Recovery checklist#
After identifying root cause:
- Pause active sending if customer-visible impact (campaign log not draining, complaints spike)
- Fix the source — DNS record, list cleanup, content edit
- Send a small re-engagement to your most-engaged 5% to validate (NOT full list)
- Monitor for 24h — bounce + complaint rates back to baseline?
- Resume normal sending if all green
- Post-mortem within 48h (template below)
Post-mortem template#
For each incident, capture in a shared doc:
INCIDENT: [Date] — [Short description]
DETECTED VIA: [Customer complaint / auto-alert / routine check]
DETECTED AT: [Timestamp]
SEVERITY: [Customer-facing impact: stalled campaigns / damaged reputation / complaint spike]
ROOT CAUSE: [One sentence — what actually broke]
TIMELINE:
T+0: [What happened]
T+5m: [What we did]
T+30m: [What we did]
T+...: [Resolution]
CONTRIBUTING FACTORS: [What made detection slow / response slow / impact wider]
PREVENTION: [What we change to make this not happen again — explicit action items with owners + dates]
LEARNINGS: [Even if no action — what we now know that we didn't before]
Advanced: operator-side server diagnostics + reputation recovery cycles
When dashboard checks point at infrastructure, the operator's checklist:
Worker process health:
ssh acelle@acellemail.com "
ps aux | grep -E 'queue:work' | grep -v grep | wc -l
# Expect 1+ per --queue= pool defined in supervisor config
php artisan queue:size
# Should drop toward zero; if growing, workers are stuck
php artisan queue:failed | wc -l
# If >0, jobs gave up; investigate:
php artisan queue:failed
"
Restart workers (most common single fix):
ssh acelle@acellemail.com "sudo supervisorctl restart all"
Outbound SMTP smoke from the AcelleMail host:
ssh acelle@acellemail.com "nc -zv email-smtp.us-east-1.amazonaws.com 587"
# Expected: succeeded. Failure = firewall outbound block.
TLS handshake check:
ssh acelle@acellemail.com "
openssl s_client -connect smtp.sendgrid.net:587 -starttls smtp -crlf < /dev/null 2>/dev/null \
| grep -E '(subject|issuer|verify)'
"
DNS check from the AcelleMail host:
ssh acelle@acellemail.com "
dig TXT yourdomain.com +short | grep -i spf
dig TXT default._domainkey.yourdomain.com +short
dig TXT _dmarc.yourdomain.com +short
"
If any returns empty, DNS not propagated or records misconfigured.
Reputation recovery cycle (when reputation hit confirmed):
Day 0: PAUSE all bulk sending. Re-verify domain. Confirm SPF/DKIM/DMARC green.
Begin sending ONLY to engaged-last-7d segment (highest open rate).
Day 7: If engagement-segment sends stayed clean, expand to engaged-last-30d.
Day 14: Expand to engaged-last-90d. Daily check of SNDS / Gmail Postmaster.
Day 30: Resume to full segment if reputation back to baseline.
Mild reputation dips recover in 1-2 weeks; severe (>1% sustained complaint) takes 4-6 weeks or requires fresh IP + warm-up cycle.
Post-mortem automation — wire the incident-template to a shared Notion / Confluence page via webhook. Every incident auto-creates a new entry; team fills it in within 48h. Builds institutional knowledge over time.
Pre-incident detection — daily-run audit script that exit-codes non-zero if:
- Yesterday's bounce rate > 4%
- Yesterday's complaint rate > 0.2%
- Any sending server has DKIM/SPF red chip
- Queue depth at 8am exceeds 10× the average backlog
Wire to PagerDuty / Slack. Catches creeping problems before customers do.
Related articles#