Email Delivery Issues — UI Diagnostic Checklist

When messages don't arrive — late, bounced, missing entirely — the AcelleMail dashboard surfaces nine signals that tell you whether the issue is sending-side or recipient-side. Walk the checklist, fix from the UI in most cases.

The 60-second dashboard check

Before assuming the worst, open AcelleMail and walk these four screens. 80% of delivery cases resolve from the UI without needing the operator.

1. Look at the campaign's tracking log

Open the campaign → Tracking log tab. Every send attempt is here with status + timestamp:

Tracking log — per-message audit

What the rows tell you:

  • All recent timestamps, all "Delivered" → AcelleMail sent successfully. The issue is recipient-side (spam folder, blocked domain). Skip to Sending server test below.
  • Recent timestamps, mix of "Bounced" + "Deferred" → receiving servers are slow or rejecting. Open the bounce log next.
  • Tracking log is empty / old → the campaign never actually sent. See Campaign stuck in Sending.

2. Read the bounce log

Same campaign → Bounces tab. The DSN code column tells you why each message failed:

Bounce log — mixed hard/soft

  • Most bounces are "5.1.1 User unknown" → stale list, addresses no longer exist. Normal at <5%. Run Subscribers → Email verification before next send.
  • Most bounces are "5.7.x" (delivery not authorized) → your sending IP or domain is being blocked. Move to step 3.
  • Most bounces are "4.x.x" (temporary) → AcelleMail retries automatically. Wait — no action needed.

See Decoding bounce messages for the full DSN code reference.

3. Check sending server health

Settings → Sending servers → click the active server. The detail page shows the three authentication signals:

Sending server config — auth chips

What to verify:

  • SPF, DKIM, DMARC chips all green — receiving servers can confirm you're authorized to send.
  • Daily quota not exhausted — if you sent your daily limit, the rest queues to tomorrow.
  • Bounce + complaint rates < 5% — sustained higher rates throttle your IP.

If any chip is red, click Verify domain → AcelleMail walks you through the DNS records to add at your DNS host.

4. Run the live diagnostic test

Same sending server detail → Test button. Sends a single message to an address you specify:

Sending server test diagnostic

Send to your own personal email. Three outcomes:

  • Arrives in Inbox → sending setup is OK. The original issue is content-side (spam-trigger words) or recipient-side (their filter). See Why emails go to spam.
  • Arrives in Spam → authentication or reputation issue. Open the message → Show original (Gmail) → check Authentication-Results header.
  • Doesn't arrive at all → IP outright blocked. Check the sending IP at mxtoolbox.com → blocklist lookup.

Common UI fix paths

Signal in dashboard Likely cause What to do
Tracking log empty hours after launch Queue worker idle Operator: restart workers (see Advanced)
Most bounces "5.1.1" (User unknown) Stale list Run email verification, prune addresses
Most bounces "5.7.x" (Delivery not authorized) IP/domain blocked Check sending server Verify domain; if SPF/DKIM/DMARC all green, IP blocklist issue
SPF or DKIM chip red on sending server DNS records missing Click Verify domain → follow wizard → update DNS at registrar
Test diagnostic doesn't arrive IP on blocklist mxtoolbox.com lookup → delisting request per blocklist's process
Sending server quota hit Daily limit exceeded Wait for reset OR provision additional sending server in pool
All to ONE domain failing (e.g. @example.com only) That receiver is blocking you specifically Investigate FBL complaints from that domain; pause sends to it until reputation recovers

When to escalate to the operator

The dashboard exposes everything most users need. Escalate to the server operator when:

  • Tracking log is empty AND the campaign should have started >30 minutes ago (queue worker dead)
  • Test diagnostic doesn't arrive AND the IP is verified clean on blocklists (deeper SMTP / DNS / firewall issue)
  • Daily quota is large enough but campaigns still stop mid-send (PHP-FPM worker exhaustion, database lock)
Advanced: server-side checks for the operator

When dashboard checks point at the worker or server-side delivery, the operator's checklist:

Worker health:

ps aux | grep "queue:work" | grep -v grep
# Expect at least one process for each --queue= pool the supervisor config defines.

php artisan queue:size
# Expected: dropping toward zero as the worker processes the backlog.

php artisan queue:failed | wc -l
# >0 means jobs gave up after their tries. Investigate with:
php artisan queue:failed

Sending server connection test from the CLI:

php artisan tinker --execute='
  \$server = \App\Model\SendingServer::where("status","active")->first();
  echo "Driver: " . \$server->type . "\n";
  echo "Host: " . \$server->host . "\n";
  echo "Test result: " . json_encode(\$server->test()) . "\n";
'

SMTP smoke (raw connectivity from the AcelleMail host to the SMTP gateway):

# Replace with your sending server's host + port.
nc -zv email-smtp.us-east-1.amazonaws.com 587
# Expected: Connection succeeded. Failure = firewall blocking outbound SMTP.

TLS handshake verification:

openssl s_client -connect smtp.sendgrid.net:587 -starttls smtp -crlf < /dev/null \
  2>/dev/null | grep -E "(subject|issuer|verify)"
# Expected: valid cert chain. Self-signed or expired = TLS rejection.

DNS record check (run from the AcelleMail host to verify the records that receivers actually see):

dig TXT yourdomain.com +short | grep spf
dig TXT default._domainkey.yourdomain.com +short
dig TXT _dmarc.yourdomain.com +short

If any of those return empty, the published records aren't propagated. Wait 24h after publishing; if still empty, the DNS host hasn't accepted the record.

Log triage:

tail -200 /home/acelle/domains/acellemail.com/storage/logs/laravel.log | grep -iE "(send|smtp|fail|reject)"
tail -200 /var/log/mail.log    # if MTA = local Postfix

Sending server queue lag — for high-volume installs, the per-server queue depth tells you which sending server is the bottleneck:

php artisan tinker --execute='
  foreach (\App\Model\SendingServer::where("status","active")->get() as \$s) {
    echo \$s->name . ": " . \$s->getCurrentDailySendingQuotaUsage() . " / " . \$s->daily_quota . " today\n";
  }
'

A server at 100% quota is the cause of queue spillover into the next day.

Related articles

18 Kommentare

13 Kommentare

  1. hung.nguyen.it
    Confirming the campaign:rerun auto-fix actually works. We had a worker OOM mid-batch last week, walked away thinking we'd need to manually intervene, came back in 15 minutes and it had recovered itself
  2. tnovak.cz
    Thanks for grounding this in actual source — much better than the generic Laravel advice you find on Stack Overflow.
  3. bos.devops
    When you say to bump wait_timeout to 86400, does that need a MySQL restart or is it dynamic? We're on RDS so restarts are expensive.
  4. m.schmidt78
    if you're on Ubuntu 22.04 with the default cron package, the time zone is UTC even if /etc/timezone says otherwise. Bit us once — the scheduled job timing was 7 hours off.
  5. y.yamamoto
    cause #2 (dead supervisor) it us after a kernel-upgrade reboot. The systemctl enable bit was missing. Took 2 hours to figure out because nothing was logging.
    1. admin
      Thanks for the breakdown. Saving for our customer-success teams reference library.
  6. v.petrova.ru
    Cause #2 (dead supervisor) hit us after a kernel-upgrade reboot. The systemctl enable bit was missing. Took 2 hours to figure out because nothing was logging.
  7. cmendoza.mx
    What about the case where campaign:rerun itself crashes silently? We had cron running but :rerun was failing on a deleted customer and just bailing out. No alerts
    1. admin
      currently a manual step. there's a feature request tracking it on the repo if you want to +1.
    2. admin (bearbeitet)
      yes, that pattern is supported. the undocumented bit is the order — config:cache must come after the migration, not before. updating the docs to make that explicit.
  8. sofia.costa.pt
    This article saved me about 4 hours of debugging today. The diagnostic order at the top is exactly the worflow I needed.
  9. jmorrison.itop…
    Adding to this: we had a campaign stuck for 6 hours one time. Turned out the running_pid was alive but the worker was deadlocked on a slow MySQL query. ps showed it as running, kill -9 was the only fix. Now we monitor for stale running_pid > 30 min...
  10. emma.whitaker
    Bookmarking this. Wish I had it last month when our queue backed up on a Sunday night.
    1. admin
      Appreciate it. If anything in this needs updating, ping us — we revisit articles evry few months.
  11. femi.adeyemi
    sent this to my whole ops team. should be required reading before anyone touches the prod queue.
  12. joel.anders.se
    Is there a way to detect cause #6 (lost DB connection) before workers wedge? Looking for a heartbeat metric to alert on
    1. admin
      we tested this with up to 1m subscribers on a $40/mo vps. past that you start needing query optimization. below that, the defaults are fine.
  13. priya.iyer.ops
    We hit cause #5 last quarter — SES sandbox limits we didn't know about. The 'wait it out' advice is right. We tried aggressive retries first and it just made things worse

More in Troubleshooting