Scaling AcelleMail for 100K+ Emails Per Day

At 100K sends/day you cross the threshold where the default AcelleMail config starts to drag — queue depths climb, MySQL slow queries appear, occasional PHP-FPM 502s. This guide walks the bottleneck math, the configs to tune (queue pool size, MySQL InnoDB, PHP-FPM, nginx, Redis), when to split components onto separate hosts, and the cost/benefit at each step.

What this is for

At 100K sends/day, the default AcelleMail config starts to drag. You'll notice:

  • Queue depths that don't drain (1k+ jobs queued even when "nothing is happening")
  • MySQL slow query log filling up
  • Occasional 502 Bad Gateway when admin pages load slowly
  • Workers getting OOM-killed during list imports
  • Customers complaining about "sending taking forever"

None of these are individually catastrophic, but each one compounds. 100K/day is the threshold where one well-tuned box starts to struggle — and where 30 minutes of targeted tuning gets you another 5× headroom. This guide walks the math, the configs, and the architectural decisions, in the order you should hit them.

Step 0 — Do the bottleneck math

100K sends/day = ~70 sends/minute if perfectly even. In reality, you get bursts — a 9 AM Tuesday campaign blast might be 100k sends in the first 20 minutes = 5,000 sends/minute peak.

Each "send" is one job pulled from Redis → one HTTP call to the sending provider (SES/Mailgun/SendGrid) → one row inserted into email_log. A healthy worker handles ~20-50 sends/minute (limited by sending-provider API latency, ~50-200ms per call).

So to handle 5,000 sends/minute peak, you need:

5,000 sends/min ÷ 30 sends/min/worker = ~170 worker-slots needed at peak

That's significantly above the default 15-worker pool. Either accept that bursts will queue up and drain over 5-10 minutes (often fine), or scale the worker pool.

The right answer depends on your customer expectations. For most use cases, a 10-minute queue drain on a 100k-send burst is acceptable — and the default 15 + auto-scale-to-20 via queue:adjust handles it. For latency-sensitive transactional sends (welcome emails, password resets), bump the pool higher.

Step 1 — Queue worker pool sizing

Per the supervisor setup guide, the two-tier pool default is 2 master + 15 worker = 17 total at Medium tier (4 vCPU / 8 GB).

For 100k+/day, the right starting point is Large tier (8 vCPU / 16 GB) with 4 + 30 workers:

# /etc/supervisor/conf.d/acellemail-master.conf
numprocs=4   # was 2

# /etc/supervisor/conf.d/acellemail-worker.conf
numprocs=30  # was 15
sudo supervisorctl reread && sudo supervisorctl update
sudo supervisorctl status
# Should now show 4 + 30 = 34 RUNNING processes

Memory math: each worker peaks ~256-512 MB. 30 × 384 MB ≈ 11.5 GB just for workers — fits comfortably in 16 GB after MySQL (3 GB), Redis (1 GB), PHP-FPM web (1 GB), nginx (negligible), OS (1 GB) = ~17 GB worst case. If you're tight, scale to 24 GB.

Verify queue drain rate is acceptable:

# Trigger a 10k send-test campaign, then watch the drain:
watch -n 2 'redis-cli llen queues:batch; redis-cli llen queues:high'

You want both to be back to near-zero within 5 minutes.

Step 2 — MySQL tuning

Edit /etc/mysql/mysql.conf.d/mysqld.cnf:

[mysqld]
# Cache hot indexes + data in RAM. Set to 50-75% of system RAM if MySQL is alone.
# For a co-located AcelleMail + MySQL on 16 GB, 4-6 GB is a good baseline.
innodb_buffer_pool_size = 4G

# Larger redo logs = fewer flushes under heavy write load (campaign blast)
innodb_log_file_size = 512M

# Allow more concurrent connections (30 workers + web + admin + cron)
max_connections = 300

# Disable query cache (deprecated in MySQL 8; removed in MySQL 9; can hurt perf at scale)
query_cache_type = 0
query_cache_size = 0

# Trade durability for throughput — durable enough for queues; tolerates ~1s of lost commits on crash
innodb_flush_log_at_trx_commit = 2

# Optimize for SSD (default is OK for most installs; tune if you see I/O bottleneck in slow log)
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

# Per-table tablespaces (newer default, but verify)
innodb_file_per_table = 1

Restart MySQL:

sudo systemctl restart mysql

Verify the buffer pool is actually using the new size:

sudo mysql -e "SHOW VARIABLES LIKE 'innodb_buffer_pool_size';"
# Expect: 4294967296 (4 GB in bytes)

innodb_flush_log_at_trx_commit = 2 trade-off: the default 1 flushes the redo log to disk on every commit (full durability). Setting 2 flushes only once per second (~1s of writes lost on power failure). For email-sending workloads this is acceptable — at-most-1-second of email_log writes might be lost, but the sending itself completed (recipient got the email). The throughput gain is significant (often 3-5×). If you need full durability, leave it at 1 and accept slower writes.

Watch the slow query log

# Enable in mysqld.cnf:
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1.0
log_queries_not_using_indexes = 0

sudo systemctl restart mysql

# Watch:
sudo tail -f /var/log/mysql/slow.log

Common AcelleMail slow queries and their fixes:

  • SELECT ... FROM subscribers WHERE list_id = X AND status = 'subscribed' slow → ensure compound index on (list_id, status) exists (run SHOW INDEX FROM subscribers to verify)
  • SELECT ... FROM email_log WHERE campaign_id = X slow → add index on campaign_id
  • SELECT ... FROM jobs ORDER BY id LIMIT 1 slow → switch queue driver to Redis (it eliminates jobs table polling)

Step 3 — PHP-FPM tuning

Edit /etc/php/8.3/fpm/pool.d/www.conf:

pm = dynamic
pm.max_children = 50      # default 5
pm.start_servers = 10     # default 2
pm.min_spare_servers = 5  # default 1
pm.max_spare_servers = 20 # default 3
pm.max_requests = 500     # recycle workers periodically — prevents memory creep

; Important: limit how long a single FPM worker can hang before being killed
; — protects against a slow outbound HTTP call blocking the pool
request_terminate_timeout = 60s

Restart PHP-FPM:

sudo systemctl restart php8.3-fpm

Memory math: 50 workers × ~80 MB peak = ~4 GB worst case. Confirms our 16 GB sizing.

request_terminate_timeout = 60s is a hard lesson from production. Without it, a single web request that hangs (slow remote API call, DNS timeout, blocked syscall) holds a worker indefinitely. 5 such requests = entire FPM pool blocked = site appears down. 60-second timeout kills the hung worker; supervisor restarts it. Set this on every production install.

Step 4 — Nginx tuning

Edit /etc/nginx/nginx.conf:

worker_processes auto;                # one per CPU core

events {
    worker_connections 2048;          # default 768 — too low for high-traffic sites
    use epoll;                        # default on Linux; explicit for clarity
    multi_accept on;
}

http {
    # gzip on responses
    gzip on;
    gzip_min_length 1024;
    gzip_types text/plain text/css text/javascript application/javascript application/json application/xml;
    gzip_comp_level 5;

    # Reasonable client + buffer settings for the admin UI
    client_max_body_size 300M;
    client_body_buffer_size 128k;
    client_header_buffer_size 8k;
    large_client_header_buffers 4 16k;

    # Keepalives reduce TCP overhead from the admin's repeated AJAX polls
    keepalive_timeout 65;
    keepalive_requests 100;
}

Reload nginx:

sudo nginx -t && sudo systemctl reload nginx

Step 5 — Redis on the same box (still — for now)

At 100k/day, Redis stays on the same box. Splitting it off is premature optimization until you hit ~1M/day or you need HA. Confirm Redis is the queue driver per Redis for Queue Processing:

grep QUEUE_CONNECTION /var/www/acellemail/.env
# Expect: QUEUE_CONNECTION=redis

# Confirm Redis is sized for the workload:
redis-cli config get maxmemory
# Should be at least 2GB for 100k/day; 4GB for headroom

redis-cli config get maxmemory-policy
# MUST be: noeviction (queues require this — see Redis article)

Step 6 — When to split DB to its own host

Co-located DB starts to bottleneck around 5-10M sends/month (~150k-300k/day average). Symptoms:

  • iostat -x 1 shows sustained 80%+ disk %util during campaigns
  • Top reports mysqld consistently in the top 2 CPU consumers
  • SHOW PROCESSLIST shows 100+ active connections with worker queries waiting

At that point, move MySQL to its own host:

  • DigitalOcean Managed DB — easy, ~$15/mo for the smallest tier. Adds 1-2ms latency per query (private network). Worth it for the operational simplicity.
  • AWS RDS — same model. db.t3.medium is a reasonable starting point at ~$55/mo on-demand or ~$35 with 1-year Reserved.
  • Self-managed on a separate droplet — cheapest, more ops work. Pick if you already have MySQL expertise in-house.

In AcelleMail's .env:

DB_HOST=10.0.0.5         # private IP of the DB box (NEVER use public IP)
DB_DATABASE=acellemail
DB_USERNAME=acellemail
DB_PASSWORD=...

# Force private network if your cloud provides one

Run php artisan config:clear after.

Step 7 — When to add a second app server

At ~10-20M sends/month (300k-600k/day), one app server starts to struggle even after all the tuning above. The right next step is horizontal scaling:

[Load Balancer]
       ↓
  ┌────┴────┐
[App1]   [App2]   ← stateless — run web UI + workers
       ↓
[MySQL on managed]
[Redis on its own box, replicated]
[Object storage for shared storage/ — S3 / DO Spaces / B2]

Key changes from single-server:

  1. Move storage/ to shared object storage (S3 / DO Spaces). Both app servers must see the same files.
  2. Move sessions to Redis (SESSION_DRIVER=redis in .env). Otherwise customers get logged out when the LB routes them to a different app server.
  3. Add a sticky-session policy on the LB for the WYSIWYG editor (it does background autosaves keyed to session).
  4. Run cron + supervisor on only one node (or use leader-election like php artisan schedule:work with a lock). Running cron on both = double-firing every scheduled task.
  5. Add a Redis password if Redis is now on a network reachable from multiple boxes.

This is the boundary where you should consider Docker / Kubernetes — orchestrating multiple stateless app instances by hand gets tedious fast. See the Docker deployment guide.

Step 8 — Sending-provider rate limits

You can scale AcelleMail all you want, but your sending provider's rate limit caps your throughput. Watch these:

Provider Default limit How to raise
Amazon SES 14 sends/sec sandbox; production starts at 50/s and ramps with reputation Open AWS Support ticket; usual cadence is +50/s/week as reputation builds
SendGrid 100 sends/sec on Pro plan Upgrade plan
Mailgun 100 sends/sec on starter; higher tiers go to 1000/s Upgrade plan
Postmark 10 sends/sec default; up to 100/s on request Email support

At 100k/day = ~1.16 sends/second average, you're well within any provider's limits. At 1M/day = 11.5/sec average, you'll bump SES's sandbox cap during burst peaks. Plan ahead — see SES Sending Limits Cookbook.

Quick-reference tuning checklist

After all the above, your config delta from default looks like:

Component Setting Default Tuned (100k/day)
Supervisor master numprocs 2 4
Supervisor worker numprocs 15 30
MySQL innodb_buffer_pool_size 128M 4G
MySQL max_connections 151 300
MySQL innodb_log_file_size 50M 512M
MySQL innodb_flush_log_at_trx_commit 1 2
PHP-FPM pm.max_children 5 50
PHP-FPM request_terminate_timeout (unset) 60s
nginx worker_connections 768 2048
Redis maxmemory (unset) 2-4G
Redis maxmemory-policy noeviction (already) noeviction (verify)
.env QUEUE_CONNECTION sync redis
.env CACHE_DRIVER file redis
.env SESSION_DRIVER file redis

Common issues

Symptom Cause Fix
Queue depth grows during campaign blast, doesn't drain Worker pool too small Step 1 — bump worker numprocs
MySQL CPU pegs at 100% during sends Buffer pool too small; reading from disk constantly Step 2 — increase innodb_buffer_pool_size
Random 502 Bad Gateway on admin pages PHP-FPM pool exhausted Step 3 — bump pm.max_children; add request_terminate_timeout
OOM kills during list import memory_limit too low for big CSV php.ini memory_limit = 1G (for fpm + cli)
Workers slow even when queue is empty Misconfigured --sleep (workers spinning) Verify supervisor configs include --sleep=3
Mail-merge campaigns take minutes per recipient Heavy template + many merge tags Profile with xhprof / blackfire; cache merged content where possible
Sending IP getting rate-limited by Gmail/Outlook Throughput exceeds receiver's per-IP cap Add IP rotation; see Multi-Server Rotation Pattern
Free disk space dropping fast email_log table growing without bound system:cleanup daily task should prune; verify cron is firing
php artisan operations slow Cached config / view files stale php artisan optimize:clear after major config changes

When to stop tuning and just scale up

A single tuned 8 vCPU / 16 GB box handles ~1M-2M sends/month comfortably. Above that, add hardware before tuning further:

  • 2M+/mo → 16 GB → 24-32 GB RAM; consider RDS for DB
  • 5M+/mo → Multi-app-server architecture (Step 7)
  • 20M+/mo → Multi-region, dedicated sending IPs, custom partition for email_log, full DBA review

The cost of an additional $30-100/mo of hardware is much less than the cost of 4 hours of your time spent micro-tuning.

FAQ

Should I tune PHP-FPM pm = static or pm = ondemand instead of dynamic? For a busy-most-of-the-time AcelleMail (campaigns going out throughout the day), dynamic is the best balance. static wastes RAM on idle nights. ondemand adds latency on the first request after idle. dynamic is the right default for 100k/day.

Why not Cloudflare in front? Cloudflare can absorb traffic spikes to public AcelleMail pages (tracking pixel endpoints, unsubscribe links). It can't help with worker throughput (those are SSE/API calls from the server). Worth adding for the tracking-pixel layer; not a substitute for the tuning above.

Should I tune the MySQL tmp_table_size / max_heap_table_size? Defaults are usually fine. If SHOW STATUS LIKE 'Created_tmp_disk_tables' shows a high number relative to Created_tmp_tables, bump both to 256M.

What about HTTP/2 / HTTP/3? nginx 1.24+ supports both. Modest improvement on admin UI responsiveness; no impact on send throughput. Worth enabling: listen 443 ssl http2; in the vhost.

Does Acelle support read replicas? Yes — Laravel's read / write database config in config/database.php supports separate read endpoints. Useful at 5M+/mo when read-heavy operations (campaign reports, subscriber search) start to slow down. Pre-configured Laravel pattern; AcelleMail honours it.

Can I cap a single campaign's send rate? Yes — set a per-sending-server throttle. See Sending Throttling Strategies for the full configuration.

Related articles

16 comentarios

5 comentarios

  1. tnovak.cz
    saving this one. we're about to hit the volume tier where we need to think about queue tuning.
  2. m.schmidt78
    Tip for high-volume installs: monitor your failed_jobs table size, not just count. We had a queue migration that left 50k stale failed rows that started slowing reads. Truncate periodically.
    1. admin
      Good tip. The Cloudflare-outbound-rate-limit case is something we hadnt documented.
    2. admin (editado)
      solid addition — adding to the article on the next refresh.
  3. tranminh.devop…
    Have you tried SQS for the queue at scale? We're hesitant about the AWS lock-in but the managed angle is appealing
    1. admin
      We don't recommend that approach in production. It works in dev but has subtle race conditions under concurrent load. Stick with the documented pattern
    2. admin (editado)
      good question. the campaign:rerun audit writes to laravel.log only when the audit decides to force-resume — pure noop runs are silent. we'll add an info-level heartbeat in a future acelle release to make it easier to monitor
    3. admin (editado)
      good catch. the bounds (200/32) are hardcoded in the runtime. we've discussed making them configurable; not a near-term priority but it's tracked.
    4. admin (editado)
      good question — and one that comes up often enough we should add an FAQ section. Short answer: yes for the common case; the exception is when youre running custom plugins that override the default behavior.
    5. admin (editado)
      we tested this with up to 1m subscribers on a $40/mo vps. past that you start needing query optimization. below that, the defaults are fine
    6. admin (editado)
      for your specific case, i'd recommend testing with `--dry-run` first. the behavior under high load isn't 100% deterministic and we want you to see your own pattern before committing.
    7. admin (editado)
      were aware of the silent-bail-out on deleted customers — there's an open issue for it. workaround for now: monitor the campaign:rerun log for absence of expected log lines, alert when silent for > 20 min...
  4. jmorrison.itop…
    Moved from database queue to Redis last month at ~800k emails/day. Worker throughput went up ~40%. MySQL CPU dropped from 60% to 18% baseline. Highly recommend the migration once you're past 500k.
    1. admin (editado)
      Thanks for the detail — adding the kernel-reboot edge case to the article on the next update. 👀
  5. i.rossi.mil
    We do automated backups to S3 nightly. wp-cli-style. Restore tested quarterly. The article's emphasis on testing restores cannot be overstated...
    1. admin (editado)
      Solid case study material here. If you're open to it, we'd love to write this up as a blog post — happy to credit you anonymously or otherwise

More in Server Management