Analytics & Reporting

A/B Report Deep Dive: Reading AcelleMail Test Results

After your A/B test finishes, the A/B Report tab tells you the winner — but the real value is in the variant-level numbers. This guide walks through the report, the statistical-significance rules, and what to do for winner / inconclusive / loss verdicts.

December 4, 2025 6 min read Advanced

What this is for

Setting up the A/B test was the easy part (see A/B Test Campaigns). The hard part is reading the result correctly — knowing when "Variant A won by 23%" is signal vs noise, and what to test next based on the verdict.

This is the deep-dive on AcelleMail's A/B Report tab.

Where the report lives

After the test audience portion has sent and the wait period (default 72h) has passed, AcelleMail auto-evaluates the winner and shows the A/B Report tab on the campaign overview.

(Screenshot from A/B Test Campaigns article.)

The report has 5 tabs of its own:

Tab	What's there
A/B Report	Winner banner + headline stats + delivery funnel
All variants	Side-by-side per-variant numbers (Variant A vs B side-by-side)
Overall stats	Combined numbers across all variants
Insights	AcelleMail's plain-language read
Logs	Per-recipient detail

The most decisional view is All variants — that's where you compare A vs B directly.

Reading the winner banner

The green winner banner is AcelleMail's verdict + the headline improvement. Three parts:

"Variant A (A) is the winner!
 Click rate improved by 23.2% over the runner-up.
 Winner sent to remaining 80% of the audience."

Phrase	What to verify
"Variant A is the winner"	Match the variant label (A/B/C…) to what you actually tested. Notes in the setup tab tell you which variant had which subject/from-name.
"improved by 23.2% over the runner-up"	The percentage delta between winning and runner-up variants on the chosen metric. NOT the absolute rate.
"Winner sent to remaining 80%"	Confirms the rollout happened automatically (or didn't, if "Manual selection" was picked).

Is the delta statistically significant?

A "23.2% improvement" sounds great until you realize the test audience was 50 subscribers. Then it's noise.

Rough thresholds for trust (no formal statistical test — just decision rules that work):

Test audience size per variant	Trust delta if
<100	Don't trust any delta. Need bigger test.
100-500	Delta must be >15% AND consistent direction across opens + clicks
500-2,000	Delta >7% likely real
2,000-10,000	Delta >3% probably real
>10,000	Delta >1.5% reliably real

In the screenshot example: 1,176 delivered / 467 opens / 47 clicks. The test audience was ~600 per variant (1,176 total / 2 variants). At that size, 23.2% delta is reliable.

For a precise check, plug the numbers into any online A/B test calculator (search "A/B test significance calculator"). 95% confidence is the standard bar. Below 90% confidence, don't act on it.

The "All variants" tab — the data that matters

This is where you compare variants side-by-side. For each variant, you see:

Metric	What it means
Delivered	How many emails arrived (same-ish across variants since split was random)
Open rate	% opened (caveat: Apple MPP inflates)
Click rate	% clicked (the reliable metric)
Click-to-open rate (CTOR)	% of openers who clicked — best content-quality signal
Bounces	Should be similar across variants
Unsubscribes	If one variant has 2x the unsubs, that variant alienated subscribers

The 4 patterns you'll see:

Pattern	Reading
Variant A: higher opens AND higher clicks	Clear winner. Subject + content both worked.
Variant A: higher opens, similar clicks	Subject pulled more opens but content didn't capitalise. A is "better" if your KPI is opens; B is "as good" if your KPI is clicks.
Variant A: similar opens, higher clicks	Subject was tie; B's content/CTA underperformed. The subject didn't matter; the content did.
Variant A: lower opens, higher clicks	Counter-intuitive — A had a smaller, more engaged audience. Often happens with personalisation tests.

What to do per verdict

Winner (clear, statistically significant)

✅ Promote the winning pattern. Update your subject-line / CTA / from-name standards to match.

Next test: vary a DIFFERENT element. If you tested subject lines this time, test CTA copy or send time next.

Inconclusive (delta <5%, low confidence)

⚠️ Both variants performed similarly. Don't conclude anything — the test didn't have enough power.

Next test: increase the test audience size (from 20% to 40%), OR test a more dramatically different variation.

Loss (variant B significantly underperformed A)

✅ Also valuable! You learned what NOT to do. Cross-reference variant B's pattern against your "stuff that didn't work" notes for future reference.

Next test: test the next hypothesis — don't keep beating up variant B.

Surprise (a variant you thought would lose actually won)

⚠️ The most valuable result. Audit WHY:

Subject line surprised? Maybe length / formality / emoji preference shifted
From-name surprised? Maybe personal-name-vs-brand norms shifted
CTA surprised? Maybe your offer messaging needs an update

Surprises are where you learn the most about your audience.

Statistical-significance worked example

Setup:

Variant A subject: "5 things every marketer should do this quarter"
Variant B subject: "Q2 newsletter"
Audience: 10,000, 50/50 split
Metric: click rate

Result after 72h:

	Delivered	Opened	Clicked	CTR
Variant A	4,820	1,690 (35.1%)	145 (3.01%)	—
Variant B	4,810	1,500 (31.2%)	115 (2.39%)	—

CTR delta: (3.01% − 2.39%) / 2.39% = +25.9% relative improvement for A.

At ~4,800 per variant with 145 vs 115 clicks, this is well above the 95% confidence threshold (any decent calculator confirms ~99.5% confidence).

Verdict: ship Variant A as the winning subject pattern. Next test: hold the subject style; vary the CTA copy.

Common pitfalls

Pitfall	What's wrong	Fix
Calling a winner at 12 hours	Most opens land in the first hour but late-opens skew CTR. Wait the configured period (24-72h).	Patience. The wait period is calibrated.
Testing 4 things at once	Subject + sender + content + send time — you don't know which moved the needle.	One variable at a time.
Re-testing the same hypothesis 3 times in a row	Diminishing returns; audience may also adapt.	Pick a different lever per quarter.
Ignoring unsubscribe rate	A variant with higher clicks AND higher unsubs is poisoning long-term LTV.	Treat unsubscribe rate as a guard metric — never let the winner have 2x the unsubs.
Manual-selection bias	When picking manual winner, you choose the variant you predicted, regardless of data.	If using manual, document the prediction BEFORE seeing data; compare.
Testing on a small list (<2,000)	No statistical power. Even big deltas are noise.	Pause testing on small lists; rely on industry benchmarks or test pooled across campaigns.

What to log per test (a running playbook)

Maintain a doc tracking every A/B test. Each row:

Field	Example
Date	2026-05-19
Campaign	Spring Sale 2026
Variable tested	Subject line length
Variant A	"5 things every marketer..." (45 chars)
Variant B	"Q2 newsletter" (13 chars)
Winner metric	Click rate
Winner	A (3.01% vs 2.39%, +25.9%)
Confidence	~99.5%
Decision applied	Use long-descriptive subjects going forward
Re-test in	Q3 2026

After 10-20 tests, patterns emerge — "our audience consistently prefers question-format subjects on Tuesdays." That's the real long-term value.

Tagged

Acellemail

登录后点赞 3 9 条评论

3 条评论

加入对话。 评论功能向 AcelleMail 社区成员开放。

注册大约需要 10 秒 — 无需邮箱验证。

创建账户登录

ravi.kumar.del… 3 个月前

we had to add custom UTM parameters to get the cross-campaign attribution we wanted — the defaults weren't quite enough.

0
1. admin 2 个月前 (已编辑)
  
  thanks for sharing. the pattern you describe is exactly the use case we built that feature for — glad it landed for you
  
  0
joel.anders.se 3 个月前

Can the analytics be exported to a data warehouse? We feed everything into BigQuery for cross-channel reporting

0
1. admin 3 个月前
  
  Same answer as above for SaaS-tenant — works the same way per-tenant, with the caveat that the cron must be set per-customer (not just system-wide).
  
  0
2. admin 3 个月前 (已编辑)
  
  Good question. The campaign:rerun audit writes to laravel.log only when the audit decides to force-resume — pure noop runs are silent. We'll add an info-level heartbeat in a future Acelle release to make it easier to monitor.
  
  0
3. admin 1 个月前 (已编辑)
  
  There's no built-in way today. Two workarounds: (1) cron + custom script polling the API every N minutes, (2) webhook-driven if your event source supports it. Most operators go with #2.
  
  0
linhpm.devs 3 个月前

The funnel-attribution model explanation is the clearest I've read. The 'last-touch vs first-touch' framing especially.

0
1. admin 2 个月前 (已编辑)
  
  Thanks. Pass it along if it helps your team.
  
  0
2. admin 1 个月前 (已编辑)
  
  Thanks for the kind words. We try to keep these source-grounded so they age well.
  
  0

Analytics & Reporting

Email Campaign ROI: Math + AcelleMail Data Points

Email ROI = (Revenue − Cost) / Cost. This guide shows the math, which AcelleMail data points feed each variable, the costs people forget to...

6 min read Intermediate

6 13

Analytics & Reporting

Read the Campaign Report: AcelleMail Analytics Tour

AcelleMail's campaign report has 6 tabs (Overview / Insights / Links / Map / Sending logs / Email review) and 5 headline metrics. This walkt...

8 min read Beginner

6 10

Analytics & Reporting

Open Rates + Apple MPP: What AcelleMail's Numbers Actually Mean

Apple Mail Privacy Protection inflates 30-45% of your "opens" by pre-fetching pixels. This guide shows how to spot MPP inflation in AcelleMa...

7 min read Intermediate

9 11

A/B Report Deep Dive: Reading AcelleMail Test Results

What this is for

Where the report lives

Reading the winner banner

Is the delta statistically significant?

The "All variants" tab — the data that matters

What to do per verdict

Winner (clear, statistically significant)

Inconclusive (delta <5%, low confidence)

Loss (variant B significantly underperformed A)

Surprise (a variant you thought would lose actually won)

Statistical-significance worked example

Common pitfalls

What to log per test (a running playbook)

Related articles

3 条评论

Email Campaign ROI: Math + AcelleMail Data Points

Read the Campaign Report: AcelleMail Analytics Tour

Open Rates + Apple MPP: What AcelleMail's Numbers Actually Mean

More in Analytics & Reporting

Read the Campaign Report: AcelleMail Analytics Tour

Click Analysis in AcelleMail: Links Tab + Geographic Click Map

UTM Parameters in AcelleMail — Tracking Clicks to Your Analytics Tool

Email Campaign ROI: Math + AcelleMail Data Points

在您自己的服务器上,按您自己的方式运营邮件营销

What this is for#

Where the report lives#

Reading the winner banner#

Is the delta statistically significant?#

The "All variants" tab — the data that matters#

What to do per verdict#

Winner (clear, statistically significant)#

Inconclusive (delta <5%, low confidence)#

Loss (variant B significantly underperformed A)#

Surprise (a variant you thought would lose actually won)#

Statistical-significance worked example#

Common pitfalls#

What to log per test (a running playbook)#

Related articles#

Get more guides like this

Related reading

Email Campaign ROI: Math + AcelleMail Data Points

Read the Campaign Report: AcelleMail Analytics Tour

Open Rates + Apple MPP: What AcelleMail's Numbers Actually Mean

More in Analytics & Reporting

Read the Campaign Report: AcelleMail Analytics Tour

Click Analysis in AcelleMail: Links Tab + Geographic Click Map

UTM Parameters in AcelleMail — Tracking Clicks to Your Analytics Tool

Email Campaign ROI: Math + AcelleMail Data Points

在您自己的服务器上,按您自己的方式运营邮件营销

Get the AcelleMail newsletter

What this is for

Where the report lives

Reading the winner banner

Is the delta statistically significant?

The "All variants" tab — the data that matters

What to do per verdict

Winner (clear, statistically significant)

Inconclusive (delta <5%, low confidence)

Loss (variant B significantly underperformed A)

Surprise (a variant you thought would lose actually won)

Statistical-significance worked example

Common pitfalls

What to log per test (a running playbook)

Related articles