Back to templates
Data AnalysisIntermediateUser Prompt

A/B Test Results Analyzer

March 28, 2026

The A/B Test Results Analyzer takes your experiment data and produces a rigorous statistical analysis with clear business recommendations. It goes beyond "statistically significant: yes/no" to provide confidence intervals, effect sizes, practical significance assessment, and segment-level insights.

Product managers, growth engineers, and data analysts use this template after an A/B test has collected enough data. It is especially valuable when stakeholders pressure you to "just call the test" based on surface-level numbers, because it provides a structured framework for making defensible decisions.

The prompt applies proper hypothesis testing methodology: it checks for adequate sample size, identifies the correct statistical test for your metric type, reports effect sizes alongside p-values, and explicitly addresses the difference between statistical significance and practical significance. This prevents the common mistake of shipping a "significant" change that improves conversion by 0.01%.

This prompt is just the starting point

Score it with AI, optimize it with one click, track versions, and build your prompt library.

AI quality score on 6 criteria
One-click optimization with 3 strategies
Version history to track improvements

The Prompt

Analyze the following A/B test results and provide a recommendation:

**Experiment Name**: [EXPERIMENT NAME]
**Hypothesis**: [WHAT YOU EXPECTED, e.g., "Changing the CTA button from blue to green will increase click-through rate"]
**Test Duration**: [HOW LONG THE TEST RAN, e.g., "14 days, March 1-14, 2026"]
**Traffic Split**: [HOW TRAFFIC WAS SPLIT, e.g., "50/50"]

**Primary Metric**: [YOUR PRIMARY SUCCESS METRIC, e.g., "Click-through rate (CTR)"]
**Results**:
```
Control (A):
- Sample size: [NUMBER]
- Conversions (or metric value): [NUMBER]
- Rate: [PERCENTAGE OR AVERAGE]

Variant (B):
- Sample size: [NUMBER]
- Conversions (or metric value): [NUMBER]
- Rate: [PERCENTAGE OR AVERAGE]
```

**Secondary Metrics** (optional):
```
[LIST ANY SECONDARY METRICS AND THEIR VALUES FOR BOTH GROUPS]
```

**Segment Data** (optional):
```
[IF AVAILABLE: results broken down by device, country, user type, etc.]
```

Perform this analysis:

### 1. Statistical Significance Test
- State the null and alternative hypotheses
- Choose the appropriate test (z-test for proportions, t-test for means, chi-square, etc.) and explain why
- Report: p-value, confidence interval (95%), and whether to reject the null hypothesis
- If the test is underpowered, calculate the minimum sample size needed and how many more days to run

### 2. Effect Size
- Calculate the relative and absolute effect size
- Assess practical significance: is the effect large enough to matter to the business?
- If applicable, estimate the annualized revenue/engagement impact

### 3. Validity Checks
- Sample Ratio Mismatch (SRM): is the actual split close to the intended split?
- Novelty/primacy effect risk: based on the test duration, could the result be temporary?
- Multiple testing concern: if testing multiple metrics, apply Bonferroni correction

### 4. Segment Analysis (if segment data provided)
- Are there segments where the variant performs significantly differently?
- Flag any Simpson's paradox risks

### 5. Recommendation
- Clear verdict: SHIP, DO NOT SHIP, or EXTEND THE TEST
- If SHIP: what to monitor post-launch
- If DO NOT SHIP: what to test next
- If EXTEND: how many more days/users needed

Usage Tips

  • Include exact numbers, not rounded: Rounding "34.7%" to "about 35%" changes the statistical test results. Provide raw counts whenever possible.
  • Report your traffic split: A 50/50 split is ideal, but if you ran 90/10, mention it. This affects the statistical test choice and power calculation.
  • Include secondary metrics: Even if the primary metric improved, a drop in revenue per user or increase in refund rate could reverse the recommendation.
  • Run for at least one full business cycle: If your product has weekly patterns, run for at least 7 days. Mention the duration so the analysis can flag novelty effects.
  • Use segment data to find hidden stories: Upload results by device, country, or user type. A winning variant overall may be losing badly on mobile.

analystanalysisresearchquality-improvement

Get more from this prompt

Save it, score it with AI, optimize it, and track every version. Free to start.

AI quality score on 6 criteria
One-click optimization with 3 strategies
Version history to track improvements