A/B Test Results Analyzer
The A/B Test Results Analyzer takes your experiment data and produces a rigorous statistical analysis with clear business recommendations. It goes beyond "statistically significant: yes/no" to provide confidence intervals, effect sizes, practical significance assessment, and segment-level insights.
Product managers, growth engineers, and data analysts use this template after an A/B test has collected enough data. It is especially valuable when stakeholders pressure you to "just call the test" based on surface-level numbers, because it provides a structured framework for making defensible decisions.
The prompt applies proper hypothesis testing methodology: it checks for adequate sample size, identifies the correct statistical test for your metric type, reports effect sizes alongside p-values, and explicitly addresses the difference between statistical significance and practical significance. This prevents the common mistake of shipping a "significant" change that improves conversion by 0.01%.
This prompt is just the starting point
Score it with AI, optimize it with one click, track versions, and build your prompt library.
The Prompt
Analyze the following A/B test results and provide a recommendation: **Experiment Name**: [EXPERIMENT NAME] **Hypothesis**: [WHAT YOU EXPECTED, e.g., "Changing the CTA button from blue to green will increase click-through rate"] **Test Duration**: [HOW LONG THE TEST RAN, e.g., "14 days, March 1-14, 2026"] **Traffic Split**: [HOW TRAFFIC WAS SPLIT, e.g., "50/50"] **Primary Metric**: [YOUR PRIMARY SUCCESS METRIC, e.g., "Click-through rate (CTR)"] **Results**: ``` Control (A): - Sample size: [NUMBER] - Conversions (or metric value): [NUMBER] - Rate: [PERCENTAGE OR AVERAGE] Variant (B): - Sample size: [NUMBER] - Conversions (or metric value): [NUMBER] - Rate: [PERCENTAGE OR AVERAGE] ``` **Secondary Metrics** (optional): ``` [LIST ANY SECONDARY METRICS AND THEIR VALUES FOR BOTH GROUPS] ``` **Segment Data** (optional): ``` [IF AVAILABLE: results broken down by device, country, user type, etc.] ``` Perform this analysis: ### 1. Statistical Significance Test - State the null and alternative hypotheses - Choose the appropriate test (z-test for proportions, t-test for means, chi-square, etc.) and explain why - Report: p-value, confidence interval (95%), and whether to reject the null hypothesis - If the test is underpowered, calculate the minimum sample size needed and how many more days to run ### 2. Effect Size - Calculate the relative and absolute effect size - Assess practical significance: is the effect large enough to matter to the business? - If applicable, estimate the annualized revenue/engagement impact ### 3. Validity Checks - Sample Ratio Mismatch (SRM): is the actual split close to the intended split? - Novelty/primacy effect risk: based on the test duration, could the result be temporary? - Multiple testing concern: if testing multiple metrics, apply Bonferroni correction ### 4. Segment Analysis (if segment data provided) - Are there segments where the variant performs significantly differently? - Flag any Simpson's paradox risks ### 5. Recommendation - Clear verdict: SHIP, DO NOT SHIP, or EXTEND THE TEST - If SHIP: what to monitor post-launch - If DO NOT SHIP: what to test next - If EXTEND: how many more days/users needed
Usage Tips
- Include exact numbers, not rounded: Rounding "34.7%" to "about 35%" changes the statistical test results. Provide raw counts whenever possible.
- Report your traffic split: A 50/50 split is ideal, but if you ran 90/10, mention it. This affects the statistical test choice and power calculation.
- Include secondary metrics: Even if the primary metric improved, a drop in revenue per user or increase in refund rate could reverse the recommendation.
- Run for at least one full business cycle: If your product has weekly patterns, run for at least 7 days. Mention the duration so the analysis can flag novelty effects.
- Use segment data to find hidden stories: Upload results by device, country, or user type. A winning variant overall may be losing badly on mobile.
Get more from this prompt
Save it, score it with AI, optimize it, and track every version. Free to start.