Statistical Significance Checker
Check whether the lift in an A/B test is significant using a two-proportion z-test and a real p-value.
Written by Golam Rabbani, Founder & Lead Engineer
How to use this statistical significance checker
- Enter visitors and conversions for Variant A (control) and Variant B (treatment).
- Choose the confidence level (default 95%).
- Press "Check significance" to read the verdict, lift, z, and the two-tail p-value.
- Copy the result or Reset to clear inputs.
About this statistical significance checker
For a binary A/B test (each visitor either converted or did not), the right test is a two-proportion z-test on the pooled standard error. Let pA = xA / nA and pB = xB / nB. The pooled proportion is p̂ = (xA + xB) / (nA + nB). The standard error is SE = √(p̂(1 − p̂) · (1/nA + 1/nB)) and the test statistic z = (pB − pA) / SE. The two-tail p-value comes from the standard normal CDF (no live table — bundled approximation accurate to ~7 decimals). The calculator also reports lift = (pB − pA) / pA and warns when any cell count is below 5, where the normal approximation breaks down.
Worked example. Variant A: 5000 visitors, 250 conversions (pA = 5.00%). Variant B: 5000 visitors, 300 conversions (pB = 6.00%). Pooled p̂ = 550 / 10,000 = 0.055. SE = √(0.055 · 0.945 · (1/5000 + 1/5000)) = √(0.0519750 · 0.0004) ≈ 0.004562. z = (0.06 − 0.05) / 0.004562 ≈ 2.192. The 95% z-critical is 1.9600 and the two-tail p-value ≈ 0.0284. Since |z| > 1.96 (and p < 0.05), the lift of +20% is statistically significant at the 95% confidence level.
FAQ
- Is "significant at 95%" the same as a 95% chance B beats A?
- No. It means: if A and B had identical true conversion rates, you would only see a difference this extreme about 5% of the time. It is a statement about the data, not a posterior probability on B winning.
- How is lift calculated?
- Lift = (pB − pA) / pA, expressed as a percentage. A lift of +20% means B converts 20% better in relative terms; absolute uplift is just pB − pA.
- When is the small-cell warning shown?
- When any of {xA, nA − xA, xB, nB − xB} drops below 5. In those cases the normal approximation can mislead — prefer an exact (Fisher) or Bayesian test.
- Does this account for peeking or multiple variants?
- No. Running the test repeatedly during a test (peeking) inflates the false-positive rate. For multiple variants, apply a Bonferroni correction or use a sequential testing framework.
- What confidence level should I pick?
- 95% is the conventional default. 99% is stricter (fewer false positives, more data needed). 80–90% may be acceptable for low-stakes optimisations.