Is "significant at 95%" the same as a 95% chance B beats A?

No. It means: if A and B had identical true conversion rates, you would only see a difference this extreme about 5% of the time. It is a statement about the data, not a posterior probability on B winning.

How is lift calculated?

Lift = (pB − pA) / pA, expressed as a percentage. A lift of +20% means B converts 20% better in relative terms; absolute uplift is just pB − pA.

When is the small-cell warning shown?

When any of {xA, nA − xA, xB, nB − xB} drops below 5. In those cases the normal approximation can mislead — prefer an exact (Fisher) or Bayesian test.

Does this account for peeking or multiple variants?

No. Running the test repeatedly during a test (peeking) inflates the false-positive rate. For multiple variants, apply a Bonferroni correction or use a sequential testing framework.

What confidence level should I pick?

95% is the conventional default. 99% is stricter (fewer false positives, more data needed). 80–90% may be acceptable for low-stakes optimisations.

Statistical Significance Checker

Check whether the lift in an A/B test is significant using a two-proportion z-test and a real p-value.

Written by Golam Rabbani, Founder & Lead Engineer

This interactive tool requires JavaScript. Read the formula and worked example below; you can compute the result by hand.

How to use this statistical significance checker

Enter visitors and conversions for Variant A (control) and Variant B (treatment).
Choose the confidence level (default 95%).
Press "Check significance" to read the verdict, lift, z, and the two-tail p-value.
Copy the result or Reset to clear inputs.

About this statistical significance checker

For a binary A/B test (each visitor either converted or did not), the right test is a two-proportion z-test on the pooled standard error. Let pA = xA / nA and pB = xB / nB. The pooled proportion is p̂ = (xA + xB) / (nA + nB). The standard error is SE = √(p̂(1 − p̂) · (1/nA + 1/nB)) and the test statistic z = (pB − pA) / SE. The two-tail p-value comes from the standard normal CDF (no live table — bundled approximation accurate to ~7 decimals). The calculator also reports lift = (pB − pA) / pA and warns when any cell count is below 5, where the normal approximation breaks down.

Worked example. Variant A: 5000 visitors, 250 conversions (pA = 5.00%). Variant B: 5000 visitors, 300 conversions (pB = 6.00%). Pooled p̂ = 550 / 10,000 = 0.055. SE = √(0.055 · 0.945 · (1/5000 + 1/5000)) = √(0.0519750 · 0.0004) ≈ 0.004562. z = (0.06 − 0.05) / 0.004562 ≈ 2.192. The 95% z-critical is 1.9600 and the two-tail p-value ≈ 0.0284. Since |z| > 1.96 (and p < 0.05), the lift of +20% is statistically significant at the 95% confidence level.

FAQ

Is "significant at 95%" the same as a 95% chance B beats A?: No. It means: if A and B had identical true conversion rates, you would only see a difference this extreme about 5% of the time. It is a statement about the data, not a posterior probability on B winning.
How is lift calculated?: Lift = (pB − pA) / pA, expressed as a percentage. A lift of +20% means B converts 20% better in relative terms; absolute uplift is just pB − pA.
When is the small-cell warning shown?: When any of {xA, nA − xA, xB, nB − xB} drops below 5. In those cases the normal approximation can mislead — prefer an exact (Fisher) or Bayesian test.
Does this account for peeking or multiple variants?: No. Running the test repeatedly during a test (peeking) inflates the false-positive rate. For multiple variants, apply a Bonferroni correction or use a sequential testing framework.
What confidence level should I pick?: 95% is the conventional default. 99% is stricter (fewer false positives, more data needed). 80–90% may be acceptable for low-stakes optimisations.