When should I use Welch vs the pooled t-test?

Use Welch by default; it works whether or not the two groups have equal variances. Use the pooled test only if you have a strong reason to assume equal variances, since the pooled test is biased when that assumption fails.

When is a paired test appropriate?

When each row of A is naturally paired with the same row of B — for example, before/after measurements on the same subject. Pairing removes between-subject variability.

Degrees of freedom. It controls the t-distribution's shape: higher df makes t look more like a normal. The bundled table covers df 1–120; above 120 the calculator falls back to the z-row.

What if my sample sizes differ?

Welch handles unequal sizes naturally; the pooled test also accepts unequal n but assumes equal variances.

Is the verdict the same as a p-value?

It is the equivalent of comparing the p-value to α: if |t| ≥ t-critical at your α, you would reject H₀.

T-Test Calculator

Run one-sample, paired, or two-sample (Welch/pooled) t-tests with bundled t-table critical values.

Written by Golam Rabbani, Founder & Lead Engineer

This interactive tool requires JavaScript. Read the formula and worked example below; you can compute the result by hand.

How to use this t-test calculator

Pick the test: one-sample, two-sample (independent), or paired.
Paste sample A. For two-sample and paired, also paste sample B (same length for paired).
For one-sample, set the hypothesised mean μ₀. For two-sample, leave Welch on or tick "equal variances" for the pooled test.
Choose α and press Calculate to read t, df, the bundled t-critical, and the verdict.

About this t-test calculator

Student's t-test asks whether a difference in means could plausibly arise by chance. The calculator covers three flavours. One-sample: t = (x̄ − μ₀) / (s / √n), df = n − 1. Paired: build the per-row differences di = ai − bi, then test their mean against zero. Two-sample (Welch by default): t = (x̄A − x̄B) / √(s²A / nA + s²B / nB), with Welch–Satterthwaite df. Toggle "equal variances" for the pooled test, where the pooled SD uses (nA − 1) and (nB − 1) weights and df = nA + nB − 2. Two-tail critical values come from the bundled t-table.

Worked example (two-sample, Welch). Sample A = 14, 16, 15, 18, 17 (nA = 5, x̄A = 16, sA = 1.5811). Sample B = 12, 13, 14, 11, 15 (nB = 5, x̄B = 13, sB = 1.5811). SE = √(1.5811² / 5 + 1.5811² / 5) = √(0.5 + 0.5) = 1.0. t = (16 − 13) / 1 = 3.000. Welch df = 8 (variances equal here, so Satterthwaite reduces to nA + nB − 2). At α = 0.05 the two-tail critical from the bundled t-table at df = 8 is 2.306, so |t| = 3.000 > 2.306 — reject H₀. The means differ significantly at the 5% level.

FAQ

When should I use Welch vs the pooled t-test?: Use Welch by default; it works whether or not the two groups have equal variances. Use the pooled test only if you have a strong reason to assume equal variances, since the pooled test is biased when that assumption fails.
When is a paired test appropriate?: When each row of A is naturally paired with the same row of B — for example, before/after measurements on the same subject. Pairing removes between-subject variability.
What does df mean?: Degrees of freedom. It controls the t-distribution's shape: higher df makes t look more like a normal. The bundled table covers df 1–120; above 120 the calculator falls back to the z-row.
What if my sample sizes differ?: Welch handles unequal sizes naturally; the pooled test also accepts unequal n but assumes equal variances.
Is the verdict the same as a p-value?: It is the equivalent of comparing the p-value to α: if |t| ≥ t-critical at your α, you would reject H₀.