How is similarity calculated?

As Jaccard similarity over lowercased word tokens: |A ∩ B| / |A ∪ B| × 100. Two texts with identical vocabularies score 100%; texts with no shared words score 0%.

Is the tool case-sensitive?

For the word-level similarity and unique-word lists, no — both sides are lowercased and punctuation is stripped first. The character and word counts, however, reflect your input as-is.

How does this differ from the diff checker?

The diff checker shows a line-by-line edit script (additions / deletions). This tool focuses on aggregate statistics and which words occur on only one side. Use them together for a complete picture.

Does it work for non-English text?

Yes. Tokenization uses the Unicode "letter and number" character classes, so it handles Cyrillic, Greek, Chinese, Arabic, accented Latin, and most other scripts.

Is anything sent to a server?

No. All counting, tokenization, and set arithmetic runs locally in your browser.

Text Comparison Tool

Compare two texts by counts plus a Jaccard similarity score and unique-word lists.

Written by Golam Rabbani, Founder & Lead Engineer

This interactive tool requires JavaScript. Read the formula and worked example below; you can compute the result by hand.

How to use this text comparison tool

Paste your first piece of text into "Text A".
Paste the second piece of text into "Text B".
Press Compare to compute character, word, and line stats plus a Jaccard similarity score.
Review the unique-word lists to see exactly which terms appear on only one side.
Use Copy report to grab a plain-text summary you can paste into an email or ticket.

About this text comparison tool

This text comparison tool gives you a quick, deterministic side-by-side view of two pieces of text. It tallies characters, words, and lines for each side, then computes a Jaccard similarity over the lowercased word tokens — the size of the shared vocabulary divided by the size of the combined vocabulary, expressed as a percentage. It also lists up to 200 words that appear in only one text, so you can spot what was added or removed at the term level.

For example, comparing A = "The quick brown fox" (4 words) and B = "The quick red fox jumps" (5 words) yields delta +1 character net (depending on whitespace), +1 word, similarity ≈ 66.7% (4 shared tokens "the/quick/fox" and unique tokens "brown" in A, "red" and "jumps" in B). The tool is intentionally simple: no live API, no diff colouring (use the diff checker for that), no fuzzy matching — just honest counts and set comparisons computed in your browser so your text never leaves the page.

FAQ

How is similarity calculated?: As Jaccard similarity over lowercased word tokens: |A ∩ B| / |A ∪ B| × 100. Two texts with identical vocabularies score 100%; texts with no shared words score 0%.
Is the tool case-sensitive?: For the word-level similarity and unique-word lists, no — both sides are lowercased and punctuation is stripped first. The character and word counts, however, reflect your input as-is.
How does this differ from the diff checker?: The diff checker shows a line-by-line edit script (additions / deletions). This tool focuses on aggregate statistics and which words occur on only one side. Use them together for a complete picture.
Does it work for non-English text?: Yes. Tokenization uses the Unicode "letter and number" character classes, so it handles Cyrillic, Greek, Chinese, Arabic, accented Latin, and most other scripts.
Is anything sent to a server?: No. All counting, tokenization, and set arithmetic runs locally in your browser.