Text Comparison Tool
Compare two texts by counts plus a Jaccard similarity score and unique-word lists.
Written by Golam Rabbani, Founder & Lead Engineer
How to use this text comparison tool
- Paste your first piece of text into "Text A".
- Paste the second piece of text into "Text B".
- Press Compare to compute character, word, and line stats plus a Jaccard similarity score.
- Review the unique-word lists to see exactly which terms appear on only one side.
- Use Copy report to grab a plain-text summary you can paste into an email or ticket.
About this text comparison tool
This text comparison tool gives you a quick, deterministic side-by-side view of two pieces of text. It tallies characters, words, and lines for each side, then computes a Jaccard similarity over the lowercased word tokens — the size of the shared vocabulary divided by the size of the combined vocabulary, expressed as a percentage. It also lists up to 200 words that appear in only one text, so you can spot what was added or removed at the term level.
For example, comparing A = "The quick brown fox" (4 words) and B = "The quick red fox jumps" (5 words) yields delta +1 character net (depending on whitespace), +1 word, similarity ≈ 66.7% (4 shared tokens "the/quick/fox" and unique tokens "brown" in A, "red" and "jumps" in B). The tool is intentionally simple: no live API, no diff colouring (use the diff checker for that), no fuzzy matching — just honest counts and set comparisons computed in your browser so your text never leaves the page.
FAQ
- How is similarity calculated?
- As Jaccard similarity over lowercased word tokens: |A ∩ B| / |A ∪ B| × 100. Two texts with identical vocabularies score 100%; texts with no shared words score 0%.
- Is the tool case-sensitive?
- For the word-level similarity and unique-word lists, no — both sides are lowercased and punctuation is stripped first. The character and word counts, however, reflect your input as-is.
- How does this differ from the diff checker?
- The diff checker shows a line-by-line edit script (additions / deletions). This tool focuses on aggregate statistics and which words occur on only one side. Use them together for a complete picture.
- Does it work for non-English text?
- Yes. Tokenization uses the Unicode "letter and number" character classes, so it handles Cyrillic, Greek, Chinese, Arabic, accented Latin, and most other scripts.
- Is anything sent to a server?
- No. All counting, tokenization, and set arithmetic runs locally in your browser.