Text Cleaner
Remove hidden characters, AI formatting, extra whitespace, and HTML tags.
Cleaning operations
Input
Output
What does each operation do?
Hidden & zero-width characters
Invisible Unicode characters that are commonly injected by word processors, browsers, and AI tools. They cause unexpected spacing, break copy-paste, and corrupt downstream text processing.
Zero-width space (U+200B)
Zero-width non-joiner (U+200C)
Zero-width joiner (U+200D)
Soft hyphen (U+00AD)
Non-breaking space (U+00A0)
Byte order mark (U+FEFF)
Smart quotes → straight quotes
Typographic "curly" quotes look beautiful in print but break code, JSON, CSV, and most plain-text formats that expect standard ASCII apostrophes and double-quote marks.
‘ ’ → ' (single quotes)
“ ” → " (double quotes)
AI-generated formatting
Language models consistently favour certain Unicode punctuation characters that look polished but are incompatible with many editors, terminals, and data pipelines.
— em dash → -- (double hyphen)
– en dash → - (hyphen)
… ellipsis → ... (three dots)
• bullet → - (hyphen)
Extra whitespace & blank lines
Multiple consecutive spaces, tabs, and more than two consecutive blank lines are collapsed. Trailing whitespace on each line is also trimmed. The result is clean, consistently spaced text.
Multiple consecutive spaces → single space
3+ blank lines → max 2 blank lines
Trailing whitespace per line
Leading / trailing whitespace (overall)
HTML & markdown tags
Strips markup syntax so you're left with plain readable text. Useful when copying from web pages, CMS editors, or markdown files where the raw syntax bleeds through.
<tags> and </tags> → removed
**bold** and *italic* → plain text
# Headings → plain text
[link text](url) → link text only
`code` and ```blocks``` → removed
- and 1. list markers → removed