Text Encoder Explorer — Unicode, UTF-8, UTF-16 Byte Inspector
Explore how text is encoded in UTF-8, UTF-16, and Unicode code points. View per-character byte breakdowns, hex dumps, and encoding statistics. Free online tool, runs entirely in your browser.
| Char | Code Point | UTF-8 | UTF-16 | Decimal | Name |
|---|---|---|---|---|---|
| H | U+0048 | 48 | 00 48 | 72 | ASCII printable |
| e | U+0065 | 65 | 00 65 | 101 | ASCII printable |
| l | U+006C | 6C | 00 6C | 108 | ASCII printable |
| l | U+006C | 6C | 00 6C | 108 | ASCII printable |
| o | U+006F | 6F | 00 6F | 111 | ASCII printable |
| , | U+002C | 2C | 00 2C | 44 | ASCII printable |
| U+0020 | 20 | 00 20 | 32 | Space | |
| 世 | U+4E16 | E4 B8 96 | 4E 16 | 19990 | U+4E16 |
| 界 | U+754C | E7 95 8C | 75 4C | 30028 | U+754C |
| ! | U+0021 | 21 | 00 21 | 33 | ASCII printable |
| U+0020 | 20 | 00 20 | 32 | Space | |
| 🌍 | U+1F30D | F0 9F 8C 8D | D8 3C DF 0D | 127757 | U+1F30D |
What is Text Encoding?
Text encoding is the process of converting characters into sequences of bytes that computers can store and transmit. Unicode assigns a unique code point to every character in every script. UTF-8 and UTF-16 are the two most widely used encoding formats: UTF-8 uses 1–4 bytes per character and is the dominant encoding on the web, while UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows. This tool lets you inspect exactly how each character in your text is represented in these encodings, showing code points, byte sequences, and a hex dump view.
How to Use the Text Encoder Explorer
- Enter or paste text into the input field. A default sample with ASCII, CJK, and emoji characters is provided.
- View the character table to see each character's Unicode code point, UTF-8 bytes, UTF-16 bytes, and decimal value.
- Check the summary stats for total character count, UTF-8 byte count, and UTF-16 byte count.
- Switch to the Hex Dump tab to see the raw UTF-8 bytes in a traditional hexdump format similar to xxd.
Common Use Cases
- Debugging encoding issues — Inspect byte sequences to diagnose mojibake, incorrect character display, or encoding mismatches between systems.
- Understanding multi-byte characters — See exactly how CJK characters, emoji, and accented letters are encoded in UTF-8 vs UTF-16.
- Estimating storage and bandwidth — Compare UTF-8 and UTF-16 byte sizes to choose the most efficient encoding for your data.
- Learning Unicode and encoding fundamentals — A hands-on reference for students and developers learning how text encoding works at the byte level.