Text Encoder Explorer — Unicode, UTF-8, UTF-16 Byte Inspector

Explore how text is encoded in UTF-8, UTF-16, and Unicode code points. View per-character byte breakdowns, hex dumps, and encoding statistics. Free online tool, runs entirely in your browser.

Text to explore

Characters

UTF-8 Bytes

UTF-16 Bytes

Char	Code Point	UTF-8	UTF-16	Decimal	Name
H	U+0048	48	00 48	72	ASCII printable
e	U+0065	65	00 65	101	ASCII printable
l	U+006C	6C	00 6C	108	ASCII printable
l	U+006C	6C	00 6C	108	ASCII printable
o	U+006F	6F	00 6F	111	ASCII printable
,	U+002C	2C	00 2C	44	ASCII printable
	U+0020	20	00 20	32	Space
世	U+4E16	E4 B8 96	4E 16	19990	U+4E16
界	U+754C	E7 95 8C	75 4C	30028	U+754C
!	U+0021	21	00 21	33	ASCII printable
	U+0020	20	00 20	32	Space
🌍	U+1F30D	F0 9F 8C 8D	D8 3C DF 0D	127757	U+1F30D

What is Text Encoding?

Text encoding is the process of converting characters into sequences of bytes that computers can store and transmit. Unicode assigns a unique code point to every character in every script. UTF-8 and UTF-16 are the two most widely used encoding formats: UTF-8 uses 1–4 bytes per character and is the dominant encoding on the web, while UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows. This tool lets you inspect exactly how each character in your text is represented in these encodings, showing code points, byte sequences, and a hex dump view.

How to Use the Text Encoder Explorer

Enter or paste text into the input field. A default sample with ASCII, CJK, and emoji characters is provided.
View the character table to see each character's Unicode code point, UTF-8 bytes, UTF-16 bytes, and decimal value.
Check the summary stats for total character count, UTF-8 byte count, and UTF-16 byte count.
Switch to the Hex Dump tab to see the raw UTF-8 bytes in a traditional hexdump format similar to xxd.

Common Use Cases

Debugging encoding issues — Inspect byte sequences to diagnose mojibake, incorrect character display, or encoding mismatches between systems.
Understanding multi-byte characters — See exactly how CJK characters, emoji, and accented letters are encoded in UTF-8 vs UTF-16.
Estimating storage and bandwidth — Compare UTF-8 and UTF-16 byte sizes to choose the most efficient encoding for your data.
Learning Unicode and encoding fundamentals — A hands-on reference for students and developers learning how text encoding works at the byte level.

FAQ

What is the difference between UTF-8 and UTF-16?

UTF-8 encodes characters using 1 to 4 bytes, with ASCII characters taking just 1 byte. UTF-16 uses 2 bytes for most common characters and 4 bytes for supplementary characters (like some emoji). UTF-8 is more efficient for Latin-heavy text, while UTF-16 can be more compact for CJK-heavy text.

Why do some characters show multiple bytes?

Characters outside the ASCII range (code points above U+007F) require multiple bytes in UTF-8. For example, Chinese characters typically use 3 bytes in UTF-8, and emoji with code points above U+FFFF use 4 bytes in both UTF-8 and UTF-16 (as surrogate pairs).

Is my text sent to a server?

No. All encoding analysis happens entirely in your browser using the JavaScript TextEncoder API. No data is transmitted to any server.

Ferramentas Relacionadas

Base64 String Encoder & Decoder HTML Entity Encoder & Decoder — Free Online URL Encoder & Decoder — Free Online Number Base Converter