ASCII and Unicode

Unicode is a global standard for character encoding and is the most commonly used character set today.

Basic ASCII Character Set

The basic ASCII character set uses 7-bits for each character

27 = 128

Extended ASCII Character Set

The Extended ASCII character set uses 8-bits for an additional 128 characters

28 = 256

Unicode

Unicode is the new standard for representing characters of all the languages of the World.

ASCII character encoding is a subset of Unicode.

The Unicode standard defines UTF-8, UTF-16 and UTF-32

UTF-8 represents 256 distinct characters (popular encoding used on the web).

UTF-16 represents 65,536 distinct characters (used by Java and Windows).

UTF-32 represents 4,294,967,296 possible characters, enough for all known languages (UTF-8 and UTF-32 are used by Linux and various Unix systems).

Unicode advantages over ASCII

More languages or all(modern) languages can be represented in one character set.

Improved portability of documents in Unicode as each character has an unique representation in Unicode.

Leave a Reply