Text to Binary and Hexadecimal

This is an example of text converted to Binary code and hexadecimal. Each letter even space ( binary representation 00100000) between words is represented with 8-bit binary code(ASCII code).

There are 151 characters in the following Sample Text.

151 characters = 151 bytes

so 1 character(letters, numbers, spaces etc) is equal to 1 byte.

More details about ASCII table and binary representation, read the following post.

Sample Text

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut eu felis augue. Sed non ante arcu. Sed pulvinar erat augue, ut volutpat est congue lacinia.

Sample Text to Binary

01001100 01101111 01110010 01100101 01101101 00100000 01101001 01110000 01110011 01110101 01101101 00100000 01100100 01101111 01101100 01101111 01110010 00100000 01110011 01101001 01110100 00100000 01100001 01101101 01100101 01110100 00101100 00100000 01100011 01101111 01101110 01110011 01100101 01100011 01110100 01100101 01110100 01110101 01110010 00100000 01100001 01100100 01101001 01110000 01101001 01110011 01100011 01101001 01101110 01100111 00100000 01100101 01101100 01101001 01110100 00101110 00100000 01010101 01110100 00100000 01100101 01110101 00100000 01100110 01100101 01101100 01101001 01110011 00100000 01100001 01110101 01100111 01110101 01100101 00101110 00100000 01010011 01100101 01100100 00100000 01101110 01101111 01101110 00100000 01100001 01101110 01110100 01100101 00100000 01100001 01110010 01100011 01110101 00101110 00100000 01010011 01100101 01100100 00100000 01110000 01110101 01101100 01110110 01101001 01101110 01100001 01110010 00100000 01100101 01110010 01100001 01110100 00100000 01100001 01110101 01100111 01110101 01100101 00101100 00100000 01110101 01110100 00100000 01110110 01101111 01101100 01110101 01110100 01110000 01100001 01110100 00100000 01100101 01110011 01110100 00100000 01100011 01101111 01101110 01100111 01110101 01100101 00100000 01101100 01100001 01100011 01101001 01101110 01101001 01100001 00101110

Binary to Hexadecimal

4C 6F 72 65 6D 20 69 70 73 75 6D 20 64 6F 6C 6F 72 20 73 69 74 20 61 6D 65 74 2C 20 63 6F 6E 73 65 63 74 65 74 75 72 20 61 64 69 70 69 73 63 69 6E 67 20 65 6C 69 74 2E 20 55 74 20 65 75 20 66 65 6C 69 73 20 61 75 67 75 65 2E 20 53 65 64 20 6E 6F 6E 20 61 6E 74 65 20 61 72 63 75 2E 20 53 65 64 20 70 75 6C 76 69 6E 61 72 20 65 72 61 74 20 61 75 67 75 65 2C 20 75 74 20 76 6F 6C 75 74 70 61 74 20 65 73 74 20 63 6F 6E 67 75 65 20 6C 61 63 69 6E 69 61 2E

ASCII and Unicode

Unicode is a global standard for character encoding and is the most commonly used character set today.

Basic ASCII Character Set

The basic ASCII character set uses 7-bits for each character

27 = 128

Extended ASCII Character Set

The Extended ASCII character set uses 8-bits for an additional 128 characters

28 = 256

Unicode

Unicode is the new standard for representing characters of all the languages of the World.

ASCII character encoding is a subset of Unicode.

The Unicode standard defines UTF-8, UTF-16 and UTF-32

UTF-8 represents 256 distinct characters (popular encoding used on the web).

UTF-16 represents 65,536 distinct characters (used by Java and Windows).

UTF-32 represents 4,294,967,296 possible characters, enough for all known languages (UTF-8 and UTF-32 are used by Linux and various Unix systems).

Unicode advantages over ASCII

More languages or all(modern) languages can be represented in one character set.

Improved portability of documents in Unicode as each character has an unique representation in Unicode.

ASCII Table

Computers work with Binary code and also store information in Binary format.

A computer system normally stores characters (letters, number, symbols, spaces, etc) using the ASCII code.

Each character (letters, number, symbols, spaces, etc) is represented and stored using 8-bits (1 byte) of information.

Check ASCII Character Set in your machine.

$ man ascii

Read the following post for more details about Binary Number System.

ASCII is abbreviated from American Standard Code for Information Interchange.

ASCII was introduced in 1963 by ANSI abbreviated from American National Standards Institute.

Basic ASCII Character Set (Total 128 characters)

The basic ASCII codes use 7-bits for each character, so it can only represent 128 distinct characters.

7 bits = 27 = 128 distinct values

Extended ASCII Character Set (Total 128 characters)

The Extended ASCII character set uses 8-bits (1 byte) binary code, which gives an additional 128 characters. The extra characters represent characters from foreign languages and special symbols such as Ö € or →

There are 256 characters in total.

8 bits = 1 byte = 1 Character

28 = 256 distinct values

8 bits (1 byte) can represent only one decimal value. That is, it represent only one character in ASCII table.

For example:

01100001 = a

01100010 = b

If it looks very long to you, check decimal values from the following ASCII table. We could write "a" and "b" characters like below as well.

97 = a

98 = b

But even we represent characters in different number systems like decimal, hexadecimal etc, eventually it should be converted to binary number. Because this is what computers understand 🙂

As you can see in the following table, characters are listed in Char column. There are 128 characters in ASCII table which are represented in different number systems like "Decimal", "Hexadecimal", "Binary" or "Octal". But, computers just understand Binary codes. Rest of the numbers is for human. Because, it is easy for us.

Let say we want to write "Hello" word in file or memory, we need to map each character to binary number. As we wrote above 1 Character = 8 bits (1 byte)

  • H = 01001000
  • e = 01100101
  • l = 01101100
  • l = 01101100
  • o = 01101111

Hello = 01001000 01100101 01101100 01101100 01101111

There are 5 characters in "Hello" word, so it is 5 bytes (or 40 bits) in computers.

You can find a table of characters in the following image.

In the ASCII character set, each binary value between 0 and 127 is given a specific character.

Continue reading