What’s character encoding?

Print anything with Printful



Character encoding is the method used to represent characters, glyphs, or symbols as numeric values in computer memory. It is necessary to translate non-numeric characters into a form that a computer can manipulate. HTML documents define the type of character encoding used, and Unicode is the standard encoding. Early computers stored basic English characters in 7-bit sequences, which eventually evolved into the Unicode standard. Different encoding schemes have been developed for different purposes, and character encoding is important for displaying foreign languages, science or math symbols, and punctuation. Incorrectly defined character encoding can result in incorrect or unreadable display.

Character encoding, in computer programming, is a method or algorithm used to find a usually numeric representation of a character, glyph, or symbol. The use of character encoding in computers is necessary because information in computer memory and on computer-readable media is stored as sequences of bits or numbers. This requires the use of encoding to translate non-numeric characters used for human-readable display or output into a form that a computer can manipulate. In a more specific application, HyperText Markup Language (HTML) documents that are read by web browsers can define the type of character encoding they are using to let the browser know which specific character set to use when displaying information in the document. There are several encoding schemes in use, although many of these proprietary and legacy sets are slowly being replaced by the Unicode® encoding standard.

In the early days of computers, when memory space was limited, the basic characters of the English alphabet, including punctuation and numbers, were stored in 7-bit sequences that allowed for 128 different characters. In this original scheme, each 7-bit byte represented one character of the English alphabet, numbered sequentially. This character encoding was efficient and was eventually standardized and used in most computers manufactured. Although the encoding system has evolved into the Unicode® encoding standard, the concept has remained the same. That is to say, every single character in a language is directly related to a single number within a large standard character set, and that number is what a computer uses to store, process, and index the character.

Other types of character encodings have been developed for different reasons. Some that were specifically geared towards the English alphabet and intended to be used for text mapped their characters only to 7-bit sequences and then spread them out to 8-bit bytes or octets. This had the effect of saving 1 bit per octet, effectively using character encoding as a type of compression. Other coding schemes have attempted to provide basic information about a character and then additional characters to represent special accents that might be used when writing in a different language, although these have largely been abandoned for one-to-one coding methods easier.

In HTML documents, character encoding is much the same as the broader concept, except that the encoding that is defined encompasses an entire set of characters. This can be important not only for foreign languages, but for documents that use symbols specific to science or math that are not present in all character sets. It can also be useful for using punctuation and other glyphs that may not be present or map differently between encoding schemes. Documents that do not correctly define a non-standard character encoding may display incorrectly or be filled with nonsense characters and placeholders instead of human-readable information.




Protect your devices with Threat Protection by NordVPN


Skip to content