Frequency lists are used in linguistic analysis to study language structure and usage. They can be word or letter lists, with letter frequencies used in cryptography. Frequency analysis was used by Sherlock Holmes to decipher a code in The Adventure of the Dancing Men. Word frequency lists are also used in ancient language studies. Zipf’s law is an observation on frequency classifications. Frequency lists are important in machine translation and natural language processing.
A frequency list is a tool for quantitative linguistic analysis, a list of everything that appears in a chosen block of text and how often it occurs. Linguistic analysis is an interdisciplinary field that studies the structure of language and how it is used. Combining elements of anthropology, mathematics, computer science and logic, linguistic analysis is used for projects such as machine translation, encryption and deciphering ancient writings.
Frequency lists can be word or letter lists. Letter frequencies are typically used in cryptography. One of the simplest ciphers is a substitution cipher, where each letter is replaced with another letter or symbol. For example, the “dawn attack” message could be encoded as “zoozhl zo azqp”. The advantage of substitution ciphers is that they don’t require a codebook, but the downside is that they can be cracked by comparing the frequency of letters and letter combinations within the message against a list of commonly used frequencies.
In Arthur Conan Doyle’s The Adventure of the Dancing Men, the fictional detective Sherlock Holmes uses frequency analysis to decipher a surrogate code. Historically, codemakers have tried various tricks to make their ciphers harder to crack with a frequency list: rolling ciphers where the substitution used depended on the position of a letter within the message, stripping or encoding spaces that word frequencies could not be used, keeping messages short and avoiding expected words so that decoders would not have enough samples to use for frequency analysis. Ultimately, any cipher can be broken with a large enough sample, which is why more sophisticated encryption protocols have become standard.
Lists of word frequency and word types are also used in ancient language studies. When Jean-Francois Champollion translated the Rosetta Stone in 1820, his process used a blend of comparison frequencies and transliterations to piece together the hieroglyphic language. Studies have shown that for ancient languages, as with modern English, a basic vocabulary of 1,500 to 2,000 words covers 85 to 90 percent of common texts, a level that allows the reader to expand their vocabulary from context .
Zipf’s law, named after Harvard linguistics professor George Kingsley Zipf, is an empirical observation on the behavior of frequency classifications. It states that the frequency of an event is inversely proportional to the ranking of the event. The event is usually a word or letter in a linguistic frequency list, but Zipf’s law has been generalized to cover other phenomena such as city populations and corporate profits.
A frequency list is an important tool in projects to help computers make sense of spoken and written language. Machine translation, the use of computers to translate documents from one language to another, is one example. Another example is Watson, the natural language supercomputer that was featured as a contestant on the television show Jeopardy! in February 2011. Frequencies of both words and usage patterns are incorporated into their programming as a tool for finding meaning.
Protect your devices with Threat Protection by NordVPN