What’s the distributive hypothesis?

Print anything with Printful



The distributive hypothesis suggests that words with similar meanings are grouped together in texts. Linguists like Firth and Harris used mathematical methods to study this, leading to the development of statistical semantics and HAL. This has implications for AI and language acquisition.

The distributive hypothesis proposes the idea that words with the same meaning are grouped within texts. The idea examines words for their meaning and their distribution in a text. This is then compared to distributions of words with similar or related meanings. Such examinations determine that words occur together in their context due to their similar or related meanings.

The distributive hypothesis was first suggested by the British linguist JR Firth. He is known for the most famous quote concerning the idea “You will know a word by the company it keeps”. Firth, also known for his prosody studies, believed that no system would ever explain how a language works. Instead, he believed several overlapping systems would be needed.

American linguist Zellig Harris built on Firth’s work. He wanted to use mathematics to study and analyze linguistic data. His ideas about the contribution of mathematics to such studies are important, but he is also known to have covered a wide range of linguistic ideas during his lifetime.

Studies on the distributive hypothesis are part of the examination of linguistics. Mathematical and statistical methods are used to sift through large amounts of linguistic data, not linguistic ones. This means, therefore, that the distributive hypothesis is part of computational linguistics and statistical semantics. It is also related to the ideas of linguists and linguistic philosophers about the development of native languages ​​in children, a process known as language acquisition.

Statistical semantics uses mathematical algorithms to study the distribution of words. These results are then filtered by meaning and further studied to discover the distribution of words related to meaning. There are two main methods of statistical semantics: distribution by word clusters and by text region.

The study of the distribution of words by groups of related meanings is called Hyperspace Analog to Language (HAL). HAL examines the relationships of grouped words in a text. This can be within a sentence or within a paragraph, but rarely further than that. The semantic distribution of words is determined by how often words are found next to each other.
Full-text studies use latent semantic analysis (LSA). This is a natural language processing method. Words with a narrow meaning will occur next to each other throughout the text. Such texts are examined for clusters using a mathematical method called Singular Value Decompression (SVD).

Data gathered from distributive hypothesis studies is used to study the building blocks of semantics and word relationships. Going beyond a structuralist approach, the hypothesis can be applied to Artificial Intelligence (AI). This would help computer programs better understand the relationship and distribution of words. It also has implications for how children process words and make word and phrase associations.




Protect your devices with Threat Protection by NordVPN


Skip to content