Chapter published in:Computational Phraseology
Edited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 190–206
Statistical significance for measures of collocation strength
Of the commonly-used measures of lexical association or collocation strength, only some directly relate to statistical significance: the t-score, chi-squared, log-likelihood, the z-score and Fisher’s exact test. We describe each of these tests, and also describe a computer simulation by which we can derive confidence limits, and hence the statistical significance, of any measure of lexical association which is derived from the contingency table. We illustrate this approach using pointwise mutual information (PMI). We also describe how the Poisson distribution enables us to find the statistical significance of the raw frequency with which a collocation is found. We compare all these methods using collocates of “take”, namely “take up”, “take place”, “take advantage” and “take stock”.
Keywords: collocation strength, statistical significance, Monte Carlo Methods, Poisson Distribution
Published online: 08 May 2020