Chapter published in:
Computational PhraseologyEdited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 190–206
Statistical significance for measures of collocation strength
Michael P. Oakes | University of Wolverhampton
Of the commonly-used measures of lexical association or
collocation strength, only some directly relate to statistical significance:
the t-score, chi-squared, log-likelihood, the z-score and Fisher’s exact
test. We describe each of these tests, and also describe a computer
simulation by which we can derive confidence limits, and hence the
statistical significance, of any measure of lexical association which is
derived from the contingency table. We illustrate this approach using
pointwise mutual information (PMI). We also describe how the Poisson
distribution enables us to find the statistical significance of the raw
frequency with which a collocation is found. We compare all these methods
using collocates of “take”, namely “take up”, “take place”, “take advantage”
and “take stock”.
Keywords: collocation strength, statistical significance, Monte Carlo Methods, Poisson Distribution
Published online: 08 May 2020
https://doi.org/10.1075/ivitra.24.10oak
https://doi.org/10.1075/ivitra.24.10oak