FrontPage
Understanding Statistics in Corpus Linguistics †
- Variables
- categorical
- binary (2 categories)
- multiple (n > 2): (a) nominal vs. (b) ordinal
Another aspect of variables: †
- explanatory/predictor/independent variables
- response/outcome/dependent variables
Univariate vs. Bivariate analyses †
- univariate
- an examination of a single variable
Concept of "significance" †
- Difference in proportions
- significance = a difference that is sufficiently large enough to trust it
Null Hypothesis †
- There is no particular difference
Expected frequencies †
- the frequencies we WOULD get if the two proportions are identical. Both probabilities equal 0.5
Chi-square †
- sum of the squared differences between obser[ved and expected frequencies, divided by the expected frequency, across all cells
- The probability of chi-square statistic is known for each number of degrees of freedom [number of groups -1]
- Advantages
- easy to understand
- used widely
- Disadvantages
- For small O in 2x2, apply Yate's correction or user Fisher exact test
- Dunning shows chi-square is not a good test when O are small and N is large.
- Log-likelihood test does basically the same job without these limitations
Multivariate Analysis †
- Control of variables
- Interaction of multiple predictor variables over response variables
- Log-linear analysis
- Generalized linear model