[[FrontPage]] **Understanding Statistics in Corpus Linguistics [#cf852098] -Variables --categorical ---binary (2 categories) ---multiple (n > 2): (a) nominal vs. (b) ordinal --quantitative ---interval ---discrete **Another aspect of variables: [#x6bfd7ee] -explanatory/predictor/independent variables -response/outcome/dependent variables **Univariate vs. Bivariate analyses [#c85829ac] -univariate --an examination of a single variable **Concept of "significance" [#u10790cf] -Difference in proportions -significance = a difference that is sufficiently large enough to trust it ***Null Hypothesis [#lcc1139e] -There is no particular difference ***Expected frequencies [#a23f67b4] -the frequencies we WOULD get if the two proportions are identical. Both probabilities equal 0.5 **Chi-square [#t18709d3] -sum of the squared differences between obser[ved and expected frequencies, divided by the expected frequency, across all cells -The probability of chi-square statistic is known for each number of degrees of freedom [number of groups -1] -Advantages --easy to understand --used widely -Disadvantages --For small O in 2x2, apply Yate's correction or user Fisher exact test --Dunning shows chi-square is not a good test when O are small and N is large. --Log-likelihood test does basically the same job without these limitations **Multivariate Analysis [#t7720dba] -Control of variables -Interaction of multiple predictor variables over response variables --Log-linear analysis --Generalized linear model -