[[FrontPage]]

**Understanding Statistics in Corpus Linguistics [#cf852098]

-Variables
--categorical
---binary (2 categories)
---multiple (n > 2): (a) nominal vs. (b) ordinal

--quantitative
---interval
---discrete

**Another aspect of variables: [#x6bfd7ee]

-explanatory/predictor/independent variables
-response/outcome/dependent variables

**Univariate vs. Bivariate analyses [#c85829ac]

-univariate
--an examination of a single variable

**Concept of "significance" [#u10790cf]
-Difference in proportions
-significance = a difference that is sufficiently large enough to trust it

***Null Hypothesis [#lcc1139e]
-There is no particular difference

***Expected frequencies [#a23f67b4]
-the frequencies we WOULD get if the two proportions are identical. Both probabilities equal 0.5

**Chi-square [#t18709d3]
-sum of the squared differences between obser[ved and expected frequencies, divided by the expected frequency, across all cells
-The probability of chi-square statistic is known for each number of degrees of freedom [number of groups -1]

-Advantages
--easy to understand
--used widely

-Disadvantages
--For small O in 2x2, apply Yate's correction or user Fisher exact test
--Dunning shows chi-square is not a good test when O are small and N is large.
--Log-likelihood test does basically the same job without these limitations




**Multivariate Analysis [#t7720dba]

-Control of variables
-Interaction of multiple predictor variables over response variables
--Log-linear analysis
--Generalized linear model


-

トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS