UnderstandingStatistics
をテンプレートにして作成
[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
]
開始行:
[[FrontPage]]
**Understanding Statistics in Corpus Linguistics [#cf8520...
-Variables
--categorical
---binary (2 categories)
---multiple (n > 2): (a) nominal vs. (b) ordinal
--quantitative
---interval
---discrete
**Another aspect of variables: [#x6bfd7ee]
-explanatory/predictor/independent variables
-response/outcome/dependent variables
**Univariate vs. Bivariate analyses [#c85829ac]
-univariate
--an examination of a single variable
**Concept of "significance" [#u10790cf]
-Difference in proportions
-significance = a difference that is sufficiently large e...
***Null Hypothesis [#lcc1139e]
-There is no particular difference
***Expected frequencies [#a23f67b4]
-the frequencies we WOULD get if the two proportions are ...
**Chi-square [#t18709d3]
-sum of the squared differences between obser[ved and exp...
-The probability of chi-square statistic is known for eac...
-Advantages
--easy to understand
--used widely
-Disadvantages
--For small O in 2x2, apply Yate's correction or user Fis...
--Dunning shows chi-square is not a good test when O are ...
--Log-likelihood test does basically the same job without...
*Collocation statistics [#n7a538d1]
-Notation borrowed from Stefan Evert
--[[association measures>http://www.collocations.de/]]
-Effect size
--Observed/Expected
---Could be very high effect size with only a single inst...
-Evidence floor
--f(node, collocate)
--
-Mutual Information
--Formal definition: log(p(n,c)/(p(n)p(c)))
--MI= log (Observed/Expected)
--MI of 3 = oft-recommended cut-off = observed 8 times gr...
--frequency floors = recommended value is 10 (Andrew Hard...
-Significance testing
--problems
---random samples from two populations
-Chi-square test
--sum of (Obs-Exp)^2 / Exp
-Log-likelihood test
--2 x (sum of (Obs * log(Obs/Exp))
-MI and LL
-grammatical patterns & function words --> LL
-lexical words and semantics --> MI
-LL
--biased towards words where there is lots of evidence du...
-MI
--biased towards words where effect size is huge due to l...
-MI3
--log(Obs^3/Exp)
--it over-corrects MI: its high-frequency focus is too gr...
**Problems [#ua00c807]
-Windows and sentence boundaries
-No hope of seeing how they are related to each other
--The formulae often come in multiple versions...
-Overlapping windows
--Martin Amis problem (Hardie)
-Possibilities (speculative proposal by Hardie)
--sig test like LL
--rank the list by MI or effect size
**Multivariate Analysis [#t7720dba]
-Control of variables
-Interaction of multiple predictor variables over respons...
--Log-linear analysis
--Generalized linear model
-
終了行:
[[FrontPage]]
**Understanding Statistics in Corpus Linguistics [#cf8520...
-Variables
--categorical
---binary (2 categories)
---multiple (n > 2): (a) nominal vs. (b) ordinal
--quantitative
---interval
---discrete
**Another aspect of variables: [#x6bfd7ee]
-explanatory/predictor/independent variables
-response/outcome/dependent variables
**Univariate vs. Bivariate analyses [#c85829ac]
-univariate
--an examination of a single variable
**Concept of "significance" [#u10790cf]
-Difference in proportions
-significance = a difference that is sufficiently large e...
***Null Hypothesis [#lcc1139e]
-There is no particular difference
***Expected frequencies [#a23f67b4]
-the frequencies we WOULD get if the two proportions are ...
**Chi-square [#t18709d3]
-sum of the squared differences between obser[ved and exp...
-The probability of chi-square statistic is known for eac...
-Advantages
--easy to understand
--used widely
-Disadvantages
--For small O in 2x2, apply Yate's correction or user Fis...
--Dunning shows chi-square is not a good test when O are ...
--Log-likelihood test does basically the same job without...
*Collocation statistics [#n7a538d1]
-Notation borrowed from Stefan Evert
--[[association measures>http://www.collocations.de/]]
-Effect size
--Observed/Expected
---Could be very high effect size with only a single inst...
-Evidence floor
--f(node, collocate)
--
-Mutual Information
--Formal definition: log(p(n,c)/(p(n)p(c)))
--MI= log (Observed/Expected)
--MI of 3 = oft-recommended cut-off = observed 8 times gr...
--frequency floors = recommended value is 10 (Andrew Hard...
-Significance testing
--problems
---random samples from two populations
-Chi-square test
--sum of (Obs-Exp)^2 / Exp
-Log-likelihood test
--2 x (sum of (Obs * log(Obs/Exp))
-MI and LL
-grammatical patterns & function words --> LL
-lexical words and semantics --> MI
-LL
--biased towards words where there is lots of evidence du...
-MI
--biased towards words where effect size is huge due to l...
-MI3
--log(Obs^3/Exp)
--it over-corrects MI: its high-frequency focus is too gr...
**Problems [#ua00c807]
-Windows and sentence boundaries
-No hope of seeing how they are related to each other
--The formulae often come in multiple versions...
-Overlapping windows
--Martin Amis problem (Hardie)
-Possibilities (speculative proposal by Hardie)
--sig test like LL
--rank the list by MI or effect size
**Multivariate Analysis [#t7720dba]
-Control of variables
-Interaction of multiple predictor variables over respons...
--Log-linear analysis
--Generalized linear model
-
ページ名: