ALLT2012Memo3
をテンプレートにして作成
[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
]
開始行:
[[FrontPage]]
ALLT2112: David Tugwell
**What is Sketch Engine? [#x1b0af01]
-David is the original author of word sketch at Univ. of ...
-42 language corpora so far:
--20 billion Russian corpora is the biggest one.
-65 preloaded corpora
--Balanced corpora: BNC
--Specialized corpora: CHILDES, BASE, BAWE
--Web corpora: de-duplicated, cleaned
---range of "ten-ten" 10^10=10 billion
-Chinese
--Chinese GigaWord, Chinese TaiwanWaC
-Load your own corpora
--automatic lemmatization, tagging, word sketches
-WebBootCat
--low-density languages & subject areas
--seed collection process
--results are cleaned and processed in a new corpus
**Use of SkE [#da605c08]
-Lexicography
--Collins, Macmillan, CUP, OUP, Le Robert, Cornelsen Verl...
-Language Lerning
-
**WebBootCat [#p9446d22]
-To create seedword list
--You can try Google translators to insert a list of seed...
終了行:
[[FrontPage]]
ALLT2112: David Tugwell
**What is Sketch Engine? [#x1b0af01]
-David is the original author of word sketch at Univ. of ...
-42 language corpora so far:
--20 billion Russian corpora is the biggest one.
-65 preloaded corpora
--Balanced corpora: BNC
--Specialized corpora: CHILDES, BASE, BAWE
--Web corpora: de-duplicated, cleaned
---range of "ten-ten" 10^10=10 billion
-Chinese
--Chinese GigaWord, Chinese TaiwanWaC
-Load your own corpora
--automatic lemmatization, tagging, word sketches
-WebBootCat
--low-density languages & subject areas
--seed collection process
--results are cleaned and processed in a new corpus
**Use of SkE [#da605c08]
-Lexicography
--Collins, Macmillan, CUP, OUP, Le Robert, Cornelsen Verl...
-Language Lerning
-
**WebBootCat [#p9446d22]
-To create seedword list
--You can try Google translators to insert a list of seed...
ページ名: