ALLT2112: David Tugwell

What is Sketch Engine?

  • David is the original author of word sketch at Univ. of Brighton
  • 42 language corpora so far:
    • 20 billion Russian corpora is the biggest one.
  • 65 preloaded corpora
    • Balanced corpora: BNC
    • Specialized corpora: CHILDES, BASE, BAWE
    • Web corpora: de-duplicated, cleaned
      • range of "ten-ten" 10^10=10 billion
  • Chinese
    • Chinese GigaWord?, Chinese TaiwanWaC
  • Load your own corpora
    • automatic lemmatization, tagging, word sketches
  • WebBootCat?
    • low-density languages & subject areas
    • seed collection process
    • results are cleaned and processed in a new corpus

Use of SkE

  • Lexicography
    • Collins, Macmillan, CUP, OUP, Le Robert, Cornelsen Verlag, Shogakukan, Instituut voor Nederlandse
  • Language Lerning


  • To create seedword list
    • You can try Google translators to insert a list of seed words in English and translate them into the target language

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2012-04-21 (土) 12:57:27 (2860d)