[[FrontPage]] ALLT2112: David Tugwell **What is Sketch Engine? [#x1b0af01] -David is the original author of word sketch at Univ. of Brighton -42 language corpora so far: --20 billion Russian corpora is the biggest one. -65 preloaded corpora --Balanced corpora: BNC --Specialized corpora: CHILDES, BASE, BAWE --Web corpora: de-duplicated, cleaned ---range of "ten-ten" 10^10=10 billion -Chinese --Chinese GigaWord, Chinese TaiwanWaC -Load your own corpora --automatic lemmatization, tagging, word sketches -WebBootCat --low-density languages & subject areas --seed collection process --results are cleaned and processed in a new corpus **Use of SkE [#da605c08] -Lexicography --Collins, Macmillan, CUP, OUP, Le Robert, Cornelsen Verlag, Shogakukan, Instituut voor Nederlandse -Language Lerning - **WebBootCat [#p9446d22] -To create seedword list --You can try Google translators to insert a list of seed words in English and translate them into the target language