FrontPage
ALLT2112: David Tugwell
What is Sketch Engine? †
- David is the original author of word sketch at Univ. of Brighton
- 42 language corpora so far:
- 20 billion Russian corpora is the biggest one.
- 65 preloaded corpora
- Balanced corpora: BNC
- Specialized corpora: CHILDES, BASE, BAWE
- Web corpora: de-duplicated, cleaned
- range of "ten-ten" 10^10=10 billion
- Chinese
- Chinese GigaWord?, Chinese TaiwanWaC
- Load your own corpora
- automatic lemmatization, tagging, word sketches
- WebBootCat?
- low-density languages & subject areas
- seed collection process
- results are cleaned and processed in a new corpus
Use of SkE †
- Lexicography
- Collins, Macmillan, CUP, OUP, Le Robert, Cornelsen Verlag, Shogakukan, Instituut voor Nederlandse
- Language Lerning
WebBootCat? †
- To create seedword list
- You can try Google translators to insert a list of seed words in English and translate them into the target language
|