[[FrontPage]] * UDPipe [#s406e91d] ** UDPipe Universe: R との連携 [#sb46b284] -[[Introduction>https://bnosac.github.io/udpipe/docs/doc1.html]] -多言語データをこれで解析して lemma テキストに変換すればカバー率を算出できる。 - UDPipe provides language-agnostic tokenization, tagging, lemmatization and dependency parsing of raw text, which is an essential part in natural language processing. **Pre-trained model [#h1463ab2] - トレーニング済みデータは65言語、CEFR-J x 28 プロジェクトの対象の言語は - トレーニング済みデータは65言語、CEFR-J x 28 プロジェクトの対象の言語は 18言語ある。 - Pre-trained models build on Universal Dependencies treebanks are made available for more than 65 languages based on 101 treebanks, namely: -- afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, ''arabic-padt,'' armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, ''chinese-gsd'', chinese-gsdsimp, classical_chinese-kyoto, coptic-scriptorium, croatian-set, ''czech-cac, czech-cltt, czech-fictree, czech-pdt'', danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, ''french-gsd, french-partut, french-sequoia, french-spoken,'' galician-ctg, galician-treegal, ''german-gsd, german-hdt,'' gothic-proiel, greek-gdt, hebrew-htb, ''hindi-hdtb,'' hungarian-szeged, ''indonesian-gsd,'' irish-idt, ''italian-isdt, italian-partut, italian-postwita, italian-twittiro, italian-vit,'' ''japanese-gsd,'' kazakh-ktb, ''korean-gsd, korean-kaist,'' kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, ''persian-seraji, polish-lfg, polish-pdb, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd,'' romanian-nonstandard, romanian-rrt, ''russian-gsd, russian-syntagrus, russian-taiga,'' sanskrit-ufal, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, ''spanish-ancora, spanish-gsd,'' swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, ''turkish-imst,'' ukrainian-iu, upper_sorbian-ufal, ''urdu-udtb,'' uyghur-udt, ''vietnamese-vtb,'' wolof-wtb. - These have been made available easily to users of the package by using udpipe_download_model - These have been made available easily to users of the package by using "udpipe_download_model" 例) udmodel <- udpipe_download_model(language = "chinese") ** 自分でトレーニング [#xac701f4] - 新しい言語のトレーニングもできる。 --CONLL-U format の言語データが必要 ---http://universaldependencies.org/#ud-treebanks --How this is done is detailed in the package '''vignette'''. 例) vignette("udpipe-train", package = "udpipe")