[[FrontPage]]

* UDPipe [#s406e91d]

** UDPipe Universe: R との連携 [#sb46b284]

-[[Introduction>https://bnosac.github.io/udpipe/docs/doc1.html]]
-多言語データをこれで解析して lemma テキストに変換すればカバー率を算出できる。


- UDPipe provides language-agnostic tokenization, tagging, lemmatization and dependency parsing of raw text, which is an essential part in natural language processing.

**Pre-trained model [#h1463ab2]

- トレーニング済みデータは65言語、CEFR-J x 28 プロジェクトの対象の言語は 18言語ある。

- Pre-trained models build on Universal Dependencies treebanks are made available for more than 65 languages based on 101 treebanks, namely:

-- afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, chinese-gsdsimp, classical_chinese-kyoto, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, italian-vit, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, persian-seraji, polish-lfg, polish-pdb, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, sanskrit-ufal, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb, wolof-wtb.
-- afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, ''arabic-padt,'' armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, ''chinese-gsd'', chinese-gsdsimp, classical_chinese-kyoto, coptic-scriptorium, croatian-set, ''czech-cac, czech-cltt, czech-fictree, czech-pdt'', danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, ''french-gsd, french-partut, french-sequoia, french-spoken,'' galician-ctg, galician-treegal, ''german-gsd, german-hdt,'' gothic-proiel, greek-gdt, hebrew-htb, ''hindi-hdtb,'' hungarian-szeged, ''indonesian-gsd,'' irish-idt, ''italian-isdt, italian-partut, italian-postwita, italian-twittiro, italian-vit,'' ''japanese-gsd,'' kazakh-ktb, ''korean-gsd, korean-kaist,'' kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, ''persian-seraji, polish-lfg, polish-pdb, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd,'' romanian-nonstandard, romanian-rrt, ''russian-gsd, russian-syntagrus, russian-taiga,'' sanskrit-ufal, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, ''spanish-ancora, spanish-gsd,'' swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, ''turkish-imst,'' ukrainian-iu, upper_sorbian-ufal, ''urdu-udtb,'' uyghur-udt, ''vietnamese-vtb,'' wolof-wtb.

- These have been made available easily to users of the package by using udpipe_download_model
- These have been made available easily to users of the package by using "udpipe_download_model"

 例) udmodel <- udpipe_download_model(language = "chinese")

** 自分でトレーニング [#xac701f4]

- 新しい言語のトレーニングもできる。
--CONLL-U format の言語データが必要
---http://universaldependencies.org/#ud-treebanks
--How this is done is detailed in the package '''vignette'''.

 例)   vignette("udpipe-train", package = "udpipe")


トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS