Difference between revisions of "Resources for Japanese"
Jump to navigation
Jump to search
(4 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
* [http://corpora.informatik.uni-leipzig.de/ Japanese plain text and Co-occurrences at LCC] (downloadable and web-searchable, but only for non-commercial use) | * [http://corpora.informatik.uni-leipzig.de/ Japanese plain text and Co-occurrences at LCC] (downloadable and web-searchable, but only for non-commercial use) | ||
* [http://www.ninjal.ac.jp/english/products/bccwj/ Balanced Corpus of Contemporary Written Japanese (BCCWJ)] (subset is web searchable at Kotonoha) | * [http://www.ninjal.ac.jp/english/products/bccwj/ Balanced Corpus of Contemporary Written Japanese (BCCWJ)] (subset is web searchable at Kotonoha) | ||
+ | * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style. | ||
===Free/Open Licence=== | ===Free/Open Licence=== | ||
Line 20: | Line 21: | ||
====Monolingual==== | ====Monolingual==== | ||
* [http://www-lab25.kuee.kyoto-u.ac.jp/NLP_Portal/lr-cat-e.html#jp:knb_corpus Kyoto University and NTT Blog Corpus] | * [http://www-lab25.kuee.kyoto-u.ac.jp/NLP_Portal/lr-cat-e.html#jp:knb_corpus Kyoto University and NTT Blog Corpus] | ||
+ | * [http://www.edrdg.org/~jwb/compv/ Compilation of potential Japanese compound verbs] by Jim Breen. 64,776 verb collection with n-gram counts and dictionary references (CC-SA licence) | ||
== Grammars == | == Grammars == | ||
Line 29: | Line 31: | ||
==Dictionaries== | ==Dictionaries== | ||
===Free/Open Licence=== | ===Free/Open Licence=== | ||
− | * [http://www. | + | * [http://www.edrdg.org/jmdict/edict_doc.html JMdict/EDICT] Japanese-English and Japanese-Multilanguage dictionary in text and XML formats, by EDRDG (Electronic Dictionary R&D Group) - 170,000 entries, (CC-BY-SA 3.0 licence) |
− | * [http://www. | + | * [http://www.edrdg.org/enamdict/enamdict_doc.html ENAMDICT/JMnedict] proper name dictionary in text and XML formats - 740,000 entries, by EDRDG (Electronic Dictionary R&D Group), (CC-BY-SA 3.0 licence) |
* [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese version of WordNet] by NICT, (WordNet license, like BSD) | * [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese version of WordNet] by NICT, (WordNet license, like BSD) | ||
+ | * [http://www.edrdg.org/kanjidic/kanjidic.html Kanjidic]/[http://www.edrdg.org/kanjidic/kanjd2index.html Kanjidic2] Kanji dictionaries in text and XML formats covering about 13,000 characters, by EDRDG (Electronic Dictionary R&D Group), (CC-BY-SA 3.0 licence) | ||
===Unknown licence=== | ===Unknown licence=== | ||
− | * [http://www. | + | * [http://www.sljfaq.org/afaq/jitadoushi.html List of Japanese transitive/intransitive verb pairs] [http://nihongo.monash.edu/ti_list.html earlier version] |
[[Category:Resources by language|Japanese]] | [[Category:Resources by language|Japanese]] |
Latest revision as of 19:40, 11 October 2017
There is a very good list at JAIST: Catalogue of Language Resources and Tools in Japan
Corpora
Proprietary
- Japanese plain text and Co-occurrences at LCC (downloadable and web-searchable, but only for non-commercial use)
- Balanced Corpus of Contemporary Written Japanese (BCCWJ) (subset is web searchable at Kotonoha)
- HamleDT, harmonized dependency treebanks of many languages, common annotation style.
Free/Open Licence
Multilingual
- Tanaka Corpus by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence
- Tatoeba Updated version of the Tanaka Corpus; ≈150,000 sentence pairs (CC-BY)
- Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles ≈500,000 pairs of manually-translated sentences (CC-BY 3.0)
- National Diet Library Subject Headers Japanese Subject Headers, with paraphrases including English Translations (non-commercial attribution)
- English-Japanese Translation Alignment Data aligned by Masao Utiyama (GFDL, CC-by-nc 1.0)
- WordNet Definitions and Glosses ≈180,000 sentence/phrase pairs from the Japanese Wordnet (WordNet license, similar to BSD)
- Japanese Translation of SemCor ≈14,000 sentences from the Japanese Wordnet, easily aligned to the English source (WordNet license, similar to BSD)
- The Kyoto Free Translation Task (KFTT) by Graham Neubig, 1,235 sentences of Japanese-English manually word-aligned
- JEC Basic Sentence Data by Kyoto University: 5,304 basic Japanese sentences based on Kyoto University Case Frame Data, translated in Chinese and English
Monolingual
- Kyoto University and NTT Blog Corpus
- Compilation of potential Japanese compound verbs by Jim Breen. 64,776 verb collection with n-gram counts and dictionary references (CC-SA licence)
Grammars
Free/Open Licence
- Jacy HPSG grammar MIT Licence
Unknown licence
- KPML generation grammar (downloadable)
Dictionaries
Free/Open Licence
- JMdict/EDICT Japanese-English and Japanese-Multilanguage dictionary in text and XML formats, by EDRDG (Electronic Dictionary R&D Group) - 170,000 entries, (CC-BY-SA 3.0 licence)
- ENAMDICT/JMnedict proper name dictionary in text and XML formats - 740,000 entries, by EDRDG (Electronic Dictionary R&D Group), (CC-BY-SA 3.0 licence)
- Japanese version of WordNet by NICT, (WordNet license, like BSD)
- Kanjidic/Kanjidic2 Kanji dictionaries in text and XML formats covering about 13,000 characters, by EDRDG (Electronic Dictionary R&D Group), (CC-BY-SA 3.0 licence)