Resources for Japanese
Revision as of 07:58, 23 July 2013 by Bond (talk | contribs) (→Multilingual: update link to Utiyama's data)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
There is a very good list at Kyoto University: Catalogue of Language Resources and Tools in Japan
Corpora
Proprietary
- Japanese plain text and Co-occurrences at LCC (downloadable and web-searchable, but only for non-commercial use)
- Balanced Corpus of Contemporary Written Japanese (BCCWJ) (subset is web searchable at Kotonoha)
Free/Open Licence
Multilingual
- Tanaka Corpus by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence
- Tatoeba Updated version of the Tanaka Corpus; ≈150,000 sentence pairs (CC-BY)
- Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles ≈500,000 pairs of manually-translated sentences (CC-BY 3.0)
- National Diet Library Subject Headers Japanese Subject Headers, with paraphrases including English Translations (non-commercial attribution)
- English-Japanese Translation Alignment Data aligned by Masao Utiyama (GFDL, CC-by-nc 1.0)
- WordNet Definitions and Glosses ≈180,000 sentence/phrase pairs from the Japanese Wordnet (WordNet license, similar to BSD)
- Japanese Translation of SemCor ≈14,000 sentences from the Japanese Wordnet, easily aligned to the English source (WordNet license, similar to BSD)
- The Kyoto Free Translation Task (KFTT) by Graham Neubig, 1,235 sentences of Japanese-English manually word-aligned
- JEC Basic Sentence Data by Kyoto University: 5,304 basic Japanese sentences based on Kyoto University Case Frame Data, translated in Chinese and English
Monolingual
Grammars
Free/Open Licence
- Jacy HPSG grammar MIT Licence
Unknown licence
- KPML generation grammar (downloadable)
Dictionaries
Free/Open Licence
- EDICT Japanese-English dictionary, by Jim Breen, (CC-BY-SA 3.0 licence)
- ENAMDICT/JMnedict proper name dictionary, by Jim Breen, (CC-BY-SA 3.0 licence)
- Japanese version of WordNet by NICT, (WordNet license, like BSD)
Unknown licence
- List of Japanese transitive/intransitive verb pairs (dead link?)