Resources for Japanese
Revision as of 03:04, 2 February 2011 by Bond (talk | contribs) (→Free/Open Licence: Links to some sources of aligned data)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Corpora
Proprietary
- Japanese plain text and Co-occurrences at LCC (downloadable and web-searchable, but only for non-commercial use)
Free/Open Licence
- Tanaka Corpus by Jim Breen, under a CC-BY-SA 3.0 licence
- Tatoeba Updated version of the Tanaka Corpus; ≈150,000 sentence pairs (CC-BY)
- Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles ≈500,000 pairs of manually-translated sentences (CC-BY 3.0)
- National Diet Library Subject Headers Japanese Subject Headers, with paraphrases including English Translations(non-commercial attribution)
- English-Japanese Translation Alignment Data aligned by Masao Utiyama (GFDL, CC-by-nc 1.0)
- WordNet Definitions and Glosses ≈180,000 sentence/phrase pairs (WordNet license, similar to BSD)
Grammars
Free/Open Licence
- Jacy HPSG grammar MIT Licence
Unknown licence
- KPML generation grammar (downloadable)
Dictionaries
Free/Open Licence
- EDICT Japanese-English dictionary, by Jim Breen, under a CC-BY-SA 3.0 licence
- ENAMDICT/JMnedict proper name dictionary, by Jim Breen, under a CC-BY-SA 3.0 licence
Unknown licence
- List of Japanese transitive/intransitive verb pairs (dead link?)