Difference between revisions of "Resources for Japanese"
Jump to navigation
Jump to search
(→Multilingual: update link to Utiyama's data) |
|||
Line 12: | Line 12: | ||
* [http://alaginrc.nict.go.jp/WikiCorpus/index_E.html Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles] ≈500,000 pairs of manually-translated sentences (CC-BY 3.0) | * [http://alaginrc.nict.go.jp/WikiCorpus/index_E.html Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles] ≈500,000 pairs of manually-translated sentences (CC-BY 3.0) | ||
* [http://id.ndl.go.jp/auth/ndlsh National Diet Library Subject Headers] Japanese Subject Headers, with paraphrases including English Translations ([http://id.ndl.go.jp/auth/docs/about-ndlsh#03 non-commercial attribution]) | * [http://id.ndl.go.jp/auth/ndlsh National Diet Library Subject Headers] Japanese Subject Headers, with paraphrases including English Translations ([http://id.ndl.go.jp/auth/docs/about-ndlsh#03 non-commercial attribution]) | ||
− | * [http:// | + | * [http://www2.nict.go.jp/univ-com/multi_trans/member/mutiyama/ English-Japanese Translation Alignment Data] aligned by [http://mastarpj.nict.go.jp/~mutiyama/ Masao Utiyama] (GFDL, CC-by-nc 1.0) |
* [http://nlpwww.nict.go.jp/wn-ja/index.en.html WordNet Definitions and Glosses] ≈180,000 sentence/phrase pairs from the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese Wordnet] (WordNet license, similar to BSD) | * [http://nlpwww.nict.go.jp/wn-ja/index.en.html WordNet Definitions and Glosses] ≈180,000 sentence/phrase pairs from the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese Wordnet] (WordNet license, similar to BSD) | ||
* [http://nlpwww.nict.go.jp/wn-ja/eng/downloads.html#jsemcor Japanese Translation of SemCor] ≈14,000 sentences from the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese Wordnet], easily aligned to the [http://www.cse.unt.edu/~rada/downloads.html#semcor English source] (WordNet license, similar to BSD) | * [http://nlpwww.nict.go.jp/wn-ja/eng/downloads.html#jsemcor Japanese Translation of SemCor] ≈14,000 sentences from the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese Wordnet], easily aligned to the [http://www.cse.unt.edu/~rada/downloads.html#semcor English source] (WordNet license, similar to BSD) |
Revision as of 06:58, 23 July 2013
There is a very good list at Kyoto University: Catalogue of Language Resources and Tools in Japan
Corpora
Proprietary
- Japanese plain text and Co-occurrences at LCC (downloadable and web-searchable, but only for non-commercial use)
- Balanced Corpus of Contemporary Written Japanese (BCCWJ) (subset is web searchable at Kotonoha)
Free/Open Licence
Multilingual
- Tanaka Corpus by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence
- Tatoeba Updated version of the Tanaka Corpus; ≈150,000 sentence pairs (CC-BY)
- Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles ≈500,000 pairs of manually-translated sentences (CC-BY 3.0)
- National Diet Library Subject Headers Japanese Subject Headers, with paraphrases including English Translations (non-commercial attribution)
- English-Japanese Translation Alignment Data aligned by Masao Utiyama (GFDL, CC-by-nc 1.0)
- WordNet Definitions and Glosses ≈180,000 sentence/phrase pairs from the Japanese Wordnet (WordNet license, similar to BSD)
- Japanese Translation of SemCor ≈14,000 sentences from the Japanese Wordnet, easily aligned to the English source (WordNet license, similar to BSD)
- The Kyoto Free Translation Task (KFTT) by Graham Neubig, 1,235 sentences of Japanese-English manually word-aligned
- JEC Basic Sentence Data by Kyoto University: 5,304 basic Japanese sentences based on Kyoto University Case Frame Data, translated in Chinese and English
Monolingual
Grammars
Free/Open Licence
- Jacy HPSG grammar MIT Licence
Unknown licence
- KPML generation grammar (downloadable)
Dictionaries
Free/Open Licence
- EDICT Japanese-English dictionary, by Jim Breen, (CC-BY-SA 3.0 licence)
- ENAMDICT/JMnedict proper name dictionary, by Jim Breen, (CC-BY-SA 3.0 licence)
- Japanese version of WordNet by NICT, (WordNet license, like BSD)
Unknown licence
- List of Japanese transitive/intransitive verb pairs (dead link?)