Difference between revisions of "Resources for Japanese"

From ACL Wiki
Jump to navigation Jump to search
m (Reverted edits by Creek (talk) to last revision by Bond)
Line 8: Line 8:
 
===Free/Open Licence===
 
===Free/Open Licence===
 
====Multilingual====
 
====Multilingual====
* [http://www.edrdg.org/projects/tanaka/tanakacorpus.html Tanaka Corpus] by Jim Breen, under a CC-BY-SA 3.0 licence
+
* [http://www.edrdg.org/projects/tanaka/tanakacorpus.html Tanaka Corpus] by Tanaka Yasuhito, edited by Jim Breen, under a CC-BY-SA 3.0 licence
 
** [http://tatoeba.org/eng/home Tatoeba] Updated version of the Tanaka Corpus;  ≈150,000 sentence pairs  (CC-BY)
 
** [http://tatoeba.org/eng/home Tatoeba] Updated version of the Tanaka Corpus;  ≈150,000 sentence pairs  (CC-BY)
 
* [http://alaginrc.nict.go.jp/WikiCorpus/index_E.html Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles]  ≈500,000 pairs of manually-translated sentences (CC-BY 3.0)
 
* [http://alaginrc.nict.go.jp/WikiCorpus/index_E.html Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles]  ≈500,000 pairs of manually-translated sentences (CC-BY 3.0)
* [http://id.ndl.go.jp/auth/ndlsh National Diet Library Subject Headers]  Japanese Subject Headers, with paraphrases including English Translations([http://id.ndl.go.jp/auth/docs/about-ndlsh#03 non-commercial attribution])
+
* [http://id.ndl.go.jp/auth/ndlsh National Diet Library Subject Headers]  Japanese Subject Headers, with paraphrases including English Translations ([http://id.ndl.go.jp/auth/docs/about-ndlsh#03 non-commercial attribution])
 
* [http://mastarpj.nict.go.jp/~mutiyama/align/index.html English-Japanese Translation Alignment Data]  aligned by [http://mastarpj.nict.go.jp/~mutiyama/ Masao Utiyama] (GFDL, CC-by-nc 1.0)
 
* [http://mastarpj.nict.go.jp/~mutiyama/align/index.html English-Japanese Translation Alignment Data]  aligned by [http://mastarpj.nict.go.jp/~mutiyama/ Masao Utiyama] (GFDL, CC-by-nc 1.0)
* [http://nlpwww.nict.go.jp/wn-ja/index.en.html WordNet Definitions and Glosses]  ≈180,000 sentence/phrase pairs (WordNet license, similar to BSD)
+
* [http://nlpwww.nict.go.jp/wn-ja/index.en.html WordNet Definitions and Glosses]  ≈180,000 sentence/phrase pairs from the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese Wordnet] (WordNet license, similar to BSD)
* [http://www.phontron.com/kftt/#alignments The Kyoto Free Translation Task (KFTT)] by Graham Neubig, 1235 sentences of Japanese-English manually word-aligned
+
* [http://nlpwww.nict.go.jp/wn-ja/eng/downloads.html#jsemcor Japanese Translation of SemCor] ≈14,000 sentences from the [http://nlpwww.nict.go.jp/wn-ja/index.en.html Japanese Wordnet], easily aligned to the [http://www.cse.unt.edu/~rada/downloads.html#semcor English source]  (WordNet license, similar to BSD)
 +
* [http://www.phontron.com/kftt/#alignments The Kyoto Free Translation Task (KFTT)] by Graham Neubig, 1,235 sentences of Japanese-English manually word-aligned
 
* [http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JEC%20Basic%20Sentence%20Data JEC Basic Sentence Data] by Kyoto University: 5,304 basic Japanese sentences based on Kyoto University Case Frame Data, translated in Chinese and English
 
* [http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JEC%20Basic%20Sentence%20Data JEC Basic Sentence Data] by Kyoto University: 5,304 basic Japanese sentences based on Kyoto University Case Frame Data, translated in Chinese and English
  

Revision as of 00:22, 12 December 2012

There is a very good list at Kyoto University: Catalogue of Language Resources and Tools in Japan

Corpora

Proprietary

Free/Open Licence

Multilingual

Monolingual

Grammars

Free/Open Licence

Unknown licence

Dictionaries

Free/Open Licence

Unknown licence