Difference between revisions of "Resources for Turkish"

Revision as of 15:35, 10 January 2013

TRMorph "is a relatively complete morphological analyzer for Turkish. It is implemented using SFST, and uses a lexicon based on (but heavily modified) the wordlist of Zemberek spell checker. The morphological analyzer is distributed under the GPL."

Southeast European Times (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish — approximately 4.5 million words per language)
TS Corpus (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Million tokens)

K. Oflazer, "Two-level Description of Turkish Morphology," Literary and Linguistic Computing, vol. 9, pp. 137-148, 1995. Backwards PDF

@@ Line 14: / Line 14: @@
 * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language)
+* [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Million tokens)
 ===Proprietary===
@@ Line 29: / Line 30: @@
 * [http://www.hlst.sabanciuniv.edu Sabancı University Natural Language Processing Tools (Turkish Morphological Analyzer, BalkaNET)]
 * [http://ddi.ce.itu.edu.tr Istanbul Technical University Natural Language Processing Research Group]
-* [http://nooj4nlp.net/turkish Mersin University Turkish National Corpus Project]
+* [http://nooj4nlp.net/pages/turkish.html NooJ_TR by Mersin University Turkish National Corpus Project Team]
 [[Category:Resources by language|Tajik]]