Difference between revisions of "Resources for Turkish"

Latest revision as of 07:40, 17 June 2015

TRMorph "is a relatively complete morphological analyzer for Turkish. It is implemented using SFST, and uses a lexicon based on (but heavily modified) the wordlist of Zemberek spell checker. The morphological analyzer is distributed under the GPL."

Southeast European Times (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish — approximately 4.5 million words per language)

HamleDT, harmonized dependency treebanks of many languages, common annotation style.
TS Corpus (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Million tokens)
METU-Sabanci Turkish treebank
Turkish plain text and Co-occurrences at LCC

K. Oflazer, "Two-level Description of Turkish Morphology," Literary and Linguistic Computing, vol. 9, pp. 137-148, 1995. Backwards PDF

@@ Line 14: / Line 14: @@
 * [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language)
-* [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Million tokens)
 ===Proprietary===
 * [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
-*[http://www.ii.metu.edu.tr/~corpus/treebank.html METU-Sabanci Turkish treebank]
+* [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Million tokens)
+* [http://www.ii.metu.edu.tr/~corpus/treebank.html METU-Sabanci Turkish treebank]
 * [http://corpora.informatik.uni-leipzig.de/ Turkish plain text and Co-occurrences at LCC]