Difference between revisions of "Resources for Macedonian"

Revision as of 12:10, 25 March 2010

Southeast European Times (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish — approximately 4.5 million words per language)

Vojnovski, V., S. Džeroski, and Erjavec, T. (2005) "Learning PoS tagging from a tagged Macedonian text corpus". Proceedings of SiKDD 2005 (Conference on Data Mining and Data Warehouses), Ljubljana, Slovenia, pp. 199-202.

A POS tagger for Macedonian is trained on the Macedonian of George Orwells Nineteen Eighty-Four

Ivanovska, A., Zdravkova, K., Džeroski, S., Erjavec, T. (2005) "Learning Rules for Morphological Analysis and Synthesis of Macedonian Nouns". Proceedings of IS 2005, the 8th International Multiconference on the Information Society, 11-17 October 2005, Ljubljana. pp. 195-198

Gives a machine learning approach to learning Macedonian nouns.

@@ Line 9: / Line 9: @@
 ===Free===
+* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language)
-* [http://xixona.dlsi.ua.es/~fran/setimes/ Southeast European Times] (paragraph aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; 9,678 paragraphs, 92,450&mdash; 122,912 words per language)
 ==Bibliography==