Email is the number one activity that people do on the internet: 74% of internet users check their email on an average day. Email use in offices has more than doubled since 2000, and is now over 8 hours a week. There are many great NLP problems for email, like automatic clustering and foldering, search, prioritization, automatically finding keywords within messages, finding addresses, and summarization. Spam is the number one problem for email. I'll talk about how spam filters work, and the current open problems, as well as other kinds of abuse like chat spam (Spat), IM spam (Spim), blog comment spam (Blat), etc. all of which make great NLP problems.
Email and abuse problems like spam can be some of the most exciting for research: they inspire us to work on new problems we would otherwise not have found. We are exploring areas like adversarial learning, learning with unbalanced costs, and learning with partial user feedback. Shipping solutions to these problems is both surprisingly hard and surprisingly fun. For NLP Researchers, the hardest constraint is that products ship in about 20 languages. By carefully choosing tools like word clustering that are easy to build in many languages, instead of similar tools like taggers that may not exist everywhere, we increase the chance of shipping. When we have actually built complete systems and given them to users, we have found several new and interesting problems in the most exciting way, by shipping solutions that don't work the first time around.
Joshua Goodman is a Principal Researcher in the Machine Learning and Applied Statistic group at Microsoft Research, where he runs a team focused on Learning for Messaging and Adversarial Problems. Spam filters he helped develop stop over a billion spam messages per day. He has also worked on language modeling and machine learning, and has a Ph.D. in Computer Science from Harvard University for his work on Statistical Parsing. He helped start and is now President of the Conference on Email and Anti-Spam.
Keynote Speech II
In recent years, the development of intelligent tutoring dialogue systems has become more prevalent, in an attempt to close the performance gap between human and computer tutors. With advancesin speech technology, several systems have begun to incorporate spoken language capabilities, hypothesizing that adding speech technology will promote student learning by enhancing communication richness. Tutoring applications differ in many ways, however, from the types of applications for which spoken dialogue systems are typically developed. This talk will illustrate some of the opportunities and challenges in this area, focusing on issues such as affective reasoning, discourse analysis, error handling, and performance evaluation.
Diane Litman is Professor of Computer Science, as well as Research Scientist with the Learning Research and Development Center, at the University of Pittsburgh. Previously, Dr. Litman was a member ofthe Artificial Intelligence Principles Research Department, AT&T Labs - Research (formerly Bell Laboratories); she was also an Assistant Professor of Computer Science at Columbia University. Dr. Litman received her Ph.D. degree in Computer Science from the University of Rochester. Her current research focuses on enhancing the effectiveness of tutorial dialogue systems through the use of spoken language processing, affective computing, and machine learning. She has collaborated on the development of spoken dialogue systems in multiple application areas, including intelligent tutoring (ITSPOKE), chat (CobotDS) and database/web access (NJFun and TOOT). Dr. Litman has been Chair of the North American Chapter of the Association for Computational Linguistics, a member of the Executive Committee of the Association for Computational Linguistics, and a member of the editorial boards of Computational Linguistics and User Modeling and User-Adapted Interaction.
|Back to Conference Homepage .|