Kyungduk Kim


2021

pdf bib
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim | HyoungSeok Kim | Sang-Woo Lee | Gichang Lee | Donghyun Kwak | Jeon Dong Hyeon | Sunghyun Park | Sungju Kim | Seonhoon Kim | Dongpil Seo | Heungsub Lee | Minyoung Jeong | Sungjae Lee | Minsub Kim | Suk Hyun Ko | Seokhun Kim | Taeyong Park | Jinuk Kim | Soyoung Kang | Na-Hyeon Ryu | Kang Min Yoo | Minsuk Chang | Soobin Suh | Sookyo In | Jinseong Park | Kyungduk Kim | Hiun Kim | Jisu Jeong | Yong Goo Yeo | Donghoon Ham | Dongju Park | Min Young Lee | Jaewook Kang | Inho Kang | Jung-Woo Ha | Woomyoung Park | Nako Sung
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

2012

pdf bib
A Hierarchical Domain Model-Based Multi-Domain Selection Framework for Multi-Domain Dialog Systems
Seonghan Ryu | Donghyeon Lee | Injae Lee | Sangdo Han | Gary Geunbae Lee | Myungjae Kim | Kyungduk Kim
Proceedings of COLING 2012: Posters

2009

pdf bib
Automatic Agenda Graph Construction from Human-Human Dialogs using Clustering Method
Cheongjae Lee | Sangkeun Jung | Kyungduk Kim | Gary Geunbae Lee
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Hybrid Approach to User Intention Modeling for Dialog Simulation
Sangkeun Jung | Cheongjae Lee | Kyungduk Kim | Gary Geunbae Lee
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
A Frame-Based Probabilistic Framework for Spoken Dialog Management Using Dialog Examples
Kyungduk Kim | Cheongjae Lee | Sangkeun Jung | Gary Geunbae Lee
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf bib
An Integrated Dialog Simulation Technique for Evaluating Spoken Dialog Systems
Sangkeun Jung | Cheongjae Lee | Kyungduk Kim | Gary Geunbae Lee
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

2006

pdf bib
MMR-based Active Machine Learning for Bio Named Entity Recognition
Seokhwan Kim | Yu Song | Kyungduk Kim | Jeong-Won Cha | Gary Geunbae Lee
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf bib
POSBIOTM/W: A Development Workbench for Machine Learning Oriented Biomedical Text Mining System
Kyungduk Kim | Yu Song | Gary Geunbae Lee
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations