Difference between revisions of "Author List Clean-up Code"
DragoRadev (talk | contribs) |
|||
Line 19: | Line 19: | ||
− | + | * Contact radev@umich.edu for assistance with this software. | |
[[Category:Conference Handbook]] | [[Category:Conference Handbook]] |
Revision as of 10:01, 17 April 2016
Media:author_name_normalization.tar.gz
A big challenge in automatically creating an anthology from publications is correcting author names. Many different versions of author names are found in different publications.
For example, in the ACL Anthology, there are 5 different versions of the author name "Rosé, Carolyn Penstein" 's name, as shown below.
Rose, Carolyn P. Rosé, CarolynPenstein Rosé, Carolyn P. PensteinRosé, Carolyn P. Rosé, Carolyn
In order to resolve this, we have created a semi-automatically cleaned list of all author names in ACL anthology. The "master list" of author names contains 13,692 different authors. In addition to the master list, we provide code for the following tasks
1. Finding the canonical version of different author names in the field of computational linguistics, if it exists in a master list (available as part of the package) using many different heuristics.
2. Automatically change different versions of the name to the suggested canonical name (incorporating any manual corrections by the user, if any)
- Contact radev@umich.edu for assistance with this software.