Difference between revisions of "2019Q3 Reports: Program Chairs"

From Admin Wiki
Jump to navigation Jump to search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Program Committee =
+
Our report is here:   https://www.aclweb.org/adminwiki/index.php?title=File:ACL_Program_Co-Chairs_Report_July_2019.pdf
 
 
== Organising Committee ==
 
 
 
=== General Chair ===
 
Jill Burstein, Educational Testing Service, USA
 
 
 
=== Program Co-Chairs ===
 
Christy Doran, Interactions LLC, USA <br />
 
Thamar Solorio, University of Houston, USA
 
 
 
=== Industry Track Co-chairs ===
 
Rohit Kumar<br />
 
Anastassia Loukina, Educational Testing Service, USA<br />
 
Michelle Morales, IBM, USA
 
 
 
=== Workshop Co-Chairs ===
 
Smaranda Muresan, Columbia University, USA<br />
 
Swapna Somasundaran, Educational Testing Service, USA<br />
 
Elena Volodina, University of Gothenburg, Sweden
 
 
 
=== Tutorial Co-Chairs ===
 
Anoop Sarkar, Simon Fraser University, Canada<br />
 
Michael Strube, Heidelberg Institute for Theoretical Studies, Germany
 
 
 
=== System Demonstration Co-Chairs ===
 
Waleed Ammar, Allen Institute for AI, USA<br />
 
Annie Louis, University of Edinburgh, Scotland<br />
 
Nasrin Mostafazadeh, Elemental Cognition, USA
 
 
 
=== Publication Co-Chairs ===
 
Stephanie Lukin, U.S. Army Research Laboratory<br />
 
Alla Roskovskaya, City University of New York, USA
 
 
 
=== Handbook Chair ===
 
Steve DeNeefe, SDL, USA
 
 
 
=== Student Research Workshop Co-Chairs & Faculty Advisors ===
 
Sudipta Kar, University of Houston, USA<br />
 
Farah Nadeem, University of Washington, USA<br />
 
Laura Wendlandt, University of Michigan, USA<br />
 
Greg Durrett, University of Texas at Austin, USA<br />
 
Na-Rae Han, University of Pittsburgh, USA
 
 
 
=== Diversity & Inclusion Co-Chairs ===
 
Jason Eisner, Johns Hopkins University, USA<br />
 
Natalie Schluter, IT University, Copenhagen, Denmark
 
 
 
=== Publicity & Social Media Co-Chairs ===
 
Yuval Pinter, Georgia Institute of Technology, USA <br />
 
Rachael Tatman, Kaggle, USA
 
 
 
=== Website & Conference App Chair ===
 
Nitin Madnani, Educational Testing Service, USA
 
 
 
=== Student Volunteer Coordinator ===
 
Lu Wang, Northeastern University, USA
 
 
 
=== Video Chair ===
 
Spencer Whitehead, Rensselaer Polytechnic Institute, USA
 
 
 
=== Remote Presentation Co-Chairs ===
 
Meg Mitchell, Google, USA<br />
 
Abhinav Misra, Educational Testing Service, USA
 
 
 
=== Local Sponsorship Co-Chairs ===
 
Chris Callison-Burch, University of Pennsylvania, USA<br />
 
Tonya Custis, Thomson Reuters, USA
 
 
 
=== Local Organization ===
 
Priscilla Rasmussen, ACL
 
 
 
= Area Chairs =
 
== FORMATTING TBD ==
 
Biomedical NLP & Clinical Text Processing
 
Bridget McInnes, Virginia Commonwealth University, USA
 
Byron C. Wallace, Northeastern University, USA
 
Cognitive Modeling – Psycholinguistics
 
Serguei Pakhomov, University of Minnesota, USA
 
Emily Prud’hommeaux, Boston College, USA
 
Dialog and Interactive systems
 
Nobuhiro Kaji, Yahoo Japan Corporation, Japan
 
Zornitsa Kozareva, Google, USA
 
Sujith Ravi, Google, USA
 
Michael White, Ohio State University, USA
 
Discourse and Pragmatics
 
Ruihong Huang, Texas A&M University, USA
 
Vincent Ng, University of Texas at Dallas, USA
 
Ethics, Bias and Fairness
 
Saif Mohammad, National Research Council Canada, Canada
 
Mark Yatskar, University of Washington, USA
 
Generation
 
He He, Amazon Web Services, USA
 
Wei Xu, Ohio State University, USA
 
Yue Zhang, Westlake University, China
 
Information Extraction
 
Heng Ji, Rensselaer Polytechnic Institute, USA
 
David McClosky, Google, USA
 
Gerard de Melo, Rutgers University, USA
 
Timothy Miller, Boston Children’s Hospital, USA
 
Mo Yu, IBM Research, USA
 
Information Retrieval
 
Sumit Bhatia, IBM’s India Research Laboratory, India
 
Dina Demner-Fushman, US National Library of Medicine, USA
 
Machine Learning for NLP
 
Ryan Cotterell, Johns Hopkins University, USA
 
Daichi Mochihashi, The Institute of Statistical Mathematics, Japan
 
Marie-Francine Moens, KU Leuven, Belgium
 
Vikram Ramanarayanan, Educational Testing Service, USA
 
Anna Rumshisky, University of Massachusetts Lowell, USA
 
Natalie Schluter, IT University of Copenhagen, Denmark
 
Machine Translation
 
Rafael E. Banchs, HLT Institute for Infocomm Research A*Star, Singapore
 
Daniel Cer, Google Research, USA
 
Haitao Mi, Ant Financial US, USA
 
Preslav Nakov, Qatar Computing Research Institute, Qatar
 
Zhaopeng Tu, Tencent, China
 
Mixed Topics
 
Ion Androutsopoulos, Athens Univ. of Economics and Business, Greece
 
Steven Bethard, University of Arizona, USA
 
Multilingualism, Cross lingual resources
 
Željko Agić, IT University of Copenhagen, Denmark
 
Ekaterina Shutova, University of Amsterdam, Netherlands
 
Yulia Tsvetkov, Carnegie Mellon University, USA
 
Ivan Vulic, Cambridge University, UK
 
NLP Applications
 
T. J. Hazen, Microsoft, USA
 
Alessandro Moschitti, Amazon, USA
 
Shimei Pan, University of Maryland Baltimore County, USA
 
Wenpeng Yin, University of Pennsylvania, USA
 
Su-Youn Yoon, Educational Testing Service, USA
 
Phonology, Morphology and Word Segmentation
 
Ramy Eskander, Columbia University, USA
 
Grzegorz Kondrak, University of Alberta, Canada
 
Question Answering
 
Eduardo Blanco, University of North Texas, USA
 
Christos Christodoulopoulos, Amazon, USA
 
Asif Ekbal, Indian Institute of Technology Patna, India
 
Yansong Feng, Peking University, China
 
Tim Rocktäschel, Facebook, USA
 
Avi Sil, IBM Research, USA
 
Resources and Evaluation
 
Torsten Zesch, University of Duisburg-Essen, Germany
 
Tristan Miller, Technische Universität Darmstadt, Germany
 
Semantics
 
Ebrahim Bagheri, Ryerson University, Canada
 
Samuel Bowman, New York University, USA
 
Matt Gardner, Allen Institute for Artificial Intelligence, USA
 
Kevin Gimpel, Toyota Technological Institute at Chicago, USA
 
Daisuke Kawahara, Kyoto University, Japan
 
Carlos Ramisch, Aix Marseille University, France
 
Sentiment Analysis
 
Isabelle Augenstein, University of Copenhagen, Denmark
 
Wai Lam, The Chinese University of Hong Kong, Hong Kong
 
Soujanya Poria, Nanyang Technological University, Singapore
 
Ivan Vladimir Meza Ruiz, UNAM, Mexico
 
Social Media
 
Dan Goldwasser, Purdue University, USA
 
Michael J. Paul, University of Colorado Boulder, USA
 
Sara Rosenthal, IBM Research, USA
 
Paolo Rosso, Universitat Politècnica de València, Spain
 
Chenhao Tan, University of Colorado Boulder, USA
 
Xiaodan Zhu, Queen’s University, Canada
 
Speech
 
Keelan Evanini, Educational Testing Service, USA
 
Yang Liu, LAIX Inc, USA
 
Style
 
Beata Beigman Klebanov, Educational Testing Service, USA
 
Manuel Montes, Instituto Nacional de Astrofísica, Óptica y Electrónica, Mexico
 
Joel Tetreault, Grammarly, USA
 
Summarization
 
Mohit Bansal, University of North Carolina Chapel Hill, USA
 
Fei Liu, University of Central Florida, USA
 
Ani Nenkova, University of Pennsylvania, USA
 
Tagging, Chunking, Syntax and Parsing
 
Adam Lopez, University of Edinburgh, Scotland
 
Roi Reichart, Technion – Israel Institute of Technology, Israel
 
Agata Savary, University of Tours, France
 
Guillaume Wisniewski, Université Paris Sud, France
 
Text Mining
 
Kai-Wei Chang, University of California Los Angeles, USA
 
Anna Feldman, Montclair State University, USA
 
Shervin Malmasi, Harvard Medical School, USA
 
Verónica Pérez-Rosas, University of Michigan, USA
 
Kevin Small, Amazon, USA
 
Diyi Yang, Carnegie Mellon University, USA
 
Theory and Formalisms
 
Valia Kordoni, Humboldt University Berlin, Germany
 
Andreas Maletti, University of Stuttgart, Germany
 
Vision, Robotics and other grounding
 
Francis Ferraro, University of Maryland Baltimore County, USA
 
Vicente Ordóñez, University of Virginia, USA
 
William Yang Wang, University of California Santa Barbara, USA
 
 
 
= Main Innovations =
 
Conference theme
 
The CFP made a special request for papers addressing the tension between data privacy and model bias in NLP, including: using NLP for surveillance and profiling, balancing the need for broadly representative data sets with protections for individuals, understanding and addressing model bias, and where bias correction becomes censorship. The three invited speakers were all selected to tie into the theme, and a Best Thematic Paper was selected.
 
 
 
Land Acknowledgement
 
Similar to what has been done in recent *CL conferences, the opening session included a land acknowledgement to recognize and honor Indigeneous Peoples.
 
 
 
Video Poster Highlights
 
This year included one minute slides with pre recorded audio that showcase the posters to be presented that day. The goal was to provide more visibility to posters. These were shown during the welcome reception, breakfast and breaks.
 
 
 
Remote Presentations
 
Remote presentations were supported for both talks and posters, via an application form to the committee.
 
 
 
Diversity & Inclusion Organization
 
The new Diversity & Inclusion team piloted a number of new initiatives including:
 
additional questions on the registration form to identify any accommodations
 
preferred pronouns (optionally) added to badges
 
I’m hiring/I’m looking for a job/I’m new badge stickers
 
<bunch of others, pull from their report>?
 
 
 
= Submissions =
 
This year we followed a two-stage submission process, in which abstracts were due one week before full papers. Our goal was to get a head start on assigning papers to areas, and recruiting additional area chairs where submissions exceeded our predicted volume.
 
Pro: early response to areas with larger than predicted number of papers
 
Con: too much overhead for PCs, as authors repeatedly contacted chairs to request that papers be moved between long and short, or asked about changes to authorship, titles and abstracts.
 
 
 
Full papers available for bidding: reviewers loved it, authors did not
 
 
 
3.1 An overview of statistics
 
Authors were permitted to switch format (long/short) when they submitted the full papers, so the total in the chart below uses 2271 as the total number of submissions, discounting the 103 that never submitted a full paper in the second phase. Seventy nine papers were desk-rejected due to anonymity, formatting, or dual-submission violations;  456 papers withdrawn prior to acceptance decisions being sent, although some were withdrawn part way through the review process; and an additional 11 papers were withdrawn after acceptance notifications had been sent.  Keeping the acceptance rate consistent with past years meant 5 parallel tracks were needed to fit more papers into 3 days--as the conference grows, decisions will have to be made about continuing to add more tracks, adding more days to the main conference, or lowering the acceptance rate. The overall technical program consisted of 423 main conference  papers, plus 9 TACL papers, 23 SRW papers, 28 Industry papers, and 24 demos. The TACL and SRW papers were integrated into the program, and marked SRW or TACL accordingly.
 
 
 
Acceptance break-down:
 
\begin{table}[h]
 
\centering
 
\begin{tabular}{|l|l|l|l|l|}
 
\hline
 
&\textbf{Long}& \textbf{Short} &\textbf{Total} & \textbf{TACL}\\ \hline
 
Reviewed & 1067 & 666 & 1733 & \\
 
Accepted as talk & 140  & 72  &  212 & 4\\
 
Accepted as poster &  141 & 70  &  211 & 5\\
 
Total Accepted & 281 (26.3\%)  & 142 (21.3\%) & 423 (24.4\%) & 9\\
 
\hline
 
\end{tabular}
 
\end{table}
 
3.2 Detailed statistics by area
 
 
 
Area
 
Long (%)
 
Short (%)
 
Area
 
Long (%)
 
Short (%)
 
Bio and clinical NLP
 
7 (57)
 
28 (17)
 
Question Answering
 
73 (36)
 
41 (17)
 
Cognitive modeling
 
24 (29)
 
14 (14)
 
Resources and Evaluation
 
33 (27)
 
20 (20)
 
Dialog and Interactive systems
 
64 (20)
 
18 (27)
 
Semantics
 
80 (13)
 
42 (11)
 
Discourse and Pragmatics
 
38 (21)
 
      11 (36)
 
Sentiment Analysis
 
32 (28)
 
40 (20)
 
Ethics, Bias and Fairness
 
16 (25)
 
12 (50)
 
Social Media
 
44 (18)
 
41 (36)
 
Generation
 
46 (14)
 
19 (23)
 
Speech
 
19 (31)
 
9 (33)
 
Information Extraction
 
46 (28)
 
16 (12)
 
Style
 
24 ( (25)
 
16 (25)
 
Information Retrieval
 
22 (22)
 
13 (30)
 
Summarization
 
22 (27)
 
28 (28)
 
Machine Learning for NLP
 
100 (29)
 
22 (22)
 
Syntax
 
36 (52)
 
54 (13)
 
Machine Translation
 
49 (30)
 
53 (18)
 
Text Mining
 
101 (18)
 
29 (24)
 
Multilingual NLP
 
43 (25)
 
28 (10)
 
Theory and Formalisms
 
12 (58)
 
12 (16)
 
NLP Applications
 
60 (30)
 
41 (17)
 
Vision & Robotics
 
41 (12)
 
22 (36)
 
Phonology
 
24 (33)
 
      24 (25)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3.3      Conference tracks
 
The Industry Track, in its second year, had  28 accepted papers (10 oral and 18 posters, acceptance rate: ~28%), and ran a lunchtime Careers in Industry panel which was very well attended. Panelists were Judith Klavans, Yunyao Li, Owen Rambow, and Joel Tetreault and the moderator was Phil Resnik.
 
 
 
The Student Research Workshop had 23 accepted papers, distributed throughout the conference, and 19 submissions received pre-submission mentoring. For the first time, both archival and non-archival submissions were offered, meaning that authors who opted for the non-archival version will not have a paper available in the archive and are free to publish elsewhere.
 
 
 
There were 25 accepted Demos, which were spread across several of the poster sessions.
 
 
 
= Review Process =
 
Issued a wide call for volunteers for Area Chairs (ACs) and reviewers. Volunteers were scanned by PCs and assigned ACs/reviewer roles.
 
PCs created 25 specific areas + one for “Mixed Topics” and assigned at least 2 ACs per topic area. After abstract deadline we added more ACs to teams with larger than predicted submissions
 
 
 
We used a hybrid reviewing form, combining elements of the EMNLP 2018, NAACL-HLT 2018 and ACL 2018, with a 6-point overall rating scale so there was no “easy out” mid-point, distinct sections of summary, strengths and weaknesses to make easy to scan and compare relevant sections, and the minimum length feature of START enabled to elicit more consistently substantive content for the authors.
 
 
 
Authors were blind to Area Chairs
 
Review assignment
 
Criteria: Fairness, Expertise, Interest
 
Method: area chair expertise + Toronto Paper Matching System (TPMS) + reviewer bids
 
Many reviewers did not have TPMS profiles
 
Goal was no more than 5 papers per reviewer, some reviewers agreed to handle more
 
First-round accept/reject suggestions were made by area chairs
 
Final decisions were made by the program chairs
 
 
 
 
 
No author response: due to time constraints and finding from NAACL 2018 that it had little impact. Authors were unhappy about this, they really want to be able to respond to reviews.
 
Video Poster highlights: instead of 1-minute madness, A/V failures have made it hard to assess effectiveness.  
 
SRW papers integrated into sessions: positive feedback from participants, better experience for students
 
Did not repeat Test of Time awards from 2018--should this happen every N years to allow for sliding window?  
 
 
 
 
 
4.1 Recruiting area chairs (ACs) and reviewers:
 
 
 
Response
 
Area Chair
 
Reviewer
 
Female
 
24.4
 
25.2
 
Male
 
73
 
71.7
 
Prefer not to answer
 
2.6
 
3.1
 
 
 
 
 
 
 
4.2 Assigning papers to areas and reviewers:
 
Assignment to areas was based on keywords and manual inspection of the paper. Assignment of papers to reviewers followed a combination of TPMS, reviewer bidding, and manual tweaking.
 
 
 
4.3 Deciding on the reject-without-review papers:
 
Our process for identifying desk rejects has been very similar to what other PCs have done in the past. First, the area chairs check their batch of assigned papers and report any issues to us. As the reviewing begins, reviewers may also identify issues that were not caught by ACs, which they flag up to ACs or directly to PCs. We then review each of these issues and make a final decision, to ensure that papers are handled consistently. This means each paper is reviewed for non-content issues by at least three people.
 
The major categories for desk rejects are:
 
Violations to the dual submission policy specified in the call for papers
 
Violations to the anonymity policy as specified in the call for papers
 
“Format cheating” submissions not following the clearly stated format and style guidelines either in LaTeX or Word (thanks to Emily and Leon for introducing the concept).
 
As of February 7th, out of 2378 submissions, there were 44 rejections for format issues, 24 for anonymity violations, and 11 for dual submissions. This means that a total of 3% of the submissions were desk-rejected.
 
 
 
4.4 A large pool of reviewers
 
Similar to what other PCs have done in the past, we distributed a wide call for volunteers to recruit the Area Chairs and Reviewers--we seeded the areas with volunteers who responded, and then Area Chairs filled out the remainder of their respective committees. Our goal was to ensure greater diversity by including in each area some participants who may not have been previously involved, and therefore would not have been invited if the committees were built from lists of previous reviewers.  390 of 1321 reviewers were reviewing for NAACL for the first time.
 
 
 
4.5 Structured review form
 
We used a hybrid reviewing form, combining elements of the EMNLP 2018, NAACL-HLT 2018 and ACL 2018, with a 6-point overall rating scale so there was no “easy out” mid-point, distinct sections of summary, strengths and weaknesses to make easy to scan and compare relevant sections, and the minimum length feature of START enabled to elicit more consistently substantive content for the authors. This received excellent feedback from authors but which some reviewers complained about and others outright circumvented via html tags or repeated filler content.
 
 
 
4.6 Abstract Submissions
 
 
 
This year we followed a two-stage submission process, in which abstracts were due one week before full papers. Our goal was to get a head start on assigning papers to areas, and recruiting additional area chairs where submissions exceeded our predicted volume. Relative to the projected numbers from NAACL-HLT 2018, several areas received a higher-than-predicted number of submissions: Biomedical/Clinical, Dialogue and Vision. Text Mining ended up with the overall largest number of submissions. 
 
 
 
4.7 Review process
 
 
 
Authors were permitted to switch format (long/short) when they submitted the full papers, so the total in the chart below uses 2271 as the total number of submissions, discounting the 103 that never submitted a full paper in the second phase. Seventy nine papers were desk-rejected due to anonymity, formatting, or dual-submission violations;  456 papers withdrawn prior to acceptance decisions being sent, although some were withdrawn part way through the review process; and an additional 11 papers were withdrawn after acceptance notifications had been sent.  Keeping the acceptance rate consistent with past years meant we needed 5 parallel tracks to fit more papers into 3 days--as the conference grows, decisions will have to be made about continuing to add more tracks, adding more days to the main conference, or lowering the acceptance rate. The overall technical program consists of 423 main conference  papers, plus 9 TACL papers, 23 SRW papers, 28 Industry papers, and 24 demos. The TACL and SRW papers are integrated into the program, and are marked SRW or TACL accordingly.
 
 
 
X reviews were received by the end of the review period, Y others within the next week.
 
 
 
Importance of double blind reviewing
 
 
 
4.9 Statistics
 
 
 
= Best paper awards =
 
 
 
Best Thematic Paper
 
What’s in a Name? Reducing Bias in Bios Without Access to Protected Attributes
 
Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky and Adam Kalai
 
 
 
Best Explainable NLP Paper
 
CNM: An Interpretable Complex-valued Network for Matching
 
Qiuchi Li, Benyou Wang and Massimo Melucci
 
 
 
Best Long Paper
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova
 
 
 
Best Short Paper
 
Probing the Need for Visual Context in Multimodal Machine Translation
 
Ozan Caglayan, Pranava Madhyastha, Lucia Specia and Loïc Barrault
 
 
 
Best Resource Paper
 
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
 
Alon Talmor, Jonathan Herzig, Nicholas Lourie and Jonathan Berant
 
 
 
= Presentations=
 
Long-paper presentations: 22 sessions in total (4 sessions in parallel), duration: 15 minutes for talk + 3 minutes for questions + 2 dedicated Industry Track sessions
 
Short-paper presentations: 12 sessions in total (4 sessions in parallel), duration: 12 minutes for talk + 3 minutes for questions
 
Best-paper presentation: 1 session at the end of the last day
 
Posters: 8 sessions in total (1 session in parallel with every non-plenary talk session) + 1 dedicated Industry Poster session
 
 
 
= Timeline =
 
 
 
= Issues and recommendations =
 
 
 
TBD
 

Latest revision as of 11:22, 22 July 2019