Conference on Empirical Methods in Natural Language Processing 2017

[ ]

EMNLP 2017, one of the best conferences in NLP, will be hosted on September 7~11, 2017 in Copenhagen, Denmark. The list of accepted papers and best papers of EMNLP 2017 is available now at their official website. There are 1509 submissions this year, which is a 40% increase from last year, including 836 long papers and 582 short papers. Out of those only 323 are accepted, including 216 long papers at the acceptance rate of 25.8, and 107short papers at the acceptance rate of 18.4%.

Best papers

Best long paper

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Abstract: Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively.

Depression and Self-Harm Risk Assessment in Online Forums

Best short paper

Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog.

Abstract: A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision! In this paper, using a Task and Tell reference game between two agents as a testbed, we present a sequence of ‘negative’ results culminating in a ‘positive’ one – showing that while most agent-invented languages are effective (i.e. achieve near-perfect task rewards), they are decidedly not interpretable or compositional. In essence, we find that natural language does not emerge ‘naturally’, despite the semblance of ease of natural-language-emergence that one may gather from recent literature. We discuss how it is possible to coax the invented languages to become more and more human-like and compositional by increasing restrictions on how two agents may communicate.

Best demo paper

Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps.

Abstract: Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.

Keynote speeches

There will be three keynote speeches.

"Does This Vehicle Belong to You"? Processing the Language of Policing for Improving Police-Community Relations from Dan Jurafsky, Stanford;
Towards more universal language technology: unsupervised learning from speech from Sharon Goldwater, Edinburg University;
Physical simulation, learning and language from Nando de Freitas, Google Deepmind.

Written on August 18, 2017