Of the vast wealth of information unlocked by the Internet, most is plain text. The data necessary to answer myriad questions – about, say, the correlations between the industrial use of certain chemicals and incidents of disease, or between patterns of news coverage and voter-poll results – may all be online. But extracting it from plain text and organizing it for quantitative analysis may be prohibitively time consuming.
Information extraction – or automatically classifying data items stored as plain text – is thus a major topic of artificial-intelligence research.
Last week, at the Association for Computational Linguistics’ Conference on Empirical Methods on Natural Language Processing, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory won a best-paper award for a new approach to information extraction that turns conventional machine learning on its head.