Summarizing Wikipedia Articles
There is a commonly referred-to analogy that data is to this century what oil was to the previous one. Human text is part of this valuable resource, which, contrary to oil, keeps increasing. Undoubtedly, the amount of textual data available from various sources has exploded. With the advent of Web 2.0, online users ceased to be merely consumers of this material and became content creators, further enhancing the abundance of online text data. But the more content that is available online, the less easy it is to discover and consume the most important information efficiently. Automatically extracting the gist of longer texts into an accurate summary and thus eliminating irrelevant content is urgently needed. Once more, machines can undertake this role.
This chapter introduces another challenging topic in natural language processing (NLP) and demystifies methods for text summarization. To implement pertinent systems, we exploit data coming from the web...