Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Modern R Programming Cookbook Recipes to simplify your statistical applications

Product type Paperback

Published in Oct 2017

Publisher Packt

ISBN-13 9781787129054

Length 236 pages

Edition 1st Edition

Languages

Concepts

Programming Language

Author (1):

Jaynal Abedin

View More author details

Table of Contents (10) Chapters

Preface

1. Installing and Configuring R and its Libraries FREE CHAPTER

2. Data Structures in R

3. Writing Customized Functions

4. Conditional and Iterative Operations

5. R Objects and Classes

6. Querying, Filtering, and Summarizing

7. R for Text Processing

8. R and Databases

9. Parallel Processing in R

Extracting unstructured text data from a plain web page

In the age of the internet revolution, web pages are the most popular source of text data. You can get newspaper articles, blog posts, personal biography web pages, and many more. Articles in Wikipedia are another source of plain text data through Wikipedia web pages. In this recipe, you will learn how to extract text data from a plain web page.

Getting ready

In this recipe, to extract unstructured text data from a plain web page, the following web page will be used:

https://en.wikipedia.org/wiki/Programming_with_Big_Data_in_R

This page contains a brief description of programming with big data in R. The objective is to read the web page using R and store the text data...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (1)

Jaynal Abedin

Jaynal Abedin is currently doing research as a PhD student at Unit for Biomedical Data Analytics (BDA) of INSIGHT at the National University of Ireland Galway. His research work is focused on the sports science and sports medicine area in a targeted project with ORRECO --an Irish startup company that provides evidence-based advice to individual athletes through biomarker and GPS data. Before joining INSIGHT as a PhD student he was leading a team of statisticians at an international public health research organization (icddr,b). His primary role there was to develop internal statistical capabilities for researchers who come from various disciplines. He was involved in designing and delivering statistical training to the researchers. He has a bachelors and masters degree in statistics, and he has written two books in R programming: Data Manipulation with R and R Graphs Cookbook (Second Edition) with Packt. His current research interests are predictive modeling to predict probable injury of an athlete and scoring extremeness of multivariate data to get an early signal of an anomaly. Moreover, he has an excellent reputation as a freelance R programmer and statistician in an online platform such as upwork.

See other products by Jaynal Abedin