You're reading from Haskell Data Analysis cookbook Explore intuitive data analysis techniques and powerful machine learning methods using over 130 practical recipes

Product type Paperback

Published in Jun 2014

Publisher

ISBN-13 9781783286331

Length 334 pages

Edition 1st Edition

Languages

Haskell

Concepts

Data Analysis

Author (1):

Nishant Shukla

View More author details

Table of Contents (14) Chapters

Preface

1. The Hunt for Data FREE CHAPTER

2. Integrity and Inspection

3. The Science of Words

4. Data Hashing

5. The Dance with Trees

6. Graph Fundamentals

7. Statistics and Analysis

8. Clustering and Classification

9. Parallel and Concurrent Design

10. Real-time Data

11. Visualizing Data

12. Exporting and Presenting

Index

Understanding how to perform HTTP GET requests

One of the most resourceful places to find good data is online. GET requests are common methods of communicating with an HTTP web server. In this recipe, we will grab all the links from a Wikipedia article and print them to the terminal. To easily grab all the links, we will use a helpful library called HandsomeSoup, which lets us easily manipulate and traverse a webpage through CSS selectors.

Getting ready

We will be collecting all links from a Wikipedia web page. Make sure to have an Internet connection before running this recipe.

Install the HandsomeSoup CSS selector package, and also install the HXT library if it is not already installed. To do this, use the following commands:

$ cabal install HandsomeSoup
$ cabal install hxt

How to do it...

This recipe requires hxt for parsing HTML and requires HandsomeSoup for the easy-to-use CSS selectors, as shown in the following code snippet:
```
import Text.XML.HXT.Core
import Text.HandsomeSoup
```
Define and implement main as follows:
```
main :: IO ()
main = do
```

Pass in the URL as a string to HandsomeSoup's fromUrl function:

    let doc = fromUrl "http://en.wikipedia.org/wiki/Narwhal"

Select all links within the bodyContent field of the Wikipedia page as follows:

    links <- runX $ doc >>> css "#bodyContent a" ! "href"
    print links

How it works…

The HandsomeSoup package allows easy CSS selectors. In this recipe, we run the #bodyContent a selector on a Wikipedia article web page. This finds all link tags that are descendants of an element with the bodyContent ID.