Downloading data from a social network
We are first going to download a corpus of data from Twitter and use it to sort out spam from useful content. Twitter provides a robust API for collecting information from its servers and this API is free for small-scale usage. It is, however, subject to some conditions that you'll need to be aware of if you start using Twitter's data in a commercial setting.
First, you'll need to sign up for a Twitter account (which is free). Go to http://twitter.com and register an account if you do not already have one.
Next, you'll need to ensure that you only make a certain number of requests per minute. This limit is currently 15 requests per 15 minutes (it depends on the exact API). It can be tricky ensuring that you don't breach this limit, so it is highly recommended that you use a library to talk to Twitter's API.
Note
If you are using your own code (that is making the web calls with your own code) to connect with a web-based API, ensure that you read the documentation...