Accessing an API with R
As we mentioned before, an always increasing proportion of our data resides on the Web and is made available through web APIs.
Note
APIs in computer programming are intended to be APIs, groups of procedures, protocols, and software used for software application building. APIs expose software in terms of input, output, and processes.
Web APIs are developed as an interface between web applications and third parties.
The typical structure of a web API is composed of a set of HTTP request messages that have answers with a predefined structure, usually in the XML or JSON format.
A typical use case for API data contains data regarding web and mobile applications, for instance, Google Analytics data or data regarding social networking activities.
The successful web application If This ThenThat (IFTTT), for instance, lets you link together different applications, making them share data with each other and building powerful and customizable workflows:
This useful job is done by leveraging the application's API (if you don't know IFTTT, just navigate to https://ifttt.com, and I will see you there).
Using R, it is possible to authenticate and get data from every API that adheres to the OAuth 1 and OAuth 2 standards, which are nowadays the most popular standards (even though opinions about these protocols are changing; refer to this popular post by the OAuth creator Blain Cook at http://hueniverse.com/2012/07/26/oauth-2-0-and-the-road-to-hell/). Moreover, specific packages have been developed for a lot of APIs.
This recipe shows how to access custom APIs and leverage packages developed for specific APIs.
In the There's more... section, suggestions are given on how to develop custom functions for frequently used APIs.
Getting ready
The rvest
package, once again a product of our benefactor Hadley Whickham, provides a complete set of functionalities for sending and receiving data through the HTTP protocol on the Web. Take a look at the quick-start guide hosted on GitHub to get a feeling of rvest
functionalities (https://github.com/hadley/rvest).
Among those functionalities, functions for dealing with APIs are provided as well.
Both OAuth 1.0 and OAuth 2.0 interfaces are implemented, making this package really useful when working with APIs.
Let's look at how to get data from the GitHub API. By changing small sections, I will point out how you can apply it to whatever API you are interested in.
Let's now actually install the rvest
package:
install.packages("rvest") library(rvest)
How to do it…
- The first step to connect with the API is to define the API endpoint. Specifications for the endpoint are usually given within the API documentation. For instance, GitHub gives this kind of information at http://developer.github.com/v3/oauth/.
In order to set the endpoint information, we are going to use the
oauth_endpoint()
function, which requires us to set the following arguments:request
: This is the URL that is required for the initial unauthenticated token. This is deprecated for OAuth 2.0, so you can leave itNULL
in this case, since the GitHub API is based on this protocol.authorize
: This is the URL where it is possible to gain authorization for the given client.access
: This is the URL where the exchange for an authenticated token is made.base_url
: This is the API URL on which other URLs (that is, the URLs containing requests for data) will be built upon.In the GitHub example, this will translate to the following line of code:
github_api <- oauth_endpoint(request = NULL, authorize = "https://github.com/login/oauth/authorize", access = "https://github.com/login/oauth/access_token", base_url = "https://github.com/login/oauth")
- Create an application to get a key and secret token. Moving on with our GitHub example, in order to create an application, you will have to navigate to https://github.com/settings/applications/new (assuming that you are already authenticated on GitHub).
Be aware that no particular URL is needed as the homepage URL, but a specific URL is required as the authorization callback URL.
This is the URL that the API will redirect to after the method invocation is done.
As you would expect, since we want to establish a connection from GitHub to our local PC, you will have to redirect the API to your machine, setting the Authorization callback URL to
http://localhost:1410
.After creating your application, you can get back to your R session to establish a connection with it and get your data.
- After getting back to your R session, you now have to set your OAuth credentials through the
oaut_app()
andoauth2.0_token()
functions and establish a connection with the API, as shown in the following code snippet:app <- oauth_app("your_app_name", key = "your_app_key", secret = "your_app_secret") API_token <- oauth2.0_token(github_api,app)
- This is where you actually use the API to get data from your web-based software. Continuing on with our GitHub-based example, let's request some information about API rate limits:
request <- GET("https://api.github.com/rate_limit", config(token = API_token))
How it works...
Be aware that this step will be required both for OAuth 1.0 and OAuth 2.0 APIs, as the difference between them is only the absence of a request URL, as we noted earlier.
Note
Endpoints for popular APIs
The httr
package comes with a set of endpoints that are already implemented for popular APIs, and specifically for the following websites:
- Vimeo
- GitHub
For these APIs, you can substitute the call to oauth_endpoint()
with a call to the oauth_endpoints()
function, for instance:
oauth_endpoints("github")
The core feature of the OAuth protocol is to secure authentication. This is then provided on the client side through a key and secret token, which are to be kept private.
The typical way to get a key and a secret token to access an API involves creating an app within the service providing the API.
The callback URL
Within the web API domain, a callback URL is the URL that is called by the API after the answer is given to the request. A typical example of a callback URL is the URL of the page navigated to after completing an online purchase.
In this example, when we finish at the checkout on the online store, an API call is made to the payment circuit provider.
After completing the payment operation, the API will navigate again to the online store at the callback URL, usually to a thank you page.
There's more...
You can also write custom functions to handle APIs. When frequently dealing with a particular API, it can be useful to define a set of custom functions in order to make it easier to interact with.
Basically, the interaction with an API can be summarized with the following three categories:
- Authentication
- Getting content from the API
- Posting content to the API
Authentication can be handled by leveraging the HTTR package's authenticate()
function and writing a function as follows:
api_auth function (path = "api_path", password){ authenticate(user = path, password) }
You can get the content from the API through the get
function of the httr
package:
api_get <- function(path = "api_path",password){ auth <- api_auth(path, password ) request <- GET("https://api.com", path = path, auth) }
Posting content will be done in a similar way through the POST
function:
api_post <- function(Path, post_body, path = "api_path",password){ auth <- api_auth(pat) stopifnot(is.list(body)) body_json <- jsonlite::toJSON(body) request <- POST("https://api.application.com", path = path, body = body_json, auth, post, ...) }