Harvesting additional results from the Google+ API using pagination
By default, the Google+ APIs return a maximum of 25 results, but we can extend the previous scripts by increasing the maximum value and harvesting more results through pagination. As before, we will communicate with the Google+ API through a URL and the urllib
library. We will create arbitrary numbers that will increase as requests go ahead, so we can move across pages and gather more results.
How to do it
The following script shows how you can harvest additional results from the Google+ API:
import urllib2 import json GOOGLE_API_KEY = "{Insert your Google API key}" target = "packtpub.com" token = "" loops = 0 while loops < 10: api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read() json_response = json.loads(api_response) token = json_response['nextPageToken'] if len(json_response['items']) == 0: break for result in json_response['items']: name = result['displayName'] print name image = result['image']['url'].split('?')[0] f = open(name+'.jpg','wb+') f.write(urllib2.urlopen(image).read()) loops+=1
How it works
The first big change in this script that is the main code has been moved into a while
loop:
token = "" loops = 0 while loops < 10:
Here, the number of loops is set to a maximum of 10 to avoid sending too many requests to the API servers. This value can of course be changed to any positive integer. The next change is to the request URL itself; it now contains two additional trailing parameters maxResults
and pageToken
. Each response from the Google+ API contains a pageToken
value, which is a pointer to the next set of results. Note that if there are no more results, a pageToken
value is still returned. The maxResults
parameter is self-explanatory, but can only be increased to a maximum of 50:
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()
The next part reads the same as before in the JSON response, but this time it also extracts the nextPageToken
value:
json_response = json.loads(api_response) token = json_response['nextPageToken']
The main while
loop can stop if the loops
variable increases up to 10, but sometimes you may only get one page of results. The next part in the code checks to see how many results were returned; if there were none, it exits the loop prematurely:
if len(json_response['items']) == 0: break
Finally, we ensure that we increase the value of the loops
integer each time. A common coding mistake is to leave this out, meaning the loop will continue forever:
loops+=1