Running PhantomJS with a disk cache
In this recipe, we will learn about running PhantomJS with an on-disk cache that is enabled using the disk-cache
and max-disk-cache-size
command-line arguments. We can use this to test how browsers cache our static assets.
Getting ready
To run this recipe, we will need a script to run with PhantomJS that accesses a website with cacheable assets. Optionally, we will also need a sense of how large we wish to set the on-disk cache (in kilobytes).
The script in this recipe is available in the downloadable code repository as recipe06.js
under chapter01
. If we run the provided example script, we must change to the root directory for the book's sample code.
Lastly, the script in this recipe runs against the demo site that is included with the cookbook's sample code repository. To run that demo site, we must have Node.js installed. In a separate terminal, change into the phantomjs-sandbox
directory (in the sample code's directory) and start the app with the following command:
node app.js
How to do it…
Given the following script:
var page = require('webpage').create(), count = 0, until = 2; page.onResourceReceived = function(res) { if (res.stage === 'end') { console.log(JSON.stringify(res, undefined, 2)); } }; page.onLoadStarted = function() { count += 1; console.log('Run ' + count + ' of ' + until + '.'); }; page.onLoadFinished = function(status) { if (status === 'success') { if (count < until) { console.log('Go again.\n'); page.reload(); } else { console.log('All done.'); phantom.exit(); } } else { console.error('Could not open page! (Is it running?)'); phantom.exit(1); } }; page.open('http://localhost:3000/cache-demo');
Enter the following command at the command line:
phantomjs --disk-cache=true --max-disk-cache-size=4000 chapter01/recipe06.js
The script will print out details about each resource in the response as JSON.
How it works…
Our preceding example script performs the following actions:
- It creates a
webpage
object and sets two variables,count
anduntil
. - We assign an event handler function to the
webpage
object'sonResourceReceived
callback. This callback will print out every property of each resource received. - We assign an event handler function to the
webpage
object'sonLoadStarted
callback. This callback will incrementcount
when the page load starts and print a message indicating which run it is. - We assign an event handler function to the
webpage
object'sonLoadFinished
callback. This callback checks thestatus
of the response and takes action accordingly as follows:- If
status
is not'success'
, then we print an error message and exit from PhantomJS - If the callback's
status
is'success
', then we check to see ifcount
is less thanuntil
, and if it is, then we callreload
on thewebpage
object; otherwise, we exit PhantomJS
- If
- Finally, we open the target URL (
http://localhost:3000/cache-demo
) usingwebpage.open
.
There's more…
Even though the disk cache is off by default, PhantomJS still performs some in-memory caching. This detail becomes important in later explorations, as it produces some otherwise difficult to explain results. For example, in our preceding sample script, we used webpage.reload
for our second request of the URL, and in that second request, we saw all of the images re-requested. However, if we had used a second call to webpage.open
(instead of webpage.reload
), then the onResourceReceived
callback would have shown a second request to the URL but none of the images would have been re-requested. (As an interesting aside, we would also see that behavior if we set the disk-cache
argument to false
; the in-memory cache cannot be disabled.)
Another interesting observation is that PhantomJS always reports an HTTP response status of 200 Ok
for every successfully retrieved asset. If we look at the Node.js console output for the demo app while our sample script runs, we can see the discrepancy. Again, when our sample script runs, we can see that an HTTP status code of 200
is reported by PhantomJS for each of the images during both the first and second request/response cycles. However, the output from the Node.js app looks something like this:
GET /cache-demo 200 1ms - 573b GET /images/583519989_1116956980_b.jpg 200 4ms - 264.64kb GET /images/152824439_ffcc1b2aa4_b.jpg 200 8ms - 615.21kb GET /images/357292530_f225d7e306_b.jpg 200 6ms - 497.98kb GET /images/391560246_f2ac936f6d_b.jpg 200 5ms - 446.68kb GET /images/872027465_2519a358b9_b.jpg 200 5ms - 766.94kb GET /cache-demo 200 1ms - 573b GET /images/152824439_ffcc1b2aa4_b.jpg 304 3ms GET /images/357292530_f225d7e306_b.jpg 304 3ms GET /images/391560246_f2ac936f6d_b.jpg 304 2ms GET /images/583519989_1116956980_b.jpg 304 3ms GET /images/872027465_2519a358b9_b.jpg 304 3ms
We can see that the server responds with 304 Not Modified
for each of the image assets. This is exactly what we would expect for a second request to the same URL when the assets are served with Cache-Control
headers that specify a max-age
, and for assets that are also cached to disk.
disk-cache
We can enable the disk cache by setting the disk-cache
argument to true
or yes
. By default, the disk cache is disabled, but we can also explicitly disable it by providing false
or no
to the command-line argument. When the disk cache is enabled, PhantomJS will cache assets to the on-disk cache, which it stores at the desktop services cache storage location. Caching these assets has the potential to speed up future script runs against URLs that share those assets.
max-disk-cache-size
Optionally, we may also wish to limit the size of the disk cache (for example, to simulate the small caches on some mobile devices). To limit the size of the disk cache, we use the max-disk-cache-size
command-line argument and provide an integer that determines the size of the cache in kilobytes. By default (if you do not use the max-disk-cache-size
argument), the cache size is unbounded. Most of the time, we will not need to use the max-disk-cache-size
argument.
Cache locations
If we need to inspect the cached data that is persisted to disk, PhantomJS writes to the desktop services cache storage location for the platform it's running on. These locations are listed as follows:
Platform |
Location |
---|---|
Windows |
|
Mac OS X |
|
Linux |
|
Note
These locations may not exist until after we have run PhantomJS with the disk-cache
argument enabled.
See also
- The Opening a URL within PhantomJS recipe in Chapter 3, Working with webpage Objects