Getting screenshots of websites with QtWebKit
They say a picture is worth a thousand words. Sometimes, it's good to get screenshots of websites during the intelligence gathering phase. We may want to scan an IP range and get an idea of which IPs are serving up web pages, and more importantly what they look like. This could assist us in picking out interesting sites to focus on and we also might want to quickly scan ports on a particular IP address for the same reason. We will take a look at how we can accomplish this using the QtWebKit
Python library.
Getting ready
The QtWebKit is a bit of a pain to install. The easiest way is to get the binaries from http://www.riverbankcomputing.com/software/pyqt/download. For Windows users, make sure you pick the binaries that fit your python/arch
path. For example, I will use the PyQt4-4.11.3-gpl-Py2.7-Qt4.8.6-x32.exe
binary to install Qt4 on my Windows 32bit Virtual Machine that has Python version 2.7 installed. If you are planning on compiling Qt4 from the source files, make sure you have already installed SIP
.
How to do it…
Once you've got PyQt4 installed, you're pretty much ready to go. The following script is what we will use as the base for our screenshot class:
import sys import time from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * class Screenshot(QWebView): def __init__(self): self.app = QApplication(sys.argv) QWebView.__init__(self) self._loaded = False self.loadFinished.connect(self._loadFinished) def wait_load(self, delay=0): while not self._loaded: self.app.processEvents() time.sleep(delay) self._loaded = False def _loadFinished(self, result): self._loaded = True def get_image(self, url): self.load(QUrl(url)) self.wait_load() frame = self.page().mainFrame() self.page().setViewportSize(frame.contentsSize()) image = QImage(self.page().viewportSize(), QImage.Format_ARGB32) painter = QPainter(image) frame.render(painter) painter.end() return image
Create the preceding script and save it in the Python Lib
folder. We can then reference it as an import in our scripts.
How it works…
The script makes use of QWebView
to load the URL and then creates an image using QPainter. The get_image
function takes a single parameter: our target. Knowing this, we can simply import it into another script and expand the functionality.
Let's break down the script and see how it works.
Firstly, we set up our imports:
import sys import time from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import *
Then, we create our class definition; the class we are creating extends from QWebView
by inheritance:
class Screenshot(QWebView):
Next, we create our initialization method:
def __init__(self): self.app = QApplication(sys.argv) QWebView.__init__(self) self._loaded = False self.loadFinished.connect(self._loadFinished) def wait_load(self, delay=0): while not self._loaded: self.app.processEvents() time.sleep(delay) self._loaded = False def _loadFinished(self, result): self._loaded = True
The initialization method sets the self.__loaded
property. This is used along with the __loadFinished
and wait_load
functions to check the state of the application as it runs. It waits until the site has loaded before taking a screenshot. The actual screenshot code is contained in the get_image
function:
def get_image(self, url): self.load(QUrl(url)) self.wait_load() frame = self.page().mainFrame() self.page().setViewportSize(frame.contentsSize()) image = QImage(self.page().viewportSize(), QImage.Format_ARGB32) painter = QPainter(image) frame.render(painter) painter.end() return image
Within this get_image
function, we set the size of the viewport to the size of the contents within the main frame. We then set the image format, assign the image to a painter object, and then render the frame using the painter. Finally, we return the processed image.
There's more…
To use the class we've just made, we just import it into another script. For example, if we wanted to just save the image we get back, we could do something like the following:
import screenshot s = screenshot.Screenshot() image = s.get_image('http://www.packtpub.com') image.save('website.png')
That's all there is to it. In the next script, we will create something a little more useful.