Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Python Web Penetration Testing Cookbook
Python Web Penetration Testing Cookbook

Python Web Penetration Testing Cookbook: Over 60 indispensable Python recipes to ensure you always have the right code on hand for web application testing

Arrow left icon
Profile Icon Dave Mound Profile Icon Cameron Buchanan Profile Icon Benjamin May Profile Icon Terry Ip Profile Icon Andrew Mabbitt +1 more Show less
Arrow right icon
$48.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5 (4 Ratings)
Paperback Jun 2015 224 pages 1st Edition
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Dave Mound Profile Icon Cameron Buchanan Profile Icon Benjamin May Profile Icon Terry Ip Profile Icon Andrew Mabbitt +1 more Show less
Arrow right icon
$48.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5 (4 Ratings)
Paperback Jun 2015 224 pages 1st Edition
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Python Web Penetration Testing Cookbook

Chapter 1. Gathering Open Source Intelligence

In this chapter, we will cover the following topics:

  • Gathering information using the Shodan API
  • Scripting a Google+ API search
  • Downloading profile pictures using the Google+ API
  • Harvesting additional results using the Google+ API pagination
  • Getting screenshots of websites using QtWebKit
  • Screenshots based on port lists
  • Spidering websites

Introduction

Open Source Intelligence (OSINT) is the process of gathering information from Open (overt) sources. When it comes to testing a web application, that might seem a strange thing to do. However, a great deal of information can be learned about a particular website before even touching it. You might be able to find out what server-side language the website is written in, the underpinning framework, or even its credentials. Learning to use APIs and scripting these tasks can make the bulk of the gathering phase a lot easier.

In this chapter, we will look at a few of the ways we can use Python to leverage the power of APIs to gain insight into our target.

Gathering information using the Shodan API

Shodan is essentially a vulnerability search engine. By providing it with a name, an IP address, or even a port, it returns all the systems in its databases that match. This makes it one of the most effective sources for intelligence when it comes to infrastructure. It's like Google for internet-connected devices. Shodan constantly scans the Internet and saves the results into a public database. Whilst this database is searchable from the Shodan website (https://www.shodan.io), the results and services reported on are limited, unless you access it through the Application Programming Interface (API).

Our task for this section will be to gain information about the Packt Publishing website by using the Shodan API.

Getting ready

At the time of writing this, Shodan membership is $49, and this is needed to get an API key. If you're serious about security, access to Shodan is invaluable.

If you don't already have an API key for Shodan, visit www.shodan.io/store/member and sign up for it. Shodan has a really nice Python library, which is also well documented at https://shodan.readthedocs.org/en/latest/.

To get your Python environment set up to work with Shodan, all you need to do is simply install the library using cheeseshop:

$ easy_install shodan

How to do it…

Here's the script that we are going to use for this task:

import shodan
import requests

SHODAN_API_KEY = "{Insert your Shodan API key}" 
api = shodan.Shodan(SHODAN_API_KEY)

target = 'www.packtpub.com'

dnsResolve = 'https://api.shodan.io/dns/resolve?hostnames=' + target + '&key=' + SHODAN_API_KEY

try:
    # First we need to resolve our targets domain to an IP
    resolved = requests.get(dnsResolve)
    hostIP = resolved.json()[target]

    # Then we need to do a Shodan search on that IP
    host = api.host(hostIP)
    print "IP: %s" % host['ip_str']
    print "Organization: %s" % host.get('org', 'n/a')
    print "Operating System: %s" % host.get('os', 'n/a')

    # Print all banners
    for item in host['data']:
        print "Port: %s" % item['port']
        print "Banner: %s" % item['data']

    # Print vuln information
    for item in host['vulns']:
        CVE = item.replace('!','')
        print 'Vulns: %s' % item
        exploits = api.exploits.search(CVE)
        for item in exploits['matches']:
            if item.get('cve')[0] == CVE:
                print item.get('description')
except:
    'An error occured'

The preceding script should produce an output similar to the following:

IP: 83.166.169.231
Organization: Node4 Limited
Operating System: None

Port: 443
Banner: HTTP/1.0 200 OK

Server: nginx/1.4.5

Date: Thu, 05 Feb 2015 15:29:35 GMT

Content-Type: text/html; charset=utf-8

Transfer-Encoding: chunked

Connection: keep-alive

Expires: Sun, 19 Nov 1978 05:00:00 GMT

Cache-Control: public, s-maxage=172800

Age: 1765

Via: 1.1 varnish

X-Country-Code: US


Port: 80
Banner: HTTP/1.0 301 https://www.packtpub.com/

Location: https://www.packtpub.com/

Accept-Ranges: bytes

Date: Fri, 09 Jan 2015 12:08:05 GMT

Age: 0

Via: 1.1 varnish

Connection: close

X-Country-Code: US

Server: packt


Vulns: !CVE-2014-0160
The (1) TLS and (2) DTLS implementations in OpenSSL 1.0.1 before 1.0.1g do not properly handle Heartbeat Extension packets, which allows remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read, as demonstrated by reading private keys, related to d1_both.c and t1_lib.c, aka the Heartbleed bug.

I've just chosen a few of the available data items that Shodan returns, but you can see that we get a fair bit of information back. In this particular instance, we can see that there is a potential vulnerability identified. We also see that this server is listening on ports 80 and 443 and that according to the banner information, it appears to be running nginx as the HTTP server.

How it works…

  1. Firstly, we set up our static strings within the code; this includes our API key:
    SHODAN_API_KEY = "{Insert your Shodan API key}" 
    target = 'www.packtpub.com'
    
    dnsResolve = 'https://api.shodan.io/dns/resolve?hostnames=' + target + '&key=' + SHODAN_API_KEY
  2. The next step is to create our API object:
    api = shodan.Shodan(SHODAN_API_KEY)
  3. In order to search for information on a host using the API, we need to know the host's IP address. Shodan has a DNS resolver but it's not included in the Python library. To use Shodan's DNS resolver, we simply have to make a GET request to the Shodan DNS Resolver URL and pass it the domain (or domains) we are interested in:
    resolved = requests.get(dnsResolve)
    hostIP = resolved.json()[target] 
  4. The returned JSON data will be a dictionary of domains to IP addresses; as we only have one target in our case, we can simply pull out the IP address of our host using the target string as the key for the dictionary. If you were searching on multiple domains, you would probably want to iterate over this list to obtain all the IP addresses.
  5. Now, we have the host's IP address, we can use the Shodan libraries host function to obtain information on our host. The returned JSON data contains a wealth of information about the host, though in our case we will just pull out the IP address, organization, and if possible the operating system that is running. Then we will loop over all of the ports that were found to be open and their respective banners:
        host = api.host(hostIP)
        print "IP: %s" % host['ip_str']
        print "Organization: %s" % host.get('org', 'n/a')
        print "Operating System: %s" % host.get('os', 'n/a')
    
        # Print all banners
        for item in host['data']:
            print "Port: %s" % item['port']
            print "Banner: %s" % item['data']
  6. The returned data may also contain potential Common Vulnerabilities and Exposures (CVE) numbers for vulnerabilities that Shodan thinks the server may be susceptible to. This could be really beneficial to us, so we will iterate over the list of these (if there are any) and use another function from the Shodan library to get information on the exploit:
    for item in host['vulns']:
            CVE = item.replace('!','')
            print 'Vulns: %s' % item
            exploits = api.exploits.search(CVE)
            for item in exploits['matches']:
                if item.get('cve')[0] == CVE:
                    print item.get('description')

    That's it for our script. Try running it against your own server.

There's more…

We've only really scratched the surface of the Shodan Python library with our script. It is well worth reading through the Shodan API reference documentation and playing around with the other search options. You can filter results based on "facets" to narrow down your searches. You can even use searches that other users have saved using the "tags" search.

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Scripting a Google+ API search

Social media is a great way to gather information on a target company or person. Here, we will be showing you how to script a Google+ API search to find contact information for a company within the Google+ social sites.

Getting ready

Some Google APIs require authorization to access them, but if you have a Google account, getting the API key is easy. Just go to https://console.developers.google.com and create a new project. Click on API & auth | Credentials. Click on Create new key and Server key. Optionally enter your IP or just click on Create. Your API key will be displayed and ready to copy and paste into the following recipe.

How to do it…

Here's a simple script to query the Google+ API:

import urllib2

GOOGLE_API_KEY = "{Insert your Google API key}" 
target = "packtpub.com"
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()
api_response = api_response.split("\n")
for line in api_response:
    if "displayName" in line:
        print line

How it works…

The preceding code makes a request to the Google+ search API (authenticated with your API key) and searches for accounts matching the target; packtpub.com. Similarly to the preceding Shodan script, we set up our static strings including the API key and target:

GOOGLE_API_KEY = "{Insert your Google API key}" 
target = "packtpub.com"

The next step does two things: first, it sends the HTTP GET request to the API server, then it reads in the response and stores the output into an api_response variable:

api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()

This request returns a JSON formatted response; an example snippet of the results is shown here:

How it works…

In our script, we convert the response into a list so it's easier to parse:

api_response = api_response.split("\n")

The final part of the code loops through the list and prints only the lines that contain displayName, as shown here:

How it works…

See also…

In the next recipe, Downloading profile pictures using the Google+ API, we will look at improving the formatting of these results.

There's more…

By starting with a simple script to query the Google+ API, we can extend it to be more efficient and make use of more of the data returned. Another key aspect of the Google+ platform is that users may also have a matching account on another of Google's services, which means you can cross-reference accounts. Most Google products have an API available to developers, so a good place to start is https://developers.google.com/products/. Grab an API key and plug the output from the previous script into it.

Downloading profile pictures using the Google+ API

Now that we have established how to use the Google+ API, we can design a script to pull down pictures. The aim here is to put faces to names taken from web pages. We will send a request to the API through a URL, handle the response through JSON, and create picture files in the working directory of the script.

How to do it

Here's a simple script to download profile pictures using the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()

json_response = json.loads(api_response)
for result in json_response['items']:
      name = result['displayName']
      print name
      image = result['image']['url'].split('?')[0]
  f = open(name+'.jpg','wb+')
  f.write(urllib2.urlopen(image).read())
  f.close()

How it works

The first change is to store the display name into a variable, as this is then reused later on:

      name = result['displayName']
      print name

Next, we grab the image URL from the JSON response:

image = result['image']['url'].split('?')[0]

The final part of the code does a number of things in three simple lines: firstly it opens a file on the local disk, with the filename set to the name variable. The wb+ flag here indicates to the OS that it should create the file if it doesn't exist and to write the data in a raw binary format. The second line makes a HTTP GET request to the image URL (stored in the image variable) and writes the response into the file. Finally, the file is closed to free system memory used to store the file contents:

  f = open(name+'.jpg','wb+')
  f.write(urllib2.urlopen(image).read())
  f.close()

After the script is run, the console output will be the same as before, with the display names shown. However, your local directory will now also contain all the profile images, saved as JPEG files.

Harvesting additional results from the Google+ API using pagination

By default, the Google+ APIs return a maximum of 25 results, but we can extend the previous scripts by increasing the maximum value and harvesting more results through pagination. As before, we will communicate with the Google+ API through a URL and the urllib library. We will create arbitrary numbers that will increase as requests go ahead, so we can move across pages and gather more results.

How to do it

The following script shows how you can harvest additional results from the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
token = ""
loops = 0

while loops < 10:
  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

  if len(json_response['items']) == 0:
    break

  for result in json_response['items']:
        name = result['displayName']
        print name
        image = result['image']['url'].split('?')[0]
    f = open(name+'.jpg','wb+')
    f.write(urllib2.urlopen(image).read())
  loops+=1

How it works

The first big change in this script that is the main code has been moved into a while loop:

token = ""
loops = 0

while loops < 10:

Here, the number of loops is set to a maximum of 10 to avoid sending too many requests to the API servers. This value can of course be changed to any positive integer. The next change is to the request URL itself; it now contains two additional trailing parameters maxResults and pageToken. Each response from the Google+ API contains a pageToken value, which is a pointer to the next set of results. Note that if there are no more results, a pageToken value is still returned. The maxResults parameter is self-explanatory, but can only be increased to a maximum of 50:

  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

The next part reads the same as before in the JSON response, but this time it also extracts the nextPageToken value:

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

The main while loop can stop if the loops variable increases up to 10, but sometimes you may only get one page of results. The next part in the code checks to see how many results were returned; if there were none, it exits the loop prematurely:

  if len(json_response['items']) == 0:
    break

Finally, we ensure that we increase the value of the loops integer each time. A common coding mistake is to leave this out, meaning the loop will continue forever:

  loops+=1

Getting screenshots of websites with QtWebKit

They say a picture is worth a thousand words. Sometimes, it's good to get screenshots of websites during the intelligence gathering phase. We may want to scan an IP range and get an idea of which IPs are serving up web pages, and more importantly what they look like. This could assist us in picking out interesting sites to focus on and we also might want to quickly scan ports on a particular IP address for the same reason. We will take a look at how we can accomplish this using the QtWebKit Python library.

Getting ready

The QtWebKit is a bit of a pain to install. The easiest way is to get the binaries from http://www.riverbankcomputing.com/software/pyqt/download. For Windows users, make sure you pick the binaries that fit your python/arch path. For example, I will use the PyQt4-4.11.3-gpl-Py2.7-Qt4.8.6-x32.exe binary to install Qt4 on my Windows 32bit Virtual Machine that has Python version 2.7 installed. If you are planning on compiling Qt4 from the source files, make sure you have already installed SIP.

How to do it…

Once you've got PyQt4 installed, you're pretty much ready to go. The following script is what we will use as the base for our screenshot class:

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class Screenshot(QWebView):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

    def wait_load(self, delay=0):
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

    def _loadFinished(self, result):
        self._loaded = True

    def get_image(self, url):
        self.load(QUrl(url))
        self.wait_load()

        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())

        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        return image

Create the preceding script and save it in the Python Lib folder. We can then reference it as an import in our scripts.

How it works…

The script makes use of QWebView to load the URL and then creates an image using QPainter. The get_image function takes a single parameter: our target. Knowing this, we can simply import it into another script and expand the functionality.

Let's break down the script and see how it works.

Firstly, we set up our imports:

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

Then, we create our class definition; the class we are creating extends from QWebView by inheritance:

class Screenshot(QWebView):

Next, we create our initialization method:

def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

def wait_load(self, delay=0):
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

def _loadFinished(self, result):
        self._loaded = True

The initialization method sets the self.__loaded property. This is used along with the __loadFinished and wait_load functions to check the state of the application as it runs. It waits until the site has loaded before taking a screenshot. The actual screenshot code is contained in the get_image function:

def get_image(self, url):
        self.load(QUrl(url))
        self.wait_load()

        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())

        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        return image

Within this get_image function, we set the size of the viewport to the size of the contents within the main frame. We then set the image format, assign the image to a painter object, and then render the frame using the painter. Finally, we return the processed image.

There's more…

To use the class we've just made, we just import it into another script. For example, if we wanted to just save the image we get back, we could do something like the following:

import screenshot
s = screenshot.Screenshot()
image = s.get_image('http://www.packtpub.com')
image.save('website.png')

That's all there is to it. In the next script, we will create something a little more useful.

Screenshots based on a port list

In the previous script, we created our base function to return an image for a URL. We will now expand on that to loop over a list of ports that are commonly associated with web-based administration portals. This will allow us to point the script at an IP and automatically run through the possible ports that could be associated with a web server. This is to be used in cases when we don't know which ports are open on a server, rather than when where we are specifying the port and domain.

Getting ready

In order for this script to work, we'll need to have the script created in the Getting screenshots of a website with QtWeb Kit recipe. This should be saved in the Pythonxx/Lib folder and named something clear and memorable. Here, we've named that script screenshot.py. The naming of your script is particularly essential as we reference it with an important declaration.

How to do it…

This is the script that we will be using:

import screenshot
import requests

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

http = 'http://'
https = 'https://'

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

for port in portList:
    testAndSave(http, port)
    testAndSave(https, port)

How it works…

We first create our import declarations. In this script, we use the screenshot script we created before and also the requests library. The requests library is used so that we can check the status of a request before trying to convert it to an image. We don't want to waste time trying to convert sites that don't exist.

Next, we import our libraries:

import screenshot
import requests

The next step sets up the array of common port numbers that we will be iterating over. We also set up a string with the IP address we will be using:

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

Next, we create strings to hold the protocol part of the URL that we will be building later; this just makes the code later on a little bit neater:

http = 'http://'
https = 'https://'

Next, we create our method, which will do the work of building the URL string. After we've created the URL, we check whether we get a 200 response code back for our get request. If the request is successful, we convert the web page returned to an image and save it with the filename being the successful port number. The code is wrapped in a try block because if the site doesn't exist when we make the request, it will throw an error:

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

Now that our method is ready, we simply iterate over each port in the port list and call our method. We do this once for the HTTP protocol and then with HTTPS:

for port in portList:
    testAndSave(http, port)
    testAndSave(https, port)

And that's it. Simply run the script and it will save the images to the same location as the script.

There's more…

You might notice that the script takes a while to run. This is because it has to check each port in turn. In practice, you would probably want to make this a multithreaded script so that it can check multiple URLs at the same time. Let's take a quick look at how we can modify the code to achieve this.

First, we'll need a couple more import declarations:

import Queue
import threading

Next, we need to create a new function that we will call threader. This new function will handle putting our testAndSave functions into the queue:

def threader(q, port):
    q.put(testAndSave(http, port))
    q.put(testAndSave(https, port))

Now that we have our new function, we just need to set up a new Queue object and make a few threading calls. We will take out the testAndSave calls from our FOR loop over the portList variable and replace it with this code:

q = Queue.Queue()

for port in portList:
    t = threading.Thread(target=threader, args=(q, port))
    t.deamon = True
    t.start()

s = q.get()

So, our new script in total now looks like this:

import Queue
import threading
import screenshot
import requests

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

http = 'http://'
https = 'https://'

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

def threader(q, port):
    q.put(testAndSave(http, port))
    q.put(testAndSave(https, port))

q = Queue.Queue()

for port in portList:
    t = threading.Thread(target=threader, args=(q, port))
    t.deamon = True
    t.start()

s = q.get()

If we run this now, we will get a much quicker execution of our code as the web requests are now being executed in parallel with each other.

You could try to further expand the script to work on a range of IP addresses too; this can be handy when you're testing an internal network range.

Spidering websites

Many tools provide the ability to map out websites, but often you are limited to style of output or the location in which the results are provided. This base plate for a spidering script allows you to map out websites in short order with the ability to alter them as you please.

Getting ready

In order for this script to work, you'll need the BeautifulSoup library, which is installable from the apt command with apt-get install python-bs4 or alternatively pip install beautifulsoup4. It's as easy as that.

How to do it…

This is the script that we will be using:

import urllib2 
from bs4 import BeautifulSoup
import sys
urls = []
urls2 = []

tarurl = sys.argv[1] 

url = urllib2.urlopen(tarurl).read()
soup = BeautifulSoup(url)
for line in soup.find_all('a'):
    newline = line.get('href')
    try: 
        if newline[:4] == "http": 
            if tarurl in newline: 
            urls.append(str(newline)) 
        elif newline[:1] == "/": 
            combline = tarurl+newline urls.append(str(combline)) except: 
               pass

    for uurl in urls: 
        url = urllib2.urlopen(uurl).read() 
        soup = BeautifulSoup(url) 
        for line in soup.find_all('a'): 
            newline = line.get('href') 
            try: 
                if newline[:4] == "http": 
                    if tarurl in newline:
                        urls2.append(str(newline)) 
                elif newline[:1] == "/": 
                    combline = tarurl+newline 
                    urls2.append(str(combline)) 
                    except: 
                pass 
            urls3 = set(urls2) 
    for value in urls3: 
    print value

How it works…

We first import the necessary libraries and create two empty lists called urls and urls2. These will allow us to run through the spidering process twice. Next, we set up input to be added as an addendum to the script to be run from the command line. It will be run like:

$ python spider.py http://www.packtpub.com

We then open the provided url variable and pass it to the beautifulsoup tool:

url = urllib2.urlopen(tarurl).read() 
soup = BeautifulSoup(url) 

The beautifulsoup tool splits the content into parts and allows us to only pull the parts that we want to:

for line in soup.find_all('a'): 
newline = line.get('href') 

We then pull all of the content that is marked as a tag in HTML and grab the element within the tag specified as href. This allows us to grab all the URLs listed in the page.

The next section handles relative and absolute links. If a link is relative, it starts with a slash to indicate that it is a page hosted locally to the web server. If a link is absolute, it contains the full address including the domain. What we do with the following code is ensure that we can, as external users, open all the links we find and list them as absolute links:

if newline[:4] == "http": 
if tarurl in newline: 
urls.append(str(newline)) 
  elif newline[:1] == "/": 
combline = tarurl+newline urls.append(str(combline))

We then repeat the process once more with the urls list that we identified from that page by iterating through each element in the original url list:

for uurl in urls:

Other than a change in the referenced lists and variables, the code remains the same.

We combine the two lists and finally, for ease of output, we take the full list of the urls list and turn it into a set. This removes duplicates from the list and allows us to output it neatly. We iterate through the values in the set and output them one by one.

There's more…

This tool can be tied in with any of the functionality shown earlier and later in this book. It can be tied to Getting Screenshots of a website with QtWeb Kit to allow you to take screenshots of every page. You can tie it to the email address finder in the Chapter 2, Enumeration, to gain email addresses from every page, or you can find another use for this simple technique to map web pages.

The script can be easily changed to add in levels of depth to go from the current level of 2 links deep to any value set by system argument. The output can be changed to add in URLs present on each page, or to turn it into a CSV to allow you to map vulnerabilities to pages for easy notation.

Left arrow icon Right arrow icon

Description

This book is for testers looking for quick access to powerful, modern tools and customizable scripts to kick-start the creation of their own Python web penetration testing toolbox.

Who is this book for?

This book is for testers looking for quick access to powerful, modern tools and customizable scripts to kick-start the creation of their own Python web penetration testing toolbox.

What you will learn

  • Enumerate users on web apps through Python
  • Develop complicated headerbased attacks through Python
  • Deliver multiple XSS strings and check their execution success
  • Handle outputs from multiple tools and create attractive reports
  • Create PHP pages that test scripts and tools
  • Identify parameters and URLs vulnerable to Directory Traversal
  • Replicate existing tool functionality in Python
  • Create basic dialback Python scripts using reverse shells and basic Python PoC malware
Estimated delivery fee Deliver to South Africa

Standard delivery 10 - 13 business days

$12.95

Premium delivery 3 - 6 business days

$34.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 24, 2015
Length: 224 pages
Edition : 1st
Language : English
ISBN-13 : 9781784392932
Category :
Languages :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Estimated delivery fee Deliver to South Africa

Standard delivery 10 - 13 business days

$12.95

Premium delivery 3 - 6 business days

$34.95
(Includes tracking information)

Product Details

Publication date : Jun 24, 2015
Length: 224 pages
Edition : 1st
Language : English
ISBN-13 : 9781784392932
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 152.97
Learning Penetration Testing with Python
$54.99
Kali Linux: Wireless Penetration Testing Beginner's Guide, Second Edition
$48.99
Python Web Penetration Testing Cookbook
$48.99
Total $ 152.97 Stars icon

Table of Contents

10 Chapters
1. Gathering Open Source Intelligence Chevron down icon Chevron up icon
2. Enumeration Chevron down icon Chevron up icon
3. Vulnerability Identification Chevron down icon Chevron up icon
4. SQL Injection Chevron down icon Chevron up icon
5. Web Header Manipulation Chevron down icon Chevron up icon
6. Image Analysis and Manipulation Chevron down icon Chevron up icon
7. Encryption and Encoding Chevron down icon Chevron up icon
8. Payloads and Shells Chevron down icon Chevron up icon
9. Reporting Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5
(4 Ratings)
5 star 50%
4 star 0%
3 star 0%
2 star 50%
1 star 0%
Perry Nally Aug 26, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Overall an excellent read! Easy to follow scripts presented in the point of view as a hacker (including subtle remarks toward those that use these techniques for ill-fated purposes). Cameron presents an idea, shows the python script and corresponding source contents for which this script works against, then describes the scripts steps, and then goes on to describe additional related things about this script. You could say that each script could build upon each other, but that's not totally true. The author makes sure they are really cut and paste recipes. He gives you the recipe and often a way to include it into a bigger, more comprehensive script - that builds upon each step as the book progresses. Being a cookbook, there is plenty of code examples for you to try out. This is not a book about theory, but rather implementation - so all the fluff is cut out and it gets right to the point.The book also focuses most of the direct web page vulnerability testing (2-3 chapters) at php script as the web pages' source. This would have been nice to have a corresponding discussion related to aspx, jsp, etc. There is some discussion of other technology other than php, and I get that the book would have probably doubled in size if more common page source was discussed, but it is something to think about when reading. Create the same page in aspx or jsp and attempt if there is a similar vulnerability.Don't worry though, there's plenty of scripts related to SQL injection, header processing, encryption, encoding, payloads, shells, and even how to report your findings. These items are not necessarily exclusive to a single technology, so you are not pigeon-holed into testing only a certain type of website/server.This book is not about learning python so if you're new to it, and you really want to understand how to manipulate each recipe, then I suggest searching for a beginner python book. However, that being said, most average level programmers can understand the scripts presented without needing to reach out for a python book/video.Being in the industry for over 15 years, I've seen a lot of tools you can buy off the shelf that tout the ability to do this same thing just by running a program. I think knowing what it actually does is key to really understanding your vulnerabilities rather than trusting someone else's process because after all, attack vectors change all the time and with this information you can easily change your scripts.In conclusion, this book is perfect for a web application developer wanting to test her application or an IT person ready to see just how vulnerable their application is - all with the ability to report the findings to those who need to know where to plug the holes. This is a book I will be referring to during and after each project I work on.
Amazon Verified review Amazon
Denver Water - Dawson Mar 04, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Wonderful published book. Great vendor!!
Amazon Verified review Amazon
Always Helpful Oct 30, 2015
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
I had high hopes for this but the code only works on certain system and a lot of the examples are of a poor quality.If you compare this with other books on the market then it leaves a lot to be desired!
Amazon Verified review Amazon
Jeff Char May 24, 2016
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
So many of examples don't work, have to search online to fix.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela