Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Python Web Penetration Testing Cookbook
Python Web Penetration Testing Cookbook

Python Web Penetration Testing Cookbook: Over 60 indispensable Python recipes to ensure you always have the right code on hand for web application testing

Arrow left icon
Profile Icon Dave Mound Profile Icon Cameron Buchanan Profile Icon Benjamin May Profile Icon Terry Ip Profile Icon Andrew Mabbitt +1 more Show less
Arrow right icon
₹800 per month
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5 (4 Ratings)
Paperback Jun 2015 224 pages 1st Edition
eBook
₹799.99 ₹2919.99
Paperback
₹3649.99
Subscription
Free Trial
Renews at ₹800p/m
Arrow left icon
Profile Icon Dave Mound Profile Icon Cameron Buchanan Profile Icon Benjamin May Profile Icon Terry Ip Profile Icon Andrew Mabbitt +1 more Show less
Arrow right icon
₹800 per month
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5 (4 Ratings)
Paperback Jun 2015 224 pages 1st Edition
eBook
₹799.99 ₹2919.99
Paperback
₹3649.99
Subscription
Free Trial
Renews at ₹800p/m
eBook
₹799.99 ₹2919.99
Paperback
₹3649.99
Subscription
Free Trial
Renews at ₹800p/m

What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Python Web Penetration Testing Cookbook

Chapter 1. Gathering Open Source Intelligence

In this chapter, we will cover the following topics:

  • Gathering information using the Shodan API
  • Scripting a Google+ API search
  • Downloading profile pictures using the Google+ API
  • Harvesting additional results using the Google+ API pagination
  • Getting screenshots of websites using QtWebKit
  • Screenshots based on port lists
  • Spidering websites

Introduction

Open Source Intelligence (OSINT) is the process of gathering information from Open (overt) sources. When it comes to testing a web application, that might seem a strange thing to do. However, a great deal of information can be learned about a particular website before even touching it. You might be able to find out what server-side language the website is written in, the underpinning framework, or even its credentials. Learning to use APIs and scripting these tasks can make the bulk of the gathering phase a lot easier.

In this chapter, we will look at a few of the ways we can use Python to leverage the power of APIs to gain insight into our target.

Gathering information using the Shodan API

Shodan is essentially a vulnerability search engine. By providing it with a name, an IP address, or even a port, it returns all the systems in its databases that match. This makes it one of the most effective sources for intelligence when it comes to infrastructure. It's like Google for internet-connected devices. Shodan constantly scans the Internet and saves the results into a public database. Whilst this database is searchable from the Shodan website (https://www.shodan.io), the results and services reported on are limited, unless you access it through the Application Programming Interface (API).

Our task for this section will be to gain information about the Packt Publishing website by using the Shodan API.

Getting ready

At the time of writing this, Shodan membership is $49, and this is needed to get an API key. If you're serious about security, access to Shodan is invaluable.

If you don't already have an API key for Shodan, visit www.shodan.io/store/member and sign up for it. Shodan has a really nice Python library, which is also well documented at https://shodan.readthedocs.org/en/latest/.

To get your Python environment set up to work with Shodan, all you need to do is simply install the library using cheeseshop:

$ easy_install shodan

How to do it…

Here's the script that we are going to use for this task:

import shodan
import requests

SHODAN_API_KEY = "{Insert your Shodan API key}" 
api = shodan.Shodan(SHODAN_API_KEY)

target = 'www.packtpub.com'

dnsResolve = 'https://api.shodan.io/dns/resolve?hostnames=' + target + '&key=' + SHODAN_API_KEY

try:
    # First we need to resolve our targets domain to an IP
    resolved = requests.get(dnsResolve)
    hostIP = resolved.json()[target]

    # Then we need to do a Shodan search on that IP
    host = api.host(hostIP)
    print "IP: %s" % host['ip_str']
    print "Organization: %s" % host.get('org', 'n/a')
    print "Operating System: %s" % host.get('os', 'n/a')

    # Print all banners
    for item in host['data']:
        print "Port: %s" % item['port']
        print "Banner: %s" % item['data']

    # Print vuln information
    for item in host['vulns']:
        CVE = item.replace('!','')
        print 'Vulns: %s' % item
        exploits = api.exploits.search(CVE)
        for item in exploits['matches']:
            if item.get('cve')[0] == CVE:
                print item.get('description')
except:
    'An error occured'

The preceding script should produce an output similar to the following:

IP: 83.166.169.231
Organization: Node4 Limited
Operating System: None

Port: 443
Banner: HTTP/1.0 200 OK

Server: nginx/1.4.5

Date: Thu, 05 Feb 2015 15:29:35 GMT

Content-Type: text/html; charset=utf-8

Transfer-Encoding: chunked

Connection: keep-alive

Expires: Sun, 19 Nov 1978 05:00:00 GMT

Cache-Control: public, s-maxage=172800

Age: 1765

Via: 1.1 varnish

X-Country-Code: US


Port: 80
Banner: HTTP/1.0 301 https://www.packtpub.com/

Location: https://www.packtpub.com/

Accept-Ranges: bytes

Date: Fri, 09 Jan 2015 12:08:05 GMT

Age: 0

Via: 1.1 varnish

Connection: close

X-Country-Code: US

Server: packt


Vulns: !CVE-2014-0160
The (1) TLS and (2) DTLS implementations in OpenSSL 1.0.1 before 1.0.1g do not properly handle Heartbeat Extension packets, which allows remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read, as demonstrated by reading private keys, related to d1_both.c and t1_lib.c, aka the Heartbleed bug.

I've just chosen a few of the available data items that Shodan returns, but you can see that we get a fair bit of information back. In this particular instance, we can see that there is a potential vulnerability identified. We also see that this server is listening on ports 80 and 443 and that according to the banner information, it appears to be running nginx as the HTTP server.

How it works…

  1. Firstly, we set up our static strings within the code; this includes our API key:
    SHODAN_API_KEY = "{Insert your Shodan API key}" 
    target = 'www.packtpub.com'
    
    dnsResolve = 'https://api.shodan.io/dns/resolve?hostnames=' + target + '&key=' + SHODAN_API_KEY
  2. The next step is to create our API object:
    api = shodan.Shodan(SHODAN_API_KEY)
  3. In order to search for information on a host using the API, we need to know the host's IP address. Shodan has a DNS resolver but it's not included in the Python library. To use Shodan's DNS resolver, we simply have to make a GET request to the Shodan DNS Resolver URL and pass it the domain (or domains) we are interested in:
    resolved = requests.get(dnsResolve)
    hostIP = resolved.json()[target] 
  4. The returned JSON data will be a dictionary of domains to IP addresses; as we only have one target in our case, we can simply pull out the IP address of our host using the target string as the key for the dictionary. If you were searching on multiple domains, you would probably want to iterate over this list to obtain all the IP addresses.
  5. Now, we have the host's IP address, we can use the Shodan libraries host function to obtain information on our host. The returned JSON data contains a wealth of information about the host, though in our case we will just pull out the IP address, organization, and if possible the operating system that is running. Then we will loop over all of the ports that were found to be open and their respective banners:
        host = api.host(hostIP)
        print "IP: %s" % host['ip_str']
        print "Organization: %s" % host.get('org', 'n/a')
        print "Operating System: %s" % host.get('os', 'n/a')
    
        # Print all banners
        for item in host['data']:
            print "Port: %s" % item['port']
            print "Banner: %s" % item['data']
  6. The returned data may also contain potential Common Vulnerabilities and Exposures (CVE) numbers for vulnerabilities that Shodan thinks the server may be susceptible to. This could be really beneficial to us, so we will iterate over the list of these (if there are any) and use another function from the Shodan library to get information on the exploit:
    for item in host['vulns']:
            CVE = item.replace('!','')
            print 'Vulns: %s' % item
            exploits = api.exploits.search(CVE)
            for item in exploits['matches']:
                if item.get('cve')[0] == CVE:
                    print item.get('description')

    That's it for our script. Try running it against your own server.

There's more…

We've only really scratched the surface of the Shodan Python library with our script. It is well worth reading through the Shodan API reference documentation and playing around with the other search options. You can filter results based on "facets" to narrow down your searches. You can even use searches that other users have saved using the "tags" search.

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Scripting a Google+ API search

Social media is a great way to gather information on a target company or person. Here, we will be showing you how to script a Google+ API search to find contact information for a company within the Google+ social sites.

Getting ready

Some Google APIs require authorization to access them, but if you have a Google account, getting the API key is easy. Just go to https://console.developers.google.com and create a new project. Click on API & auth | Credentials. Click on Create new key and Server key. Optionally enter your IP or just click on Create. Your API key will be displayed and ready to copy and paste into the following recipe.

How to do it…

Here's a simple script to query the Google+ API:

import urllib2

GOOGLE_API_KEY = "{Insert your Google API key}" 
target = "packtpub.com"
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()
api_response = api_response.split("\n")
for line in api_response:
    if "displayName" in line:
        print line

How it works…

The preceding code makes a request to the Google+ search API (authenticated with your API key) and searches for accounts matching the target; packtpub.com. Similarly to the preceding Shodan script, we set up our static strings including the API key and target:

GOOGLE_API_KEY = "{Insert your Google API key}" 
target = "packtpub.com"

The next step does two things: first, it sends the HTTP GET request to the API server, then it reads in the response and stores the output into an api_response variable:

api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()

This request returns a JSON formatted response; an example snippet of the results is shown here:

How it works…

In our script, we convert the response into a list so it's easier to parse:

api_response = api_response.split("\n")

The final part of the code loops through the list and prints only the lines that contain displayName, as shown here:

How it works…

See also…

In the next recipe, Downloading profile pictures using the Google+ API, we will look at improving the formatting of these results.

There's more…

By starting with a simple script to query the Google+ API, we can extend it to be more efficient and make use of more of the data returned. Another key aspect of the Google+ platform is that users may also have a matching account on another of Google's services, which means you can cross-reference accounts. Most Google products have an API available to developers, so a good place to start is https://developers.google.com/products/. Grab an API key and plug the output from the previous script into it.

Downloading profile pictures using the Google+ API

Now that we have established how to use the Google+ API, we can design a script to pull down pictures. The aim here is to put faces to names taken from web pages. We will send a request to the API through a URL, handle the response through JSON, and create picture files in the working directory of the script.

How to do it

Here's a simple script to download profile pictures using the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()

json_response = json.loads(api_response)
for result in json_response['items']:
      name = result['displayName']
      print name
      image = result['image']['url'].split('?')[0]
  f = open(name+'.jpg','wb+')
  f.write(urllib2.urlopen(image).read())
  f.close()

How it works

The first change is to store the display name into a variable, as this is then reused later on:

      name = result['displayName']
      print name

Next, we grab the image URL from the JSON response:

image = result['image']['url'].split('?')[0]

The final part of the code does a number of things in three simple lines: firstly it opens a file on the local disk, with the filename set to the name variable. The wb+ flag here indicates to the OS that it should create the file if it doesn't exist and to write the data in a raw binary format. The second line makes a HTTP GET request to the image URL (stored in the image variable) and writes the response into the file. Finally, the file is closed to free system memory used to store the file contents:

  f = open(name+'.jpg','wb+')
  f.write(urllib2.urlopen(image).read())
  f.close()

After the script is run, the console output will be the same as before, with the display names shown. However, your local directory will now also contain all the profile images, saved as JPEG files.

Harvesting additional results from the Google+ API using pagination

By default, the Google+ APIs return a maximum of 25 results, but we can extend the previous scripts by increasing the maximum value and harvesting more results through pagination. As before, we will communicate with the Google+ API through a URL and the urllib library. We will create arbitrary numbers that will increase as requests go ahead, so we can move across pages and gather more results.

How to do it

The following script shows how you can harvest additional results from the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
token = ""
loops = 0

while loops < 10:
  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

  if len(json_response['items']) == 0:
    break

  for result in json_response['items']:
        name = result['displayName']
        print name
        image = result['image']['url'].split('?')[0]
    f = open(name+'.jpg','wb+')
    f.write(urllib2.urlopen(image).read())
  loops+=1

How it works

The first big change in this script that is the main code has been moved into a while loop:

token = ""
loops = 0

while loops < 10:

Here, the number of loops is set to a maximum of 10 to avoid sending too many requests to the API servers. This value can of course be changed to any positive integer. The next change is to the request URL itself; it now contains two additional trailing parameters maxResults and pageToken. Each response from the Google+ API contains a pageToken value, which is a pointer to the next set of results. Note that if there are no more results, a pageToken value is still returned. The maxResults parameter is self-explanatory, but can only be increased to a maximum of 50:

  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

The next part reads the same as before in the JSON response, but this time it also extracts the nextPageToken value:

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

The main while loop can stop if the loops variable increases up to 10, but sometimes you may only get one page of results. The next part in the code checks to see how many results were returned; if there were none, it exits the loop prematurely:

  if len(json_response['items']) == 0:
    break

Finally, we ensure that we increase the value of the loops integer each time. A common coding mistake is to leave this out, meaning the loop will continue forever:

  loops+=1

Getting screenshots of websites with QtWebKit

They say a picture is worth a thousand words. Sometimes, it's good to get screenshots of websites during the intelligence gathering phase. We may want to scan an IP range and get an idea of which IPs are serving up web pages, and more importantly what they look like. This could assist us in picking out interesting sites to focus on and we also might want to quickly scan ports on a particular IP address for the same reason. We will take a look at how we can accomplish this using the QtWebKit Python library.

Getting ready

The QtWebKit is a bit of a pain to install. The easiest way is to get the binaries from http://www.riverbankcomputing.com/software/pyqt/download. For Windows users, make sure you pick the binaries that fit your python/arch path. For example, I will use the PyQt4-4.11.3-gpl-Py2.7-Qt4.8.6-x32.exe binary to install Qt4 on my Windows 32bit Virtual Machine that has Python version 2.7 installed. If you are planning on compiling Qt4 from the source files, make sure you have already installed SIP.

How to do it…

Once you've got PyQt4 installed, you're pretty much ready to go. The following script is what we will use as the base for our screenshot class:

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class Screenshot(QWebView):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

    def wait_load(self, delay=0):
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

    def _loadFinished(self, result):
        self._loaded = True

    def get_image(self, url):
        self.load(QUrl(url))
        self.wait_load()

        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())

        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        return image

Create the preceding script and save it in the Python Lib folder. We can then reference it as an import in our scripts.

How it works…

The script makes use of QWebView to load the URL and then creates an image using QPainter. The get_image function takes a single parameter: our target. Knowing this, we can simply import it into another script and expand the functionality.

Let's break down the script and see how it works.

Firstly, we set up our imports:

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

Then, we create our class definition; the class we are creating extends from QWebView by inheritance:

class Screenshot(QWebView):

Next, we create our initialization method:

def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

def wait_load(self, delay=0):
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

def _loadFinished(self, result):
        self._loaded = True

The initialization method sets the self.__loaded property. This is used along with the __loadFinished and wait_load functions to check the state of the application as it runs. It waits until the site has loaded before taking a screenshot. The actual screenshot code is contained in the get_image function:

def get_image(self, url):
        self.load(QUrl(url))
        self.wait_load()

        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())

        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        return image

Within this get_image function, we set the size of the viewport to the size of the contents within the main frame. We then set the image format, assign the image to a painter object, and then render the frame using the painter. Finally, we return the processed image.

There's more…

To use the class we've just made, we just import it into another script. For example, if we wanted to just save the image we get back, we could do something like the following:

import screenshot
s = screenshot.Screenshot()
image = s.get_image('http://www.packtpub.com')
image.save('website.png')

That's all there is to it. In the next script, we will create something a little more useful.

Screenshots based on a port list

In the previous script, we created our base function to return an image for a URL. We will now expand on that to loop over a list of ports that are commonly associated with web-based administration portals. This will allow us to point the script at an IP and automatically run through the possible ports that could be associated with a web server. This is to be used in cases when we don't know which ports are open on a server, rather than when where we are specifying the port and domain.

Getting ready

In order for this script to work, we'll need to have the script created in the Getting screenshots of a website with QtWeb Kit recipe. This should be saved in the Pythonxx/Lib folder and named something clear and memorable. Here, we've named that script screenshot.py. The naming of your script is particularly essential as we reference it with an important declaration.

How to do it…

This is the script that we will be using:

import screenshot
import requests

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

http = 'http://'
https = 'https://'

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

for port in portList:
    testAndSave(http, port)
    testAndSave(https, port)

How it works…

We first create our import declarations. In this script, we use the screenshot script we created before and also the requests library. The requests library is used so that we can check the status of a request before trying to convert it to an image. We don't want to waste time trying to convert sites that don't exist.

Next, we import our libraries:

import screenshot
import requests

The next step sets up the array of common port numbers that we will be iterating over. We also set up a string with the IP address we will be using:

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

Next, we create strings to hold the protocol part of the URL that we will be building later; this just makes the code later on a little bit neater:

http = 'http://'
https = 'https://'

Next, we create our method, which will do the work of building the URL string. After we've created the URL, we check whether we get a 200 response code back for our get request. If the request is successful, we convert the web page returned to an image and save it with the filename being the successful port number. The code is wrapped in a try block because if the site doesn't exist when we make the request, it will throw an error:

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

Now that our method is ready, we simply iterate over each port in the port list and call our method. We do this once for the HTTP protocol and then with HTTPS:

for port in portList:
    testAndSave(http, port)
    testAndSave(https, port)

And that's it. Simply run the script and it will save the images to the same location as the script.

There's more…

You might notice that the script takes a while to run. This is because it has to check each port in turn. In practice, you would probably want to make this a multithreaded script so that it can check multiple URLs at the same time. Let's take a quick look at how we can modify the code to achieve this.

First, we'll need a couple more import declarations:

import Queue
import threading

Next, we need to create a new function that we will call threader. This new function will handle putting our testAndSave functions into the queue:

def threader(q, port):
    q.put(testAndSave(http, port))
    q.put(testAndSave(https, port))

Now that we have our new function, we just need to set up a new Queue object and make a few threading calls. We will take out the testAndSave calls from our FOR loop over the portList variable and replace it with this code:

q = Queue.Queue()

for port in portList:
    t = threading.Thread(target=threader, args=(q, port))
    t.deamon = True
    t.start()

s = q.get()

So, our new script in total now looks like this:

import Queue
import threading
import screenshot
import requests

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

http = 'http://'
https = 'https://'

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

def threader(q, port):
    q.put(testAndSave(http, port))
    q.put(testAndSave(https, port))

q = Queue.Queue()

for port in portList:
    t = threading.Thread(target=threader, args=(q, port))
    t.deamon = True
    t.start()

s = q.get()

If we run this now, we will get a much quicker execution of our code as the web requests are now being executed in parallel with each other.

You could try to further expand the script to work on a range of IP addresses too; this can be handy when you're testing an internal network range.

Spidering websites

Many tools provide the ability to map out websites, but often you are limited to style of output or the location in which the results are provided. This base plate for a spidering script allows you to map out websites in short order with the ability to alter them as you please.

Getting ready

In order for this script to work, you'll need the BeautifulSoup library, which is installable from the apt command with apt-get install python-bs4 or alternatively pip install beautifulsoup4. It's as easy as that.

How to do it…

This is the script that we will be using:

import urllib2 
from bs4 import BeautifulSoup
import sys
urls = []
urls2 = []

tarurl = sys.argv[1] 

url = urllib2.urlopen(tarurl).read()
soup = BeautifulSoup(url)
for line in soup.find_all('a'):
    newline = line.get('href')
    try: 
        if newline[:4] == "http": 
            if tarurl in newline: 
            urls.append(str(newline)) 
        elif newline[:1] == "/": 
            combline = tarurl+newline urls.append(str(combline)) except: 
               pass

    for uurl in urls: 
        url = urllib2.urlopen(uurl).read() 
        soup = BeautifulSoup(url) 
        for line in soup.find_all('a'): 
            newline = line.get('href') 
            try: 
                if newline[:4] == "http": 
                    if tarurl in newline:
                        urls2.append(str(newline)) 
                elif newline[:1] == "/": 
                    combline = tarurl+newline 
                    urls2.append(str(combline)) 
                    except: 
                pass 
            urls3 = set(urls2) 
    for value in urls3: 
    print value

How it works…

We first import the necessary libraries and create two empty lists called urls and urls2. These will allow us to run through the spidering process twice. Next, we set up input to be added as an addendum to the script to be run from the command line. It will be run like:

$ python spider.py http://www.packtpub.com

We then open the provided url variable and pass it to the beautifulsoup tool:

url = urllib2.urlopen(tarurl).read() 
soup = BeautifulSoup(url) 

The beautifulsoup tool splits the content into parts and allows us to only pull the parts that we want to:

for line in soup.find_all('a'): 
newline = line.get('href') 

We then pull all of the content that is marked as a tag in HTML and grab the element within the tag specified as href. This allows us to grab all the URLs listed in the page.

The next section handles relative and absolute links. If a link is relative, it starts with a slash to indicate that it is a page hosted locally to the web server. If a link is absolute, it contains the full address including the domain. What we do with the following code is ensure that we can, as external users, open all the links we find and list them as absolute links:

if newline[:4] == "http": 
if tarurl in newline: 
urls.append(str(newline)) 
  elif newline[:1] == "/": 
combline = tarurl+newline urls.append(str(combline))

We then repeat the process once more with the urls list that we identified from that page by iterating through each element in the original url list:

for uurl in urls:

Other than a change in the referenced lists and variables, the code remains the same.

We combine the two lists and finally, for ease of output, we take the full list of the urls list and turn it into a set. This removes duplicates from the list and allows us to output it neatly. We iterate through the values in the set and output them one by one.

There's more…

This tool can be tied in with any of the functionality shown earlier and later in this book. It can be tied to Getting Screenshots of a website with QtWeb Kit to allow you to take screenshots of every page. You can tie it to the email address finder in the Chapter 2, Enumeration, to gain email addresses from every page, or you can find another use for this simple technique to map web pages.

The script can be easily changed to add in levels of depth to go from the current level of 2 links deep to any value set by system argument. The output can be changed to add in URLs present on each page, or to turn it into a CSV to allow you to map vulnerabilities to pages for easy notation.

Left arrow icon Right arrow icon

Description

This book is for testers looking for quick access to powerful, modern tools and customizable scripts to kick-start the creation of their own Python web penetration testing toolbox.

Who is this book for?

This book is for testers looking for quick access to powerful, modern tools and customizable scripts to kick-start the creation of their own Python web penetration testing toolbox.

What you will learn

  • Enumerate users on web apps through Python
  • Develop complicated headerbased attacks through Python
  • Deliver multiple XSS strings and check their execution success
  • Handle outputs from multiple tools and create attractive reports
  • Create PHP pages that test scripts and tools
  • Identify parameters and URLs vulnerable to Directory Traversal
  • Replicate existing tool functionality in Python
  • Create basic dialback Python scripts using reverse shells and basic Python PoC malware

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 24, 2015
Length: 224 pages
Edition : 1st
Language : English
ISBN-13 : 9781784392932
Category :
Languages :

What do you get with a Packt Subscription?

Free for first 7 days. ₹800 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jun 24, 2015
Length: 224 pages
Edition : 1st
Language : English
ISBN-13 : 9781784392932
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
₹800 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
₹4500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts
₹5000 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 11,396.97
Learning Penetration Testing with Python
₹4096.99
Kali Linux: Wireless Penetration Testing Beginner's Guide, Second Edition
₹3649.99
Python Web Penetration Testing Cookbook
₹3649.99
Total 11,396.97 Stars icon

Table of Contents

10 Chapters
1. Gathering Open Source Intelligence Chevron down icon Chevron up icon
2. Enumeration Chevron down icon Chevron up icon
3. Vulnerability Identification Chevron down icon Chevron up icon
4. SQL Injection Chevron down icon Chevron up icon
5. Web Header Manipulation Chevron down icon Chevron up icon
6. Image Analysis and Manipulation Chevron down icon Chevron up icon
7. Encryption and Encoding Chevron down icon Chevron up icon
8. Payloads and Shells Chevron down icon Chevron up icon
9. Reporting Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5
(4 Ratings)
5 star 50%
4 star 0%
3 star 0%
2 star 50%
1 star 0%
Perry Nally Aug 26, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Overall an excellent read! Easy to follow scripts presented in the point of view as a hacker (including subtle remarks toward those that use these techniques for ill-fated purposes). Cameron presents an idea, shows the python script and corresponding source contents for which this script works against, then describes the scripts steps, and then goes on to describe additional related things about this script. You could say that each script could build upon each other, but that's not totally true. The author makes sure they are really cut and paste recipes. He gives you the recipe and often a way to include it into a bigger, more comprehensive script - that builds upon each step as the book progresses. Being a cookbook, there is plenty of code examples for you to try out. This is not a book about theory, but rather implementation - so all the fluff is cut out and it gets right to the point.The book also focuses most of the direct web page vulnerability testing (2-3 chapters) at php script as the web pages' source. This would have been nice to have a corresponding discussion related to aspx, jsp, etc. There is some discussion of other technology other than php, and I get that the book would have probably doubled in size if more common page source was discussed, but it is something to think about when reading. Create the same page in aspx or jsp and attempt if there is a similar vulnerability.Don't worry though, there's plenty of scripts related to SQL injection, header processing, encryption, encoding, payloads, shells, and even how to report your findings. These items are not necessarily exclusive to a single technology, so you are not pigeon-holed into testing only a certain type of website/server.This book is not about learning python so if you're new to it, and you really want to understand how to manipulate each recipe, then I suggest searching for a beginner python book. However, that being said, most average level programmers can understand the scripts presented without needing to reach out for a python book/video.Being in the industry for over 15 years, I've seen a lot of tools you can buy off the shelf that tout the ability to do this same thing just by running a program. I think knowing what it actually does is key to really understanding your vulnerabilities rather than trusting someone else's process because after all, attack vectors change all the time and with this information you can easily change your scripts.In conclusion, this book is perfect for a web application developer wanting to test her application or an IT person ready to see just how vulnerable their application is - all with the ability to report the findings to those who need to know where to plug the holes. This is a book I will be referring to during and after each project I work on.
Amazon Verified review Amazon
Denver Water - Dawson Mar 04, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Wonderful published book. Great vendor!!
Amazon Verified review Amazon
Always Helpful Oct 30, 2015
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
I had high hopes for this but the code only works on certain system and a lot of the examples are of a poor quality.If you compare this with other books on the market then it leaves a lot to be desired!
Amazon Verified review Amazon
Jeff Char May 24, 2016
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
So many of examples don't work, have to search online to fix.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.