Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Effective Python Penetration Testing

You're reading from   Effective Python Penetration Testing Pen test your system like a pro and overcome vulnerabilities by leveraging Python scripts, libraries, and tools

Arrow left icon
Product type Paperback
Published in Jun 2016
Publisher Packt
ISBN-13 9781785280696
Length 164 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Rejah Rehim Rejah Rehim
Author Profile Icon Rejah Rehim
Rejah Rehim
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Python Scripting Essentials 2. Analyzing Network Traffic with Scapy FREE CHAPTER 3. Application Fingerprinting with Python 4. Attack Scripting with Python 5. Fuzzing and Brute-Forcing 6. Debugging and Reverse Engineering 7. Crypto, Hash, and Conversion Functions 8. Keylogging and Screen Grabbing 9. Attack Automation 10. Looking Forward

Web scraping


Even though some sites offer APIs, most websites are designed mainly for human eyes and only provide HTML pages formatted for humans. If we want a program to fetch some data from such a website, we have to parse the markup to get the information we need. Web scraping is the method of using a computer program to analyze a web page and get the data needed.

There are many methods to fetch the content from the site with Python modules:

  • Use urllib/urllib2 to create an HTTP request that will fetch the webpage, and using BeautifulSoup to parse the HTML

  • To parse an entire website we can use Scrapy (http://scrapy.org), which helps to create web spiders

  • Use requests module to fetch and lxml to parse

urllib / urllib2 module

Urllib is a high-level module that allows us to script different services such as HTTP, HTTPS, and FTP.

Useful methods of urllib/urllib2

Urllib/urllib2 provide methods that can be used for getting resources from URLs, which includes opening web pages, encoding arguments, manipulating...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image