Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Python High Performance

You're reading from   Mastering Python High Performance Learn how to optimize your code and Python performance with this vital guide to Python performance profiling and benchmarking

Arrow left icon
Product type Paperback
Published in Sep 2015
Publisher
ISBN-13 9781783989300
Length 260 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Fernando Donglio Fernando Donglio
Author Profile Icon Fernando Donglio
Fernando Donglio
Arrow right icon
View More author details
Toc

The initial code base


Let's now list all of the code that we'll optimize in future, based on the earlier description.

The first of the following points is quite simple: a single file script that takes care of scraping and saving in JSON format like we discussed earlier. The flow is simple, and the order is as follows:

  1. It will query the list of questions page by page.

  2. For each page, it will gather the question's links.

  3. Then, for each link, it will gather the information listed from the previous points.

  4. It will move on to the next page and start over again.

  5. It will finally save all of the data into a JSON file.

The code is as follows:

from bs4 import BeautifulSoup
import requests
import json


SO_URL = "http://scifi.stackexchange.com"
QUESTION_LIST_URL = SO_URL + "/questions"
MAX_PAGE_COUNT = 20

global_results = []
initial_page = 1 #first page is page 1

def get_author_name(body):
  link_name = body.select(".user-details a")
  if len(link_name) == 0:
    text_name = body.select(".user-details")
...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image