There are situations where you need to create only one instance of data throughout the lifetime of a program. This can be a class instance, a list, or a dictionary, for example. The creation of a second instance is undesirable. This can result in logical errors or malfunctioning of the program. The design pattern that allows you to create only one instance of data is called singleton. In this article, you will learn about module-level, classic, and borg singletons; you'll also learn about how they work, when to use them, and build a two-threaded web crawler that uses a singleton to access the shared resource.
(For more resources related to this topic, see here.)
Singleton is the best candidate when the requirements are as follows:
Some typical use cases of using a singleton are:
There are several ways to implement singletons. We will look at module-level singleton, classic singletons, and borg singleton.
All modules are singletons by nature because of Python's module importing steps:
So, if you want to quickly make a singleton, use the following steps and keep the shared data as the module attribute.
singletone.py:
only_one_var = "I'm only one var"
module1.py:
import single tone
print singleton.only_one_var
singletone.only_one_var += " after modification"
import module2
module2.py:
import singletone
print singleton.only_one_var
Here, if you try to import a global variable in a singleton module and change its value in the module1 module, module2 will get a changed variable.
This function is quick and sometimes is all that you need; however, we need to consider the following points:
In classic singleton in Python, we check whether an instance is already created. If it is created, we return it; otherwise, we create a new instance, assign it to a class attribute, and return it.
Let's try to create a dedicated singleton class:
class Singleton(object):
def __new__(cls):
if not hasattr(cls, 'instance'):
cls.instance = super(Singleton, cls).__new__(cls)
return cls.instance
Here, before creating the instance, we check for the special __new__ method, which is called right before __init__ if we had created an instance earlier. If not, we create a new instance; otherwise, we return the already created instance.
Let's check how it works:
>>> singleton = Singleton()
>>> another_singleton = Singleton()
>>> singleton is another_singleton
True
>>> singleton.only_one_var = "I'm only one var"
>>> another_singleton.only_one_var
I'm only one var
Try to subclass the Singleton class with another one.
class Child(Singleton):
pass
If it's a successor of Singleton, all of its instances should also be the instances of Singleton, thus sharing its states. But this doesn't work as illustrated in the following code:
>>> child = Child()
>>> child is singleton
>>> False
>>> child.only_one_var
AttributeError: Child instance has no attribute 'only_one_var'
To avoid this situation, the borg singleton is used.
Borg is also known as monostate. In the borg pattern, all of the instances are different, but they share the same state.
In the following code , the shared state is maintained in the _shared_state attribute. And all new instances of the Borg class will have this state as defined in the __new__ class method.
class Borg(object):
_shared_state = {}
def __new__(cls, *args, **kwargs):
obj = super(Borg, cls).__new__(cls, *args, **kwargs)
obj.__dict__ = cls._shared_state
return obj
Generally, Python stores the instance state in the __dict__ dictionary and when instantiated normally, every instance will have its own __dict__. But, here we deliberately assign the class variable _shared_state to all of the created instances.
Here is how it works with subclassing:
class Child(Borg):
pass
>>> borg = Borg()
>>> another_borg = Borg()
>>> borg is another_borg
False
>>> child = Child()
>>> borg.only_one_var = "I'm the only one var"
>>> child.only_one_var
I'm the only one var
So, despite the fact that you can't compare objects by their identity, using the is statement, all child objects share the parents' state.
If you want to have a class, which is a descendant of the Borg class but has a different state, you can reset shared_state as follows:
class AnotherChild(Borg):
_shared_state = {}
>>> another_child = AnotherChild()
>>> another_child.only_one_var
AttributeError: AnotherChild instance has no attribute 'shared_state'
Which type of singleton should be used is up to you. If you expect that your singleton will not be inherited, you can choose the classic singleton; otherwise, it's better to stick with borg.
As a practical example, we'll create a simple web crawler that scans a website you open on it, follows all the links that lead to the same website but to other pages, and downloads all of the images it'll find.
To do this, we'll need two functions: a function that scans a website for links, which leads to other pages to build a set of pages to visit, and a function that scans a page for images and downloads them.
To make it quicker, we'll download images in two threads. These two threads should not interfere with each other, so don't scan pages if another thread has already scanned them, and don't download images that are already downloaded.
So, a set with downloaded images and scanned web pages will be a shared resource for our application, and we'll keep it in a singleton instance.
In this example, you will need a library for parsing and screen scraping websites named BeautifulSoup and an HTTP client library httplib2. It should be sufficient to install both with either of the following commands:
First of all, we'll create a Singleton class. Let's use the classic singleton in this example:
import httplib2
import os
import re
import threading
import urllib
from urlparse import urlparse, urljoin
from BeautifulSoup import BeautifulSoup
class Singleton(object):
def __new__(cls):
if not hasattr(cls, 'instance'):
cls.instance = super(Singleton, cls).__new__(cls)
return cls.instance
It will return the singleton objects to all parts of the code that request it.
Next, we'll create a class for creating a thread. In this thread, we'll download images from the website:
class ImageDownloaderThread(threading.Thread):
"""A thread for downloading images in parallel."""
def __init__(self, thread_id, name, counter):
threading.Thread.__init__(self)
self.name = name
def run(self):
print 'Starting thread ', self.name
download_images(self.name)
print 'Finished thread ', self.name
The following function traverses the website using BFS algorithms, finds links, and adds them to a set for further downloading. We are able to specify the maximum links to follow if the website is too large.
def traverse_site(max_links=10):
link_parser_singleton = Singleton()
# While we have pages to parse in queue
while link_parser_singleton.queue_to_parse:
# If collected enough links to download images, return
if len(link_parser_singleton.to_visit) == max_links:
return
url = link_parser_singleton.queue_to_parse.pop()
http = httplib2.Http()
try:
status, response = http.request(url)
except Exception:
continue
# Skip if not a web page
if status.get('content-type') != 'text/html':
continue
# Add the link to queue for downloading images
link_parser_singleton.to_visit.add(url)
print 'Added', url, 'to queue'
bs = BeautifulSoup(response)
for link in BeautifulSoup.findAll(bs, 'a'):
link_url = link.get('href')
# <img> tag may not contain href attribute
if not link_url:
continue
parsed = urlparse(link_url)
# If link follows to external webpage, skip it
if parsed.netloc and parsed.netloc != parsed_root.netloc:
continue
# Construct a full url from a link which can be relative
link_url = (parsed.scheme or parsed_root.scheme) + '://' + (parsed.netloc or parsed_root.netloc) + parsed.path or ''
# If link was added previously, skip it
if link_url in link_parser_singleton.to_visit:
continue
# Add a link for further parsing
link_parser_singleton.queue_to_parse = [link_url] + link_parser_singleton.queue_to_parse
The following function downloads images from the last web resource page in the singleton.to_visit queue and saves it to the img directory. Here, we use a singleton for synchronizing shared data, which is a set of pages to visit between two threads:
def download_images(thread_name):
singleton = Singleton()
# While we have pages where we have not download images
while singleton.to_visit:
url = singleton.to_visit.pop()
http = httplib2.Http()
print thread_name, 'Starting downloading images from', url
try:
status, response = http.request(url)
except Exception:
continue
bs = BeautifulSoup(response)
# Find all <img> tags
images = BeautifulSoup.findAll(bs, 'img')
for image in images:
# Get image source url which can be absolute or relative
src = image.get('src')
# Construct a full url. If the image url is relative,
# it will be prepended with webpage domain.
# If the image url is absolute, it will remain as is
src = urljoin(url, src)
# Get a base name, for example 'image.png' to name file locally
basename = os.path.basename(src)
if src not in singleton.downloaded:
singleton.downloaded.add(src)
print 'Downloading', src
# Download image to local filesystem
urllib.urlretrieve(src, os.path.join('images', basename))
print thread_name, 'finished downloading images from', url
Our client code is as follows:
if __name__ == '__main__':
root = 'http://python.org'
parsed_root = urlparse(root)
singleton = Singleton()
singleton.queue_to_parse = [root]
# A set of urls to download images from
singleton.to_visit = set()
# Downloaded images
singleton.downloaded = set()
traverse_site()
# Create images directory if not exists
if not os.path.exists('images'):
os.makedirs('images')
# Create new threads
thread1 = ImageDownloaderThread(1, "Thread-1", 1)
thread2 = ImageDownloaderThread(2, "Thread-2", 2)
# Start new Threads
thread1.start()
thread2.start()
Run a crawler using the following command:
$ python crawler.py
You should get the following output (your output may vary because the order in which the threads access resources is not predictable):
If you go to the images directory, you will find the downloaded images there.
To learn more about design patterns in depth, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended:
Further resources on this subject: