Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Modern Python Standard Library Cookbook

You're reading from   Modern Python Standard Library Cookbook Over 100 recipes to fully leverage the features of the standard library in Python

Arrow left icon
Product type Paperback
Published in Aug 2018
Publisher Packt
ISBN-13 9781788830829
Length 366 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Alessandro Molina Alessandro Molina
Author Profile Icon Alessandro Molina
Alessandro Molina
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Containers and Data Structures 2. Text Management FREE CHAPTER 3. Command Line 4. Filesystem and Directories 5. Date and Time 6. Read/Write Data 7. Algorithms 8. Cryptography 9. Concurrency 10. Networking 11. Web Development 12. Multimedia 13. Graphical User Interfaces 14. Development Tools 15. Other Books You May Enjoy

Prioritizing entries

Picking the first/top entry of a set of values is a pretty frequent need; this usually means defining one value that has priority over the other and involves sorting.

But sorting can be expensive and re-sorting every time you add an entry to your values is certainly not a very convenient way to pick the first entry out of a set of values with some kind of priority.

How to do it...

Heaps are a perfect match for everything that has priorities, such as a priority queue:

import time
import heapq

class PriorityQueue:
    def __init__(self):
        self._q = []

    def add(self, value, priority=0):
        heapq.heappush(self._q, (priority, time.time(), value))

    def pop(self):
        return heapq.heappop(self._q)[-1]

Then, our PriorityQueue can be used to retrieve entries given a priority:

>>> def f1(): print('hello')
>>> def f2(): print('world')
>>>
>>> pq = PriorityQueue()
>>> pq.add(f2, priority=1)
>>> pq.add(f1, priority=0)
>>> pq.pop()()
hello
>>> pq.pop()()
world

How it works...

PriorityQueue works by storing everything in an heap. Heaps are particularly efficient at retrieving the top/first element of a sorted set without having to actually sort the whole set.

Our priority queue stores all the values in a three-element tuple: priority, time.time(), and value.

The first entry of our tuple is priority (lower is better). In the example, we recorded f1 with a better priority than f2, which ensures than when we use heap.heappop to fetch tasks to process, we get f1 and then f2, so that we end up with the hello world message and not world hello.

The second entry, timestamp, is used to ensure that tasks that have the same priority are processed in their insertion order. The oldest task will be served first as it will have the smallest timestamp.

Then, we have the value itself, which is the function we want call for our task.

There's more...

A very common approach to sorting is to keep a list of entries in a tuple, where the first element is key for which we are sorting and the second element is the value itself.

For a scoreboard, we can keep each player's name and how many points they got:

scores = [(123, 'Alessandro'),
          (143, 'Chris'),
          (192, 'Mark']

Storing those values in tuples works because comparing two tuples is performed by comparing each element of the first tuple with the element in the same index position in the other tuple:

>>> (10, 'B') > (10, 'A')
True
>>> (11, 'A') > (10, 'B')
True

It's very easy to understand what's going on if you think about strings. 'BB' > 'BB' is the same as ('B', 'B') > ('B', 'A'); in the end, a string is just a list of characters.

We can use this property to sort our scores and retrieve the winner of a competition:

>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

The major problem with this approach is that every time we add an entry to our list, we have to sort it again, or our scoreboard would became meaningless:

>>> scores.append((137, 'Rick'))
>>> scores[-1]
(137, 'Rick')
>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

This is very inconvenient because it's easy to miss re-sorting somewhere if we have multiple places appending to the list, and sorting the whole list every time can be expensive.

The Python standard library offers a data structure that is a perfect match when we're interested in finding out the winner of a competition.

In the heapq module, we have a fully working implementation of a heap data structure, a particular kind of tree where each parent is smaller than its children. This provides us with a tree that has a very interesting property: the root element is always the smallest one.

And being implemented on top of a list, it means that l[0] is always the smallest element in a heap:

>>> import heapq
>>> l = []
>>> heapq.heappush(l, (192, 'Mark'))
>>> heapq.heappush(l, (123, 'Alessandro'))
>>> heapq.heappush(l, (137, 'Rick'))
>>> heapq.heappush(l, (143, 'Chris'))
>>> l[0]
(123, 'Alessandro')

You might have noticed, by the way, that the heap finds the loser of our tournament, not the winner, and we were interested in finding the best player, with the highest value.

This is a minor problem we can easily solve by storing all scores as negative numbers. If we store each score as * -1, the head of the heap will always be the winner:

>>> l = []
>>> heapq.heappush(l, (-143, 'Chris'))
>>> heapq.heappush(l, (-137, 'Rick'))
>>> heapq.heappush(l, (-123, 'Alessandro'))
>>> heapq.heappush(l, (-192, 'Mark'))
>>> l[0]
(-192, 'Mark')
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime