Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Polymorphism and type-pattern matching in Python [Tutorial]

Save for later
  • 11 min read
  • 13 Aug 2018

article-image

Some functional programming languages offer clever approaches to the problem of working with statically typed function definitions. The problem is that many functions we'd like to write are entirely generic with respect to data type. For example, most of our statistical functions are identical for int or float numbers, as long as the division returns a value that is a subclass of numbers.Real (for example, DecimalFraction, or float).

In many functional languages, sophisticated type or type-pattern matching rules are used by the compiler to make a single generic definition work for multiple data types. Python doesn't have this problem and doesn't need the pattern matching. In this article, we'll understand how to achieve Polymorphism and type-pattern matching in Python.

This Python tutorial is an extract taken from the 2nd edition of the bestseller, Functional Python Programming, authored by Steven Lott.


Instead of the (possibly) complex features of statically typed functional languages, Python changes the approach dramatically. Python uses dynamic selection of the final implementation of an operator based on the data types being used. In Python, we always write generic definitions. The code isn't bound to any specific data type. The Python runtime will locate the appropriate operations based on the types of the actual objects in use. The 3.3.7 Coercion rules section of the language reference manual and the numbers module in the library provide details on how this mapping from operation to special method name works.

This means that the compiler doesn't certify that our functions are expecting and producing the proper data types. We generally rely on unit testing and the mypy tool for this kind of type checking.

In rare cases, we might need to have different behavior based on the types of data elements. We have two ways to tackle this:

  • We can use the isinstance() function to distinguish the different cases
  • We can create our own subclass of numbers.Number or NamedTuple and implement proper polymorphic special method names.


In some cases, we'll actually need to do both so that we can include appropriate data type conversions for each operation. Additionally, we'll also need to use the cast() function to make the types explicit to the mypy tool.

The ranking example in the previous section is tightly bound to the idea of applying rank-ordering to simple pairs. While this is the way the Spearman correlation is defined, a multivariate dataset have a need to do rank-order correlation among all the variables.

The first thing we'll need to do is generalize our idea of rank-order information. The following is a NamedTuple value that handles a tuple of ranks and a raw data object:

from typing import NamedTuple, Tuple, Any
class Rank_Data(NamedTuple):
    rank_seq: Tuple[float]
    raw: Any


A typical use of this kind of class definition is shown in this example:

 >>> data = {'key1': 1, 'key2': 2}
 >>> r = Rank_Data((2, 7), data)
 >>> r.rank_seq[0]
 2
 >>> r.raw
 {'key1': 1, 'key2': 2}


The row of raw data in this example is a dictionary. There are two rankings for this particular item in the overall list. An application can get the sequence of rankings as well as the original raw data item.

We'll add some syntactic sugar to our ranking function. In many previous examples, we've required either an iterable or a concrete collection. The for statement is graceful about working with either one. However, we don't always use the for statement, and for some functions, we've had to explicitly use iter() to make an iterable out of a collection. We can handle this situation with a simple isinstance() check, as shown in the following code snippet:

def some_function(seq_or_iter: Union[Sequence, Iterator]):
    if isinstance(seq_or_iter, Sequence):
        yield from some_function(iter(seq_or_iter), key)
        return
    # Do the real work of the function using the Iterator


This example includes a type check to handle the small difference between a Sequence object and an Iterator. Specifically, the function uses iter() to create an Iterator from a Sequence, and calls itself recursively with the derived value.

For rank-ordering, the Union[Sequence, Iterator] will be supported. Because the source data must be sorted for ranking, it's easier to use list() to transform a given iterator into a concrete sequence. The essential isinstance() check will be used, but instead of creating an iterator from a sequence (as shown previously), the following examples will create a sequence object from an iterator.

In the context of our rank-ordering function, we can make the function somewhat more generic. The following two expressions define the inputs:

Source = Union[Rank_Data, Any]
Union[Sequence[Source], Iterator[Source]]


There are four combinations defined by these two types:

  • Sequence[Rank_Data]
  • Sequence[Any]
  • Iterator[Rank_Data]
  • Iterator[Any]

Handling four combination data types


Here's the rank_data() function with three cases for handling the four combinations of data types:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
from typing import (
    Callable, Sequence, Iterator, Union, Iterable, 
    TypeVar, cast, Union
)
K_ = TypeVar("K_") # Some comparable key type used for ranking.
Source = Union[Rank_Data, Any] 
def rank_data(
        seq_or_iter: Union[Sequence[Source], Iterator[Source]],
        key: Callable[[Rank_Data], K_] = lambda obj: cast(K_, obj)
    ) -> Iterable[Rank_Data]:

if isinstance(seq_or_iter, Iterator):
# Iterator? Materialize a sequence object
yield from rank_data(list(seq_or_iter), key)
return

data: Sequence[Rank_Data]
if isinstance(seq_or_iter[0], Rank_Data):
# Collection of Rank_Data is what we prefer.
data = seq_or_iter
else:
# Convert to Rank_Data and process.
empty_ranks: Tuple[float] = cast(Tuple[float], ())
data = list(
Rank_Data(empty_ranks, raw_data)
for raw_data in cast(Sequence[Source], seq_or_iter)
)

for r, rd in rerank(data, key):
new_ranks = cast(
Tuple[float], 
rd.rank_seq + cast(Tuple[float], (r,)))
yield Rank_Data(new_ranks, rd.raw)


We've decomposed the ranking into three cases to cover the four different types of data. The following are the cases defined by the union of unions:

  • Given an Iterator (an object without a usable __getitem__() method), we'll materialize a list object to work with. This will work for Rank_Data as well as any other raw data type. This case covers objects which are Iterator[Rank_Data] as well as Iterator[Any].
  • Given a Sequence[Any], we'll wrap the unknown objects into Rank_Data tuples with an empty collection of rankings to create a Sequence[Rank_Data].
  • Finally, given a Sequence[Rank_Data], add yet another ranking to the tuple of ranks inside the each Rank_Data container.


The first case calls rank_data() recursively. The other two cases both rely on a rerank() function that builds a new Rank_Data tuple with additional ranking values. This contains several rankings for a complex record of raw data values.

Note that a relatively complex cast() expression is required to disambiguate the use of generic tuples for the rankings. The mypy tool offers a reveal_type() function that can be incorporated to debug the inferred types.

The rerank() function follows a slightly different design to the example of the rank() function shown previously. It yields two-tuples with the rank and the original data object:

def rerank(
        rank_data_iter: Iterable[Rank_Data],
        key: Callable[[Rank_Data], K_]
    ) -> Iterator[Tuple[float, Rank_Data]]:
    sorted_iter = iter(
        sorted(
            rank_data_iter, key=lambda obj: key(obj.raw)
        )
    )
    # Apply ranker to head, *tail = sorted(rank_data_iter)
    head = next(sorted_iter)
    yield from ranker(sorted_iter, 0, [head], key)


The idea behind rerank() is to sort a collection of Rank_Data objects. The first item, head, is used to provide a seed value to the ranker() function. The ranker() function can examine the remaining items in the iterable to see if they match this initial value, this allows computing a proper rank for a batch of matching items.

The ranker() function accepts a sorted iterable of data, a base rank number, and an initial collection of items of the minimum rank. The result is an iterable sequence of two-tuples with a rank number and an associated Rank_Data object:

def ranker(
        sorted_iter: Iterator[Rank_Data],
        base: float,
        same_rank_seq: List[Rank_Data],
        key: Callable[[Rank_Data], K_]
    ) -> Iterator[Tuple[float, Rank_Data]]:
    try:
        value = next(sorted_iter)
    except StopIteration:
        dups = len(same_rank_seq)
        yield from yield_sequence(
           (base+1+base+dups)/2, iter(same_rank_seq))
        return
    if key(value.raw) == key(same_rank_seq[0].raw):
        yield from ranker(
            sorted_iter, base, same_rank_seq+[value], key)
    else:
        dups = len(same_rank_seq)
        yield from yield_sequence(
            (base+1+base+dups)/2, iter(same_rank_seq))
        yield from ranker(
            sorted_iter, base+dups, [value], key)


This starts by attempting to extract the next item from the sorted_iter collection of sorted Rank_Data items. If this fails with a StopIteration exception, there is no next item, the source was exhausted. The final output is the final batch of equal-valued items in the same_rank_seq sequence.

If the sequence has a next item, the key() function extracts the key value. If this new value matches the keys in the same_rank_seq collection, it is accumulated into the current batch of same-valued keys.  The final result is based on the rest of the items in sorted_iter, the current value for the rank, a larger batch of same_rank items that now includes the head value, and the original key() function.

If the next item's key doesn't match the current batch of equal-valued items, the final result has two parts. The first part is the batch of equal-valued items accumulated in same_rank_seq.  This is followed by the reranking of the remainder of the sorted items. The base value for these is incremented by the number of equal-valued items, a fresh batch of equal-rank items is initialized with the distinct key, and the original key() extraction function is provided.

The output from ranker() depends on the yield_sequence() function, which looks as follows:

def yield_sequence(
        rank: float,
        same_rank_iter: Iterator[Rank_Data]
    ) -> Iterator[Tuple[float, Rank_Data]]:
    head = next(same_rank_iter)
    yield rank, head
    yield from yield_sequence(rank, same_rank_iter)


We've written this in a way that emphasizes the recursive definition. For any practical work, this should be optimized into a single for statement.

When doing Tail-Call Optimization to transform a recursion into a loop define unit test cases first. Be sure the recursion passes the unit test cases before optimizing.


The following are some examples of using this function to rank (and rerank) data. We'll start with a simple collection of scalar values:

>>> scalars= [0.8, 1.2, 1.2, 2.3, 18]
>>> list(rank_data(scalars))
[Rank_Data(rank_seq=(1.0,), raw=0.8), 
 Rank_Data(rank_seq=(2.5,), raw=1.2),
 Rank_Data(rank_seq=(2.5,), raw=1.2), 
 Rank_Data(rank_seq=(4.0,), raw=2.3),
 Rank_Data(rank_seq=(5.0,), raw=18)]


Each value becomes the raw attribute of a Rank_Data object.

When we work with a slightly more complex object, we can also have multiple rankings. The following is a sequence of two tuples:

 >>> pairs = ((2, 0.8), (3, 1.2), (5, 1.2), (7, 2.3), (11, 18))
 >>> rank_x = list(rank_data(pairs, key=lambda x:x[0]))
 >>> rank_x
 [Rank_Data(rank_seq=(1.0,), raw=(2, 0.8)),
 Rank_Data(rank_seq=(2.0,), raw=(3, 1.2)),
 Rank_Data(rank_seq=(3.0,), raw=(5, 1.2)),
 Rank_Data(rank_seq=(4.0,), raw=(7, 2.3)),
 Rank_Data(rank_seq=(5.0,), raw=(11, 18))]

>>> rank_xy = list(rank_data(rank_x, key=lambda x:x[1] ))
>>> rank_xy
[Rank_Data(rank_seq=(1.0, 1.0), raw=(2, 0.8)),
Rank_Data(rank_seq=(2.0, 2.5), raw=(3, 1.2)),
Rank_Data(rank_seq=(3.0, 2.5), raw=(5, 1.2)),
Rank_Data(rank_seq=(4.0, 4.0), raw=(7, 2.3)),
Rank_Data(rank_seq=(5.0, 5.0), raw=(11, 18))]


Here, we defined a collection of pairs. Then, we ranked the two tuples, assigning the sequence of Rank_Data objects to the rank_x variable. We then ranked this collection of Rank_Data objects, creating a second rank value and assigning the result to the rank_xy variable.

The resulting sequence can be used for a slightly modified rank_corr() function to compute the rank correlations of any of the available values in the rank_seq attribute of the Rank_Data objects. We'll leave this modification as an exercise for you.

If you found this tutorial useful and would like to learn more such techniques, head over to get Steven Lott's bestseller, Functional Python Programming.

Why functional programming in Python matters: Interview with best selling author, Steven Lott

Top 7 Python programming books you need to read

Members Inheritance and Polymorphism