Recent developments in Python

Python is a dynamically typed, interpreted language that was initially well suited for scripting tasks that are boring and repetitive day-to-day tasks. But as the years progressed, the language gained a number of new features and the huge backing of its community, which propelled its development to make it a language that is now well suited to performing tasks that range from very simple applications, such as web scraping, to analyzing large amounts of data for training machine learning models that themselves are written in Python. Let's take a look at some of the major things that have changed over the years and see what the latest release of Python, Python 3, brings to the table.

Dropping backward compatibility

Python as a language has evolved a lot over the years, but despite this fact, a program written in Python 1.0 will still be able to run in Python 2.7, which is a version that was released 19 years after Python 1.0.

Though a great benefit for the developers of Python applications, this backward compatibility of the language is also a major hurdle in the growth and development of major improvements in the language specification, since a great amount of the older code base will break if major changes are made to the language specification.

With the release of Python 3, this chain of backward compatibility was broken. The language in version 3 dropped the support for programs that were written in earlier versions in favor of allowing a larger set of long-overdue improvements to the language. However, this decision disappointed quite a lot of developers in the community.

It's all Unicode

In the days of Python 2, the text data type str was used to support ASCII data, and for Unicode data, the language provided a unicode data type. When someone wanted to deal with a particular encoding, they took a string and encoded it into the required encoding scheme.

Also, the language inherently supported an implicit conversion of the string type to the unicode type. This is shown in the following code snippet:

str1 = 'Hello'
type(str1)        # type(str1) => 'str'
str2 = u'World'
type(str2)        # type(str2) => 'unicode'
str3 = str1 + str2
type(str3)        # type(str3) => 'unicode'

This used to work, because here, Python would implicitly decode the byte string str1 into Unicode using the default encoding and then perform a concatenation. One thing to note here is that if this str1 string contained any non-ASCII characters, then this concatenation would have failed in Python, raising a UnicodeDecodeError.

With the arrival of Python 3, the data types that dealt with text changed. Now, the default data type str which was used to store text supports Unicode. With this, Python 3 also introduced a binary data type, called bytes, which can be used to store binary data. These two types, str and bytes, are incompatible and no implicit conversion between them will happen, and any attempt to do so will give rise to TypeError, as shown in the following code:

str1 = 'I am a unicode string'
type(str1) # type(str1) => 'str'
str2 = b"And I can't be concatenated to a byte string"
type(str2) # type(str2) => 'bytes'
str3 = str1 + str2
-----------------------------------------------------------
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytes

As we can see, an attempt to concatenate a unicode type string with a byte type string failed with TypeError. Although an implicit conversion of a string to a byte or a byte to a string is not possible, we do have methods that allow us to encode a string into a bytes type and decode a bytes type to a string. Look at the following code:

str1 = '₹100'
str1.encode('utf-8')
#b'\xe2\x82\xb9100'
b'\xe2\x82\xb9100'.decode('utf-8')
# '₹100'

This clear distinction between a string type and binary type with restrictions on implicit conversion allows for more robust code and fewer errors. But these changes also mean that any code that used to deal with the handling of Unicode in Python 2 will need to be rewritten in Python 3 because of the backward incompatibility.

Here, you should focus on the encoding and decoding format used to convert string to bytes and vice versa. Choosing a different formatting for encoding and decoding can result in the loss of important information, and can result in corrupt data.

Support for type hinting

Python is a dynamically typed language, and hence the type of a variable is evaluated at runtime by the interpreter once a value has been assigned to the variable, as shown in the following code:

a = 10
type(a)        # type(a) => 'int'
a = "Joe"      
type(a)        # type(a) => 'str'

Though dynamic interpretation of the type of a variable can be handy while writing small programs where the code base can be easily tracked, the feature of the language can also become a big problem when working with very large code bases, which spawn a lot of modules, and where keeping track of the type of a particular variable can become a challenge and silly mistakes related to the use of incompatible types can happen easily. Look at the following code:

def get_name_string(name):
    return name['first_name'] + name['last_name']

username = "Joe Cruze"

print(get_name_string(username))

Let's see what happens if we try to execute the preceding program:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in get_name_string
TypeError: string indices must be integers

The program exited with a TypeError because we passed a string type to the get_name_string() method and then tried to access the keys inside a string, which is not the correct solution.

With the release of Python 3.5, the community added support for type hinting that was built into the language. This was not an effort to enforce a method, but was rather provided to support the users who may want to use modules that can catch errors that are related to a variable changing its type in between the execution flow.

The general syntax to mark the type of a variable is as follows:

<variable>: <type> = <value>

To mark the types in a method, the following syntax can be used:

def method_name(parameter: type) -> return_type:
    # method body

One of the examples of how to use type hinting in the program code is shown in the following code:

from typing import Dict

def get_name_string(name: Dict[str, str]) -> str:
    return name['first_name'] + name['last_name']

username = "Joe Cruze"

print(get_name_string(username))

When the preceding code is written in an IDE, the IDE can use these type hints to mark the possible violations of the type change of variable in the code base, which can prove out to be really handy by helping to avoid errors that are related to an incorrect type change when dealing with large code bases.

An important point that needs to be reiterated here is that using type hinting does not guarantee that the interpreter will raise an error if you pass a parameter with an incorrect type to the method. The type hinting support is not enforceable and should only be used either with other tools that can help check type violations or with IDEs to support the development process.