Lists, Sets, Strings, Tuples, and Dictionaries
Now that we have learned the importance of Python, we will start by exploring various basic data structures in Python. We will learn techniques to handle data. This is invaluable for a data practitioner.
We can issue the following command to start a new Jupyter server by typing the following in to the Command Prompt window:
This will start a jupyter server and you can visit it at http://localhost:8888 and use the passcode dw_4_all to access the main interface.
Lists are fundamental Python data structures that have continuous memory locations, can host different data types, and can be accessed by the index.
We will start with a list and list comprehension. We will generate a list of numbers, and then examine which ones among them are even. We will sort, reverse, and check for duplicates. We will also see how many different ways we can access the list elements, iterating over them and checking the membership of an element.
The following is an example of a simple list:
The following is also an example of a list:
As you can see, a list can contain any number of the allowed datatype, such as int, float, string, and Boolean, and a list can also be a mix of different data types (including nested lists).
If you are coming from a strongly typed language, such as C, C++, or Java, then this will probably be strange as you are not allowed to mix different kinds of data types in a single array in those languages. Lists are somewhat like arrays, in the sense that they are both based on continuous memory locations and can be accessed using indexes. But the power of Python lists come from the fact that they can host different data types and you are allowed to manipulate the data.
Note
Be careful, though, as the very power of lists, and the fact that you can mix different data types in a single list, can actually create subtle bugs that can be very difficult to track.
Exercise 1: Accessing the List Members
In the following exercise, we will be creating a list and then observing the different ways of accessing the elements:
Define a list called list_1 with four integer members, using the following command:
The indices will be automatically assigned, as follows:
Access the first element from list_1 using its forward index:
Access the last element from list_1 using its forward index:
Access the last element from list_1 using the len function:
The len function in Python returns the length of the specified list.
Access the last element from list_1 using its backward index:
Access the first three elements from list_1 using forward indices:
This is also called list slicing, as it returns a smaller list from the original list by extracting only, a part of it. To slice a list, we need two integers. The first integer will denote the start of the slice and the second integer will denote the end-1 element.
Note
Notice that slicing did not include the third index or the end element. This is how list slicing works.
Access the last two elements from list_1 by slicing:
Access the first two elements using backward indices:
When we leave one side of the colon (:) blank, we are basically telling Python either to go until the end or start from the beginning of the list. It will automatically apply the rule of list slices that we just learned.
Reverse the elements in the string:
Note
The last bit of code is not very readable, meaning it is not obvious just by looking at it what it is doing. It is against Python's philosophy. So, although this kind of code may look clever, we should resist the temptation to write code like this.
Exercise 2: Generating a List
We are going to examine various ways of generating a list:
Create a list using the append method:
The output will be as follows:
Here, we started by declaring an empty list and then we used a for loop to append values to it. The append method is a method that's given to us by the Python list data type.
Generate a list using the following command:
The partial output is as follows:
This is list comprehension, which is a very powerful tool that we need to master. The power of list comprehension comes from the fact that we can use conditionals inside the comprehension itself.
Use a while loop to iterate over a list to understand the difference between a while loop and a for loop:
The partial output will be as follows:
Create list_3 with numbers that are divisible by 5:
The output will be a list of numbers up to 100 in increments of 5:
Generate a list by adding the two lists:
The output is as follows:
Extend a string using the extend keyword:
The partial output is as follows:
The second operation changes the original list (list_1) and appends all the elements of list_2 to it. So, be careful when using it.
Exercise 3: Iterating over a List and Checking Membership
We are going to iterate over a list and test whether a certain value exists in it:
Iterate over a list:
The output is as follows:
However, it is not very Pythonic. Being Pythonic is to follow and conform to a set of best practices and conventions that have been created over the years by thousands of very able developers, which in this case means to use the in keyword, because Python does not have index initialization, bounds checking, or index incrementing, unlike traditional languages. The Pythonic way of iterating over a list is as follows:
The output is as follows:
Notice that, in the second method, we do not need a counter anymore to access the list index; instead, Python's in operator gives us the element at the i th position directly.
Check whether the integers 25 and -45 are in the list using the in operator:
The output is True.
The output is False.
Exercise 4: Sorting a List
We generated a list called list_1 in the previous exercise. We are going to sort it now:
Note
The difference between the sort function and the reverse function is the fact that we can use sort with custom sorting functions to do custom sorting, whereas we can only use reverse to reverse a list. Here also, both the functions work in-place, so be aware of this while using them.
As the list was originally a list of numbers from 0 to 99, we will sort it in the reverse direction. To do that, we will use the sort method with reverse=True:
The partial output is as follows:
We can use the reverse method directly to achieve this result:
The output is as follows:
Exercise 5: Generating a Random List
In this exercise, we will be generating a list with random numbers:
Import the random library:
Use the randint function to generate random integers and add them to a list:
Print the list using print(list_1). Note that there will be duplicate values in list_1:
The sample output is as follows:
There are many ways to get a list of unique numbers, and while you may be able to write a few lines of code using a for loop and another list (you should actually try doing it!), let's see how we can do this without a for loop and with a single line of code. This will bring us to the next data structure, sets.
Activity 1: Handling Lists
In this activity, we will generate a list of random numbers and then generate another list from the first one, which only contains numbers that are divisible by three. Repeat the experiment three times. Then, we will calculate the average difference of length between the two lists.
These are the steps for completing this activity:
Create a list of 100 random numbers.
Create a new list from this random list, with numbers that are divisible by 3.
Calculate the length of these two lists and store the difference in a new variable.
Using a loop, perform steps 2 and 3 and find the difference variable three times.
Find the arithmetic mean of these three difference values.
Note
The solution for this activity can be found on page 282.
A set, mathematically speaking, is just a collection of well-defined distinct objects. Python gives us a straightforward way to deal with them using its set datatype.
With the last list that we generated, we are going to revisit the problem of getting rid of duplicates from it. We can achieve that with the following line of code:
If we print this, we will see that it only contains unique numbers. We used the set data type to turn the first list into a set, thus getting rid of all duplicate elements, and then we used the list function on it to turn it into a list from a set once more:
The output will be as follows:
Union and Intersection of Sets
This is what a union between two sets looks like:
This simply means take everything from both sets but take the common elements only once.
We can create this using the following code:
To find the union of the two sets, the following instructions should be used:
The output would be as follows:
Notice that the common element, Banana, appears only once in the resulting set. The common elements between two sets can be identified by obtaining the intersection of the two sets, as follows:
We get the intersection of two sets in Python as follows:
This will give us a set with only one element. The output is as follows:
You can create a null set by creating a set containing no elements. You can do this by using the following code:
The output is as follows:
However, to create a dictionary, use the following command:
The output is as follows:
We are going to learn about this in detail in the next topic.
A dictionary is like a list, which means it is a collection of several elements. However, with the dictionary, it is a collection of key-value pairs, where the key can be anything that can be hashed. Generally, we use numbers or strings as keys.
To create a dictionary, use the following code:
The output is as follows:
This is also a valid dictionary:
The output is as follows:
The keys must be unique in a dictionary.
Exercise 6: Accessing and Setting Values in a Dictionary
In this exercise, we are going to access and set values in a dictionary:
Access a particular key in a dictionary:
This will return the value associated with it as follows:
Assign a new value to the key:
Define a blank dictionary and then use the key notation to assign values to it:
The output is as follows:
Exercise 7: Iterating Over a Dictionary
In this exercise, we are going to iterate over a dictionary:
Create dict_1:
Use the looping variables k and v:
The output is as follows:
Note
Notice the difference between how we did the iteration on the list and how we are doing it here.
Exercise 8: Revisiting the Unique Valued List Problem
We will use the fact that dictionary keys cannot be duplicated to generate the unique valued list:
First, generate a random list with duplicate values:
Create a unique valued list from list_1:
The sample output is as follows:
Here, we have used two useful functions on the dict data type in Python, fromkeys and keys. fromkeys creates a dict where the keys come from the iterable (in this case, which is a list), values default to None, and keys give us the keys of a dict.
Exercise 9: Deleting Value from Dict
In this exercise, we are going to delete a value from a dict:
Create list_1 with five elements:
The output is as follows:
We will use the del function and specify the element:
The output is as follows:
Note
The del operator can be used to delete a specific index from a list as well.
Exercise 10: Dictionary Comprehension
In this final exercise on dict, we will go over a less used comprehension than the list one: dictionary comprehension. We will also examine two other ways to create a dict, which will be useful in the future.
A dictionary comprehension works exactly the same way as the list one, but we need to specify both the keys and values:
Generate a dict that has 0 to 9 as the keys and the square of the key as the values:
The output is as follows:
Can you generate a dict using dict comprehension where the keys are from 0 to 9 and the values are the square root of the keys? This time, we won't use a list.
Generate a dictionary using the dict function:
The output is as follows:
You can also generate dictionary using the dict function, as follows:
The output is as follows:
It is pretty versatile. So, both the preceding commands will generate valid dictionaries.
The strange looking pair of values that we had just noticed ('Harry', 300) is called a tuple. This is another important fundamental data type in Python. We will learn about tuples in the next topic.
A tuple is another data type in Python. It is sequential in nature and similar to lists.
A tuple consists of values separated by commas, as follows:
Notice that, unlike lists, we did not open and close square brackets here.
Creating a Tuple with Different Cardinalities
This is how we create an empty tuple:
And this is how we create a tuple with only one value:
Notice the trailing comma here.
We can nest tuples, similar to list and dicts, as follows:
One special thing about tuples is the fact that they are an immutable data type. So, once created, we cannot change their values. We can just access them, as follows:
The last line of code will result in a TypeError as a tuple does not allow modification.
This makes the use case for tuples a bit different than lists, although they look and behave very similarly in a few aspects.
The term unpacking a tuple simply means to get the values contained in the tuple in different variables:
The output is as follows:
Of course, as soon as we do that, we can modify the values contained in those variables.
Exercise 11: Handling Tuples
Create a tuple to demonstrate how tuples are immutable. Unpack it to read all elements, as follows:
The output is as follows:
Try to override a variable from the tupleE tuple:
This step will result in TypeError as the tuple does not allow modification.
Try to assign a series to the tupleE tuple:
Print the output:
The output is as follows:
We have mainly seen two different types of data so far. One is represented by numbers; another is represented by textual data. Whereas numbers have their own tricks, which we will see later, it is time to look into textual data in a bit more detail.
In the final section of this section, we will learn about strings. Strings in Python are similar to any other programming language.
This is a string:
A string can also be declared in this manner:
You can use single quotes and double quotes to define a string.
Exercise 12: Accessing Strings
Strings in Python behave similar to lists, apart from one big caveat. Strings are immutable, whereas lists are mutable data structures:
Create a string called str_1:
Access the elements of the string by specifying the location of the element, like we did in lists.
Access the first member of the string:
The output is as follows:
Access the fourth member of the string:
The output is as follows:
Access the last member of the string:
The output is as follows:
Access the last member of the string:
The output is as follows:
Each of the preceding operations will give you the character at the specific index.
Note
The method for accessing the elements of a string is like accessing a list.
Exercise 13: String Slices
Just like lists, we can slice strings:
Create a string, str_1:
Specify the slicing values and slice the string:
The output is this:
'llo Worl'
Slice a string by skipping a slice value:
The output is as follows:
Use negative numbers to slice the string:
The output is as follows:
To find out the length of a string, we simply use the len function:
The length of the string is 41. To convert a string's case, we can use the lower and upper methods:
The output is as follows:
To search for a string within a string, we can use the find method:
The output is -1. Can you figure out whether the find method is case-sensitive or not? Also, what do you think the find method returns when it actually finds the string?
To replace one string with another, we have the replace method. Since we know that a string is an immutable data structure, replace actually returns a new string instead of replacing and returning the actual one:
The output is as follows:
You should look up string methods in the standard documentation of Python 3 to discover more about these methods.
Exercise 14: Split and Join
These two string methods need separate introductions, as they enable you to convert a string into a list and vice versa:
Create a string and convert it to a list using the split method:
The preceding code will give you a list similar to the following:
Combine this list into another string using the join method:
This code will give you a string like this:
With these, we are at the end of our second topic of this chapter. We now have the motivation to learn data wrangling and have a solid introduction to the fundamentals of data structures using Python. There is more to this topic, which will be covered in a future chapters.
We have designed an activity for you so that you can practice all the skills you just learned. This small activity should take around 30 to 45 minutes to finish.
Activity 2: Analyze a Multiline String and Generate the Unique Word Count
This section will ensure that you have understood the various basic data structures and their manipulation. We will do that by going through an activity that has been designed specifically for this purpose.
In this activity, we will do the following:
Get multiline text and save it in a Python variable
Get rid of all new lines in it using string methods
Get all the unique words and their occurrences from the string
Repeat the step to find all unique words and occurrences, without considering case sensitivity
These are the steps to guide you through solving this activity:
Create a mutliline_text variable by copying the text from the first chapter of Pride and Prejudice.
Find the type and length of the multiline_text string using the commands type and len.
Remove all new lines and symbols using the replace function.
Find all of the words in multiline_text using the split function.
Create a list from this list that will contain only the unique words.
Count the number of times the unique word has appeared in the list using the key and value in dict.
Find the top 25 words from the unique words that you have found using the slice function.
You just created, step by step, a unique word counter using all the neat tricks that you learned about in this chapter.
Note
The solution for this activity can be found on page 285.