Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Modern Python Cookbook

You're reading from   Modern Python Cookbook The latest in modern Python recipes for the busy modern programmer

Arrow left icon
Product type Paperback
Published in Nov 2016
Publisher Packt
ISBN-13 9781786469250
Length 692 pages
Edition 1st Edition
Languages
Arrow right icon
Toc

Table of Contents (12) Chapters Close

Preface 1. Numbers, Strings, and Tuples FREE CHAPTER 2. Statements and Syntax 3. Function Definitions 4. Built-in Data Structures – list, set, dict 5. User Inputs and Outputs 6. Basics of Classes and Objects 7. More Advanced Class Design 8. Input/Output, Physical Format, and Logical Layout 9. Testing 10. Web Services 11. Application Integration

Building complex strings from lists of characters

How can we make very complex changes to an immutable string? Can we assemble a string from individual characters?

In most cases, the recipes we've already seen give us a number of tools for creating and modifying strings. There are yet more ways in which we can tackle the string manipulation problem. We'll look at using a list object. This will dovetail with some of the recipes in Chapter 4, Built-in Data Structures – list, set, dict.

Getting ready

Here's a string that we'd like to rearrange:

>>> title = "Recipe 5: Rewriting an Immutable String"

We'd like to do two transformations:

  • Remove the part before the :
  • Replace the punctuation with _, and make all the characters lowercase

We'll make use of the string module:

>>> from string import whitespace, punctuation

This has two important constants:

  • string.whitespace lists all of the common whitespace characters, including space and tab
  • string.punctuation lists the common ASCII punctuation marks. Unicode has a larger list of punctuation marks; that's also available based on your locale settings

How to do it...

We can work with a string exploded into a list. We'll look at lists in more depth in Chapter 4, Built-in Data Structures – list, set, dict.

  1. Explode the string into a list object:
      >>> title_list = list(title)
  1. Find the partition character. The index() method for a list has the same semantics as the index() method for a list. It locates the position with the given value:
      >>> colon_position = title_list.index(':')
  1. Delete the characters no longer needed. The del statement can remove items from a list. Lists are a mutable data structures:
      >>> del title_list[:colon_position+1]

We don't need to carefully work with the useful piece of the original string. We can remove items from a list.

  1. Replace punctuation by stepping through each position. In this case, we'll use a for statement to visit every index in the string:
      >>> for position in range(len(title_list)):
      ...    if title_list[position] in whitespace+punctuation:
      ...        title_list[position]= '_'
  1. The expression range(len(title_list)) generates all of the values between 0 and len(title_list)-1. This assures us that the value of position will be each value index in the list. Join the list of characters to create a new string. It seems a little odd to use zero-length string, '', as a separator when concatenating strings together. However, it works perfectly:
      >>> title = ''.join(title_list)
      >>> title
      '_Rewriting_an_Immutable_String'

We assigned the resulting string back to the original variable. The original string object, which had been referred to by that variable, is no longer needed: it's removed from memory. The new string object replaces the value of the variable.

How it works...

This is a change in representation trick. Since a string is immutable, we can't update it. We can, however, convert it into a mutable form; in this case, a list. We can do whatever changes are required to the mutable list object. When we're done, we can change the representation from a list back to a string.

Strings provide a number of features that lists don't. Conversely, strings provide a number of features a list doesn't have. We can't convert a list to lowercase the way we can convert a string.

There's an important trade-off here:

  • Strings are immutable, that makes them very fast. Strings are focused on Unicode characters. When we look at mappings and sets, we can use strings as keys for mappings and items in sets because the value is immutable.
  • Lists are mutable. Operations are slower. Lists can hold any kind of item. We can't use a list as a key for a mapping or an item in a set because the value could change.

Strings and lists are both specialized kinds of sequences. Consequently, they have a number of common features. The basic item indexing and slicing features are shared. Similarly a list uses the same kind of negative index values that a string does: list[-1] is the last item in a list object.

We'll return to mutable data structures in Chapter 4, Built-in Data Structures – list, set, dict.

There's more

Once we've started working with a list of characters instead of a string, we no longer have the string processing methods. We do have a number of list-processing techniques available to us. In addition to being able to delete items from a list, we can append an item, extend a list with another list, and insert a character into the list.

We can also change our viewpoint slightly, and look at a list of strings instead of a list of characters. The technique of doing ''.join(list) will work when we have a list of strings as well as a list of characters. For example, we might do this:

>>> title_list.insert(0, 'prefix')
>>> ''.join(title_list)
'prefix_Rewriting_an_Immutable_String'

Our title_list object will be mutated into a list that contains a six-character string, prefix, plus 30 individual characters.

See also

  • We can also work with strings using the internal methods of a string. See the Rewriting an immutable string recipe for more techniques.
  • Sometimes, we need to build a string, and then convert it into bytes. See the Encoding strings – creating ASCII and UTF-8 bytes recipe for how we can do this.
  • Other times, we'll need to convert bytes into a string. See the Decoding Bytes - How to get proper characters from some bytes recipe.
You have been reading a chapter from
Modern Python Cookbook
Published in: Nov 2016
Publisher: Packt
ISBN-13: 9781786469250
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image