Handling text and strings
We've glossed over Python's use of string objects. Expressions such as Decimal('247.616')
and input(GRD conversion: )
involve string literal values. Python gives us several ways to put strings into our programs; there's a lot of flexibility available.
Here are some examples of strings:
>>> "short" 'short' >>> 'short' 'short' >>> """A multiple line, ... very long string.""" 'A multiple line,\nvery long string.' >>> '''another multiple line ... very long string.''' 'another multiple line\nvery long string.'
We've used single quotes and apostrophes to create short strings. These must be complete within a single line of programming. We used triple quotes and triple apostrophes to create long strings. These strings can stretch over multiple lines of a program.
Note that Python echoes the strings back to us with a \n
character to show the line break. This is called a character escape. The \
character escapes the normal meaning of n
. The sequence \n
doesn't mean n
; \n
means the often invisible newline character. Python has a number of escapes. The newline character is perhaps the most commonly used escape.
Sometimes we'll need to use characters which aren't present on our computer keyboards. For example, we might want to print one of the wide variety of Unicode special characters.
The following example works well when we know the Unicode number for a particular symbol:
>>> "\u2328" '⌨'
The following example is better because we don't need to know the obscure code for a symbol:
>>> "\N{KEYBOARD}" '⌨'
Converting between numbers and strings
We have two kinds of interesting string conversions: strings to numbers and numbers to strings.
We've seen functions such as Decimal()
to convert a string to a number. We also have the functions: int(), float(), fractions.Fraction()
, and complex()
. When we have numbers that aren't in base 10, we can also use int()
to convert those, as shown in the following code:
>>> int( 'dead', 16 ) 57005 >>> int( '0b1101111010101101', 2 ) 57005
We can create strings from numbers too. We can use functions such as hex()
, oct()
, and bin()
to create strings in base 16, 8, and 2. We also have the str()
function, which is the most general-purpose function to convert any Python object into a string of some kind.
More valuable than these is the format()
method of a string. This performs a variety of value-to-string conversions. It uses a conversion format specification or template string to define what the resulting string will look like.
Here's an example of using format()
to convert several values into a single string. It uses a rather complex format specification string:
>>> "{0:12s} {1:6.2f} USD {2:8.0f} GRD".format( "lunch", lunch_usd, lunch_grd ) 'lunch 52.10 USD 12900 GRD'
The format string has three conversion specifications: {0:12s}
, {1:6.2f}
, and {2:8.0f}
. It also has some literal text, mostly spaces, but USD
and GRD
are part of the background literal text into which the data will be merged.
Each conversion specification has two parts: the item to convert and the format for that item. These two parts separated by a :
inside {}
. We'll look at each conversion:
- The item
0
is converted using the12s
format. This format produces a twelve-position string. The stringlunch
was padded out to 12 positions. - The item
1
is converted using the6.2f
format. This format produces a six-position string. There will be two positions to the right of the decimal point. The value oflunch_usd
was formatted using this. - The item
2
is converted using an8.0f
format. This format produces an eight-position string with no positions to the right of the decimal point. The value oflunch_grd
was formatted using this specification.
We can do something like the following to improve our receipt:
receipt_1 = "{0:12s} {1:6.2f} USD" receipt_2 = "{0:12s} {1:8.0f} GRD {2:6.2f} USD" print( receipt_2.format("Lunch", lunch_grd, lunch_usd) ) print( receipt_2.format("Bribe", bribe_grd, bribe_usd) ) print( receipt_1.format("Cab", cab_usd) ) print( receipt_1.format("Total", lunch_usd+bribe_usd+cab_usd) )
We've used two parallel format specifications. The receipt_1
string can be used to format a label and a single dollar value. The receipt_2
string can be used to format a label and two numeric values: one in dollars and the other in Greek Drachma.
This makes a better-looking receipt. That should keep the accountants off our back and let us focus on the real work: working on data files and folders.
Parsing strings
String objects can also be decomposed or parsed into substrings. We could easily write an entire chapter on all the various parsing methods that string objects offer. A common transformation is to strip extraneous whitespace from the beginning and end of a string. The idea is to remove spaces and tabs (and a few other nonobvious characters). It looks like this:
entry= input("GRD conversion: ").strip()
We've applied the input()
function to get a string from the user. Then we've applied the strip()
method of that string object to create a new string, stripped bare of whitespace characters. We can try it from the >>>
prompt like this:
>>> " 123.45 ".strip() '123.45'
This shows how a string with junk was pared down to the essentials. This can simplify a user's life; a few extra spaces won't be a problem.
Another transformation might be to split a string into pieces. Here's just one of the many techniques available:
>>> amount, space, currency = "123.45 USD".partition(" ") >>> amount '123.45' >>> space ' ' >>> currency 'USD'
Let's look at this in detail. First, it's a multiple-assignment statement, where three variables are going to be set: amount
, space
, and currency
.
The expression, "123.45 USD".partition(" ")
, works by applying the partition()
method to a literal string value. We're going to partition the string on the space character. The partition()
method returns three things: the substring in front of the partition, the partition character, and the substring after the partition.
The actual partition variable may also be assigned an empty string, ''
. Try this:
amount, space, currency = "word".partition(" ")
What are the values for amount
, space
, and currency
?
If you use help(str)
, you'll see all of the various kinds of things a string can do. The names that have __
around them map to Python operators. __add__()
, for example, is how the +
operator is implemented.