Dealing with encodings
Text files can be present in different encodings. In recent years, the situation has greatly improved, as there are a few encodings that are pretty standard, but there are still compatibility problems when working with different systems.
There's a difference between raw data in a file and a string object in Python. The string object has been transformed from whatever encoding the file contained into a native Unicode string. Once it is in this format, it may need to be stored in different encodings. By default, Python works with the encoding defined by the OS, which in modern operating systems is UTF-8
. This is a highly compatible encoding, but you may need to save files in a different one, depending on your specific requirements.
Getting ready
We prepared two files in the GitHub repository that store the string 20£ in two different encodings: one in the usual UTF-8
and another in ISO 8859-1
, a different common encoding...