Output encoding
Encoding support is also present for the output text from Beautiful Soup. There are certain output methods in Beautiful Soup, for example, prettify()
, which will give the output only in the UTF-8 encoding. Even though the encoding was something different like ISO 8859-2
, the output will be in UTF-8. For example, the following HTML content is an example of ISO8859-2 encoding:
html_markup = """ <html> <meta http-equiv="Content-Type" content="text/html;charset=ISO8859-2"/> <p>cédille (from French), is a hook or tail ( ž ) added under certain letters as a diacritical mark to modify their pronunciation </p>""" soup = BeautifulSoup(html_markup,"lxml")
The soup.original_encoding
will give us the encoding as ISO8859-2
, which is true for the preceding HTML code snippet. But print (soup.prettify())
will give the output in utf-8
.
<html> <head> <meta content="text/html;charset=utf-8" http-equiv="Content-Type"/> </head> <...