Unicode Transformation Format-8 (UTF-8) is a character set that encapsulates all Unicode characters, using one to four eight-bit bytes. It is the byte-oriented encoded form of Unicode. UTF-8 is and has been the predominant character set for encoding web pages since 2009.
Here are some characteristics of UTF-8:
- It can encode all 1,112,064 Unicode code points
- It uses one to four eight-bit bytes
- It accounts for nearly 90% of all web pages
- It is backward compatible with ASCII
- It is reversible
The pervasive use of UTF-8 underscores the importance of ensuring the Java platform fully supports UTF-8. With Java applications, we have the ability to specify property files that have UTF-8 encoding. The Java platform includes changes to the ResourceBundle API to support UTF-8.
Let's take a look at the premodern Java (Java 8 and earlier) ResourceBundle class, followed...