Support for UTF-8
Unicode Transformation Format-8 (UTF-8) is a character set that encapsulates all Unicode characters using one to four 8-bit bytes. It is the byte-oriented encoded form of Unicode. UTF-8 is and has been the predominant character set for encoding web pages since 2009. Here are some characteristics of UTF-8:
- Can encode all 1,112,064 Unicode code points
- Uses one to four 8-bit bytes
- Accounts for nearly 90% of all web pages
- Is backward compatible with ASCII
- Is reversible
The pervasive use of UTF-8 underscores the importance of ensuring the Java platform fully supports UTF-8. This mindset led to the Java Enhancement Proposal 226, UTF-8 property resource bundles. With Java 9 applications, we have the ability to specify property files that have UTF-8 encoding. The Java 9 platform includes changes to the ResourceBundle
API to support UTF-8.
Let's take a look at the pre-Java 9 ResourceBundle
class, followed by what changes were made to this class in the Java 9 platform.