Internationalization - Core Java for the Impatient (2013)

Core Java for the Impatient (2013)

Chapter 13. Internationalization

Topics in This Chapter

Image 13.1 Locales

Image 13.2 Number Formats

Image 13.3 Currencies

Image 13.4 Date and Time Formatting

Image 13.5 Collation and Normalization

Image 13.6 Message Formatting

Image 13.7 Resource Bundles

Image 13.8 Character Encodings

Image 13.9 Preferences

Image Exercises

There’s a big world out there, and hopefully many of its inhabitants will be interested in your software. Some programmers believe that all they need to do to internationalize their application is to support Unicode and translate the messages in the user interface. However, as you will see, there is a lot more to internationalizing programs. Dates, times, currencies, even numbers are formatted differently in different parts of the world. In this chapter, you will learn how to use the internationalization features of Java so that your programs present and accept information in a way that makes sense to your users, wherever they may be.

At the end of this chapter, you will find a brief overview of the Java Preferences API for storing user preferences.

The key points of this chapter are:

1. Translating an application for international users requires more than translating messages. In particular, formatting for numbers and dates varies widely across the world.

2. A locale describes language and formatting preferences for a population of users.

3. The NumberFormat and DateTimeFormat classes handle locale-aware formatting of numbers, currencies, dates, and times.

4. The MessageFormat class can format message strings with placeholders, each of which can have its own format.

5. Use the Collator class for locale-dependent sorting of strings.

6. The ResourceBundle class manages localized strings and objects for multiple locales.

7. The Preferences class can be used for storing user preferences in a platform-independent way.

13.1 Locales

When you look at an application that is adapted to an international market, the most obvious difference is the language. But there are many more subtle differences; for example, numbers are formatted quite differently in English and in German. The number

123,456.78

should be displayed as

123.456,78

for a German user—that is, the roles of the decimal point and the decimal comma separator are reversed. There are similar variations in the display of dates. In the United States, dates are displayed as month/day/year; Germany uses the more sensible order of day/month/year, whereas in China, the usage is year/month/day. Thus, the American date

3/22/61

should be presented as

22.03.1961

to a German user. If the month names are written out explicitly, then the difference in languages becomes even more apparent. The English

March 22, 1961

should be presented as

22. März 1961

in German, or

1961 Image 3 Image 22 Image

in Chinese.

A locale specifies the language and location of a user, which allows formatters to take user preferences into account. The following sections show you how to specify a locale and how to control the locale settings of a Java program.

13.1.1 Specifying a Locale

A locale is made up of up to five components:

1. A language, specified by two or three lowercase letters, such as en (English), de (German), or zh (Chinese). Table 13–1 shows common codes.

Image

Table 13–1 Common Language Codes

2. Optionally, a script, specified by four letters with an initial uppercase, such as Latn (Latin), Cyrl (Cyrillic), or Hant (traditional Chinese characters). This can be useful because some languages, such as Serbian, are written in Latin or Cyrillic, and some Chinese readers prefer the traditional over the simplified characters.

3. Optionally, a country or region, specified by two uppercase letters or three digits, such as US (United States) or CH (Switzerland). Table 13–2 shows common codes.

Image

Table 13–2 Common Country Codes

4. Optionally, a variant.

5. Optionally, an extension. Extensions describe local preferences for calendars (such as the Japanese calendar), numbers (Thai instead of Western digits), and so on. The Unicode standard specifies some of these extensions. Extensions start with u- and a two-letter code specifying whether the extension deals with the calendar (ca), numbers (nu), and so on. For example, the extension u-nu-thai denotes the use of Thai numerals. Other extensions are entirely arbitrary and start with x-, such as x-java.


Image Note

Variants are rarely used nowadays. There used to be a “Nynorsk” variant of Norwegian, but it is now expressed with a different language code, nn. What used to be variants for the Japanese imperial calendar and Thai numerals are now expressed as extensions.


Rules for locales are formulated in the “Best Current Practices” memo BCP 47 of the Internet Engineering Task Force (http://tools.ietf.org/html/bcp47). You can find a more accessible summary at www.w3.org/International/articles/language-tags.


Image Note

The codes for languages and countries seem a bit random because some of them are derived from local languages. German in German is Deutsch, Chinese in Chinese is zhongwen; hence de and zh. And Switzerland is CH, deriving from the latin term Confoederatio Helvetica for the Swiss confederation.


Locales are described by tags—hyphenated strings of locale elements such as en-US.

In Germany, you would use a locale de-DE. Switzerland has four official languages (German, French, Italian, and Rhaeto-Romance). A German speaker in Switzerland would want to use a locale de-CH. This locale uses the rules for the German language, but currency values are expressed in Swiss francs, not euros.

If you only specify the language, say, de, then the locale cannot be used for country-specific issues such as currencies.

You can construct a Locale object from a tag string like this:

Click here to view code image

Locale usEnglish = Locale.forLanguageTag("en-US");

The toLanguageTag method yields the language tag for a given locale. For example, Locale.US.toLanguageTag() is the string "en-US".

For your convenience, there are predefined locale objects for various countries:

Locale.CANADA
Locale.CANADA_FRENCH
Locale.CHINA
Locale.FRANCE
Locale.GERMANY
Locale.ITALY
Locale.JAPAN
Locale.KOREA
Locale.PRC
Locale.TAIWAN
Locale.UK
Locale.US

A number of predefined locales specify just a language without a location:

Locale.CHINESE
Locale.ENGLISH
Locale.FRENCH
Locale.GERMAN
Locale.ITALIAN
Locale.JAPANESE
Locale.KOREAN
Locale.SIMPLIFIED_CHINESE
Locale.TRADITIONAL_CHINESE

Finally, the static getAvailableLocales method returns an array of all locales known to the virtual machine.

13.1.2 The Default Locale

The static getDefault method of the Locale class initially gets the default locale as stored by the local operating system.

Some operating systems allow the user to specify different locales for displayed messages and for formatting. For example, a French speaker living in the United States can have French menus but currency values in dollar.

To obtain these preferences, call

Click here to view code image

Locale displayLocale = Locale.getDefault(Locale.Category.DISPLAY);
Locale formatLocale = Locale.getDefault(Locale.Category.FORMAT);


Image Note

In Unix, you can specify separate locales for numbers, currencies, and dates, by setting the LC_NUMERIC, LC_MONETARY, and LC_TIME environment variables. Java does not pay attention to these settings.



Image Tip

For testing, you might want to switch the default locale of your program. Supply the language and region properties when you launch your program. For example, here we set the default locale to German (Switzerland):

Click here to view code image

java -Duser.language=de -Duser.country=CH MainClass

You can also change the script and variant, and you can have separate settings for the display and format locales, for example, -Duser.script.display=Hant.


You can change the default locale of the virtual machine by calling one of

Click here to view code image

Locale.setDefault(newLocale);
Locale.setDefault(category, newLocale);

The first call changes the locales returned by Locale.getDefault() and Locale.getDefault(category) for all categories.

13.1.3 Display Names

Suppose you want to allow a user to choose among a set of locales. You don’t want to display cryptic tag strings; the getDisplayName method returns a string describing the locale in a form that can be presented to a user, such as

German (Switzerland)

Actually, there is a problem here. The display name is issued in the default locale. That might not be appropriate. If your user already selected German as the preferred language, you probably want to present the string in German. You can do just that by giving the German locale as a parameter. The code

Click here to view code image

Locale loc = Locale.forLanguageTag("de-CH");
System.out.println(loc.getDisplayName(Locale.GERMAN));

prints

Deutsch (Schweiz)

This example shows why you need Locale objects. You feed them to locale-aware methods that produce text to be presented to users in different locations. You will see many examples in the following sections.

13.2 Number Formats

The NumberFormat class in the java.text package provides three factory methods for formatters that can format and parse numbers: getNumberInstance, getCurrencyInstance, and getPercentInstance. For example, here is how you can format a currency value in German:

Click here to view code image

Locale loc = Locale.GERMANY;
NumberFormat formatter = NumberFormat.getCurrencyInstance(loc);
double amt = 123456.78;
String result = formatter.format(amt);

The result is

123.456,78€

Note that the currency symbol is € and that it is placed at the end of the string. Also, note the reversal of decimal points and decimal commas.

Conversely, to read in a number that was entered or stored with the conventions of a certain locale, use the parse method:

Click here to view code image

String input = ...;
NumberFormat formatter = NumberFormat.getNumberInstance();
// Get the number formatter for default format locale
Number parsed = formatter.parse(input);
double x = parsed.doubleValue();

The return type of parse is the abstract type Number. The returned object is either a Double or a Long wrapper object, depending on whether the parsed number was a floating-point number. If you don’t care about the distinction, you can simply use the doubleValue method of the Number class to retrieve the wrapped number.

If the text for the number is not in the correct form, the method throws a ParseException. For example, leading whitespace in the string is not allowed. (Call trim to remove it.) However, any characters that follow the number in the string are simply ignored, and no exception is thrown.

13.3 Currencies

To format a currency value, you can use the NumberFormat.getCurrencyInstance method. However, that method is not very flexible—it returns a formatter for a single currency. Suppose you prepare an invoice for an American customer in which some amounts are in dollars and others are in euros. You can’t just use two formatters

Click here to view code image

NumberFormat dollarFormatter = NumberFormat.getCurrencyInstance(Locale.US);
NumberFormat euroFormatter = NumberFormat.getCurrencyInstance(Locale.GERMANY);

Your invoice would look very strange, with some values formatted like $100,000 and others like 100.000€. (Note that the euro value uses a decimal point, not a comma.)

Instead, use the Currency class to control the currency used by the formatters. You can get a Currency object by passing a currency identifier to the static Currency.getInstance method. Table 13–3 lists common identifiers. The static method Currency.getAvailableCurrencies yields a Set<Currency> with the currencies known to the virtual machine.

Image

Table 13–3 Common Currency Identifiers

Once you have a Currency object, call the setCurrency method for the formatter. Here is how to format euro amounts for your American customer:

Click here to view code image

NumberFormat formatter = NumberFormat.getCurrencyInstance(Locale.US);
formatter.setCurrency(Currency.getInstance("EUR"));
System.out.println(formatter.format(euros));

If you need to display localized names or symbols of currencies, call

getDisplayName()
getSymbol()

These methods return strings in the default display locale. You can also provide an explicit locale parameter.

13.4 Date and Time Formatting

When formatting date and time, there are four locale-dependent issues:

1. The names of months and weekdays should be presented in the local language.

2. There will be local preferences for the order of year, month, and day.

3. The Gregorian calendar might not be the local preference for expressing dates.

4. The time zone of the location must be taken into account.

Use the DateTimeFormatter from the java.time.format package, and not the legacy java.util.DateFormat. Decide whether you need the date, time, or both. Pick one of four formats—see Table 13–4. If you format date and time, you can pick them separately.

Image

Table 13–4 Locale-Specific Formatting Styles

Then get a formatter:

Click here to view code image

FormatStyle style = ...; // One of FormatStyle.SHORT, FormatStyle.MEDIUM, ...
DateTimeFormatter dateFormatter = DateTimeFormatter.ofLocalizedDate(style);
DateTimeFormatter timeFormatter = DateTimeFormatter.ofLocalizedTime(style);
DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofLocalizedDateTime(style);
// or DateTimeFormatter.ofLocalizedDateTime(style1, style2)

These formatters use the current format locale. To use a different locale, use the withLocale method:

Click here to view code image

DateTimeFormatter dateFormatter =
DateTimeFormatter.ofLocalizedDate(style).withLocale(locale);

Now you can format a LocalDate, LocalDateTime, LocalTime, or ZonedDateTime:

Click here to view code image

ZonedDateTime appointment = ...;
String formatted = formatter.format(appointment);

To parse a string, use one of the static parse methods of LocalDate, LocalDateTime, LocalTime, or ZonedDateTime.

Click here to view code image

LocalTime time = LocalTime.parse("9:32 AM", formatter);

If the string cannot be successfully parsed, a DateTimeParseException is thrown.


Image Caution

These methods are not suitable for parsing human input, at least not without preprocessing. For example, the short time formatter for the United States will parse "9:32 AM" but not "9:32AM" or "9:32 am".



Image Caution

Date formatters parse nonexistent dates, such as November 31, and adjust them to the last date in the given month.


Sometimes, you need to display just the names of weekdays and months, for example, in a calendar application. Call the getDisplayName method of the DayOfWeek and Month enumerations.

Click here to view code image

for (Month m : Month.values())
System.out.println(m.getDisplayName(textStyle, locale) + " ");

Table 13–5 shows the text styles. The STANDALONE versions are for display outside a formatted date. For example, in Finnish, January is “tammikuuta” inside a date, but “tammikuu” standalone.

Image

Table 13–5 Values of the java.time.format.TextStyle Enumeration

13.5 Collation and Normalization

Most programmers know how to compare strings with the compareTo method of the String class. Unfortunately, when interacting with human users, this method is not very useful. The compareTo method uses the values of the UTF-16 encoding of the string, which leads to absurd results, even in English. For example, the following five strings are ordered according to the compareTo method:

Athens
Zulu
able
zebra
Ångström

For dictionary ordering, you would want to consider upper case and lower case equivalent, and accents should not be significant. To an English speaker, the sample list of words should be ordered as

able
Ångström
Athens
zebra
Zulu

However, that order would not be acceptable to a Swedish user. In Swedish, the letter Å is different from the letter A, and it is collated after the letter Z! That is, a Swedish user would want the words to be sorted as

able
Athens
zebra
Zulu
Ångström

To obtain a locale-sensitive comparator, call the static Collator.getInstance method:

Click here to view code image

Collator coll = Collator.getInstance(locale);
words.sort(coll);
// Collator implements Comparator<Object>

There are a couple of advanced settings for collators. You can set a collator’s strength to adjust how selective it should be. Character differences are classified as primary, secondary, or tertiary. For example, in English, the difference between e and f is considered primary, the difference between e and é is secondary, and between e and E is tertiary.

For example, when processing city names, you may not care about the differences between

San José
San Jose
SAN JOSE

In that case, configure the collator by calling

Click here to view code image

coll.setStrength(Collator.PRIMARY);

A more technical setting is the decomposition mode which deals with the fact that a character or sequence of characters can sometimes be described in more than one way in Unicode. For example, an é (U+00E9) can also be expressed as a plain e (U+0065) followed by a ´ (combining acute accent U+0301). You probably don’t care about that difference, and by default, it is not significant. If you do care, you need to configure the collator as follows:

Click here to view code image

coll.setStrength(Collator.IDENTICAL);
coll.setDecomposition(Collator.NO_DECOMPOSITION);

Conversely, if you want to be very lenient and consider the trade mark symbol ™ (U+2122) the same as the character combination TM, then set the decomposition mode to Collator.FULL_DECOMPOSITION.

You might want to convert strings into a normalized forms even when you don’t do collation—for example, for persistent storage or communication with another program. The Unicode standard defines four normalization forms (C, D, KC, and KD)—see www.unicode.org/unicode/reports/tr15/tr15–23.html. In the normalization form C, accented characters are always composed. For example, a sequence of e and a combining acute accent ´ is combined into a single character é. In form D, accented characters are always decomposed into their base letters and combining accents: é is turned into e followed by ´. Forms KC and KD also decompose characters such the trademark symbol ™. The W3C recommends that you use normalization form C for transferring data over the Internet.

The static normalize method of the java.text.Normalizer class carries out the normalization process. For example,

Click here to view code image

String city = "San Jose\u0301";
String normalized = Normalizer.normalize(city, Normalizer.Form.NFC);

13.6 Message Formatting

When you internationalize a program, you often have messages with variable parts. The static format method of the MessageFormat class takes a template string with placeholders, followed by the placeholder values, like this:

Click here to view code image

String template = "{0} has {1} messages"
String message = MessageFormat.format(template, "Pierre", 42);

Of course, instead of hardcoding the template, you should look up a locale-specific one, such as "Il y a {1} messages pour {0}" in French. You will see how to do that in Section 13.7, “Resource Bundles,” on p. 410.

Note that the ordering of the placeholders may differ among languages. In English, the message is “Pierre has 42 messages”, but in French, it is “Il y a 42 messages pour Pierre”. The placeholder {0} is the first argument after the template in the call to format, {1} is the next argument, and so on.

You can format numbers as currency amounts by adding a suffix number,currency, to the placeholder, like this:

Click here to view code image

template="Your current total is {0,number,currency}."

In the United States, a value of 1023.95 is be formatted as $1,023.95. The same value is displayed as 1.023,95€ in Germany, using the local currency symbol and decimal separator convention.

The number indicator can be followed by currency, integer, percent, or a number format pattern of the DecimalFormat class, such as $,##0.

You can format values of the legacy java.util.Date class with an indicator date or time, followed by the format short, medium, long, or full, or a format pattern of the SimpleDateFormat such as yyyy-MM-dd.

Note that you need to convert java.time values; for example,

Click here to view code image

String message = MessageFormat("It is now {0,time,short}.", Date.from(Instant.now()));

Finally, a choice formatter lets you generate messages such as

No files copied
1 file copied
42 files copied

depending on the placeholder value.

A choice format is a sequence of pairs, each containing a lower limit and a format string. The limit and format string are separated by a # character, and the pairs are separated by | characters.

Click here to view code image

String template = "{0,choice,0#No files|1#1 file|2#{0} files} copied";

Note that {0} occurs twice in the template. When the message format applies the choice format to the {0} placeholder and the value is 42, the choice format returns "{0} files". That string is then formatted again, and the result is spliced into the message.


Image Note

The design of the choice format is a bit muddleheaded. If you have three format strings, you need two limits to separate them. (In general, you need one fewer limit than you have format strings.) The MessageFormat class actually ignores the first limit!


Use the < symbol instead of # to denote that a choice should be selected if the lower bound is strictly less than the value. You can also use the ≤ symbol (U+2264) as a synonym for #, and specify a lower bound of -∞ (a minus sign followed by U+221E) for the first value. This makes the format string easier to read:

Click here to view code image

-∞<No files|0<1 file|2≤{0} files


Image Caution

Any text in single quotes ' . . . ' is included literally. For example, '{0}' is not a placeholder but the literal string {0}. If the template has single quotes, you must double them.

Click here to view code image

String template = "<a href=''{0}''>{1}</a>";


The static MessageFormat.format method uses the current format locale to format the values. To format with an arbitrary locale, you have to work a bit harder because there is no “varargs” method that you can use. You need to place the values to be formatted into an Object[] array, like this:

Click here to view code image

MessageFormat mf = new MessageFormat(template, locale);
String message = mf.format(new Object[] { arg1, arg2, ... });

13.7 Resource Bundles

When localizing an application, it is best to separate the program from the message strings, button labels, and other texts that need to be translated. In Java, you can place them into resource bundles. Then, you can give these bundles to a translator who can edit them without having to touch the source code of the program.


Image Note

Chapter 4 describes a concept of JAR file resources, whereby data files, sounds, and images can be placed in a JAR file. The getResource method of the class Class finds the file, opens it, and returns a URL to the resource. That is a useful mechanism for bundling files with a program, but it has no locale support.


13.7.1 Organizing Resource Bundles

When localizing an application, you produce a set of resource bundles. Each bundle is either a property file or a special class, with entries for a particular locale or set of matching locales.

In this section, I only discuss property files since they are much more common than resource classes. A property file is a text file with extension .properties that contains key/value pairs. For example, a file messages_de_DE.properties might contain

computeButton=Rechnen
cancelButton=Abbrechen
defaultPaperSize=A4

You need to use a specific naming convention for the files that make up these bundles. For example, resources specific to Germany go into a file bundleName_de_DE, whereas those shared by all German-speaking countries go into bundleName_de. For a given combination of language, script, and country, the following candidates are considered:

Click here to view code image

bundleName_language_script_country
bundleName_language_script
bundleName_language_country
bundleName_language

If bundleName contains periods, then the file must be placed in a matching subdirectory. For example, files for the bundle com.mycompany.messages are com/mycompany/messages_de_DE.properties, and so on.

To load a bundle, call

Click here to view code image

ResourceBundle res = ResourceBundle.getBundle(bundleName);

for the default locale, or

Click here to view code image

ResourceBundle bundle = ResourceBundle.getBundle(bundleName, locale);

for the given locale.


Image Caution

The first getBundle method does not use the default display locale, but the overall default locale. If you look up a resource for the user interface, be sure to pass Locale.getDisplayDefault() as the locale.


To look up a string, call the getString method with the key.

Click here to view code image

String computeButtonLabel = bundle.getString("computeButton");

The rules for loading bundle files are a bit complex and involve two phases. In the first phase, a matching bundle is located. This involves up to three steps.

1. First, all candidate combinations of bundle name, language, script, country, and variant are attempted, in the order given above, until a match is found. For example, if the target locale is de-DE and there is no messages_de_DE.properties but there is messages_de.properties, that becomes the matching bundle.

2. If there is no match, the process is repeated with the default locale. For example, if a German bundle is requested but there is none, and the default locale is en-US, then messages_en_US.properties is accepted as a match.

3. If there is no match with the default locale either, then the bundle with no suffixes (for example, messages.properties) is a match. If that is not present either, the search fails.


Image Note

There are special rules for variants, Chinese simplified and traditional scripts, and Norwegian languages. See the Javadoc for ResourceBundle.Control for details.


In the second phase, the parents of the matching bundle are located. The parents are those in the candidate list below the matching bundle, and the bundle without suffixes. For example, the parents of messages_de_DE.properties are messages_de.properties and messages.properties.

The getString method looks for keys in the matching bundle and its parents.


Image Note

If the matching bundle was found in the first phase, then its parents are never taken from the default locale.



Image Caution

Property files use the ASCII character set. All non-ASCII characters must be encoded using the \uxxxx encoding. For example, to specify a value of Préférences, use

Click here to view code image

preferences=Pr\u00E9fer\u00E9nces

You can use the native2ascii tool to generate these files.


13.7.2 Bundle Classes

To provide resources that are not strings, define classes that extend the ResourceBundle class. Use a naming convention similar to that of property resources, for example

Click here to view code image

com.mycompany.MyAppResources_en_US
com.mycompany.MyAppResources_de
com.mycompany.MyAppResources

To implement a resource bundle class, you can extend the ListResourceBundle class. Place all your resources into an array of key/value pairs and return it in the getContents method. For example,

Click here to view code image

package com.mycompany;
public class MyAppResources_de extends ListResourceBundle {
public Object[][] getContents() {
return new Object[][] {
{ "backgroundColor", Color.BLACK },
{ "defaultPaperSize", new double[] { 210, 297 } }
};
}
}

To get objects out of such a resource bundle, call the getObject method:

Click here to view code image

ResourceBundle bundle = ResourceBundle.getBundle("com.mycompany.MyAppResources",
locale);
Color backgroundColor = (Color) bundle.getObject("backgroundColor");
double[] paperSize = (double[]) bundle.getObject("defaultPaperSize");


Image Caution

The ResourceBundle.getBundle method gives preference to classes over property files when it finds both a class and a property file with the same bundle name.


13.8 Character Encodings

The fact that Java uses Unicode doesn’t mean that all your problems with character encodings have gone away. Fortunately, you don’t have to worry about the encoding of String objects. Any string you receive, be it a command-line argument, console input, or input from a GUI text field, will be a UTF-16 encoded string that contains the text provided by the user.

When you display a string, the virtual machine encodes it for the local platform. There are two potential problems. It could happen that a display font does not have a glyph for a particular Unicode character. In a Java GUI, such characters are displayed as hollow boxes. For console output, if the console uses a character encoding that cannot represent all output characters, missing characters are displayed as ?. Users can correct these issues by installing appropriate fonts or by switching the console to UTF-8.

The situation gets more complex when your program reads plain text files produced by users. Simple-minded text editors often produce files in the local platform encoding. You can obtain that encoding by calling

Click here to view code image

Charset platformEncoding = Charset.defaultCharset();

This is a reasonable guess for the user’s preferred character encoding, but you should allow your users to override it.

If you want to offer a choice of character encodings, you can obtain localized names as

Click here to view code image

String displayName = encoding.displayName(locale);
// Yields names such as UTF-8, ISO-8859-6, or GB18030

Unfortunately, these names aren’t really suitable for end users who would want to have choices between Unicode, Arabic, Chinese Simplified, and so on.


Image Tip

Java source files are also text files. Assuming you are not the only programmer on a project, don’t store source files in the platform encoding. You could represent any non-ASCII characters in code or comments with \uxxxx escapes, but that is tedious. Instead, use UTF-8. Set your text editor and console preference to UTF-8, or compile with

javac -encoding UTF-8 *.java


13.9 Preferences

I close this chapter with an API that is tangentially related to internationalization—the storage of user preferences (which might include the preferred locale).

Of course, you can store preferences in a property file that you load on program startup. However, there is no standard convention for naming and placing configuration files, which increases the likelihood of conflicts as users install multiple Java applications.

Some operating systems have a central repository for configuration information. The best-known example is the registry in Microsoft Windows. The Preferences class, which is the standard mechanism in Java for storing user preferences, uses the registry on Windows. On Linux, the information is stored in the local file system instead. The specific repository implementation is transparent to the programmer using the Preferences class.

The Preferences repository holds a tree of nodes. Each node in the repository has a table of key/value pairs. Values can be numbers, boolean values, strings, or byte arrays.


Image Note

No provision is made for storing arbitrary objects. You are, of course, free to store a serialized object as a byte array if you aren’t worried about using serialization for long-term storage.


Paths to nodes look like /com/mycompany/myapp. As with package names, you can avoid name clashes by starting the paths with reversed domain names.

There are two parallel trees. Each program user has one tree. An additional tree, called the system tree, is available for settings that are common to all users. The Preferences class uses the operating system notion of the “current user” for accessing the appropriate user tree. To access a node in the tree, start with the user or system root:

Click here to view code image

Preferences root = Preferences.userRoot();

or

Click here to view code image

Preferences root = Preferences.systemRoot();

Then access nodes through their path names:

Click here to view code image

Preferences node = root.node("/com/mycompany/myapp");

Alternatively, provide a Class object to the static userNodeForPackage or systemNodeForPackage method, and the node path is derived from the package name of the class.

Click here to view code image

Preferences node = Preferences.userNodeForPackage(obj.getClass());

Once you have a node, you can access the key/value table. Retrieve a string with

Click here to view code image

String preferredLocale = node.get("locale", "");

For other types, use one of these methods:

Click here to view code image

String get(String key, String defval)
int getInt(String key, int defval)
long getLong(String key, long defval)
float getFloat(String key, float defval)
double getDouble(String key, double defval)
boolean getBoolean(String key, boolean defval)
byte[] getByteArray(String key, byte[] defval)

You must specify a default value when reading the information, in case the repository data is not available.

Conversely, you can write data to the repository with put methods such as

Click here to view code image

void put(String key, String value)
void putInt(String key, int value)

and so on.

To remove an entry from a node, call

void remove(String key)

Call node.removeNode() to remove the entire node and its children.

You can enumerate all keys stored in a node, and all child paths of a node, with the methods

String[] keys()
String[] childrenNames()


Image Note

There is no way to find out the type of the value of a particular key.


You can export the preferences of a subtree by calling the method

Click here to view code image

void exportSubtree(OutputStream out)

on the root node of the subtree.

The data is saved in XML format. You can import it into another repository by calling

Click here to view code image

InputStream in = Files.newInputStream(path)
Preferences.importPreferences(in);

Exercises

1. Write a program that demonstrates the date and time formatting styles in France, China, and Thailand (with Thai digits).

2. Which of the locales in your JVM don’t use Western digits for formatting numbers?

3. Which of the locales in your JVM use the same date convention (month/day/year) as the United States?

4. Write a program that prints the names of all languages of locales in your JVM in all available languages. Collate them and suppress duplicates.

5. Repeat the preceding exercise for currency names.

6. Write a program that lists all currencies that have different symbols in at least two locales.

7. Write a program that lists the display and standalone month names in all locales in which they differ, excepting those where the standalone names consist of digits.

8. Write a program that lists all Unicode characters that are expanded to two or more ASCII characters in normalization form KC or KD.

9. Take one of your programs and internationalize all messages, using resource bundles in at least two languages.

10. Provide a mechanism for showing available character encodings with a human-readable description, like in your web browser. The language names should be localized. (Use the translations for locale languages.)

11. Provide a class for locale-dependent display of paper sizes, using the preferred dimensional unit and default paper size in the given locale. (Everyone on the planet, with the exception of the United States and Canada, uses ISO 216 paper sizes. Only three countries in the world have not yet officially adopted the metric system: Liberia, Myanmar (Burma), and the United States.)