Java Unicode System: Supporting Global Characters and Languages

Explore Java’s Unicode system, which allows developers to work with an extensive range of characters and symbols across multiple languages. Learn how Unicode enhances Java's platform independence, enabling internationalization and localization in applications.



Java - Unicode System

Unicode is an international character set that encompasses a vast range of characters, symbols, and scripts from many languages across the globe.

Unicode System in Java

The Java programming language, known for its platform independence, provides built-in support for Unicode characters. This allows developers to create applications that seamlessly work with various languages and scripts.

Before Unicode was introduced, multiple standards existed for character encoding:

  • ASCII – used in the United States
  • ISO 8859-1 – for Western European languages
  • KOI-8 – for Russian
  • GB18030 and BIG-5 – for Chinese

To support multinational application codes, some characters utilized a single byte while others used two bytes. Consequently, the same code could represent different characters in various languages. To address this issue, the Unicode system was developed, where each character is represented by 2 bytes. Java adopted the Unicode system as it was designed for multilingual support, with the lowest value represented by \u0000 and the highest value represented by \uFFFF.

Approaches: Working with Unicode Characters & Values

There are two primary approaches for working with Unicode characters in Java:

  1. Using Unicode Escape Sequences
  2. Directly Storing Unicode Characters

The first approach involves representing Unicode characters using escape sequences, which is beneficial when characters cannot be directly typed or displayed in Java code. The second approach allows for the direct storage of Unicode characters in variables, which is more convenient when these characters can be directly typed or displayed. The choice between these approaches depends on the program's specific requirements.

1. Using Unicode Escape Sequences

One way to store Unicode characters in Java is through Unicode escape sequences. An escape sequence is a series of characters representing a special character. In Java, a Unicode escape sequence starts with '\u' followed by four hexadecimal digits that represent the Unicode code point of the desired character.

Example: Use of Unicode Escape Sequences

Syntax

package com.tutorialsarena;

public class UnicodeCharacterDemo {
public static void main(String[] args) {
    // Unicode escape sequence
    char unicodeChar = '\u0041'; // Unicode for 'A'
    System.out.println("Stored Unicode Character: " + unicodeChar);
}
}
    
Output

Stored Unicode Character: A
    

In the code snippet above, the Unicode escape sequence '\u0041' represents the character 'A.' The escape sequence is assigned to the variable unicodeChar, and the stored character is printed to the console.

2. Storing Unicode Values Directly

Alternatively, you can directly store a Unicode character in a char variable by enclosing the character in single quotes. However, this method may not be suitable for characters that cannot be directly typed using a keyboard or are not visible, such as control characters.

Example 1: Assigning Unicode Character to Variable

Syntax

package com.tutorialsarena;

public class UnicodeCharacterDemo {
public static void main(String[] args) {
    // Storing Unicode character directly
    char unicodeChar = 'A'; // Directly storing the character 'A'
    System.out.println("Stored Unicode Character: " + unicodeChar);
}
}
    
Output

Stored Unicode Character: A
    

In this example, the character 'A' is directly enclosed in single quotes and assigned to the variable unicodeChar. The stored character is printed to the console.

Example 2: Assigning Unicode Values to Variables

Syntax

package com.tutorialsarena;

public class UnicodeCharacterDemo {
public static void main(String[] args) {
    // Storing Unicode characters using escape sequences
    char letterA = '\u0041';
    char letterSigma = '\u03A3';
    char copyrightSymbol = '\u00A9';
    
    // Storing Unicode characters directly
    char letterZ = 'Z';
    char letterOmega = 'Ω';
    char registeredSymbol = '®';
    
    // Printing the stored Unicode characters
    System.out.println("Stored Unicode Characters using Escape Sequences:");
    System.out.println("Letter A: " + letterA);
    System.out.println("Greek Capital Letter Sigma: " + letterSigma);
    System.out.println("Copyright Symbol: " + copyrightSymbol);
    System.out.println("\nStored Unicode Characters Directly:");
    System.out.println("Letter Z: " + letterZ);
    System.out.println("Greek Capital Letter Omega: " + letterOmega);
    System.out.println("Registered Symbol: " + registeredSymbol);
}
}
    
Output

Stored Unicode Characters using Escape Sequences:
Letter A: A
Greek Capital Letter Sigma: Σ
Copyright Symbol: ©

Stored Unicode Characters Directly:
Letter Z: Z
Greek Capital Letter Omega: Ω
Registered Symbol: ®
    

In this example, various Unicode characters are stored using both escape sequences and direct assignment. The stored characters are printed to the console.

Example 3: Assigning Unicode Characters and Values to Variables

This example demonstrates how to manipulate the stored Unicode characters. It calculates the difference between the capital letter 'A' and the lowercase letter 'a' and uses that difference to determine the capital letter 'C.' It then calculates the lowercase letter 'c' by adding 32 to the Unicode code point of the capital letter 'C.' The manipulated Unicode characters are printed to the console.

Syntax

package com.tutorialsarena;

public class UnicodeCharacterDemo {
public static void main(String[] args) {
    // Storing Unicode characters using escape sequences
    char letterA = '\u0041';
    char letterSmallA = '\u0061';
    // Storing Unicode characters directly
    char letterB = 'B';

    // Manipulating the stored Unicode characters
    int difference = letterA - letterSmallA;
    char letterC = (char) (letterB + difference);
    char letterSmallC = (char) (letterC + 32);
    
    // Printing the manipulated Unicode characters
    System.out.println("Manipulated Unicode Characters:");
    System.out.println("Difference between A and a: " + difference);
    System.out.println("Calculated Letter C: " + letterC);
    System.out.println("Calculated Letter c: " + letterSmallC);
}
}
    
Output

Manipulated Unicode Characters:
Difference between A and a: -32
Calculated Letter C: C
Calculated Letter c: c
    

When this program is compiled and run, it shows the difference between 'A' and 'a,' calculates the corresponding letters, and prints the results to the console.

Conclusion

In Java, you can store Unicode characters using character literals by employing either Unicode escape sequences or directly typing the characters. The approach you choose depends on your specific use case and the characters you want to work with.