Tokenization can be achieved using a number of Java classes, including the String, StringTokenizer, and StreamTokenizer classes. In this recipe, we will demonstrate the use of the Scanner class. While frequently used for console input, it can also be used to tokenize a string.
Tokenization using the Java SDK
Getting ready
To prepare, we need to create a new Java project.
How to do it...
Let's go through the following steps:
- Add the following import statement to your project's class:
import java.util.ArrayList;
import java.util.Scanner;
- Add the following statements to the main method to declare the sample string, create an instance of the Scanner class, and add a list to hold the tokens:
String sampleText =
"In addition, the rook was moved too far to be effective.";
Scanner scanner = new Scanner(sampleText);
ArrayList<String> list = new ArrayList<>();
- Insert the following loops to populate the list and display the tokens:
while (scanner.hasNext()) {
String token = scanner.next();
list.add(token);
}
for (String token : list) {
System.out.println(token);
}
- Execute the program. You should get the following output:
In
addition,
the
rook
was
moved
too
far
to
be
effective.
How it works...
The Scanner class's constructor took a string as an argument. This allowed us to apply the Scanner class's methods against the text we used in the next method, which returns a single token at a time, delimited by white spaces. While it was not necessary to store the tokens in a list, this permits us to use it later for different purposes.