Java Regular Expressions: Master Text Searching with Regex
Learn about Java regular expressions (regex), a powerful tool for searching and manipulating text based on specific patterns. Discover how regex allows you to efficiently find, match, and extract data within text using predefined criteria.
Java Regular Expressions
A regular expression (regex) is a sequence of characters that forms a search pattern, allowing you to search for data within text based on specific criteria.
In Java, regular expressions are managed using the java.util.regex package, which includes:
The Java Regex or Regular Expression is an API to define a pattern for searching or manipulating strings.
It is widely used to define constraints on strings such as password and email validation. After learning this tutorial, you can test your regular expressions using the Java Regex Tester Tool.
- Pattern Class: Defines the regex pattern to be used in a search.
- Matcher Class: Used to perform searches based on the pattern.
- PatternSyntaxException Class: Indicates syntax errors in a regex pattern.
Matcher class
The Matcher class implements the MatchResult interface and is used to perform match operations on a character sequence.
No. | Method | Description |
---|---|---|
1 | boolean matches() | Tests whether the regular expression matches the pattern. |
2 | boolean find() | Finds the next expression that matches the pattern. |
3 | boolean find(int start) | Finds the next expression that matches the pattern from the given start number. |
4 | String group() | Returns the matched subsequence. |
5 | int start() | Returns the starting index of the matched subsequence. |
6 | int end() | Returns the ending index of the matched subsequence. |
7 | int groupCount() | Returns the total number of the matched subsequences. |
Pattern class
The Pattern class is the compiled version of a regular expression. It defines a pattern for the regex engine.
No. | Method | Description |
---|---|---|
1 | static Pattern compile(String regex) | Compiles the given regex and returns an instance of the Pattern. |
2 | Matcher matcher(CharSequence input) | Creates a matcher that matches the given input with the pattern. |
3 | static boolean matches(String regex, CharSequence input) | Compiles the regular expression and matches the given input with the pattern. |
4 | String[] split(CharSequence input) | Splits the given input string around matches of the given pattern. |
5 | String pattern() | Returns the regex pattern. |
Example
Syntax
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("Movie", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("I am watching a Movie!");
boolean matchFound = matcher.find();
if(matchFound) {
System.out.println("Match found");
} else {
System.out.println("Match not found");
}
}
}
Output
Match found
Explanation
In the example above:
- A pattern is created using
Pattern.compile()
, searching for the word "tutorialsarena" with a case-insensitive flag. matcher()
is used to find occurrences of the pattern in the input string.find()
returns true if the pattern is found, otherwise false.
Example of Java Regular Expressions
There are three ways to write the regex example in Java.
import java.util.regex.*;
public class RegexExample1 {
public static void main(String args[]) {
// 1st way
Pattern p = Pattern.compile(".s"); // . represents single character
Matcher m = p.matcher("as");
boolean b = m.matches();
// 2nd way
boolean b2 = Pattern.compile(".s").matcher("as").matches();
// 3rd way
boolean b3 = Pattern.matches(".s", "as");
System.out.println(b + " " + b2 + " " + b3);
}
}
Output
true true true
Flags
Flags in compile()
change how the search is performed, such as ignoring case or treating special
characters as literals.
Regular Expression Patterns
The pattern passed to Pattern.compile()
specifies what to search for, using brackets for
character ranges and metacharacters for special meanings.
Metacharacters
Metacharacters include special characters like |
, .
, ^
,
$
, \d
, \s
, \b
, and \uxxxx
, each with
specific search functionalities.
Metacharacter | Description |
---|---|
| | Find a match for any one of the patterns separated by | as in: cat|dog|fish |
. | Find just one instance of any character |
^ | Finds a match as the beginning of a string as in: ^Hello |
$ | Finds a match at the end of the string as in: World$ |
\d | Find a digit |
\s | Find a whitespace character |
\b | Find a match at the beginning or end of a word like this: \bWORD or WORD\b |
\uxxxx | Find the Unicode character specified by the hexadecimal number xxxx |
Example of Metacharacters
import java.util.regex.*;
class RegexExample5 {
public static void main(String args[]) {
System.out.println("metacharacters d...."); \\d means digit
System.out.println(Pattern.matches("\\d", "abc")); // false (non-digit)
System.out.println(Pattern.matches("\\d", "1")); // true (digit and comes once)
System.out.println(Pattern.matches("\\d", "4443")); // false (digit but comes more than once)
System.out.println(Pattern.matches("\\d", "323abc")); // false (digit and char)
System.out.println("metacharacters D...."); \\D means non-digit
System.out.println(Pattern.matches("\\D", "abc")); // false (non-digit but comes more than once)
System.out.println(Pattern.matches("\\D", "1")); // false (digit)
System.out.println(Pattern.matches("\\D", "4443")); // false (digit)
System.out.println(Pattern.matches("\\D", "323abc")); // false (digit and char)
System.out.println(Pattern.matches("\\D", "m")); // true (non-digit and comes once)
System.out.println("metacharacters D with quantifier....");
System.out.println(Pattern.matches("\\D*", "mak")); // true (non-digit and may come 0 or more times)
}
}
Output
metacharacters d....
false
true
false
false
metacharacters D....
false
false
false
false
true
metacharacters D with quantifier....
true
Regex Character Classes
No. | Character Class | Description |
---|---|---|
1 | [abc] | a, b, or c (simple class) |
2 | [^abc] | Any character except a, b, or c (negation) |
3 | [a-zA-Z] | a through z or A through Z, inclusive (range) |
4 | [a-d[m-p]] | a through d, or m through p: [a-dm-p] (union) |
5 | [a-z&&[def]] | d, e, or f (intersection) |
6 | [a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
7 | [a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z] (subtraction) |
Quantifiers
Quantifiers specify how many instances of a character or group are expected, using symbols like
+
, *
, ?
, {x}
, {x,y}
, and {x,}
.
Quantifier | Description |
---|---|
n+ | Matches any string that contains at least one n |
n* | Matches any string that contains zero or more occurrences of n |
n? | Matches any string that contains zero or one occurrences of n |
n{x} | Matches any string that contains a sequence of X n's |
n{x,y} | Matches any string that contains a sequence of X to Y n's |
n{x,} | Matches any string that contains a sequence of at least X n's |
Example of Character Classes and Quantifiers
import java.util.regex.*;
class RegexExample4 {
public static void main(String args[]) {
System.out.println("? quantifier ....");
System.out.println(Pattern.matches("[amn]?", "a")); // true (a or m or n comes one time)
System.out.println(Pattern.matches("[amn]?", "aaa")); // false (a comes more than one time)
System.out.println(Pattern.matches("[amn]?", "aammmnn")); // false (a m and n comes more than one time)
System.out.println(Pattern.matches("[amn]?", "aazzta")); // false (a comes more than one time)
System.out.println(Pattern.matches("[amn]?", "am")); // false (a or m or n must come one time)
System.out.println("+ quantifier ....");
System.out.println(Pattern.matches("[amn]+", "a")); // true (a or m or n once or more times)
System.out.println(Pattern.matches("[amn]+", "aaa")); // true (a comes more than one time)
System.out.println(Pattern.matches("[amn]+", "aammmnn")); // true (a or m or n comes more than once)
System.out.println(Pattern.matches("[amn]+", "aazzta")); // false (z and t are not matching pattern)
System.out.println("* quantifier ....");
System.out.println(Pattern.matches("[amn]*", "ammmna")); // true (a or m or n may come zero or more times)
}
}
Output
? quantifier ....
true
false
false
false
false
+ quantifier ....
true
true
true
false
* quantifier ....
true