Regular Expression APIs in Java
Let's learn about the Java regular expressions APIs.
Regular expressions API
The java.util.regex
package contains the API that we use to work with regular expressions in Java. It is the most commonly used library for working with regular expressions in Java. This API is also popularly known as Java Regex, and it is a powerful tool for pattern matching.
You can use the following import statement to use the regular expressions API in your program.
import java.util.regex.*;
This package contains the following useful classes and interfaces that facilitate the use of regular expressions in Java.
-
Pattern
class: This defines a pattern for searching or manipulating strings. It can also define the constraints on strings, such as phone number, zip code, password, and email validations. This is then compiled into a Pattern object. This pattern can be used to create Matcher objects that allow us to check our string with different modifiers. -
Matcher
class: This class performs match operations on the character sequence. -
MatchResult
interface: This provides query methods to fetch the result of pattern matching operation against a regular expression. -
PatternSyntaxException
class: This is an unchecked exception that is thrown when an invalid or syntactically incorrect regular expression is provided for pattern matching.
Components of a regular expression
A regular expression consists of two parts.
- a pattern string
- a flag specifying how the pattern should be matched (optional)
Flags
Flags provide instructions to regular expression processors about how to look for the pattern.
For example, if CASE_INSENSITIVE
is provided in the pattern, it will perform case insensitive matching.
Below is the list of flags:
Flags | Description |
---|---|
CASE_INSENSITIVE |
Used for case insensitive matching. The case of letters is ignored when performing a search. |
COMMENTS |
Permits whitespace and comments in pattern. |
DOTALL |
To enable Dotall mode |
MULTILINE |
To enable the multiline mode |
UNICODE_CASE |
Unicode aware case folding |
UNIX_LINES |
To enable Unix lines mode |
CANON_EQ |
Canonical equivalence |
LITERAL |
Literal parsing of the pattern. Special characters in the pattern will not have any special meaning and will be treated as ordinary characters when performing a search. |
UNICODE_CHARACTER_CLASS |
Used to enable Unicode version of predefined character classes and POSIX character classes. Use together with the CASE_INSENSITIVE flag to also ignore the case of letters outside of the English alphabet. |
Now let’s learn about the Java classes for regular expressions in detail.
The Pattern
class
The object of the Pattern
class is a compiled representation of regular expression(s).
Each object represents a template or a specific sequence of characters that we look up within a string or multiple strings. The pattern contains placeholders in certain spots (to cover variables, for instance) because we won’t supply all the details. These variables are known by unique names called pattern variables. If the pattern matches, the pattern variable is initialized with the value contained with the input string that matches the pattern.
Creating a Pattern
object
To create a pattern, use the following code snippet.
static Pattern compile(<reg-exp>);
Here, <reg-exp>
is the regular expression pattern. It compiles the given regex and returns the instance of the pattern.
Next, call the compile()
method with regular expression as the first argument.
It will return a Pattern
object.
import java.util.regex.Pattern;import java.util.Arrays;public class Main {public static void main(String[] args) {String ipAddress = "10.52.255.1";// simple string matchingString separator = "\\.";Pattern pattern = Pattern.compile(separator);System.out.println(Arrays.toString(pattern.split(ipAddress)));}}
The Matcher
class
The Matcher
class represents an engine that interprets the pattern and performs the pattern matching operations against the input string. It implements the MatchResult
interface.
Creating a Matcher
object
To create a Matcher
object, call the matcher()
method of the Pattern object. This method returns a Matcher
object that matches the given input with the pattern.
String text = "text to match";
// creates a matcher that matches the text with the pattern
Matcher matcher = pattern.matcher(text);
We will learn about the replacement methods of Matcher
class later in the course.
The MatchResult
interface
The MatchResult interface contains both index methods and study methods. The index methods query the result of a match against a regular expression. The study methods analyze the input string and return whether or not the match is a success.
Index methods
Index methods provide the index values that contain the location of the match in the input string.
-
public int start()
: Returns the start index of the previous match. -
public int start(int groupNumber)
: Returns the start index of the subsequence captured by the given group during the previous match operation. -
public int end()
: Returns the offset after the last character is matched. -
public int end(int groupNumber)
: Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.
Study methods
Study methods review the input string and return a boolean indicating whether or not the pattern is found.
-
boolean lookingAt()
: Attempts to match the input sequence, starting at the beginning of the region, against the pattern. -
boolean find()
: Attempts to find the next subsequence of the input sequence that matches the pattern. -
boolean find(int start)
: Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. -
boolean matches()
: Attempts to match the entire string against the pattern and returns the result as boolean. -
String group()
: Returns the matched sequence.
Example of pattern matching
The program below demonstrates the simplest form of pattern matching.
import java.util.regex.Matcher;import java.util.regex.Pattern;import java.util.regex.PatternSyntaxException;public class Main {public static void main(String[] args) {final int flag = Pattern.CASE_INSENSITIVE;// final int flag = Pattern.UNIX_LINES; // Unix lines mode// final int flag = Pattern.LITERAL;// final int flat = Pattern.COMMENTS; // Permists whitespace and comments// final int flag = Pattern.DOTALL; // dotall - '.' matches all characters// final int flag = Pattern.MULTILINE;// final int flag = Pattern.CANON_EQ; // canonical equivalence// final int flag = Pattern.UNICODE_CHARACTER_CLASS; // POSIX character// final int flag = Pattern.UNICODE_CASE; // Unicode-aware case folding// creating a pattern with case insensitive flag.Pattern pattern = Pattern.compile("regex", flag);// creating a matcher which matches the pattern against the given string.Matcher matcher = pattern.matcher("Welcome to this course!!." +"\n In this course we will learn about Java regex");// returns true or false depending whether tha pattern is matched or not.boolean matchFound = matcher.find();if(matchFound) {System.out.println("Match found");} else {System.out.println("Match not found");}}}
Try playing around with different flags to see how it affects the pattern matching result.
Explanation
In the above example, the word “regex” is being searched in the text.
First, we create the pattern object using the Pattern.compile()
method. This method takes two parameters:
- The pattern representing the string that is being searched for
- A flag to indicate that the search should be case insensitive (optional)
The matcher()
method searches for the pattern in the text. It returns a Matcher
object containing the information about the search that was performed.
The find()
method returns true
if the pattern was found in the string. If the pattern is not found in the string, it returns false
.
The PatternSyntaxException
class
PatternSyntaxException
is an unchecked exception that indicates a syntax error in the regular expression pattern language.
Below are its methods:
-
getDescription()
: Fetches error description -
getIndex()
: Fetches error index -
getPattern()
: Fetches the erroneous regular expression pattern -
getMessage()
: Returns a multi-line string containing the description of the syntax error, its index, the erroneous regular-expression pattern, and a visual indication of the error index within the pattern.
import java.util.regex.*;public class Main {public static void main(String[] args) {// string to be searchedString text = "Regular Expression";// invalid regular expressionString searchText = "*";// compiles the given regex represented by searchText// and returns an Pattern object instance.try {Pattern pattern = Pattern.compile(searchText);} catch (PatternSyntaxException e) {System.out.println(">> Inside catch block");System.out.println();System.out.println("Description -> " + e.getDescription());System.out.println("at Index -> " + e.getIndex());System.out.println("with Pattern -> " + e.getPattern());System.out.println();System.out.println("Message -> " + e.getMessage());}}}