Overview of the character set

The character set, as the name suggests, is a set of characters.

First, let’s recall the previous example in the wildcard character chapter, where you created a RegEx to match the text “Boat”, “Bent”, “Boot”, “Beat” and the RegEx was /B..t/. This RegEx was not precise as it will also match texts like “Best”, “Belt” etc. Let’s take another set of text “boat”, “Boat”. To match these texts, you can use a case-sensitive flag. We can match them using a character set.

How the character set works

Character set property: From the given set of characters, the RegEx engine will select any of the characters that match with text (only one character at a time). Characters mentioned inside the square bracket are considered elements of the character set. For example, [abc] is considered as a character set. When the RegEx engine starts searching for the pattern in the text, it will match the character present in the text with all the characters mentioned inside the character set.

Let’s try to understand this with an example. Suppose your text is “This is a book” and RegEx /[abc]/g. First the character “T” will be taken from the text. Character “T” will be matched against characters “a” or “b” or “c”. Since it will not match any of the characters from the character set, it will move to the next character and similarly on to other characters. When character “a” from the text appears, it will be a match. And similarly, character “b” from “book” will also match. RegEx’s engine will return the answer as “a,b”. There is no other character from the text which will match the pattern.

Let’s consider another example, suppose you want to match “head”, “heat”, “hear”, one RegEx can be /hea./.This RegEx will provide a wide range of text to match, such as “heal”, “heaT”, “heaR” and many more. And you only want to match the following three mentioned texts. So the solution to this problem is to use a character set. Mention the required character inside the character set [drt]. Now, this RegEx will match “hea” as literal matching and any one character from the set. So RegEx will be /hea[drt]/g and will only match required texts, not the whole range of texts. Always remember to create good RegEx’s that only find required texts. Wildcard characters increase computation due to backtracking.

Get hands-on with 1400+ tech skills courses.