...
/Parsing Information With Regular Expressions
Parsing Information With Regular Expressions
Learn about the functionalities of regular expressions using the re module.
Let’s now focus on the Python side of things. The regular expression syntax is the furthest thing from object-oriented programming. However, Python’s re
module provides an object-oriented interface to enter the regular expression engine.
We’ve been checking whether the re.match()
function returns a valid object or not. If a pattern does not match, that function returns None
. If it does match, however, it returns a useful object that we can inspect for information about the pattern.
So far, our regular expressions have answered questions such as does this string match this pattern? Matching patterns is useful, but in many cases, a more interesting question is if this string matches this pattern, what is the value of a relevant substring? If we use groups to identify parts of the pattern that we want to reference later, we can get them out of the match return value, as illustrated in the next example:
def email_domain(text: str) -> Optional[str]:email_pattern = r"[a-z0-9._%+-]+@([a-z0-9.-]+\.[a-z]{2,})"if match := re.match(email_pattern, text, re.IGNORECASE):return match.group(1)else:return None
The full specification describing all valid email addresses is extremely complicated, and the regular expression that accurately matches all possibilities is obscenely long. So, we cheated and made a smaller regular expression that matches many common email addresses; the point is that we want to access the domain name (after the @
sign) so we can connect to that address. This is done easily by wrapping that part of the pattern in parentheses and calling the group()
method on the object returned by match()
.
We’ve used an additional ...