Regular Expression or RegEx
is a sequence of characters that forms a search pattern. RegEx
is used to search for and replace specific patterns.
Python provides a built-in module, re
, which supports regular expressions. The module can be imported as follows:
import re
Metacharacters are characters which are interpreted in a particular way. The following table lists down all the metacharacters used in RegEx
, along with their functionality:
Special sequences are a \
followed by any one of the following characters, based on their particular functionality:
The re
module provides users a variety of functions to search for a pattern in a particular string. Below are some of the most frequently used functions in detail:
The re.findall()
function returns a list of strings containing all matches of the specified pattern.
The function takes as input the following:
The following example will return a list of all the instances of the substring at in the given string:
import restring = "at what time?"match = re.findall('at',string)print (match)
The re.search()
function returns a match object in case a match is found.
Note:
- In case of more than one match, the first occurrence of the match is returned.
- If no occurrence is found,
None
is returned.
Suppose you wish to look for the occurrence of a particular sub-string in a string:
import restring = "at what time?"match = re.search('at',string)if (match):print "String found at: " ,match.start()else:print "String not found!"
Note: The
start()
function returns the start index of the matched string.
The re.split()
function splits the string at every occurrence of the sub-string and returns a list of strings which have been split.
Consider the following example to get a better idea of what this function does:
Suppose we wish to split a string wherever there is an occurrence of a
import restring = "at what time?"match = re.split('a',string)print (match)
Note: In case there is no match, the string will be returned as it is, in a list.
The re.sub()
function is used to replace occurrences of a particular sub-string with another sub-string.
This function takes as input the following:
Suppose you wish to insert !!! instead of a white-space character in a string. This can be done via the re.sub()
function as follows:
import restring = "at what time?"match = re.sub("\s","!!!",string)print (match)