Python uses the re.sub()
function to replace text that matches a regular expression pattern with a new value in a string.
Key takeaways:
The
re.sub()
function in Python is commonly used for substituting patterns in strings based on RegEx.Group capturing in RegEx allows for the selective replacement of specific parts of a matched pattern while keeping other part of the string intact.
The
re.sub()
function helps remove unnecessary text, convert text cases, clean input, correct spelling errors, and many more.
re.sub()
functionStephen Cole Kleene invented regular expressions (RegEx), which are powerful tools used for searching, matching, and manipulating text patterns. They enable us to define complex search patterns using a combination of characters and special symbols.
Python provides re
module, that supports regular expressions and is useful for text processing tasks. The re
module provides users with various functions to search for a pattern in a particular string.
The re.sub()
function is one of the re
module functions. It is a substitution function that replaces all occurrences of the specified pattern with a new string. This function has diverse applications:
The re.sub()
function represents a substring and returns a string with replaced values. Multiple elements can be replaced using a list when we use this function.
re.sub(pattern, repl, string, count=0, flags=0)
pattern
: This denotes the regular expression that needs to be replaced. Here, regular expressions can be strings, regex characters (^, *, +), or special sequences (\w, \s, \d).
repl
: This denotes the string/pattern with which the pattern
is replaced.
string
: This denotes the string on which the re.sub()
operation will be executed.
count
(Optional): This denotes the number of replacements that should occur. If we want to replace all matches, we can skip this parameter or set it to 0
.
flags
(Optional): This serves to modify the behavior of the regular expression operation. For example, the value could be re.IGNORECASE
or re.NOFLAG
. To use multiple flags, we need to specify them using the bitwise OR operator (|
). For example, flags=re.IGNORECASE | re.MULTILINE
applies both flags when making substitutions.
re.sub()
in actionLet’s look at the code snippet below to understand it better.
# Importing the re moduleimport re# Given strings = "I am a human being."# Performing the Sub() operationres_1 = re.sub('a', 'x', s)res_2 = re.sub('[a,I]','x',s)# Print resultsprint(res_1)print(res_2)# The original string remains unchangedprint(s)
line 2: We import the re
module.
Line 5: We enter a sample string.
Line 8: We replace all the instances of a
with x
in the string s
.
Line 9: We replace all the instances of a
and I
with x
in the string s
.
Lines 12–13: We print the results.
Line 16: We print the original string.
count
and flags
parametersLet's see another example to understand the usage of count
and flags
parameters in the re.sub()
function.
# Importing the re moduleimport re# Given strings = "I am a human being."# Performing the Sub() operation with count parameterres_1 = re.sub('a', 'x', s, count=2)# Performing the Sub() operation with flags parameterres_2 = re.sub('i', 'x', s, flags=re.IGNORECASE)# Print Resultsprint(res_1)print(res_2)
We set count=2
on line 8, which will replace only two instances of a
with x
. On line 11, we use the flags
parameter, allowing case-insensitive matching.
pattern
parameterCapturing groups are parts of your RegEx pattern that are each treated as a single unit. By using capturing groups, we can treat each part as a single unit and manipulate each unit together and independently of the other group. For example, if a we look at a URL, it consists of the application protocol part (https://
), and the domain part (educative.io
)—each can be considered a separate group. Now suppose you want to change the protocol and replace it with, say, ftp://
. Capturing groups provide you with the flexibility to replace only that group with the string of your choice.
As shown below, the capturing groups are defined by parentheses in a RegEx pattern which can be referenced later. In re.sub()
, these groups are referenced in the replacement string with back references like \1
for the first match and \2
for the second, making substitutions more dynamic.
Let us now understand how to use capturing groups in pattern
parameter with the help of code example:
# Importing the re moduleimport res = 'Welcome to https://educative.com'print(re.sub(r'(https://)(educative.com)', r'\1educative.io', s))
In line 6, (https://)
and (educative.com)
are two capturing groups. The first group captures the scheme of the URL (e.g., https://
) and the second group(educative.com)
captures the domain part of the URL (e.g., educative.com
). Next, r'\1educative.io'
is replacement where \1
refers to the first group (e.g., https://
) and educative.io
is the new domain we want to use.
Quiz!
What is the purpose of count
parameter in re.sub()
operation?
It limits the number of matches in the string.
It determines the length of the replacement string.
It limits the number of substitutions made.
None of the above
In summary, Python's re.sub()
function is a great tool for replacing text using regular expression patterns. By leveraging its pattern matching and replacement capabilities, we can effectively manage and transform text to fit our needs.
If you're eager to deepen your understanding of Python and sharpen your problem-solving skills, our Learn to Code: Become a Software Engineer path is the perfect next step. With this structured path, you'll go beyond fundamental concepts like generators and dive into advanced topics that will prepare you for a successful career in software engineering.
Don’t just learn Python—become proficient and ready for the challenges of the real world.
Haven’t found what you were looking for? Contact Us