What is the re.sub() function in Python?

Key takeaways:

  • The re.sub() function in Python is commonly used for substituting patterns in strings based on RegEx.

  • Group capturing in RegEx allows for the selective replacement of specific parts of a matched pattern while keeping other part of the string intact.

  • The re.sub() function helps remove unnecessary text, convert text cases, clean input, correct spelling errors, and many more.

The re.sub() function

Stephen Cole Kleene invented regular expressions (RegEx), which are powerful tools used for searching, matching, and manipulating text patterns. They enable us to define complex search patterns using a combination of characters and special symbols.

Python provides re module, that supports regular expressions and is useful for text processing tasks. The re module provides users with various functions to search for a pattern in a particular string.

The re.sub()function is one of the re module functions. It is a substitution function that replaces all occurrences of the specified pattern with a new string. This function has diverse applications:

  • Remove unnecessary characters
  • Convert the case of characters in a string
  • Standardize formats to prepare data for analysis
  • Correct spelling errors
  • Replace specific words with synonyms
  • Check for valid patterns
  • Clean input to ensure data integrity and prevent errors

Syntax

The re.sub() function represents a substring and returns a string with replaced values. Multiple elements can be replaced using a list when we use this function.

re.sub(pattern, repl, string, count=0, flags=0)

Parameters

  • pattern: This denotes the regular expression that needs to be replaced. Here, regular expressions can be strings, regex characters (^, *, +), or special sequences (\w, \s, \d).

  • repl: This denotes the string/pattern with which the pattern is replaced.

  • string: This denotes the string on which the re.sub() operation will be executed.

  • count (Optional): This denotes the number of replacements that should occur. If we want to replace all matches, we can skip this parameter or set it to 0.

  • flags (Optional): This serves to modify the behavior of the regular expression operation. For example, the value could be re.IGNORECASE or re.NOFLAG. To use multiple flags, we need to specify them using the bitwise OR operator (|). For example, flags=re.IGNORECASE | re.MULTILINE applies both flags when making substitutions.

The re.sub() in action

Let’s look at the code snippet below to understand it better.

# Importing the re module
import re
# Given string
s = "I am a human being."
# Performing the Sub() operation
res_1 = re.sub('a', 'x', s)
res_2 = re.sub('[a,I]','x',s)
# Print results
print(res_1)
print(res_2)
# The original string remains unchanged
print(s)

Code explanation

  • line 2: We import the re module.

  • Line 5: We enter a sample string.

  • Line 8: We replace all the instances of a with x in the string s.

  • Line 9: We replace all the instances of a and I with x in the string s.

  • Lines 12–13: We print the results.

  • Line 16: We print the original string.

Using count and flags parameters

Let's see another example to understand the usage of count and flags parameters in the re.sub() function.

# Importing the re module
import re
# Given string
s = "I am a human being."
# Performing the Sub() operation with count parameter
res_1 = re.sub('a', 'x', s, count=2)
# Performing the Sub() operation with flags parameter
res_2 = re.sub('i', 'x', s, flags=re.IGNORECASE)
# Print Results
print(res_1)
print(res_2)

We set count=2 on line 8, which will replace only two instances of a with x. On line 11, we use the flags parameter, allowing case-insensitive matching.

Using capturing groups in the pattern parameter

Capturing groups are parts of your RegEx pattern that are each treated as a single unit. By using capturing groups, we can treat each part as a single unit and manipulate each unit together and independently of the other group. For example, if a we look at a URL, it consists of the application protocol part (https://), and the domain part (educative.io)—each can be considered a separate group. Now suppose you want to change the protocol and replace it with, say, ftp://. Capturing groups provide you with the flexibility to replace only that group with the string of your choice.

As shown below, the capturing groups are defined by parentheses in a RegEx pattern which can be referenced later. In re.sub(), these groups are referenced in the replacement string with back references like \1 for the first match and \2 for the second, making substitutions more dynamic.

Let us now understand how to use capturing groups in pattern parameter with the help of code example:

# Importing the re module
import re
s = 'Welcome to https://educative.com'
print(re.sub(r'(https://)(educative.com)', r'\1educative.io', s))

In line 6, (https://) and (educative.com) are two capturing groups. The first group captures the scheme of the URL (e.g., https://) and the second group(educative.com) captures the domain part of the URL (e.g., educative.com). Next, r'\1educative.io' is replacement where \1 refers to the first group (e.g., https://) and educative.io is the new domain we want to use.

Quiz!

1

What is the purpose of count parameter in re.sub() operation?

A)

It limits the number of matches in the string.

B)

It determines the length of the replacement string.

C)

It limits the number of substitutions made.

D)

None of the above

Question 1 of 30 attempted

In summary, Python's re.sub() function is a great tool for replacing text using regular expression patterns. By leveraging its pattern matching and replacement capabilities, we can effectively manage and transform text to fit our needs.

If you're eager to deepen your understanding of Python and sharpen your problem-solving skills, our Learn to Code: Become a Software Engineer path is the perfect next step. With this structured path, you'll go beyond fundamental concepts like generators and dive into advanced topics that will prepare you for a successful career in software engineering.

Don’t just learn Python—become proficient and ready for the challenges of the real world.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


How does Python replace text that matches a regular expression pattern?

Python uses the re.sub() function to replace text that matches a regular expression pattern with a new value in a string.


What is Python RegEx?

In Python, RegEx is a pattern matching tool that helps in finding and modifying text patterns easily with the re module.


What is the purpose of the re function in Python?

The re module in Python provides methods for searching, matching, and replacing text using regular expressions.


What is string parsing?

String parsing means analyzing text to extract information or modify it based on defined criteria.


What are perl compatible regular expressions (PCRE)?

PCRE are RegEx patterns that follow Perl’s syntax, offering functionality for powerful pattern matching.


What is a RegEx engine?

A Regex engine is the software component that executes regular expressions to find or replace patterns in data.


What is the RegEx pattern to remove numbers from a string?

Regex for removing numbers from string is re.sub(r'\d+', '', text).


What is regular expression generator?

A RegEx expression generator is a tool that create regular expressions based on user input or patterns.


What are regular expressions in JavaScript?

JavaScript uses the RegExp constructor to create a regular expression object for pattern matching. Check out our Answer on How to write regular expressions in JavaScript.


What is the unterminated regular expression literal for?

An unterminated RegEx literal is when a RegEx pattern is missing a closing delimiter, causing syntax errors.


Which method is used to convert a finite automata to regular expressions?

Every finite automaton has an equivalent regular expression. The conversion from finite automata to regular expressions is accomplished by state elimination. Please visit our detailed answer on “How to convert finite automata to regular expressions”, where we have covered the state elimination method.


Free Resources