Remove all the punctuation marks from a sentence using RegEx

Problem Statement

Given a text/string, remove all the punctuation marks from the string/text using regex. The string can have alphabets, spaces, punctuations, and numbers.

Example:

  • Input: string: hi-^%*(#ans34wer
  • Output: hians34wer

Solution

There are two ways to construct the RegEx.

  1. Retain characters
  2. Replace characters

Retain characters

We use the following RegEx to retain only space and alphanumeric characters.

[\s\w\d]
  • \s: Space characters
  • \w: Word characters
  • \d: Digit characters

Example

import re
regex = r"[\s\w\d]"
test_str = "hi-^%*(#ans34wer"
matches = re.findall(regex, test_str, re.MULTILINE)
print("String with punctuations - ", test_str)
print("String without punctuations - ", "".join(matches))

Explanation

  • Line 1: We import the regex package.
  • Line 3: We define the RegEx.
  • Line 5: We define the string/text.
  • Line 7: We use the findall method to find the substrings that match the given RegEx.
  • Line 9: We print the text with punctuations.
  • Line 10: We print the text without punctuations.

Replace characters

Here, we can have two ways of solving the problems. They are as follows:

Replace all the punctuation marks

In this method, we include all the punctuation marks in the RegEx and replace them with empty characters using the re.sub() method in Python.

[!\"#\$%&\'\(\)\*\+,-\./:;<=>\?@\[\\\]\^_`{\|}~]

The RegEx above contains all the punctuation marks.

Example

import re
regex = r"[!\"#\$%&\'\(\)\*\+,-\./:;<=>\?@\[\\\]\^_`{\|}~]"
test_str = "mjfnd234gsd%@$%*}{:)()@#@#$`~"
subst = ""
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
print("String with punctuations - ", test_str)
print("String without punctuations - ", result)

Explanation

  • Line 1: We import the regex package.

  • Line 3: We define the RegEx.

  • Line 5: We define the string/text.

  • Line 7: We define the substitute/replacement character, which is an empty character.

  • Line 9: We use the sub method to replace the matching text with the replacement string.

  • Line 11: We print the text with punctuations.

  • Line 12: We print the text without punctuations.

Replace with negation

In this method, we replace any character other than space, word, or digit with an empty character using the re.sub() method in Python.

[^\s\w\d]
  • ^: Negation operator
  • \s: Space characters
  • \w: Word characters
  • \d: Digit characters

Example

import re
regex = r"[^\s\w\d]"
test_str = "mjfndgsd%@$%*}{:)()@#@#$`~"
subst = ""
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
print("String with punctuations - ", test_str)
print("String without punctuations - ", result)

Explanation

  • Line 1: We import the regex package.

  • Line 3: We define the RegEx.

  • Line 5: We define the string/text.

  • Line 7: We define the substitute/replacement character, which is an empty character.

  • Line 9: We use the sub method to replace the matching text with the replacement string.

  • Line 11: We print the text with punctuations.

  • Line 12: We print the text without punctuation.

Copyright ©2024 Educative, Inc. All rights reserved