What is SequenceMatcher() in Python?

SequenceMatcher is a class that is available in the difflib Python package.

The difflib module provides classes and functions for comparing sequences. It can be used to compare files and can produce information about file differences in various formats.

This class can be used to compare two input sequences or strings. In other words, this class is useful to use when finding similarities between two strings on the character level.

The basic idea behind SequenceMatcher() is to find the longest contiguous matching subsequence (LCS) that contains no “junk” elements. Junk are the things that we don’t want the algorithm to match on, like blank lines in ordinary text files, <P> lines in HTML files, etc. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.

Below is the code used to compare two strings:

import difflib
string1 = "I love to eat apple."
string2 = "I do not like to eat pineapple."
temp = difflib.SequenceMatcher(None,string1 ,string2)
print(temp.get_matching_blocks())
print('Similarity Score: ',temp.ratio())

Explanation:

  • On line 1, we import the required package.

  • On lines 3 and 4, we define the two input strings.

  • On line 6, we instantiate the object of the SequenceMatcher() class. We pass the two strings and Nonethis value specifies that we do not want to specify any junk element to be considered. to the constructor of this class.

  • On line 8, we print the continuous matching blocks. You can see in the output that we get a Match object that contains:

    • a: start index of the first string.
    • b: start index of the second string.
    • size: length of the match found between the two strings.
  • On line 9 we print the similarity score of the two input strings. The ratio() function returns the similarity score (float in [0,1]) between input strings and sums the sizes of all matched sequences returned by the get_matching_blocks() function. It calculates the ratio as:

    Ratio = 2.0 * MT{\frac{M}{T}},

    where M= “matches” and T= “total number of elements” in both the sequences.

Now, let’s see how all of this gets calculated:

  1. We get all the matching blocks (contain the length of all the matches) as we can see in the output Match(a=0, b=0, size=2), Match(a=2, b=9, size=1), Match(a=5, b=12, size=9), Match(a=14, b=25, size=6) .
  2. We sum up the match sizes, matches = 2 + 1 + 9 + 6, which equals 18.
  3. The total length of both the sequences will come out to be 51 as it is shown in the match output Match(a=20, b=31, size=0).
  4. We will apply the above formula and get the Ratio = 2.0 * 1851{\frac{18}{51}}.

The answer is 0.7058823529411765the similarity between two input sequences.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved