How to use Beautiful Soup's find() method

name: The tag name or a list of tag names to be searched.
attrs: A dictionary of attributes and their corresponding values to filter elements.
recursive: A Boolean value to specify whether to search only the direct children or the entire descendants (default is True).
text: A string or regular expression to find elements containing specific text.
**kwargs: Allows us to use CSS selectors or other filters for specific use cases.

Here are some of the functionalities that we can utilize using the find() method:

Finding elements by tag name

To locate elements based on their tag names, pass the tag name as the first argument to the find() method:

main.py

sample.html

<!DOCTYPE html>
<html>
<head>
    <title>Educative - Learn, Explore, and Grow</title>
</head>
<body>
    <header>
        <h1>Welcome to Educative</h1>
        <nav>
            <ul>
                <li>Courses with Assessments</li>
                <li>Assessments</li>
                <li>Blog</li>
                <li>About Us</li>
            </ul>
        </nav>
    </header>
    <div class='description'>
      Educative provides interactive courses for software developers. We are changing how 
      developers continue their education and stay relevant by providing pre-configured 
      learning environments that adapt to match a developer's skill level.
    </div>
</body>
</html>

main.py

sample.html

<!DOCTYPE html>
<html>
<head>
    <title>Educative - Learn, Explore, and Grow</title>
</head>
<body>
    <header>
        <h1>Welcome to Educative</h1>
        <nav>
            <ul>
                <li>Courses with Assessments</li>
                <li>Assessments</li>
                <li>Blog</li>
                <li>About Us</li>
            </ul>
        </nav>
    </header>
    <div class='description'>
      Educative provides interactive courses for software developers. We are changing how 
      developers continue their education and stay relevant by providing pre-configured 
      learning environments that adapt to match a developer's skill level.
    </div>
</body>
</html>

main.py

sample.html

<!DOCTYPE html>
<html>
<head>
    <title>Educative - Learn, Explore, and Grow</title>
</head>
<body>
    <header>
        <h1>Welcome to Educative</h1>
        <nav>
            <ul>
                <li>Courses with Assessments</li>
                <li>Assessments</li>
                <li>Blog</li>
                <li>About Us</li>
            </ul>
        </nav>
    </header>
    <div class='description'>
      Educative provides interactive courses for software developers. We are changing how 
      developers continue their education and stay relevant by providing pre-configured 
      learning environments that adapt to match a developer's skill level.
    </div>
</body>
</html>

main.py

sample.html

<!DOCTYPE html>
<html>
<head>
    <title>Educative - Learn, Explore, and Grow</title>
</head>
<body>
    <header>
        <h1>Welcome to Educative</h1>
        <nav>
            <ul>
                <li>Courses with Assessments</li>
                <li>Assessments</li>
                <li>Blog</li>
                <li>About Us</li>
            </ul>
        </nav>
    </header>
    <div class='description'>
      Educative provides interactive courses for software developers. We are changing how 
      developers continue their education and stay relevant by providing pre-configured 
      learning environments that adapt to match a developer's skill level.
    </div>
</body>
</html>

main.py

sample.html

<!DOCTYPE html>
<html>
<head>
    <title>Educative - Learn, Explore, and Grow</title>
</head>
<body>
    <header>
        <h1>Welcome to Educative</h1>
        <nav>
            <ul>
                <li>Courses with Assessments</li>
                <li>Assessments</li>
                <li>Blog</li>
                <li>About Us</li>
            </ul>
        </nav>
    </header>
    <div class='description'>
      Educative provides interactive courses for software developers. We are changing how 
      developers continue their education and stay relevant by providing pre-configured 
      learning environments that adapt to match a developer's skill level.
    </div>
</body>
</html>

main.py

sample.html

# import beautiful soup
from bs4 import BeautifulSoup
#import re for using regular expression
import re
# Read the HTML content from the local file
file_path = 'sample.html'
with open(file_path, 'r', encoding='utf-8') as file:
    html_content = file.read()
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
#Pattern
pattern = re.compile(r"software developers.*skill level\.$", re.MULTILINE | re.DOTALL)
element=soup.find(name='div', attrs={'class': 'description'}, recursive=True, text=pattern)
print("Output:", element)

In the code above, a regular expression pattern is defined using the re.compile() function. The pattern r"software developers.*skill level\.$" is used to match a string that starts with "software developers" and ends with "skill level". The re.MULTILINE and re.DOTALL flags are used to make the pattern match across multiple lines and handle newline characters. We then used the find() method that searches for a <div> element with the class attribute 'description' that contains text matching the previously defined pattern. The recursive=True argument tells Beautiful Soup to search for the element in nested structures as well.

Note: The find() method only returns first occurrence of matched element. To get all the elements of a specific criteria, you can use find_all().

Conclusion

The find() method is offered by the Beautiful Soup library which enables us to navigate HTML or XML documents with ease. By understanding the syntax and various filtering options of the find(), we can efficiently extract specific elements and data from web pages, making web scraping tasks more manageable and effective.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

How to use Beautiful Soup's find() method

Syntax

Finding elements by tag name

Finding elements by a list of tag names

Filtering elements by attributes

Finding within immediate children

Finding by text content

Combining all filters

Conclusion