How to handle web tables and dynamic web tables in Selenium

Selenium is a free and open-source automated testing suite for web user interfaces. Selenium’s tool suite provides automated testing for all types of web applications.
To understand how Selenium works with tables, first, we have to learn the structure of the table in HTML.

<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>

The table tag in HTML consists of the following components:

  • Table head <thead>: The section of an HTML table that contains the column headers.

  • Table body <tbody>: The main content area of an HTML table containing rows of data.

  • Table heading <th>: A cell within the <thead> that represents a header for a column or row in an HTML table.

  • Table row <tr>: A horizontal grouping of cells within an HTML table representing a single data entry.

  • Table data <td>: A cell within an HTML table that holds individual data values in rows and columns.

In Selenium, we fetch the main table element using any attribute like XPATH, id, CLASS_NAME or TAG_NAME etc. After fetching the table element, we can traverse it for each row present. We can print each table element <td> table data for each row in the table by simply using a for loop.

Here is an example static webpage with a table that we created using simple HTML and CSS, which can be found in the index.html file. The Selenium script to scrape data from the table is available in the main.py.

Code

# Importing libraries
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys

# Setting up automated browser
options = Options()
# disable sandboxing in Chrome 
options.add_argument('--no-sandbox')
# run Chrome in headless mode (without GUI)
options.add_argument('--headless')
# disable the use of the /dev/shm shared memory space in Chrome.
options.add_argument('--disable-dev-shm-usage')
# creating a dictionary to store preferences
prefs = {"download.default_directory": "."}

options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

url = "{{EDUCATIVE_LIVE_VM_URL}}/"
# Opeing URL
driver.get(url)

tableXPATH = '/html/body/table'

try:
    time.sleep(3);
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.XPATH, tableXPATH))
    )
    # Locate the table element using XPath
    table = driver.find_element(By.XPATH, tableXPATH)
    # Get all rows in the table
    rows = table.find_elements(By.TAG_NAME, 'tr')
    # Iterate through rows
    for row in rows:
        # Get columns (cells) in each row
        columns = row.find_elements(By.TAG_NAME, 'td')
        
        # Create a list to hold the cell data for this row
        row_data = []
        
        # Iterate through columns
        for column in columns:
            cell_text = column.text
            row_data.append(cell_text)
        
        # Format and print the row data
        formatted_row = " | ".join(row_data)
        print(formatted_row)


except Exception as e:
    print("Timeout")
    print(e)
    driver.quit()

time.sleep(10);
driver.close();
Handling tables with Selenium

Note: The left side of the terminal represents the request sent to the server, and on the right side is fetched data.

Code explanation

In the above case of the static table, we are using WebDriverWait method from the Selenium library to wait until the elements are visible on the screen before extracting. Usually, when dealing with static tables, we can retrieve the entire table once and then iterate each row to display data.

Dynamic tables in HTML draw their content from embedded JavaScript. In such scenarios, the WebDriverWait method is used to read data with some delay to avoid scraping when data is not loaded properly or when the data is being fetched from the API.

  • Lines 1–12: Importing the required libraries.

  • Lines 14–26: Defining setup for the automated browser.

  • Lines 30: Opening the URL in the automated browser.

  • Lines 36–37: Here, we used the WebDriverWait method that will wait until the desired element appears on the screen.

  • Lines 39–58: Defining the automation to fetch the table element, extracting table elements, and printing them on the console. We will use find_element() and find_elements() methods from the Selenium library to bring one or multiple elements from the HTML webpage.

  • Lines 61–64: The exception block will catch all the possible errors and print them on the terminal.

  • Lines 66–67: After successful execution, the system will wait for 10 seconds and close.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved