Scraping Yahoo Finance with Selenium
Explore how to scrape financial news from Yahoo Finance using Selenium. Understand how to handle JavaScript-driven content by leveraging CSS selectors and automated scrolling to capture dynamic data efficiently.
We'll cover the following...
Having acquired knowledge about Selenium, let's utilize this understanding to extract financial news data from Yahoo Finance. Yahoo is notorious for incorporating JavaScript on its website, rendering traditional scraping techniques ineffective.
To begin, we'll focus on extracting the first rendered news of stock market news:
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://finance.yahoo.com/topic/stock-market-news/")
try:
news = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[class="stream-item story-item yf-1usaaz9"]')))
except TimeoutException:
raise TimeoutException("Elements are not loaded")
print("len of news: ", len(news))
data = []
for n in news:
title = n.find_element(By.CSS_SELECTOR,
"section div h3").text
link = n.find_element(By.CSS_SELECTOR,
"section div a").get_attribute("href")
d = {'title': title, 'link': link}
data.append(d)
print("len of scraped data: ", len(data))
print("sample: ", data[0])
# We are using this only for demonstration purpose.
time.sleep(2)
driver.close()
Note: In the provided code, there is a hidden section that handles the imports and options for the driver, as covered earlier. For the purpose of this lesson, we will solely focus on the scraping part.
Lines 1–2: We initialize the web driver and make a
GETrequest to the subreddit URL.Lines 5–9: Using the CSS path selector
li[class="stream-item story-item yf-1usaaz9"]...