How to get web page HTML with Puppeteer

Puppeteer, created by Google, is a Node.js library offering an advanced API for managing both headless and headful browsers via the DevTools Protocol.

Retrieving the HTML of a page is useful in scenarios where we need to work with the raw HTML of a page, whether it’s for web scraping, data extraction, or other tasks that involve manipulating or analyzing the page’s structure.

Syntax

To get the HTML content of the current page, we use Puppeteer's page.content() function. It returns a Promise that resolves to the HTML string of the entire page.

await page.content();

The await keyword in JavaScript is used to pause the execution of the script until the Promise returned by the following method is resolved.

Code example

Execute the following code by clicking the "Run" button and see the HTML content of the opened page logged in the "Terminal" tab.

const puppeteer = require('puppeteer');
(async () => {
  // Launch a headless browser
  const browser = await puppeteer.launch({
    args: ['--no-sandbox']
  });
  // Open a new page
  const page = await browser.newPage();
  // Navigate to the desired URL
  await page.goto('https://www.scrapethissite.com/login/');
  // Get the HTML content of the page
  const html = await page.content();
  // Log the extracted HTML content
  console.log(html);
  // Close the browser
  await browser.close();
})();
Running code example for getting HTML of a web page with Puppeteer

You may have observed that the browser is opened in the background, as you didn’t see it open here. This is because, in Puppeteer, the browser is launched in headless mode (no visible GUI) by default.

Code explanation

  • Line 1: We import the Puppeteer library using the require function in Node.js. This action loads the Puppeteer module, making all of its functionality accessible within the script under the variable name puppeteer.

  • Line 2: We define an asynchronous function using the async keyword. Inside this function:

    • On lines 4–6, we launch the browser with Puppeteer.

    • On line 8, we create a new page.

    • On line 10, we open the desired URL.

    • On line 12, we extract the HTML of the opened page.

    • On line 14, we log the HTML of the page.

    • On line 16, we close the browser.

Note: We are passing the --no-sandbox argument to the puppeteer.launch() function to disable sandboxing to open the browser on the Educative platform. If you're running the script on your local machine, this argument might be unnecessary in your command.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved