Cheerio js is a Javascript technology used for web-scraping in server-side implementations. Web-scraping is a scripted method of extracting data from a website that can be tailored to your use-case. NodeJS is often used as the server-side platform. With node installed, you can begin using cheerio after doing an npm install using the command:
npm install cheerio
Warning: Be careful to only scrape websites that you have permission to scrape. Scraping text from certain websites may be a breach of the copywrite, a violation of privacy, and/or against the terms of service.
Cheerio js is built over fb55’s htmlparser2, which parses HTML pages and allows the user to traverse/manipulate the resulting data structure. The syntax of cheerio js is similar to jQuery and the implementation is efficient and robust.
You can specify (find) elements on a web page and analyze the information depending on your use case. With this information, you can do everything you could do with objects in a programming language including counting instances of a specific object, looping through the instances to extract useful information, and more. For instance, you may want to extract all the text in the <h1>
(or headline) tags from a web-page.
If you’re interested in a Python-based solution for web-scraping, click here.
Read more about cheerio in the official docs.