This code is designed to automatically scrape data from the real estate website www.sreality.cz. Using this code, you can extract information about apartments offered for sale in Prague and save them to a file named data.json
.
cheerio
: Used for parsing and extracting data from HTML.puppeteer
: A library for browser automation (e.g., navigating to a page and fetching its content).fs
: Node.js module for file system operations.
This asynchronous function is responsible for fetching the web page's content.
- Arguments:
pageNumber
: The page number to fetch.
- Returns: The HTML content of the page or
null
in case of an error.
This asynchronous function performs the main task of data extraction and saving.
- Arguments:
pageNumber
: The starting page number for data scraping.
- Actions:
- Loads the HTML content of the specified page.
- Uses
cheerio
to parse and extract apartment data. - Checks if the
data.json
file exists and, if so, loads the existing data. - Appends the new data to the existing data.
- Saves the combined data to
data.json
. - Moves to the next page if the current page isn't the last one.
The code automatically starts scraping data from the first page upon execution.
After the code completes its execution, all the scraped data will be saved in the data.json
file.