Additional Settings
Once you have added all the properties you need, you can configure additional settings to customize your scraper. These settings include:
Dynamic URLs
For scraping pages that follow a predictable URL pattern but require iteration over numerical sequences or specific parameters. Simply add or update a URL in your scraper settings, and use the {PARAMETER_NAME}
syntax to introduce dynamic elements. For example, converting https://domain.com/page12
to https://domain.com/page{PAGE_NUMBER}
allows the scraper to iterate through pages by substituting {PAGE_NUMBER}
with actual values.
You can specify ranges or lists for each parameter. For PAGE_NUMBER, a range like [1,200]
instructs the scraper to sequentially access pages 1 through 200. For more variability, parameters can take on values from a list like example1;example2;example3
, enabling the scraper to substitute and iterate through these values as well.
The combination of these parameters results in a full list of URLs for the scraper to target.
Toggle JavaScript
Some websites use JavaScript to load content dynamically. If you want to scrape a website that uses JavaScript, you can enable the Use Javascript
setting. This setting will allow you to scrape the content that is loaded by JavaScript.
If the website you want to scrape does not use JavaScript, you should disable the Use Javascript
setting. This will make your scraper run faster and consume less credits.
Smart Align
Smart Align is a feature that helps you align the data you scrape. When you enable Smart Align, the scraper will automatically align all data points in the rows with the most relevant other data points. This feature is useful when you are scraping properties that may not have the same number of data points.
For example, if you are scraping a list of products and some products have prices, while some don't due to being out of stock, Smart Align will align the properties of each product so that they are in the same order and the data is not misaligned.
If you disable Smart Align, the data returned from each property will be listed from top to bottom.
Timeouts
There are two types of timeouts you can configure for your scraper:
-
Page Timeout: The maximum time the scraper will wait for a page to load before giving up. If the page takes longer to load than the specified time, the scraper will move on to the next page.
-
Global Timeout: The maximum time the scraper will run before giving up. If the scraper takes longer to run than the specified time, it will stop and return the data it has collected so far.