How to do simple web scraping
When we are working in performance, sometimes we just need to do a few tweaks on an HTML page, like changing the order of some scripts loading in order to test how that affects the site speed. This is just a use case, there can be multiple other reasons. In order to do that, we can follow an easy and clean approach which is scraping the site and implement the changes that we need to test.
Here is the general guideline:
- Use the npm package website-scraper.
- npm install website-scraper
- Make sure you update your package.json in order to activate ES modules
"type": "module"
const scrape = require('website-scraper'); const options = { urls: [ 'http://www.site.com/', { url: 'http://www.site.com/about', filename: 'about.html' }, { url: 'http://www.site.com/product-123', filename: 'product-123.html' } ], directory: './site', }; (async () => { await scrape(options); })();
node main.js
That's it!
This example just scrapes the pages that are configured, but you can make the scraper follow all hyperlinks in HTML pages, making a complete sweep of the site's content.
This approach removes a lot of overhead since it doesn't care about the backend technology behind the site. It simply creates static content that you can host. This makes front end performance testing a lot easier because we are removing most of the web server internal latency.