Puppeteer
It is a Node library that provides a high-level API to control headless Chrome or Chromium browsers over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.
Why puppeteer?
These are few things that we can achieve with puppeteer.We will go one by one in detail.
Let start Before we directly start the coding we need to do setup
Download latest version of Node.js (https://nodejs.org/en/download/)
Create a project folder — mkdir scraper > cd scraper
Initialise project directory — npm init.
It will initialise your working directory for node project, and it will present a sequence of prompt; just press Enter on every prompt, or you can use (npm init -y) it will append the default value for you, saved in package.json file in the current directory.
use npm command to install Puppeteer > npm i puppeteer
This will download and bundle the latest version of Chromium.
Create a file for ex: app.js
Features
Puppeteer enables you to do most of the things that you can do manually in the browser. These features include:
Generate screenshots and PDFs of pages.
Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e., “SSR” (Server-Side Rendering)).
Automate form submission, UI testing, keyboard input, etc.
Create an up-to-date, automated testing environment. Run your tests directly in the latest version of Chrome using the latest JavaScript and browser features.
Capture a timeline trace of your site to help diagnose performance issues.
Test Chrome Extensions.
Installation
To use Puppeteer in your project, run:
npm i puppeteer
When you install Puppeteer, it downloads a recent version of Chromium by default.
There is also the puppeteer-core package, a version of Puppeteer that doesn’t download any browser by default. To install this package, run:
npm i puppeteer-core
Difference between Puppeteer and Selenium
Puppeteer | Selenium |
Puppeteer was developed by Google and runs the script on Chromium | Selenium is the node.js library that is used to automate Chrome. This library is open-source and provides a high-level API to control Chrome |
Is a Node.js library | Is a web framework for testing web applications |
Works only with Chrome or Chromium and does not support other browsers. | Supports multiple browsers like Chrome, IE, Firefox, Safari, etc. Cross-platform support is provided across all the available browsers |
Was released in 2017 | Was released in 2004 |
Supports only Node.js | Supports multiple languages like Python, Ruby, Javascript, Java, etc. |
Supports only web automation | Supports web automation and mobile automation |
Screenshot can be taken of both PDFs and images | Screenshot can be taken of both PDFs and images only in Selenium 4 |
The Tech Platform
Comments