As an internet developer, you could have needed to generate a PDF file of an internet web page to share along with your shoppers, use it in displays, or add it as a brand new function in your net app. Regardless of your cause, Puppeteer, Google’s Node API for headless Chrome and Chromium, makes the duty fairly easy for you.
On this tutorial, we’ll see the best way to convert net pages into PDF with Puppeteer and Node.js. Let’s begin the work with a fast introduction to what Puppeteer is.
What’s Puppeteer, and why is it superior?
In Google’s personal phrases, Puppeteer is, “A Node library which gives a high-level API to regulate headless Chrome or Chromium over the DevTools Protocol.”
What’s a headless browser?
Regardless that Puppeteer is especially a headless browser, you possibly can configure and use it as non-headless Chrome or Chromium.
What are you able to do with Puppeteer?
Puppeteer’s highly effective browser-capabilities make it an ideal candidate for net app testing and net scraping.
To call a number of use circumstances the place Puppeteer gives the right functionalities for net builders,
- Generate PDFs and screenshots of net pages
- Automate type submission
- Scrape net pages
- Carry out automated UI exams whereas retaining the check setting up-to-date.
- Producing pre-rendered content material for Single Web page Functions (SPAs)
Arrange the undertaking setting
You need to use Puppeteer on the backend and frontend to generate PDFs. On this tutorial, we’re utilizing a Node backend for the duty.
Initialize NPM and arrange the same old Categorical server to get began with the tutorial.
Ensure that to put in the Puppeteer NPM package deal with the next command earlier than you begin.
Convert net pages to PDF
Now we get to the thrilling a part of the tutorial. With Puppeteer, we solely want a number of strains of code to transform net pages into PDF.
First, create a browser occasion utilizing Puppeteer’s
Then, we create a brand new web page occasion and go to the given web page URL utilizing Puppeteer.
We’ve set the
waitUntil choice to
networkidle0. After we use
networkidle0 choice, Puppeteer waits till there aren’t any new community connections inside the final 500 ms. It’s a method to decide whether or not the location has completed loading. It’s not actual, and Puppeteer presents different choices, however it is without doubt one of the most dependable for many circumstances.
Lastly, we create the PDF from the crawled web page content material and put it aside to our gadget.
The print to PDF perform is sort of difficult and permits for lots of customization, which is unbelievable. Listed here are a few of the choices we used:
- printBackground: When this selection is about to true, Puppeteer prints any background colours or pictures you’ve got used on the internet web page to the PDF.
- path: Path specifies the place to avoid wasting the generated PDF file. It’s also possible to retailer it right into a reminiscence stream to keep away from writing to disk.
- format: You possibly can set the PDF format to one of many given choices: Letter, A4, A3, A2, and so on.
- margin: You possibly can specify a margin for the generated PDF with this selection.
When the PDF creation is over, shut the browser reference to
Construct an API to generate and reply PDFs from URLs
With the data we collect up to now, we will now create a brand new endpoint that can obtain a URL as a question string, after which it’s going to stream again to the shopper the generated PDF.
Right here is the code:
In case you begin the server and go to the
goal question param containing the URL we need to convert. The server will serve the generated PDF straight with out ever storing it on disk.
Which can generate the next PDF because it seems on the picture:
That’s it! You have got accomplished the conversion of an internet web page to PDF. Wasn’t that straightforward?
As talked about, Puppeteer presents many customization choices, so be sure to mess around with the alternatives to get totally different outcomes.
Subsequent, we will change the viewport dimension to seize web sites below totally different resolutions.
Seize web sites with totally different viewports
Within the beforehand created PDF, we didn’t specify the viewport dimension for the online web page Puppeteer is visiting, as an alternative used the default viewport dimension, 800×600px.
Nevertheless, we will exactly set the web page’s viewport dimension earlier than crawling the web page.
In at present’s tutorial, we used Puppeteer, a Node API for headless Chrome, to generate a PDF of a given net web page. Because you at the moment are aware of the fundamentals of Puppeteer, you should utilize this information sooner or later to create PDFs and even for different functions like net scraping and UI testing.
This article was initially printed on Reside Code Stream by Juan Cruz Martinez (twitter: @bajcmartinez), founder and writer of Reside Code Stream, entrepreneur, developer, writer, speaker, and doer of issues.
Reside Code Stream can also be obtainable as a free weekly publication. Join updates on every part associated to programming, AI, and pc science normally.