The Navigation failed because browser has disconnected error usually means that the node scripts that launched Puppeteer ends without waiting for the Puppeteer actions to be completed. Hence it’s a problem with some waitings as you told.
About your script, I made some changes to make it work:
- First of all you’re not awaiting the (async) end of the
stepThrufunction so change
stepThru();
to
await stepThru();
and
puppeteer.launch({devtools:false}).then(function(browser){
to
puppeteer.launch({devtools:false}).then(async function(browser){
(I added async)
- I changed the way you manage the
gotoandpage.oncepromises
The PDF promise is now:
new Promise(async function(resolve, reject){
//screenshot on first console message
page.once("console", async () => {
await page.pdf({
path: paper + '.pdf',
printBackground:true,
width:'1024px',
height:'768px',
margin: {
top:"0px",
right:"0px",
bottom:"0px",
left:"0px"
}
});
resolve();
});
})
and it has a single responsibility, just the PDF creation.
- Then I managed both the
page.gotoand PDF promises with aPromise.all
await Promise.all([
page.goto(url, {"waitUntil":["load", "networkidle2"]}),
new Promise(async function(resolve, reject){
// ... pdf creation as above
})
]);
- I moved the
page.closeafter thePromise.all
await Promise.all([
// page.goto
// PDF creation
]);
await page.close();
resolve();
And now it works, here the full working script:
const puppeteer = require('puppeteer');
//a list of sites to screenshot
const papers =
{
nytimes: "https://www.nytimes.com/",
wapo: "https://www.washingtonpost.com/"
};
//launch puppeteer, do everything in .then() handler
puppeteer.launch({devtools:false}).then(async function(browser){
//create a load_page function that returns a promise which resolves when screenshot is taken
async function load_page(paper){
const url = papers[paper];
return new Promise(async function(resolve, reject){
const page = await browser.newPage();
await page.setViewport({width:1024, height: 768});
await Promise.all([
page.goto(url, {"waitUntil":["load", "networkidle2"]}),
new Promise(async function(resolve, reject){
//screenshot on first console message
page.once("console", async () => {
await page.pdf({path: paper + '.pdf', printBackground:true, width:'1024px', height:'768px', margin: {top:"0px", right:"0px", bottom:"0px", left:"0px"} });
resolve();
});
})
]);
await page.close();
resolve();
})
}
//step through the list of papers, calling the above load_page()
async function stepThru(){
for(var p in papers){
if(papers.hasOwnProperty(p)){
//wait to load page and screenshot before loading next page
await load_page(p);
}
}
await browser.close();
}
await stepThru();
});
Please note that:
-
I changed
networkidle0tonetworkidle2because the nytimes.com website takes a very long time to land a 0 network requests state (because of the AD etc.). You can wait fornetworkidle0obviously but it’s up to you, it’s out of the scope of your question (increase thepage.gototimeout in that case). -
The
www.washingtonpost.comsite goes toTOO_MANY_REDIRECTSerror so I changed towashingtonpost.combut I think that you should investigate more about it. To test the script I used more times thenytimessite and other websites. Again: it’s out of the scope of your question.
Let me know if you need some more help 😉