-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
I'm using playwright to download PDFs of URLs from RSS feeds. Some of the URLs are actually links to PDFs (mixed with links to "normal" webpages), and I'd like to handle that by downloading the PDFs. I have an implementation which works, I've included a minimal(ish) version below which accepts a URL and a filename to write it to. You can try for example (where script.py
contains the code below):
Convert a web page to PDF: python3 script.py https://arxiv.org/abs/1912.11035 a.pdf
Download a PDF: python3 script.py https://arxiv.org/pdf/1912.11035 b.pdf
The first example (converting a webpage to a PDF) outputs this:
goto success: https://arxiv.org/abs/1912.11035
download exception: https://arxiv.org/abs/1912.11035: Page closed
Future exception was never retrieved
future: <Future finished exception=Error('Target page, context or browser has been closed')>
playwright._impl._api_types.Error: Target page, context or browser has been closed
I can't figure out how to stop the "Future exception was never retrieved" warning being printed. As you can see the "Page closed" exception has been caught in the exception handler for await download_task
.
Am I doing something wrong? Or is this an issue in the playwright code?
import sys
import asyncio
from playwright.async_api import async_playwright
async def download(url, filename):
async with async_playwright() as p:
browser = await p.chromium.launch()
context = await browser.new_context(accept_downloads = True, java_script_enabled = False)
page = await context.new_page()
download_task = asyncio.create_task(page.wait_for_event('download'))
goto_task = asyncio.create_task(page.goto(url, wait_until='networkidle'))
try:
await goto_task
await page.pdf(path=filename)
print('goto success: ' + url)
await page.close()
success = True
except Exception as e:
print('goto exception: {}: {}'.format(url, e))
try:
download = await download_task
await download.save_as(filename)
print('download success: ' + url)
await page.close()
success = True
except Exception as e:
print('download exception: {}: {}'.format(url, e))
if not success:
await page.close()
if __name__ == '__main__':
asyncio.run(download(sys.argv[1], sys.argv[2]))