Intermittent Headless Timeout Error on Non-Local Environments #491

mamuchastegui · 2024-07-26T18:32:15Z

Describe the bug
I have this error:
"No HTML body content found, please try setting the 'headless' flag to False in the graph configuration. HTML content: Error: Page.goto: Timeout 30000ms exceeded. Call log: navigating to 'https://clarin.com/', waiting until 'load'."

This only occurs in non-local environments and the error is intermittent; it had happened about a month ago and has now occurred again. When I use headless mode, this error appears, and it seems to be something that broke with the latest changes.

To Reproduce
Domain: "https://clarin.com/"
Prompt: "List me 3 product images with their title, image_url and url."

import asyncio

from scrapegraphai.graphs import SmartScraperGraph

from app.infrastructure.config.env_config import env
from app.infrastructure.logging import logger

graph_config = {
    "llm": {
        "api_key": env.OPENAI_SCRAPEGRAPH_API_KEY,
        "model": "gpt-4o",
    },
    "headless": True,
    "verbose": False,
    "timeout": 60000,
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36",
}


async def scrapegraph_products(domain_url: str, prompt: str):
    logger.info(f"scrapegraph_products - domain: {domain_url}, prompt: {prompt}")
    scrape_graph = SmartScraperGraph(
        prompt=prompt,
        source=domain_url,
        config=graph_config,
    )

    loop = asyncio.get_running_loop()
    result = await loop.run_in_executor(None, scrape_graph.run)

    logger.info(f"scrapegraph_products result: {result}")
    return result

Expected results

{'products': [{'title': 'La ceremonia de apertura de los Juegos Olímpicos 2024 en las mejores fotos', 'image_url': 'https://www.clarin.com/img/2024/07/26/KIQdkZVmE_290x290__1.jpg', 'url': 'https://clarin.com/fotogalerias/ceremonia-apertura-juegos-olimpicos-2024-mejores-fotos-paris-2024-jjoo-2024-argentina_5_4CHQW7XKlw.html'}, {'title': 'Santiago Lange y un emotivo reconocimiento previo a la ceremonia de apertura de los Juegos Olímpicos: fue relevo de la antorcha de atletas legendarios', 'image_url': 'https://www.clarin.com/img/2024/07/26/rn-hNvDc4_600x290__2.jpg#1722010337269', 'url': 'https://www.clarin.com/deportes/santiago-lange-emotivo-reconocimiento-previo-ceremonia-apertura-juegos-olimpicos-relevo-antorcha-atletas-legendarios_0_FWiwtIz4xh.html'}, {'title': 'Francis Ford Coppola en problemas: un video besando a extras que estaban en topless de su filme Megalópolis', 'image_url': 'https://www.clarin.com/img/2024/07/26/AcUPKcQmm_600x290__1.jpg', 'url': 'https://www.clarin.com/espectaculos/francis-ford-coppola-problemas-video-besando-extras-topless-filme-megalopolis_0_EJSTNUX8bl.html'}]}

Additional context
Dockerfile python:3.10-slim
Google App Engine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent Headless Timeout Error on Non-Local Environments #491

Intermittent Headless Timeout Error on Non-Local Environments #491

mamuchastegui commented Jul 26, 2024

Intermittent Headless Timeout Error on Non-Local Environments #491

Intermittent Headless Timeout Error on Non-Local Environments #491

Comments

mamuchastegui commented Jul 26, 2024