Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartScraperGraph only extracts a small part of items requested #710

Open
sillasgonzaga opened this issue Sep 29, 2024 · 4 comments
Open

Comments

@sillasgonzaga
Copy link

Describe the bug
It's not quite an error, but I am trying to scrape this Aliexpress search page, which contains 60 products listed in the first page. However, it only returns data for 10 products. It's probably due to how the web page is loaded. Is there any parameter I could use to increase the wait time before extracting the source code of the requested page?

To Reproduce

from scrapegraphai.graphs import SmartScraperGraph, ScriptCreatorGraph, OmniScraperGraph, SmartScraperMultiGraph 

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "api_key": "MY_KEY",
        "model": "openai/gpt-4o-mini",
    },
    "library": "selenium",
    "verbose": False,
    "headless": True
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Return the data about the products listed, including product id and product name",
    source="https://pt.aliexpress.com/w/wholesale-TECIDO-PAET%C3%8A-ROSA.html",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)
@VinciGit00
Copy link
Collaborator

@sillasgonzaga
Copy link
Author

@VinciGit00 thanks but sadly it did not work, it kept returning just 10 results.

@VinciGit00
Copy link
Collaborator

have you tried to add: a config like this:
graph_config = { "llm": { "api_key": openai_key, "model": "openai/gpt-4o", }, "verbose": True, "headless": False, },
headless should be false

@djds4rce
Copy link

djds4rce commented Oct 1, 2024

tried with headless false too. Same behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants