Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] populate the restock & price column with xpath filtered result #2707

Open
starfishbzdf opened this issue Oct 14, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@starfishbzdf
Copy link

starfishbzdf commented Oct 14, 2024

Version and OS
v0.46.04 on docker

Is your feature request related to a problem? Please describe.
when re-stock & price auto detection fails, i resort back to xpath filter that points straight to the price - while this works as expected, it does not show up in the restock & price column in the home page.

Describe the solution you'd like
maybe a checkmark next to the xpath that declares that this is a price of a single product, then it can be displayed (even without stock information, that's fine) in the column.
alternatively, when auto-detection fails have somewhere to manually enter the xpath for the price.

Describe the use-case and give concrete real-world examples
alright here's a single-product page of a Fortigate 40F firewall that's not being picked up by the automatic re-stock & price detection feature:
https://baypo.co.il/product/FG-40F/
the error says:

Unable to extract restock data for this page unfortunately. (Got code 200 from server), no embedded stock information was found and nothing interesting in the text, try using this watch with Chrome.

which probably falls on that site implementing their store badly. happens.
so i highlight the price and right click to inspect, copy the xpath:

/html/body/div[3]/section[1]/div/div[2]/div/div[5]/div/span

put it in the CSS/JSONPath/JQ/XPath Filters text box and voila, simply follows the changes in price and nothing else.
(you know how great your project is but it doesn't hurt to praise it)

now it would be nice if it could show up in the column like the rest of the products i follow
image

@starfishbzdf starfishbzdf added the enhancement New feature or request label Oct 14, 2024
@mechanarchy
Copy link

Agree, this feature would be an amazing addition. I have quite a few entries that aren't correctly parsed by the restock/price detector, but a manual XPath rule can pull it out and format it however I like.

Having textboxes to enter "XPath for in-stock" and "XPath for price" would be ideal, but at minimum if we could just run the optionally run the stock/price parsing after standard change detection and filters that would solve the issue.

Example

Page url: https://www.bunnings.com.au/glitz-5l-citrus-dishwashing-liquid_p4465539
My XPath filters:

xpath:concat(//meta[@property="og:title"]/@content, "<br>")
xpath:concat(//p[@data-locator="product-price"], "<br>")
xpath://p[@data-locator="product-price-comparison"]/concat(., " ", ../*[2])

Rendered result:

Glitz 5L Citrus Dishwashing Liquid
$13.12
$2.62 per litre

Doesn't show the stock level, obviously, but certainly pricing can be easily pulled out. So a tick-box to check price now solves this issue.

@denilsonsa
Copy link

By looking at processors/restock_diff/processor.py, we can see it detects prices by trying to read data from one of the formats supported by extruct.

Thus, we can work-around this limitation by injecting our own JavaScript code that gets executed before the processor runs.

  1. Edit the item that cannot yet detect the correct price.
  2. At the "General" tab, choose "Re-stock & Price detection for single product pages".
  3. At the "Request" tab, choose "Playwright Chromium/JavaScript".
  4. At the "Request" tab, click on the "Show advanced options" button.
  5. At the "Execute JavaScript before change detection" box, add your custom script.

For instance, this one works for a major online site:

var s = document.createElement('script');
s.type = 'application/ld+json';
s.textContent = JSON.stringify({
  "matches": Array.from(document.querySelectorAll('#apex_desktop .a-price.priceToPay')).map(el => {
    let curr = el.querySelector('.a-price-symbol');
    let whole = el.querySelector('.a-price-whole');
    let frac = el.querySelector('.a-price-fraction');
    return {
      "currency": curr?.textContent.trim(),
      "price": whole?.textContent.trim().replace(/[^0-9]/g, '') + '.' + (frac?.textContent.trim() || '00'),
    };
  }),
});
document.body.appendChild(s);

I'm pretty sure someone can come up with simpler code.


Still, it could be much easier if we could nudge the processor into looking at the right elements. If we could tell the processor to just look at the text content of certain elements matching a CSS selector or XPath, that would be easier. That, however, also means the processor needs to understand multiple locales, being able to extract the currency from the number, and being able to properly parse the number regardless of the decimal separator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants