Fix Various Issues and Improve Scrapper Script #162

hoz-efa · 2024-06-13T19:37:43Z

Pull Request Summary

New Updates (Commits on Jun 15, 2024):

Error Handling Improvements

Improved Error Handling: Added comprehensive error handling in the enrol function to ensure the script continues processing even if an error occurs during the execution. Each critical operation is now wrapped in a try-except block, allowing the script to skip problematic iterations and proceed with the next.

Known Issues

Language Exclusion: The script still subscribes to free courses in languages that are set to be excluded in the settings. For example, courses in Arabic are being subscribed to even though Arabic is set to false in the duce-cli-settings.json. This issue will be addressed in future updates.

Previous Changes (Commits on Jun 15, 2024):

Improve Link Extraction and Error Handling in Scrapers

Link Extraction:
- Improved logic to handle special cases for click.linksynergy.com URLs, ensuring all valid links are captured by checking both murl= and RD_PARM1 parameters.
Nonce Extraction and Processing:
- Corrected the extraction of the JSON string containing the nonce from the script tag in the cv function, and updated the processing to properly isolate and parse the JSON data, ensuring successful AJAX requests for fetching course data.
Error Handling:
- Enhanced error handling across all functions to ensure the script continues processing remaining items if an error occurs, with retries for network requests and handling cases where required elements might not be found.
Progress Tracking:
- Refined progress tracking within each scraper function to provide accurate updates on the scraping process.
Threading:
- Utilized threading to parallelize scraping tasks, ensuring efficient processing of multiple sites.
Data Aggregation:
- Improved the aggregation of scraped data into a unified list, maintaining consistency in the format and structure of the results.

These changes collectively improve the overall reliability, efficiency, and functionality of the script.

Previous Changes (Commits on Jun 10, 2024):

Fixed Cloudscraper Session Error:
- Resolved the issue where creating a scraper session through s was causing an error with cloudscraper.create_scraper(sess=s).
Repaired Scrapers for Multiple Sites:
- Fixed the scrapers for Disudemy, Coursevania, and iDownloadCoupon. Previously, the script was unable to start due to issues with these scrapers.
Corrected e-Next API Link:
- Updated the script with the correct link for the e-next API, ensuring proper API interaction.

The script is now functioning correctly.

However, there are a couple of minor issues that need attention:

The script sometimes keeps retrying. I have attached a screenshot of this behavior for reference.
The tqdm progress bar is slightly glitchy, repeating the name of the website. Despite this, the backend operations work perfectly, so this issue is only with the display of the progress bar.

Note: I have only tested these changes with the CLI version and have not verified them with the GUI version.

I have no idea about GUI. I didn't try to run it once cause I only prefer the CLI version more often, and for PySimpleGUI, you can register yourself as a "Hobbyist" and get the developer key and can use it for a year... check it out here

These are the versions of my libraries from the requirements:

bs4                       0.0.2
cloudscraper              1.2.71
colorama                  0.4.6
html5lib                  1.1
requests                  2.31.0
requests-file             2.0.0
requests-toolbelt         1.0.0
tqdm                      4.66.4

You can check and verify your versions with the above by running this command in PowerShell:
pip list | findstr /R "bs4 requests html5lib cloudscraper pyopenssl browser_cookie3 colorama tqdm"

Please review these changes and let me know if any further modifications are needed.

Bugs and Fixes

Improved Error Handling: Added comprehensive error handling in the "enrol" function to ensure the script continues processing even if an error occurs during the execution.

hoz-efa and others added 3 commits June 10, 2024 21:08

Update base.py

50bf768

Bugs and Fixes

Improve Link Extraction and Error Handling in Scrapers

c97095e

Error Handling Improvements

b2b57db

Improved Error Handling: Added comprehensive error handling in the "enrol" function to ensure the script continues processing even if an error occurs during the execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Various Issues and Improve Scrapper Script #162

Fix Various Issues and Improve Scrapper Script #162

hoz-efa commented Jun 13, 2024 •

edited

Loading

Fix Various Issues and Improve Scrapper Script #162

Are you sure you want to change the base?

Fix Various Issues and Improve Scrapper Script #162

Conversation

hoz-efa commented Jun 13, 2024 • edited Loading

Pull Request Summary

New Updates (Commits on Jun 15, 2024):

Error Handling Improvements

Known Issues

Previous Changes (Commits on Jun 15, 2024):

Improve Link Extraction and Error Handling in Scrapers

Previous Changes (Commits on Jun 10, 2024):

hoz-efa commented Jun 13, 2024 •

edited

Loading