Fix Various Issues and Improve Scrapper Script #162
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
New Updates (Commits on Jun 15, 2024):
Error Handling Improvements
enrol
function to ensure the script continues processing even if an error occurs during the execution. Each critical operation is now wrapped in a try-except block, allowing the script to skip problematic iterations and proceed with the next.Known Issues
false
in theduce-cli-settings.json
. This issue will be addressed in future updates.Previous Changes (Commits on Jun 15, 2024):
Improve Link Extraction and Error Handling in Scrapers
Link Extraction:
click.linksynergy.com
URLs, ensuring all valid links are captured by checking bothmurl=
andRD_PARM1
parameters.Nonce Extraction and Processing:
cv
function, and updated the processing to properly isolate and parse the JSON data, ensuring successful AJAX requests for fetching course data.Error Handling:
Progress Tracking:
Threading:
Data Aggregation:
These changes collectively improve the overall reliability, efficiency, and functionality of the script.
Previous Changes (Commits on Jun 10, 2024):
Fixed Cloudscraper Session Error:
s
was causing an error withcloudscraper.create_scraper(sess=s)
.Repaired Scrapers for Multiple Sites:
Corrected e-Next API Link:
The script is now functioning correctly.
However, there are a couple of minor issues that need attention:
The script sometimes keeps retrying. I have attached a screenshot of this behavior for reference.
The
tqdm
progress bar is slightly glitchy, repeating the name of the website. Despite this, the backend operations work perfectly, so this issue is only with the display of the progress bar.Note: I have only tested these changes with the CLI version and have not verified them with the GUI version.
I have no idea about GUI. I didn't try to run it once cause I only prefer the CLI version more often, and for PySimpleGUI, you can register yourself as a "Hobbyist" and get the developer key and can use it for a year... check it out here
These are the versions of my libraries from the requirements:
You can check and verify your versions with the above by running this command in PowerShell:
pip list | findstr /R "bs4 requests html5lib cloudscraper pyopenssl browser_cookie3 colorama tqdm"
Please review these changes and let me know if any further modifications are needed.