Skip to content
This repository has been archived by the owner on Jun 6, 2023. It is now read-only.

⛔ DEPRECATED - A simple scrapy-based spider for checking that redirects are working before a website is migrated

License

Notifications You must be signed in to change notification settings

essexcountycouncil/website-link-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Essex County Council Website Link Scraper

This is a one-off project to test that redirects are in place, before a site is migrated to a new location.

Running

There are a few steps.

Python

Ensure you have Python 3.9 installed, as well as pipenv. Then run the following:

pipenv install

Scraping a list of pages

This scrapes the existing site to get a list of page links that should have redirects from them. Run

pipenv run scrapy runspider scraper.py -o ./output/output.csv

Modifying your hosts file

Add a line to your hosts file (using something like sudo nano /etc/hosts) for the site that you're trying to test. This is needed because the DNS will point to the old website and will not enable you to test anything.

Testing that redirects are working

Run

pipenv run python test_redirects.py

The output will go to ./output/redirect_test_output.xlsx

About

⛔ DEPRECATED - A simple scrapy-based spider for checking that redirects are working before a website is migrated

Resources

License

Stars

Watchers

Forks

Languages

  • Python 100.0%