Essex County Council Website Link Scraper

This is a one-off project to test that redirects are in place, before a site is migrated to a new location.

Running

There are a few steps.

Python

Ensure you have Python 3.9 installed, as well as pipenv. Then run the following:

pipenv install

Scraping a list of pages

This scrapes the existing site to get a list of page links that should have redirects from them. Run

pipenv run scrapy runspider scraper.py -o ./output/output.csv

Modifying your hosts file

Add a line to your hosts file (using something like sudo nano /etc/hosts) for the site that you're trying to test. This is needed because the DNS will point to the old website and will not enable you to test anything.

Testing that redirects are working

Run

pipenv run python test_redirects.py

The output will go to ./output/redirect_test_output.xlsx

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
scraper.py		scraper.py
test_redirects.py		test_redirects.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Essex County Council Website Link Scraper

Running

Python

Scraping a list of pages

Modifying your hosts file

Testing that redirects are working

About

Languages

License

essexcountycouncil/website-link-scraper

Folders and files

Latest commit

History

Repository files navigation

Essex County Council Website Link Scraper

Running

Python

Scraping a list of pages

Modifying your hosts file

Testing that redirects are working

About

Resources

License

Stars

Watchers

Forks

Languages