Overview / Features / Install / Run / Usage / Development / Structures / Links
Command Line Interface (CLI) utility for searching dead URLs inside files
The CLI utility takes a directory, finds all files recursively and looks for valid URLs. For every URL an HTTP GET request is sent. All returning HTTP Status Codes are gathered in a list which is written to stdout, can be sorted, filtered and further processed with tools like sed, awk or grep.
-
Iterating over directories and gathering a list of all files.
-
Search for valid URLs (http and https) inside the files and store all found URLs
-
Send an optional HTTP GET request to all URLs with custom timeout and retry (soon multi-threaded)
-
Record all returning HTTP Status Codes
-
Output a list of files, urls and line numbers (optional with context up to 3 lines)
-
Common verbosity by default arguments (
-v|-vv
) with additional output for information and debugging -
Collect statistics about processed directories, files, lines, URLs and sent requests
-
Track running time for processing files, searching URLs and dispatching requests
-
Utilities name sounds like one guy hunting for other dead things in the 10th season already ;)
- At the moment only UTF-8 is supported, relative paths are saved and no binary files are processed.
# Makefile targets without a Python Virtual Environment
make requirements install-user
# Or without makefile inside a Python Virtual Environment
python -m venv .venv_run
source .venv_run/bin/activate
pip install -r requirements.txt
python setup.py install --user --record files.log
deactivate
This installation will copy files to $HOME/.local/
and create files.log. This log stores
all installed files for convenience. To uninstall run the following:
# Makefile target
make uninstall
# Or without makefile something like this:
xargs rm -rvf < files.log && rm -fv files.log
derl --dispatch directory
$ derl --dispatch tests/test-directory/
tests/test-directory/dir-1/dir-2/test-4-dir-2.txt:1, 200, http://www.python.org/
tests/test-directory/dir-1/dir-2/test-4-dir-2.txt:4, 404, http://docs.python.org/something
# [...]
$ derl --context --dispatch tests/test-directory/
tests/test-directory/dir-1/dir-2/test-4-dir-2.txt:1, 200, http://www.python.org/
Sed condimentum efficitur orci, sed mollis tellus mollis a. Nullam http://www.python.org/
tempus magna ac felis iaculis rhoncus. Ut in sodales lectus. Integer vestibulum malesuada
tests/test-directory/dir-1/dir-2/test-4-dir-2.txt:4, 404, http://docs.python.org/something
ullamcorper. Integer quis ultricies odio. Fusce tincidunt a ligula id blandit. Integer
dignissim blandit turpis ac maximus. Donec http://docs.python.org/something eget justo tempus,
mauris.
# [...]
$ derl --stats --dispatch tests/test-directory/
# [...]
tests/test-directory/test-2-dir-0.txt:3, 404, http://www.dlqx.de/test
Finished checking URLs after 1.00 second(s).
Processed Directories/Files/Lines/Tokens/URLs: 3/7/42/491/7
Sent HTTP GET Requests: 7
derl [-h] [-c] [-d] [-r RETRY] [-s] [-t TIMEOUT] [--version] [-v] [-vv] directory
Dead URL searching utility
positional arguments:
directory directory for looking for dead URLs
optional arguments:
-h, --help show this help message and exit
-c, --context showing up to 3 lines of context
-d, --dispatch dispatching HTTP requests for every found URL
-r RETRY, --retry RETRY set how often to retry a request (default is 3)
-s, --stats track and print statistics at the end
-t TIMEOUT, --timeout TIMEOUT set timeout for requests in seconds (default is 10)
--version show program's version number and exit
-v, --verbose set loglevel to INFO
-vv, --very-verbose set loglevel to DEBUG
# Makefile targets
make requirements test develop
# Or without Makefile
python -m venv .venv_run
source .venv_run/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
python setup.py test
python setup.py develop
deactivate
# Linting project
make lint
# Generating report
make report
files: [
{
filename,
urls: [
(0): {
url,
status_code,
line_number
context: [
"line above matched line"
"line with found URL",
"line below matched line"
]
},
(1): {
url,
status_code,
line_number
context: [
"line above matched line"
"line with found URL",
"line below matched line"
]
},
...
(n): {
url,
status_code,
line_number
context: [
"line above matched line"
"line with found URL",
"line below matched line"
]
}
]
}
]
test-directory/
├── dir-1
│ ├── dir-2
│ │ ├── test-4-dir-2.txt
│ │ └── test-6-dir-2.txt
│ ├── test-3-dir-1.txt
│ ├── test-5-dir-1
│ └── test-7-dir-1.txt
├── test-1-dir-0.txt
└── test-2-dir-0.txt
# Makefile target
make update-references
# Or without Makefile
derl tests/test-directory/ > tests/references/output-without-context-without-dispatch.out && \
derl tests/test-directory/ --context > tests/references/output-with-context-without-dispatch.out && \
derl tests/test-directory/ -d > tests/references/output-without-context-with-dispatch.out && \
derl tests/test-directory/ --context --dispatch > tests/references/output-with-context-with-dispatch.out
- Blog, eshlox, VS Code - sort Python imports automatically
- Digital Ocean, How-To Use String Formatters in Python 3
- Findwork, Advanced usage of Python requests - timeouts, retries, hooks
- Geeks for geeks, Testing Output to stdout
- GitHub, Python Primer for Java Developers
- Medium, Testing sys.exit() with pytest
- Medium, What the mock? — A cheatsheet for mocking in Python
- Programiz, Python Tuple
- Pylint Tutorial, A Beginner’s Guide to Code Standards in Python
- PyScaffold, Installation and Examples
- Python How-To, Sorting How-To
- Python Reference, argparse — Parser for command-line options, arguments and sub-commands
- Python Reference, Basic customization to data models
- Python Reference, Mock Object Library
- Python Reference, pathlib — Object-oriented filesystem paths
- Python Reference, re - Regular expression operations
- Python Tips, Enumerate
- Python Tutorial, Errors and Exceptions
- Python, The Python Tutorial
- Readthedocs, Requests for humans, Quickstart
- Real Python, Understanding the Python Mock Object Library
- Stack Overflow Python mock requests.post to throw exception
- Stack Overflow, Accessing the index in 'for' loops?
- Stack Overflow, Can I set max_retries for requests.request?
- Stack Overflow, Control formatting of the argparse help argument list?
- Stack Overflow, Python str and lists
- Stack Overflow, Remove all the lines before the first line that contains a match?
- Stack Overflow, Why does "return list.sort()" return None, not the list?
- Twilio, HTTP Requests in Python 3
- Youtube, Learn Python in 60 Minutes from Java