Using Raku to scrape and analyze Ukraine Ministry of Defense Data

This content also rendered at the web.

This is mainly code to extract information from the reports on Russian invader losses published by the Ukrainian ministry of Defense, as well as extracted and processed data.

Docker containers

Get the latest version of the data locally:

docker run -p 31415:31415 ghcr.io/jj/ukr-mod-data:latest

and then

wget http://localhost:31415/ -o raw-data.csv
wget http://localhost:31415/deltas -o deltas.csv

You can also use

docker run --rm -t -v `pwd`:/home/raku/test  ghcr.io/jj/ukr-mod-data-csv

Introduction

News about combat losses of the Russian invaders are periodically published by the Ukraininan minister of Defense This is a Raku module that extracts information from those pages, for instance this one.

Note: English reports are updated less frequently than the Ukrainian one, which are updated daily. That's left for future work.

Installing

This repo uses Raku as well as Python for performing the whole downloading/scraping workflow. You will need a reasonably recent version of both to work. Additionally, install poetry globally.

When that's done, clone this repo or install via zef (when I upload it to the ecosystem , shortly). If you want to run it directly from here, run

zef install --deps-only .

and

poetry install

If you just want to use the Raku part yourself, use zef for installation:

zef install Data::UkraineWar::MoD

Running

You can always check the examples in the t directory. For convenience, an Akefile is also included. It contains several targets which automate some tasks

ake CSV: generates CSV file in a fixed location
ake download: invokes the python script to download data
ake prescrape: check if there's some downloaded file that can't be scraped

Reference

Not a lot of that, I'm afraid. There are two classes, Data::UkraineWar::MoD::Daily, which will deal with content from a single webpage (updated daily-ish) and Data::UkraineWar::MoD::Scrape which will look in a directory for all HTML pages and try to extract information from them, or bail out if some page does not contain losses information.

Issues

Please raise issues at the repo.

License

This module is licensed under the Artistic 2.0 License (the same as Raku itself). See LICENSE for terms.

Name		Name	Last commit message	Last commit date
Latest commit History 1,027 Commits
.github/workflows		.github/workflows
assets		assets
bin		bin
lib/Data/UkraineWar/MoD		lib/Data/UkraineWar/MoD
raw-pages-ukr		raw-pages-ukr
raw-pages		raw-pages
resources		resources
t		t
tools		tools
ukr_mod_data		ukr_mod_data
.gitignore		.gitignore
Akefile		Akefile
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
LICENSE		LICENSE
META6.json		META6.json
README.md		README.md
_config.yml		_config.yml
data.Dockerfile		data.Dockerfile
package-lock.json		package-lock.json
package.json		package.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Raku to scrape and analyze Ukraine Ministry of Defense Data

Docker containers

Introduction

Installing

Running

Reference

Issues

See also

License

About

Releases

Packages

Languages

License

JJ/raku-ukr-mod-data

Folders and files

Latest commit

History

Repository files navigation

Using Raku to scrape and analyze Ukraine Ministry of Defense Data

Docker containers

Introduction

Installing

Running

Reference

Issues

See also

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages