This is the code that powers https://webbkoll.dataskydd.net – an online tool that checks how a webpage is doing with regards to privacy.
It attempts to simulate what happens when a user visits a specified page with a typical browser without clicking on anything, with the browser having no particular extensions installed, and with Do Not Track (DNT) disabled (as this is the default setting in most browsers).
In short: this tool, which runs the user-facing web service (built with Elixir and Phoenix), asks a simple Node.js backend to visit a page with Chromium. The backend uses Puppeteer to control Chromium; it visits and renders the page, collects various data (requests made, cookies, response headers, etc.), and sends it back as JSON to this tool which then analyzes the data and presents the results on a webpage along with explanations and advice.
Webbkoll is multilingual and currently supports English and Swedish. If you want to help us translate Webbkoll into more languages, see TRANSLATIONS.md.
Honeydew is used for job processing, and some basic rate limiting is done with ex_rated. Multiple backends can be configured. ConCache is used to store results in an in-memory ETS table for a limited time. Other than the Node.js backend, there are no external dependencies, and nothing is saved to disk.
Please note that this is still a work in progress. Expect bugs and messy code in places. Only a few basic tests are in place.
Also note that this tool is mainly meant to be used as a starting point for web developers. For more rigorous and systematic testing we recommend that you check out OpenWPM, which we used to analyze the websites of Sweden's municipalities (site, code). You might also want to have a look at PrivacyScore, which is a bit more comprehensive than Webbkoll (additionally checks e.g. email and TLS/SSL configuration) and also lets you compare/rank lists of sites.
This is a project by Dataskydd.net. See Webbkoll's About page for more information.
We've switched from PhearJS/PhantomJS to a tiny script that makes use of Puppeteer. You'll find it in this repo.
Install Erlang (>= 20) and Elixir (>= 1.7) -- see http://elixir-lang.org/install.html.
Clone this repository, cd into it.
Install dependencies:
mix deps.get
Make sure the backend is running on the host/port specified in config/dev.exs
Compile CSS with sassc, copy static assets (this replaces brunch and 340 node dependencies),
and make sure config/dev.secret.exs
(imported by config/dex.exs
) exists:
mkdir -p priv/static/css priv/static/fonts priv/static/images priv/static/js
sassc --style compressed assets/scss/style.scss priv/static/css/app.css
cat assets/static/js/webbkoll-*.js > priv/static/js/webbkoll.js
rsync -av assets/static/* priv/static
touch config/dev.secret.exs
Start the Phoenix endpoint with mix phx.server
(or to get an interactive shell: iex -S mix phx.server
)
Now you can visit localhost:4000
in your browser.
For Webbkoll to be able to automatically download the GeoLite2 country database (for GeoIP lookups),
you need to create a (free) account on MaxMind's GeoLite2 page
and get an API key. Add this key to config/dev.secret.exs
or config/prod.secret.exs
),
overriding the default value from the non-secret file:
use Mix.Config
config :webbkoll,
geoip_db_url: "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key=YOUR_KEY_HERE&suffix=tar.gz",
geoip_db_md5_url: "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key=YOUR_KEY_HERE&suffix=tar.gz.md5"
The GeoLite2 database is downloaded upon start if the file doesn't already exist
(should be priv/GeoLite2-Country.mmdb
). It is then refreshed once per week
(interval can be changed in config/config.exs
).
To run in production, first get and compile dependencies, and make sure config/prod.secret.exs
(imported by config/prod.exs
) exists:
mix deps.get --only prod
MIX_ENV=prod mix compile
touch config/prod.secret.exs
Next, do the compile CSS/rsync files step from above. Then digest and compress static files:
MIX_ENV=prod mix phx.digest
Start the server in the foreground (port must be specified):
MIX_ENV=prod PORT=4001 mix phx.server
Or detached:
MIX_ENV=prod PORT=4001 elixir --detached -S mix phx.server
Or in an interactive shell:
MIX_ENV=prod PORT=4001 iex -S mix phx.server
See also the official Phoenix deployment guides.
To run it as a systemd service (automatic start/restart on boot/crash/..), put something like this in e.g. /etc/systemd/system/webbkoll.service
(make sure to adjust User, Group, WorkingDirectory, etc.):
[Unit]
Description=Webbkoll
[Service]
Type=simple
ExecStart=/usr/local/bin/mix phx.server
WorkingDirectory=/home/foobar/webbkoll
Environment=MIX_ENV=prod
Environment=PORT=4001
User=foobar
Group=foobar
Restart=always
[Install]
WantedBy=multi-user.target
Run systemctl daemon-reload
for good measure, and then try systemctl start webbkoll
. (And systemctl enable webbkoll
to have it started automatically.)
- Add more suggestions for privacy-friendly alternatives to popular services
- Optionally visit a number of randomly selected internal pages and let the results be based on the collective data from all the pages
- Availability over Tor (e.g. does the visitor have to solve a Cloudflare captcha?)
- HTTPS Everywhere: check for requests that could have been secure
- Check localStorage (Web Storage)
- SSL Labs integration (or testssl.sh?)
- DNSSEC?
- IPv6 support
- Check whether site is in HSTS preload list?
- More translations?
- Generate good HTML versions of the GDPR (based on XML sources) and host locally instead of linking to third-parties of varying quality
- More? Let me know!
- German translation by Tomas Jakobs, with contributions from André Kelpe
- Norwegian translation by Tom Fredrik Blenning - Elektronisk Forpost Norge
- Phoenix Framework (MIT license) by Chris McCord
- Header/content analysis code in
lib/webbkoll/header_analysis.ex
,lib/webbkoll/content_analysis.ex
,test/webkoll/csp_test.exs
,test/webkoll/sri_test.exs
is based on work by April King for Mozilla HTTP Observatory, Mozilla Public License Version 2.0 - Bourbon, Neat, Bitters, Refills (
assets/scss/{base,bourbon,neat}
) (MIT license) by thoughtbot - tablesort (
assets/static/js/tablesort.min.js
andassets/scss/tablesort.css
) (MIT license) by Tristen Brown - A11y Toggle (
assets/static/js/a11y-toggle.min.js
) (MIT license) by Edenspiekermann - IcoMoon icons (
assets/static/fonts
) (GPL / CC-BY-4.0) by IcoMoon.io - Mozilla's version of Disconnect's open source list of trackers (
priv/services.json
) (GPLv3) by Disconnect, Inc. - GeoLite2 data created by MaxMind (CC BY-SA 4.0), available from http://www.maxmind.com. (Not included in the repository, but automatically downloaded to
priv/GeoLite2-Country.mmdb.gz
.) - JSON for ISO 3166-1 country code i18n from node-i18n-iso-countries (
priv/{en,fr,no,sv}.json
) (MIT license) - SVG flags/CSS (
assets/scss/flag-icon
,assets/static/flags
) from flag-icon-css (MIT license) by Panayiotis Lipiridis - HTML5 Shiv (
assets/static/js/html5shiv.min.js
) (MIT license) by Alexander Farkas
For the project code in general (things not noted above):
The MIT License (MIT)
Copyright (c) 2016-2020 Anders Jensen-Urstad
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.