-
Notifications
You must be signed in to change notification settings - Fork 5
/
example.html
115 lines (87 loc) · 9.94 KB
/
example.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>robustify.js</title>
<style>
body {margin-left:100px;margin-right:100px;font-family:Arial, Helvetica, sans-serif;line-height:1.2em}
a.example {text-decoration: none}
a.example:hover {text-decoration: underline}
dt, .code {font-family: courier; font-size:0.9em}
h2 {margin-bottom:4px}
h3 {margin-bottom:0}
p {margin-top:0}
dt {margin-top:1em}
</style>
</head>
<body>
<h1><span class="code">robustify.js</span></h1>
<p>
<span class="code">robustify.js</span> (<a href="https://github.com/renevoorburg/robustify.js">https://github.com/renevoorburg/robustify.js</a>) is a javascript add-on for web pages to fight <a href="https://en.wikipedia.org/wiki/Link_rot">link rot</a> or content drift. It is an implementation of Herbert Van de Sompel's <a href="https://robustlinks.mementoweb.org/spec/">Memento Robust Links - Link Decoration</a> specification (based in the <a data-versionurl="https://web.archive.org/web/20141130003524/http://hiberlink.org/" href="https://hiberlink.org">Hiberlink project</a>). With <span class="code">robustify.js</span> active on a page, any clicked hyperlink will test if the linked page is available online. If the page is not online, <span class="code">robustify.js</span> will redirect the user to a version in a web archive, by default using the <a href="https://timetravel.mementoweb.org/">Memento Time Travel service</a>.</p>
<p>
Because of the security model of Javascript, <span class="code">robustify.js</span> needs a server side helper script to be able to see if a page is available. This script, <span class="code">statuscode.php</span>, detects '404 file not found'-errors, and it can also detect so called soft-404 errors (error pages that do not supply the proper 404 status code). This is done by using a technique known as fuzzy hashing.
</p>
<h2>Basic implementation</h2>
<p>The easiest way to use <span class="code">robustify.js</span> is by simply putting this code at the bottom of your pages, preferably just before the <span class="code"></body></span>-tag.</p>
<pre>
<script src="https://digitopia.nl/js/robustify.js"></script>
<script>
robustify({});
</script>
</pre>
<h2>Live examples</h2>
<p>Here are some examples of how <span class="code">robustify.js</span> changes the behaviour of anchor links. <br>
<span class="code">robustify.js</span> will always present an alert before redirecting the user to a page other than specified in the original <span class="code">href</span>-part of the anchor link.</p>
<h3>A basic redirect</h3>
<p><span class="code"><a href="http://www.dds.nl/~krantb/stellingen/" class="example"><a href="http://www.dds.nl/~krantb/stellingen/">DDS stellingen</a></a></span><br>
Following a redirect, this basic hyperlink leads to a 404, "File not found". Therefore, <span class="code">robustify.js</span> will replace the link with a link that might lead to a version of this page in the default web archive. The preferred archive and preferred date of the archived version can be overridden when calling the script (or they can be specified using the <span class="code">data-versionurl</span> of <span class="code">data-versiondate</span> attributes).</p>
<h3>Soft-404 redirection</h3>
<p>Per default, <span class="code">robustify.js</span> calls the <span class="code">statuscode.php</span> server side helper with soft-404 detection enabled. The following example presents a link to a page that is essentially a soft-404, so a redirect to a web archive will follow:<br>
<span class="code"><a href="http://www.trouw.nl/tr/nl/4324/Nieuws/archief/article/detail/1593578/2010/05/12/Een-hel-vol-rijstkoeken-en-insecten.dhtml" data-versiondate="2014-01-01" class="example"><a href="http://www.trouw.nl/tr/nl/4324/Nieuws/archief/article/detail/1593578/2010/05/12/Een-hel-vol-rijstkoeken-en-insecten.dhtml">Een hel vol rijstkoeken en insecten</a></a></span>
</p>
<h3>The data-versiondate attribute</h3>
<p><span class="code"><a href="http://pedagogie.ac-toulouse.fr/histgeo/monog/comminge/sommaire.htm" data-versiondate="2009-01-20" class="example"><a href="http://pedagogie.ac-toulouse.fr/histgeo/monog/comminge/sommaire.htm" data-versiondate="2009-01-01">Sommaire</a></a></span><br>
In this example the date of the desired archived version has been specified explicitly using the <span class="code">data-versiondate</span> attribute. This date will be used when redirecting the user to the archive. Note that the archive might return a page from a date close to this date.</p>
<h3>The data-versionurl attribute</h3>
<p><span class="code"><a href="http://www.heimatverein-butzweiler.de/index.php?option=com_content&view=article&id=59&Itemid=79&lang=de" data-versionurl="http://web.archive.org/web/20090811002411/http://www.heimatverein-butzweiler.de/index.php?option=com_content&view=article&id=59&Itemid=79&lang=de" class="example"><a href="http://www.heimatverein-butzweiler.de/index.php?option=com_content&view=article&id=59&Itemid=79&lang=de" data-versionurl="http://web.archive.org/web/20090811002411/http://www.heimatverein-butzweiler.de/index.php?option=com_content&view=article&id=59&Itemid=79&lang=de">Römische Langmauer</a></a></a></span><br>
For this page that is not available online, a version from a webarchive has been defined using the <span class="code">data-versionurl</span> attribute. If the link has been decorated with both a <span class="code">data-versionurl</span> and a <span class="code">data-versiondate</span> the former takes precedence. </p>
<h3>Local links</h3>
<p><span class="code">robustify.js</span> will not modify links local to the website on which the script has been implemented. Further, additional measures have been taken to prevent it from running inside the context of a web archive.</p>
<h2>Customized implementation</h2>
<p>Per default, <span class="code">robustify.js</span> calls <span class="code"><a href="https://digitopia.nl/services/statuscode.php">http://digitopia.nl/services/statuscode.php</a></span> to obtain JSON formatted header information regarding the url of the clicked link. Of course, this can be customized, as most behaviour of the script. Here is an example of a customized call:</p>
<pre>
<script src="https://digitopia.nl/js/robustify.js"></script>
<script>
robustify({ "archive" : "https://web.archive.org/web/{yyyymmddhhmmss}/{url}",
"dfltVersiondate": "2010-01-01",
"statusservice" : "https://digitopia.nl/services/statuscode.php?url={url}",
"ignoreLinks" : [ "^http.?://[a-z]{2}\.wikipedia\.org",
"^http.?://(www\.)?wikidata\.org"
]
});
</script>
</pre>
<p>
Here is how this call has been customized:
<dl>
<dt>"archive"</dt>
<dd>When a link returns a 404, <span class="code">robustify.js</span> will redirect the user to a web archive. In determining where to send the user, the <span class="code">data-versionurl</span> takes precedence above other options. If no <span class="code">data-versionurl</span> has been given, the user will be redirected to the archive known by <span class="code">robustify.js</span>. Default this is <a href="https://timetravel.mementoweb.org/">timetravel.mementoweb.org</a> (actually an aggregator, not an archive). Using the <span class="code">"archive"</span> option you may specify an url pattern for an other web archive.</dd>
<dt>"dfltVersiondate"</dt>
<dd>When a link is not available and no <span class="code">data-versionurl</span> has been specified, the user will be redirected to the web archive known by <span class="code">robustify.js</span>. When supplied, the value of the <span class="code">data-versiondate</span> attribute of the link will be used as the preferred date for the version in the archive. When the link has not been decorated with a <span class="code">data-versiondate</span>, the <span class="code">dateModified</span>, or in absence of that the <span class="code">datePublished</span>, of the page will be used (following the <a href="https://schema.org/">schema.org</a> serialization, for example <span class="code"><meta itemprop="dateModified" content="2014-12-19"></span>). Otherwise the value of <span class="code">"dfltVersiondate"</span> will be used. If no <span class="code">versiondate</span> whatsoever is available, the current date will be used.</dd>
<dt>"statusservice"</dt>
<dd>Defines how the statuscode service is called. In this example, soft-404 detection is disabled. To enable it, add the parameter <span class="code">"soft404detect"</span> as in <span class="code">"http://digitopia.nl/services/statuscode.php?soft404detect&url={url}"</span>. Of course, you can run this service on your own server. Mind that to be able to do soft-404 detection, the <span class="code">statuscode.php</span> script requires PHP to have the <a href="https://pecl.php.net/package/ssdeep">ssdeep extension</a> loaded. </dd>
<dt>"ignoreLinks"</dt>
<dd><span class="code">robustify.js</span> will not alter the behaviour of local links, regardless whether they are defined relative or absolute. To make the script ignore more links, patterns may be defined here.</dd>
</dl>
</p>
<h2>Internationalization</h2>
<p>
<span class="code">robustify.js</span> has been designed to present alerts and dialogs using the preferred language of the browser. Currently, English and Dutch strings have been supplied. Feel free to add more support for more languages by modifying the code (<a href="https://github.com/renevoorburg/robustify.js">https://github.com/renevoorburg/robustify.js</a>).
</p>
<p><em>René Voorburg, updated October 2024</em></p>
<script src="https://digitopia.nl/js/robustify.js"></script>
<script>
robustify({});
</script>
</body>
</html>