-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
58 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
layout: post | ||
title: "Maximum Data URI size" | ||
date: 2024-10-05 | ||
categories: documentation | ||
|
||
authors: | ||
- name: Nick Nicholas | ||
email: [email protected] | ||
social_links: | ||
- https://github.com/opoudjis | ||
|
||
excerpt: >- | ||
Metanorma images are by default encoded within the generated XML file as Data URIs. In order to prevent processing | ||
problems, they are also by default constrained to 10 MB in size. | ||
--- | ||
|
||
Images, audio files, and video files are by default encoded in Metanorma as https://en.wikipedia.org/wiki/Data_URI_scheme[inline Data URIs]: | ||
rather than referencing an external file for the image, the documents generated by Metanorma (including the XML file | ||
that it takes as its starting point) represent the image inside of the file, as a (very long) URI. | ||
The same is done (though as a an XML element rather than a URI) with the potentially even longer representation | ||
of file attachments, which Alex Dyuzhev recently wrote about in link:/_posts/2024-08-20-pdf-attachments/[PDF Attachments]. | ||
(Attachments are just as valid for HTML as for PDF output.) | ||
|
||
There is an advantage to this internal representation of files, | ||
for distributing Metanorma documents: if you generate an HTML document, you can | ||
send it somewhere else as a single file, without needing to take care of the separate media files or file attachments it invokes. | ||
After all, you already do so for Word documents and for PDFs. | ||
|
||
There is a disadvantage to doing this, if the media file becomes so big that software starts having trouble | ||
with processig those URIs. Browsers think nothing of a URI 100 KB or 1 MB large; but by the time the URI | ||
needs to represent a video file 100 MB or 1 GB in size, as we have found, bad things start happening. | ||
|
||
To prevent bad things happen, we have put the following safeguards in place: | ||
|
||
* First of all, the default to represent media files as Data URIs can be turned off, by setting the document attribute | ||
`:data-uri-image: false`. If you do so, then the media files in your document are referenced, in the Metanorma XML files and the HTML output, | ||
as links to those external files, rather than bundling them inside the file. In that case, it is the Word and PDF | ||
outputs that need to convert the media files into internally bundled representations. And you will need to take care | ||
to include those media files when you upload the generated HTML file anywhere. | ||
|
||
* You can do the same with file attachments, through `:data-uri-attachments: false`. In that case, again, any file attachments | ||
will be referenced as links, rather than bundling them inside the file, and you will need to handle them the same way you handle | ||
attachemnts. The catch is that, unlike media files, HTML cannot make sense of Data URI encoding for an arbitrary attachment, | ||
so you will have to distribute the HTML file with its attachments as separate files anyway: `:data-uri-attachments: false` | ||
only shortens the XML files, it does not make the HTML any different. (In the case of HTML rendering, any attachments | ||
bundled with the file are exported to a folder called `_{document-name}_attachments`.) | ||
|
||
* In order to prevent users inadvertently generating Data URIs too big for a browser to handle, we set the maximum allowed | ||
Data URI size by default to 14 MB (corresponding to a 10 MB media file). If the Data URI needed to represent a media file is | ||
bigger than that, we now abort execution, with a warning that you need to change file configuration, to make sure you know what | ||
you are doing. You can deal with this warning in one of three ways: | ||
** Set `:data-uri-attachments: false` | ||
** Set `data-uri-maxsize` to a byte size big enough to capture your file. (Remember that Data URI encodings are one third larger | ||
than the binary files they encode). So if you have a 1 GB media file, you will need to set `data-uri-maxsize: 1400000000`, | ||
to prevent aborting. | ||
** Set `data-uri-maxsize: 0`, if you want to throw caution to the winds, and have no maximum Data URI size for your document. | ||
In which case, we admire your courage... |