diff --git a/_posts/2024-10-05-max-data-uri-size.adoc b/_posts/2024-10-05-max-data-uri-size.adoc new file mode 100644 index 00000000..51ef858f --- /dev/null +++ b/_posts/2024-10-05-max-data-uri-size.adoc @@ -0,0 +1,58 @@ +--- +layout: post +title: "Maximum Data URI size" +date: 2024-10-05 +categories: documentation + +authors: + - name: Nick Nicholas + email: nick.nicholas@ribose.com + social_links: + - https://github.com/opoudjis + +excerpt: >- + Metanorma images are by default encoded within the generated XML file as Data URIs. In order to prevent processing + problems, they are also by default constrained to 10 MB in size. +--- + +Images, audio files, and video files are by default encoded in Metanorma as https://en.wikipedia.org/wiki/Data_URI_scheme[inline Data URIs]: +rather than referencing an external file for the image, the documents generated by Metanorma (including the XML file +that it takes as its starting point) represent the image inside of the file, as a (very long) URI. +The same is done (though as a an XML element rather than a URI) with the potentially even longer representation +of file attachments, which Alex Dyuzhev recently wrote about in link:/_posts/2024-08-20-pdf-attachments/[PDF Attachments]. +(Attachments are just as valid for HTML as for PDF output.) + +There is an advantage to this internal representation of files, +for distributing Metanorma documents: if you generate an HTML document, you can +send it somewhere else as a single file, without needing to take care of the separate media files or file attachments it invokes. +After all, you already do so for Word documents and for PDFs. + +There is a disadvantage to doing this, if the media file becomes so big that software starts having trouble +with processig those URIs. Browsers think nothing of a URI 100 KB or 1 MB large; but by the time the URI +needs to represent a video file 100 MB or 1 GB in size, as we have found, bad things start happening. + +To prevent bad things happen, we have put the following safeguards in place: + +* First of all, the default to represent media files as Data URIs can be turned off, by setting the document attribute +`:data-uri-image: false`. If you do so, then the media files in your document are referenced, in the Metanorma XML files and the HTML output, +as links to those external files, rather than bundling them inside the file. In that case, it is the Word and PDF +outputs that need to convert the media files into internally bundled representations. And you will need to take care +to include those media files when you upload the generated HTML file anywhere. + +* You can do the same with file attachments, through `:data-uri-attachments: false`. In that case, again, any file attachments +will be referenced as links, rather than bundling them inside the file, and you will need to handle them the same way you handle +attachemnts. The catch is that, unlike media files, HTML cannot make sense of Data URI encoding for an arbitrary attachment, +so you will have to distribute the HTML file with its attachments as separate files anyway: `:data-uri-attachments: false` +only shortens the XML files, it does not make the HTML any different. (In the case of HTML rendering, any attachments +bundled with the file are exported to a folder called `_{document-name}_attachments`.) + +* In order to prevent users inadvertently generating Data URIs too big for a browser to handle, we set the maximum allowed +Data URI size by default to 14 MB (corresponding to a 10 MB media file). If the Data URI needed to represent a media file is +bigger than that, we now abort execution, with a warning that you need to change file configuration, to make sure you know what +you are doing. You can deal with this warning in one of three ways: +** Set `:data-uri-attachments: false` +** Set `data-uri-maxsize` to a byte size big enough to capture your file. (Remember that Data URI encodings are one third larger +than the binary files they encode). So if you have a 1 GB media file, you will need to set `data-uri-maxsize: 1400000000`, +to prevent aborting. +** Set `data-uri-maxsize: 0`, if you want to throw caution to the winds, and have no maximum Data URI size for your document. +In which case, we admire your courage...