Skip to content

Commit

Permalink
Blog on Max Data URI: #822
Browse files Browse the repository at this point in the history
  • Loading branch information
opoudjis authored and ronaldtse committed Oct 14, 2024
1 parent 21e5961 commit be91237
Showing 1 changed file with 58 additions and 0 deletions.
58 changes: 58 additions & 0 deletions _posts/2024-10-05-max-data-uri-size.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
layout: post
title: "Maximum Data URI size"
date: 2024-10-05
categories: documentation

authors:
- name: Nick Nicholas
email: [email protected]
social_links:
- https://github.com/opoudjis

excerpt: >-
Metanorma images are by default encoded within the generated XML file as Data URIs. In order to prevent processing
problems, they are also by default constrained to 10 MB in size.
---

Images, audio files, and video files are by default encoded in Metanorma as https://en.wikipedia.org/wiki/Data_URI_scheme[inline Data URIs]:
rather than referencing an external file for the image, the documents generated by Metanorma (including the XML file
that it takes as its starting point) represent the image inside of the file, as a (very long) URI.
The same is done (though as a an XML element rather than a URI) with the potentially even longer representation
of file attachments, which Alex Dyuzhev recently wrote about in link:/_posts/2024-08-20-pdf-attachments/[PDF Attachments].
(Attachments are just as valid for HTML as for PDF output.)

There is an advantage to this internal representation of files,
for distributing Metanorma documents: if you generate an HTML document, you can
send it somewhere else as a single file, without needing to take care of the separate media files or file attachments it invokes.
After all, you already do so for Word documents and for PDFs.

There is a disadvantage to doing this, if the media file becomes so big that software starts having trouble
with processig those URIs. Browsers think nothing of a URI 100 KB or 1 MB large; but by the time the URI
needs to represent a video file 100 MB or 1 GB in size, as we have found, bad things start happening.

To prevent bad things happen, we have put the following safeguards in place:

* First of all, the default to represent media files as Data URIs can be turned off, by setting the document attribute
`:data-uri-image: false`. If you do so, then the media files in your document are referenced, in the Metanorma XML files and the HTML output,
as links to those external files, rather than bundling them inside the file. In that case, it is the Word and PDF
outputs that need to convert the media files into internally bundled representations. And you will need to take care
to include those media files when you upload the generated HTML file anywhere.

* You can do the same with file attachments, through `:data-uri-attachments: false`. In that case, again, any file attachments
will be referenced as links, rather than bundling them inside the file, and you will need to handle them the same way you handle
attachemnts. The catch is that, unlike media files, HTML cannot make sense of Data URI encoding for an arbitrary attachment,
so you will have to distribute the HTML file with its attachments as separate files anyway: `:data-uri-attachments: false`
only shortens the XML files, it does not make the HTML any different. (In the case of HTML rendering, any attachments
bundled with the file are exported to a folder called `_{document-name}_attachments`.)

* In order to prevent users inadvertently generating Data URIs too big for a browser to handle, we set the maximum allowed
Data URI size by default to 14 MB (corresponding to a 10 MB media file). If the Data URI needed to represent a media file is
bigger than that, we now abort execution, with a warning that you need to change file configuration, to make sure you know what
you are doing. You can deal with this warning in one of three ways:
** Set `:data-uri-attachments: false`
** Set `data-uri-maxsize` to a byte size big enough to capture your file. (Remember that Data URI encodings are one third larger
than the binary files they encode). So if you have a 1 GB media file, you will need to set `data-uri-maxsize: 1400000000`,
to prevent aborting.
** Set `data-uri-maxsize: 0`, if you want to throw caution to the winds, and have no maximum Data URI size for your document.
In which case, we admire your courage...

0 comments on commit be91237

Please sign in to comment.