Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a max size limit for JSON-LD VCs #379

Open
OR13 opened this issue Apr 16, 2022 · 23 comments
Open

Define a max size limit for JSON-LD VCs #379

OR13 opened this issue Apr 16, 2022 · 23 comments
Assignees
Labels
post-1.0 This is for issues that are important but should not block 1.0 ready-for-pr

Comments

@OR13
Copy link
Collaborator

OR13 commented Apr 16, 2022

There must be some recommendation we would make on this front.

@mkhraisha
Copy link
Collaborator

this should go to trace interop no?

@OR13
Copy link
Collaborator Author

OR13 commented Aug 16, 2022

We should adopt MongoDB convention, then take some padding and apply this to "Certificate" types and "TraceablePresenations".

@OR13
Copy link
Collaborator Author

OR13 commented Aug 16, 2022

This should happen in the vocabulary, its a data format issue.

@OR13
Copy link
Collaborator Author

OR13 commented Aug 16, 2022

@nissimsan
Copy link
Collaborator

Simple google:

16Mb
As you know, MongoDB stores data in a document. The limit for one document is 16Mb. You can also use GridFS to store large files that can exceed 16Mb.18 May 2020

@OR13
Copy link
Collaborator Author

OR13 commented Aug 16, 2022

Suggest we set 16 MB as the max credential and presentation size limit.

@TallTed
Copy link
Contributor

TallTed commented Aug 16, 2022

TL;DR: We need better justification for taking this action, with clearer presentation of the reasoning behind the limit(s) we're contemplating imposing. Appealing to a debatable "authority" is not sufficient.


I'm wondering why we're imposing one (MongoDB) storage implementation's size limit (which appears not to be absolute, given the comment about GridFS) on VCs and VPs...

This seems specially odd given the likelihood of a CBOR-LD spec to come from the new VCWG. Being a compressed format, CBOR-LD VCs will be able to hold much more data within the same 16MB document size limit than JSON-LD VCs -- and suddenly we've lost the assurance that CBOR-LD VCs can be round-tripped with JSON-LD VCs.

I do not like imposing this arbitrary document size limit, especially because it's based on one implementation's arbitrary (and work-aroundable) limitation. At minimum, I want more justification for imposing this limit on JSON-LD VCs before we do it.

All that said -- This is the Traceability Vocab work item. We are not chartered to impose VC document size limits. Even if we include the Traceability Interop work item, we are still not chartered to impose VC document size limits. Even a recommendation of this sort feels wrong to me, with the current lack of foundational justification.

@OR13
Copy link
Collaborator Author

OR13 commented Aug 17, 2022

See https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

CBOR-LD is not currently used in this document (neither is CBOR).

I don't think document constraints need to be set in stone, but it's wise to test the limits and add safety margin in any engineering system.

@TallTed
Copy link
Contributor

TallTed commented Aug 17, 2022

[@OR13] There must be some recommendation we would make on [a max size limit for JSON-LD VCs].

Why "must" there be?

This really doesn't seem to me like a limitation that is necessary nor even desirable at this stage of the game, if ever, and certainly not in a vocabulary.

It might be relevant for traceability-interop, but I'm not convinced there's a need for this recommendation at all.

[@OR13] See https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

That's a long page, of which it appears that two bullets within a single small subsection may be relevant.

(It would be EXTREMELY helpful if you could provide more specific links in cases like this. Linking just to the whole page says that the time you save by not finding and providing the deeper link is more valuable than the cumulative time all your readers must invest in finding the tiny relevant segment of the linked page.)

Those two bullets:

  • Ensure the uploaded file is not larger than a defined maximum file size.
  • If the website supports ZIP file upload, do validation check before unzip the file. The check includes the target path, level of compress, estimated unzip size.

These are not about imposing limits on the size of files, only about common-sense tests relative to users uploading files to a server of some kind, which can help prevent (though not absolutely eliminate) disk and memory overrun.

Sure, people who are deploying atop MongoDB may want or need to impose a 16MB (decompressed?) filesize limit, or at least know what to do when a submitted file exceeds that size (e.g., fall back to GridFS storage) — but these limits are not relevant if deploying atop Virtuoso or various other datastores, so why should these limits be imposed on those deployers?

@OR13
Copy link
Collaborator Author

OR13 commented Aug 17, 2022

FAT file system The File Allocation Table (FAT) file system is the original file system used by MS-DOS and other Windows operating systems. It is a data structure Windows creates when a volume is formatted. This structure stores information about each file and directory so that it can be located later. The maximum disk partition size is 4 GB. On floppy disks, this is limited by the capacity of the disk. The maximum supported file size on hard disks is 2 GB.

FAT32 file system FAT32 stands for File Allocation Table32, an advanced version of the FAT file system. The FAT32 file system supports smaller cluster sizes and larger volumes than the FAT file system, which results in more efficient space allocation. FAT32 file systems support a maximum partition size of 32 GB for Windows XP and Windows Server 2003. The maximum size file size is 4 GB.

NTFS NTFS, which stands for New Technology File System, is an advanced file system that provides performance, security, reliability, and advanced features not found in FAT and FAT32 file systems. Some of the features of NTFS include guaranteed volume consistency by means of transaction logging and recovery techniques. NTFS uses log file and checkpoint information to restore the consistency of the file system. Other advanced features of NTFS include file and folder permissions, compression, encryption, and disk quotas. You cannot use NTFS on floppy disks due to its limited capacity (Sysinternals has a utility for using NTFS on floppy disks. For more information check out Syngress Publishing's Winternals Defragmentation, Recovery, and Administration Field Guide, ISBN 1-59749-079-2). The maximum supported partition size ranges from 2 TB to 16 TB. The maximum file size can be up to 16 TB minus 16 KB. The minimum and maximum partition sizes vary by the partition style chosen when the operating system was installed.

Is the problem that the limit is too small?

or that you think interoperability is achievable without setting limits?

@TallTed
Copy link
Contributor

TallTed commented Aug 18, 2022

@OR13 -- You're pasting great big chunks of irrelevant material. That doesn't help further your argument.

It especially doesn't help when the size limits discussed in the irrelevant material you choose to quote are a minimum of 2 GB — 125x the 16 MB size limit you initially proposed imposing on JSON-LD VCs.

Even more, you seem not to have considered the reasons for the limits on the file systems the descriptions of which you quoted — which were originally due to the 16bit (FAT) and later 32bit (FAT32) and 64bit (NTFS) numbers used to implement those systems, which were the largest available on the computer systems originally (or theoretically) meant to be supported by those file systems.

Interop may (but does not always!) require setting limits, on document sizes among other things. However, plucking a document size from all-but-thin-air, based only on one data store implementation's limitation (which doesn't appear to limit the size of the user's stored document, only the size of each "document" used by that implementation to store it at the back end, somewhat like a gzip may be broken up into 100 gz.## files, each ~1/100 of the original gzip file size, in order to store that gzip across a number of floppies when you don't have a suitable HDD or similar), with no further justification nor basis you can apparently state, is not a good way of setting such limits.

@OR13
Copy link
Collaborator Author

OR13 commented Oct 11, 2022 via email

@TallTed
Copy link
Contributor

TallTed commented Oct 13, 2022

I don't see the point of your question.

@TallTed
Copy link
Contributor

TallTed commented Feb 28, 2023

Guidance is better than restriction, here.

"Keep your Verifiable Credentials as small as possible, and only as large as necessary."

@brownoxford
Copy link
Collaborator

@OR13 @mprorock discussed on call, suggest guidance - especially in light of possibly moving away from RDF canonicalization.

@OR13
Copy link
Collaborator Author

OR13 commented Mar 1, 2023

@mkhraisha
Copy link
Collaborator

@TallTed says we should have guidance instead of restriction here.

@BenjaminMoe There is a practical limit due to size of RDF canonicalization.

I think a section that says please go as small as possible because of canonicalization times, and outlines best practices on them

@mkhraisha mkhraisha self-assigned this Aug 1, 2023
@TallTed
Copy link
Contributor

TallTed commented Aug 2, 2023

@msporny — You wanted to comment on this.

@mprorock
Copy link
Collaborator

mprorock commented Aug 3, 2023

@brownoxford on interop a hard max is good idea - very common to do so at the api side

I agree with the 16mb suggested above as safe- that is likely too large for LD + RDF canon - so we will want a much smaller size max there to avoid potential denial of service around verification etc

@mprorock
Copy link
Collaborator

mprorock commented Aug 3, 2023

@brownoxford i personally think that we should ban RDF processing prior to signature verification (e.g. no LD proofs) in the future for security concerns, but I would like to see where standardization in the vc2.0 working group lands before we give any guidance in this regard.

@OR13
Copy link
Collaborator Author

OR13 commented Aug 4, 2023

I also agree the profile should not endorse RDF processing prior to sign or verify.

I think its fine to do RDF or schema processing after you check the signature, or before you issue a credential, as long as the processing is not "part of the proofing algorithm".

@TallTed
Copy link
Contributor

TallTed commented Aug 4, 2023

I also wonder about setting "a max size limit for JSON-LD VCs" in the Traceability Vocab, rather than in the VCDM or VC Data Integrity spec. This just seems the wrong place for it.

@OR13
Copy link
Collaborator Author

OR13 commented Aug 4, 2023

@TallTed I think it would be wise to set a max here: https://github.com/w3c/json-ld-syntax

and then let profiles (like this repo), further restrict the allowed size of conforming documents.

@mkhraisha mkhraisha added the post-1.0 This is for issues that are important but should not block 1.0 label Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
post-1.0 This is for issues that are important but should not block 1.0 ready-for-pr
Projects
None yet
Development

No branches or pull requests

6 participants