-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add crate file size upload limits #794
base: master
Are you sure you want to change the base?
Conversation
I considered testing this via |
FWIW I wouldn't expect File::set_len on most filesystems/platforms to actually commit anything to disk, just adjust metadata. |
According to the docs it allocates zeroed memory, but that doesn’t matter because I can just lower the limit to something small, no idea how I didn’t realize that |
Out of the docs I have, the largest file is
there's a few of those for the different platforms, then it drops down to a handful of ~10-15MB files from other crates. The largest non-src file is
|
Does 500MB sound like a reasonable default per-file limit? |
I'm not sure this belongs in docs.rs. a) For files that are part of the tarball, I think filtering based on file size should happen on crates.io, if at all. In particular, I don't like the idea of having crates that can be published but then will be automatically rejected by docs.rs, that seems like it should need an RFC at the minimum. b) For files that are generated, I'm not sure I see the use case. The files that will be blocked are ... none, unless rustdoc starts generating enormously larger HTML files? In that case why do we need the limit? The only exception I see is files created by build.rs, but I think that should be addressed by limited disk usage during builds, not disk usage during uploading (or otherwise the server itself will run out of disk space before finishing the build). |
The point of the limit isn't to try and save space or to ever actually be set off by rustdoc, it's purely for included files of an absurd size as mentioned in the issue, e.g. a crate that has a massive video file in it. Filtering input files somewhat is part of the 101s of web security, even if Crates.io did this for us, I'd still advocate for implementing this because it's a very simple fix that helps with security. And I don't really think an RFC would be needed at all, as A. The limit is absurdly high and B. It's easily fixable (in the unlikely event that someone triggers it in a meaningful way), as the way I implemented it was similar to other limits implemented on builds, such as memory and build timeouts. This is also a better solution than limiting disk usage, as it allows crates to generate whatever weird, large files they may need (and allowing the build to work perfectly), but then just not uploading said large weird files |
I think that should be handled by crates.io if at all.
That's only relevant if we accept arbitrary uploads. We only accept uploads that were previously accepted by crates.io.
I actually think silently not uploading the file is worse than failing the build because you don't notice it's not there until you try to visit the page. Also, you haven't addressed the concern about running out of disk space during a build (which admittedly is somewhat of a separate issue). |
For the first two comments, I'll restate that even if Crates.io did this for us, I'd still advocate for the change. It's incredibly simple to implement, understand, test, etc (and I've already done it, so it's not a question of future work either), won't affect performance whatsoever and is just general good practice, even though we're getting docs from a trusted source. The actual change itself is a singular line of code, just an
It could be implemented to fail the build as well, but I want to reiterate on the fact that this should never affect a normal crate. Anyone who sets this off is either malicious (and attempting to attack us/crates.io) or at minimum is engaging in terrible practice of making very large build artifacts outside of the
That sounds like a separate docker-based PR, adding a limit to the container itself |
crates.io does indeed have a limit, but we override it for some crates if they legitimate have a need for more space. Here is the current exceptions:
Unfortunately build scripts and proc macros are a thing :(
We will definitely want to fail the build. |
Ok, if we're going to implement this it should be a restriction on the build, not on the uploads, and it should be per-crate, not per-file (or a malicious crate could upload a thousand files one byte less than the max). @pietroalbini it doesn't look like Rustwide currently has an API for limiting the storage space on https://docs.rs/rustwide/0.7.1/rustwide/cmd/struct.SandboxBuilder.html, could that be added? |
The reason I'm so adamant about limiting this during the build is because otherwise we could still have DOS attacks which cause the site to go down when the hard drive fills up; having more storage costs is much less of an issue. |
I think we should have both limits on the disk space used and the uploads. There is no reason to block this PR only because a different limit is not enforced yet. For the default limit, we should actually analyze the S3 inventory and see how much storage crates actually use. |
Do you know a reasonably efficient way to do that? Just querying the size of each file on S3 will take a very long time because we have so many. |
We have the inventory of all the files present in the bucket already. |
Limits the maximum file size for uploaded files, configurable via
Limits
. The current default I have right now is 5 GB, but I just set that because I didn't know what the preferred number would beCloses #588