Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow parsing utf8 filenames #15

Open
5c077yP opened this issue May 26, 2017 · 1 comment
Open

allow parsing utf8 filenames #15

5c077yP opened this issue May 26, 2017 · 1 comment

Comments

@5c077yP
Copy link

5c077yP commented May 26, 2017

Hey there, first thanks for this great library!

I can see that this library supports to generate the content-disposition header from a utf8 filename and allows parsing utf8 encoded filename from the header. But I can see that my browser (chrome 58) when uploading a file with a utf8 filename does not do a proper utf8 encoding of the filename (it looks like: Content-Disposition: form-data; filename="ö").

My current use-case is: I'm using this library as a middleware to parse the Content-Disposition header to upload a file to AWS-S3 and set the disposition header there as well. Now the parsing throws a "Invalid Format" exception. I feel it would be great if the lib would just accept this and can generate a valid utf8 encoded header when doing:

const parsed = contentDisposition.parse(headers['content-disposition']);
const header = contentDisposition(parsed.params.filename, { type: parsed.type });

Would do you think about it?

@dougwilson
Copy link
Contributor

Hi @5c077yP yes, right now this module only supports the HTTP header Content-Disposition; it does not support the MIME headers that are contained within multipart objects like multipart/form-data. The main reason is that the HTTP header actually has a specification and clients and browsers follow it, while the MIME header of the same name within multiparts have no specification to even make an implementation against.

I have sat down a few times to read the source code for Chrome & Firefox to deconstruct their obscure ways they are encoding the file names, but never really finished (of course, Safari and IE you'd have to do experimentally). What I generally found was that some of them just put raw UTF-8, IE seems to put raw whatever the user's OS encoding is set to (latin1 for US, big5 for Japan, etc.), some will also url-encode certain characters, but not others, and then seems like when you see a %20 it could be a space or an actual literal %20 in the file on the user's computer and you can't tell.

I was tracking trying to implement this with #3 but getting something that would actually work with any browser, any computer, and any character in the file name seems to be a very difficult task, if you want to lend a hand. The issue is, though, simply accepting raw utf-8 would work for the specific case you ran into, but then you're likely to continue run into each of the issues I listed above, so really I want to either support multipart (future) well or not support it at all (current).

I just haven't really had a lot of motivation to work on the true multipart support, so yea, I guess second invitation to help out here :) My ideal thought is to just implement the reverse of the open source browsers' implementations and then experimentally figure out the closed-source browsers and bundle all that together as a "multipart mode".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants