Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make wget automatically determine the file name. #536

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Luca0208
Copy link
Contributor

@Luca0208 Luca0208 commented Apr 4, 2018

Sometimes when using wget it is pretty annoying having to write the file name again.
This PR makes that unnecessary by automatically determining the file name.
This is done by stripping all trailing slashes and afterwards taking everything after the last remaining slash.
This is either the domain name if used like this: https://domain.tld or the file name if used liked this: https://domain.tld/folder/file.ext

wget will try to determine the file name by:
1. Stripping all trailing slashes
2. Take everything right of the last remaining slash, meaning either the full domain name or the file or foldername on the server.
@SquidDev
Copy link
Contributor

SquidDev commented Apr 4, 2018

Thank you! This is something I've been meaning to do for a while. Instead of using a loop, it might be a little cleaner to use gsub and match though:

-- Trims everything after ? or # (means foo#bar or foo?bar=qux reduce to foo), 
-- then trims trailing `/`
sUrl = sUrl:gsub( "[#?].*", "" ):gsub( "/+$", "" )

-- Find everything from the last / to the end of the string
return sUrl:match( "/([^/]+)$" ) or sUrl

Note I've only done some limited testing, so this may be more susceptible to breaking.

@Luca0208
Copy link
Contributor Author

Luca0208 commented Apr 4, 2018

foo#bar should be reduced to bar, however I'm not sure if foo?bar=s should be reduced to foo.
Sometimes these URL Parameters matter.

As for the pattern I tested them against a bunch of edge cases and they seem to work.
The or sUrl in the last line isn't even needed because the pattern will match the / of "http://" or "https://" which is required for a valid URL.

@dmarcuse
Copy link
Contributor

dmarcuse commented Apr 4, 2018

@Luca0208
Copy link
Contributor Author

Luca0208 commented Apr 4, 2018

That's a good point. I don't have access to CC in MC right now, but both CCEmuX and CCEmuRedux also emit ? in filenames, = seems to be allowed(However I'm on a mac right now, not sure how it behaves on windows)

@dmarcuse
Copy link
Contributor

dmarcuse commented Apr 4, 2018

On NTFS (the most commonly used Windows filesystem) it will fail.

@Luca0208
Copy link
Contributor Author

Luca0208 commented Apr 4, 2018

Ok it will now strip anchors and url parameters and I also applied the suggestion of SquidDev to use patterns and gsub/find

Copy link
Contributor Author

@Luca0208 Luca0208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any opinions on this?


local function getFilename( sUrl )
sUrl = sUrl:gsub( "[#?].*" , "" ):gsub( "/+$" , "" )
return sUrl:match( "/([^/]+)$" ) or " "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the or " " should be here or further down(See comment on line 52)

@@ -43,7 +48,7 @@ end

-- Determine file to download
local sUrl = tArgs[1]
local sFile = tArgs[2]
local sFile = tArgs[2] or getFilename(sUrl)
local sPath = shell.resolve( sFile )
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced by shell.resolve( sFile or " ")

@SquidDev
Copy link
Contributor

SquidDev commented Apr 5, 2018

On my machine (Windows), fs.exists(" ") always returns true, as it's parsed as a directory. Which means wget google.com will result in File already exists instead of URL malformed.

I think it would be better to move the http.checkURL call before the file name extraction. This way one doesn't need the or " " at all, which makes it substantially less ugly. It might produce a tiny bit of delay, as this may require a DNS lookup, but it should be imperceptible (<50ms).

@Luca0208
Copy link
Contributor Author

Luca0208 commented Apr 5, 2018

Ok in that case I will put the http.checkURL before the filename extraction.
Edit: Newest commit should fix this

@Luca0208
Copy link
Contributor Author

Luca0208 commented Apr 5, 2018

I just noticed that the help page of wget should be changed to reflect this change. Going to do that this evening.

--Check if the URL is valid
local ok, err = http.checkURL( sUrl )
if not ok then
if err then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'd be worth doing printError( err or "Failed. ") instead. Though err shouldn't ever be nil, so neither is really needed.

Copy link
Contributor Author

@Luca0208 Luca0208 Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is done it would probably be better to use "Invalid URL" or something like that instead of "Failed." to distinguish it from the "Failed." message caused when the download failed.
Edit: But you are right in any way the if is not needed. Just copied the checking function from the original

ccserver pushed a commit to ccserver/ComputerCraft that referenced this pull request Sep 16, 2019
Make wget automatically determine the file name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants