-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hacky fixes to archive likes #114
base: master
Are you sure you want to change the base?
Conversation
You're a legend!!! Just saved a few thousand posts and likes before everything goes down. You only forgot to define MAX_LIKES there but it works perfectly fine otherwise! Thank you!! |
Woah, wait a sec.. I've heard about the issues Tumblr recently had with its App on Apple's Walled Garden (App Store), but this is new to me:
Is this official? |
Because of course I did. Didn't pay quite enough attention when |
From this link: https://staff.tumblr.com/post/180758987165/a-better-more-positive-tumblr
I'm reading "steps they can take to appeal or preserve their content outside the community" to mean that, eventually --perhaps not on the 17th two weeks from now, but eventually-- the content will be deleted. And yes, this kind of filtering "is not simple at scale." I'd argue it's not possible at scale. There's already plenty that's getting mis-flagged. |
Damn. Thanks. Agreed, I read it in the same way. A real shame. |
A little question from a beginner: |
Be sure to make your blog explicit first in the settings. Somehow, the code works better that way. |
It "works better" because AFAIK a non-explicit blog will not publicly show explicit likes. |
I have checked. I tried again. But it still does not download all my Likes. Maybe I have too many Likes? I really do not understand. |
Huh, yeah i still get rate limited after the first couple hundred likes. i got like 40 gig down so far. Cries |
An idea of how to solve the problem? Please ? |
Did you make the code changes (as seen under Files changed)? |
@Hrxn Lol, manually patching? Why not just clone aggroskater/tumblr-utils? |
Yes, obviously. |
I just added this and it works. Great job! |
00dcc50
to
d16f632
Compare
@aggroskater I used your version, and it works much better for likes than the original, thank you! However, I still only successfully downloaded 5593 of 9626 likes, i.e. I'm missing 4033. While downloading, the program listed 44 times "HTTP Error 403: Forbidden", most of them with URLs referring to the domain I understand that these 44 + 7 + 6 = 57 likes may simply be inaccessible (I tried some of the URLs manually and could verify that), but that accounts only for a tiny fraction of the 4033 likes that were skipped without warning. Is there any chance you can fix this? If I can do something to help, please tell me. |
@allefeld If your behaviour is like what I've seen elsewhere with |
A lot of videos are indeed 'removed', or to be more exact, blocked. That is, |
If only it was possible just the download all of https://www.tumblr.com/likes while logged in via cli, |
I now think these inconsistencies have nothing to do with @bbolli's or @aggroskater's code, but with the extremely weird and unreliable tumblr API. I experimented a bit with the API myself. I went through the list first using the query/next value for the next request, and found it skips over likes. I then used the field I used bbolli's code, aggroskaters's fork, https://github.com/javierarce/tumblr-liked-photos-export, and my own code, and I never arrive at the 9000+ likes, get different numbers of recovered posts each time and on different runs. I'll be turning my back on tumblr soon, and just wanted to get my stuff out before the impending apocalypse. After the both social and technical blunders they commit, I have to say: Good riddance. Sorry for venting. Thank you for your work! |
I made the same thing some months ago but never thought of pulling over here. My implementation is #165 and it resolves the first 2 issues @aggroskater has and more. Hope it is helpful! |
I'll take a stab at incorporating @qtwyeuritoiy's work into my fork. Between the fixes for the first two issues, and the fact that a tag index feature is now upstream --I've already rebased onto latest upstream-- my original issues are resolved. |
I've got the pieces initially merged. It'll take a few hours to do a full grab and then test after liking some other posts. Sidenote, it seems that the "mark as sensitive" feature, or whatever, is... no longer available in the desktop website's settings. I can't find it anywhere. That also might be playing havoc with downloading likes for all I know. Might break down and try the oauth approach at some point tonight/tomorrow. But that's its own can of worms that'll entail pulling in some library that can support oauth1.0a's HMAC signing mechanism on the requests. |
you need to install the youtube_dl python module
…On Sun, Dec 16, 2018, 5:07 PM Orion G. ***@***.***> wrote:
Hey guys, trying this out, and it works great so far, surpassing the ~950
files I get from bbolli's version.
I want to try it with --save-audio and --save-video this time, so I've got
a question for you all: How do I install youtube-dl for Windows? Putting
the .exe from the youtube-dl page in the same folder as the
tumblr_backup.py file doesn't work, (as in tumblr_backup.py throws me an
error saying youtube_dl is not installed,) even though they're both in a
PATH directory.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#114 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAORVyzKnICJetRuOXlUdWlcVdTnqKJQks5u5sQjgaJpZM4Y_tiQ>
.
|
@Soundsgoood, actually you shouldn't have had to reinstall Python.
|
According to Tumblr staff's latest post, things aren't getting deleted quite yet, but hidden from view. Perhaps historical explicit likes are still accessible to the user if logged in? I'll try to implement an oauth approach in the coming days to see if that's the case. |
@aggroskater Yeah, can confirm. Sensitive content is hidden by Tumblr's web front-end (and I assume it is the same for the mobile apps), but the API still returns the same results as before, including URLs to the image files or clips not visible in the browser anymore. |
@aggroskater I made another attempt at saving a few more likes today, but now I'm getting an error message:
(blog name changed for privacy) Did they block your API key by any chance? |
Apparently not... I tried it with my own key and the same thing happens. |
@Hrxn, I tried gallery-dl, which downloaded the photos from my posts just fine, but I can't figure out how to download likes. Can you give me a hint? |
@allefeld Well, gallery-dl determines the selection of the appropriate extractor by matching it against specific URL patterns, this includes support for different variants. I think this should be enough here: |
That's what I thought, but I'm getting
though I used Pity, but thanks! |
Unfortunately, it is not possible to fetch all likes even with OAuth. I tried using gallery-dl to do that. You need to configure it to make it OAuth-enabled. I also modified its code to use OAuth secured API endpoint ( The same likes are also missing from web interface ( Note: I have ~500 likes on tumblr and currently I am able to grab only half of them (and this number diminishes over time). |
Well, I guess it's time to say goodbye to tumblr for good... |
FWIW there exists an unofficial I don't know if it's useful for grabbing likes. |
I get the following error when the script is almost finishing:
|
@sldx12 I was able to reproduce this error. The API must've changed because I know this fork worked before. Basically, the API no longer gives the script a pointer to the next (empty) batch of liked posts, but this fork still expects it to be there (and fails when it isn't). |
Based on PR bbolli#114 by @aggroskater
I'll take a look. I haven't edited my script locally much in the past two years, but it has been working as recently as yesterday. |
I just re-ran my script locally. No issues. Judging by the error message, I wonder if the
The |
@aggroskater
I made a test blog that can reproduce this issue - try |
@cebtenzzre when trying to run your most recent commit I stumble upon the following (I have tried to import it couldn't do it):
I also tried your fix in
|
@cebtenzzre I was able to replicate the issue on the test blog you gave with my version of the script. I confirmed that my local script is basically the same as the one at my fork's master, just with a different API key and an added What you're describing is jogging my memory a bit though. I'm assuming when I first wrote this that the API would always have a I'm going to try again on my own blog after adding only one like. In theory the API should return one page with a single like, no |
Hmmm. I'm guessing that the reason it looks like it's "working" for me is because I was doing an incremental backup of likes, and not a full run. And the like backup is starting from the latest likes and working backward in time. And since I'm doing an incremental backup, I just stop after reaching I'm guessing at some point maybe the API would have given a "yeah, here's the link to 'before the earliest like ever', it's an empty page, have fun" response. And now it doesn't. Regardless, taking off the But it definitely looks like you've done a significant amount of work on the project, @cebtenzzre , so I'll probably look into your fork and/or the original @bbolli repo and see if there's a newer, better way for me to keep clinging to my tumblr likes :P |
@sldx12 You'll have to download my fork as a zip or use |
@cebtenzzre Yeah my bad, I've tried so many stuff that I get confused. I tried your version and it didn't download your likes so I opened a new issue (here). |
@sldx12 Basically, there is no way to get that NameError unless you either deleted the line |
@cebtenzzre Ok nevermind, I tried again. I got the same error as with your PR, only with different numbers. So, if you want to reply here or in the issue I created in your PR it's the same for me, I just want it to work.
|
Based on PR bbolli#114 by @aggroskater
Based on PR bbolli#114 by @aggroskater
Based on PR bbolli#114 by @aggroskater
This doesn't actually have to be merged. It's more for anyone else who's looking to archive their years worth of likes before Tumblr
rm -rf
s them into oblivion.I intended to make it cleaner, but given Tumblr's two week deadline before they delete all NSFW content --and I'm sure plenty of other content that's going to get swept up in this accidentally-- I figured someone might find it useful in its present state.
The code is updated to archive all likes, with a ten second pause between API calls to try and avoid hitting API quotas. The previous code would only get the 1000 most recent likes.
Potential remaining problems:
I was going to get around to those problems "eventually", but my philosophy is that I can also download the json payloads and sort it all out... "eventually".
I've been running this script like so:
Edit: Be sure that the outdir is not where you have your own blog's posts backed up. You'll overwrite the html code. As far as this script is concerned, a backup of a blog's posts, and a backup of a blog's likes, both render to the same HTML layout. So if you want to keep both, you need to separate the two in different paths.
I've saved 70 GiB worth of historical likes this way. Not everything is saved. Some posts caused youtube-dl to choke out (an annoying problem that never goes away given the frenetic update cadence that most video sites seem to adhere to). Some videos that are downloaded don't seem to want to play. And of course some of the liked content has been deleted over the years. But 70 GiB of mostly-good salvage of 30k+ likes is better than 0.
Obviously the likes have to be public, and if you want to gather everything --including stuff that has been NSFW tagged? Or something else? All I know is my original run didn't grab everything-- then the blog will need to, for the time of running the script at least, be marked as containing sensitive content. Or something. I don't remember all of the specifics. Just that a second pass after doing that yielded more results.
Also, I'd advise setting up your own app and using your own API key. Likes can only be iterated over 20 at a time, and if you have tens of thousands of likes, then you could conceivably go over hourly/daily limits. I don't know what quotas the public API key that the script is using has (maybe it's not rate limited?), but it'd probably be best to be a good neighbor and get your own API key to use with these tweaks.