Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to update creation dates. #61

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ItsKaa
Copy link
Contributor

@ItsKaa ItsKaa commented Nov 9, 2024

Maybe a bit specific to my personal preferences, so feel free to decline.

Added a script that updates the creation dates in the database since there's no API call for this. The media uploader writes a simple timestamp file, that way the upload process won't be affected if the database connection doesn't work. Had to forward some file-paths through because otherwise we only had access to the raw file-data.

To get the correct modified time in the files, you'd need to set "mtime": true in the gallery-dl config file, but it will first attempt to read the metadata file.

I'm running on Python 3.13 myself, so I also updated some packages to be compatible. It still runs fine in docker too.

@reluce
Copy link
Owner

reluce commented Nov 30, 2024

Interesting, just so I follow correctly: We're writing for each uploaded file a timestamp file to a configured directory with the timestamp in metadata from gallery-dl (assuming when the file was created on the remote URL). If no metadata is set, we're using the modified date from the file itself.

And when calling the script update-db-timestamps, it reads the timestamp files from the configured directory and updates the timestamps in the database?

So the intention is to "mirror" the creation date from the remote source? And this would reshuffle only posts which were uploaded with upload-media and if the options were set in config.toml?

@ItsKaa
Copy link
Contributor Author

ItsKaa commented Nov 30, 2024

Interesting, just so I follow correctly: We're writing for each uploaded file a timestamp file to a configured directory with the timestamp in metadata from gallery-dl (assuming when the file was created on the remote URL). If no metadata is set, we're using the modified date from the file itself.

And when calling the script update-db-timestamps, it reads the timestamp files from the configured directory and updates the timestamps in the database?

So the intention is to "mirror" the creation date from the remote source? And this would reshuffle only posts which were uploaded with upload-media and if the options were set in config.toml?

Precisely. Modified time obviously won't mean much for import-from-url if --mtime isn't used (or the config file variation -- see #62)., for upload-media we likely only rely on the modified date unless if someone manually used gallery-dl with metadata options (or what I did, manually uploading failed downloads due to random crashes, though that hasn't occurred to me recently with #60).

Szurubooru sorts on id by default but when you use sort:date you'll get the mirrored creation dates.

This affects both import-from-url and upload-media because import_from_url.py goes into upload_media.main, which is handy, this is also the reason for the src_path change. I've been running it for over a week now and it's been working well.

One thing that does come to mind is that the database must be able to accept the timestamps under ISO-8601 format, I believe it always should but maybe certain locales may cause problems, if anything does go wrong because of it though then the errors will only show up during the update-db-timestamps call.

@reluce
Copy link
Owner

reluce commented Dec 1, 2024

I see, so the sorting is being affected only when explicitly sorting for date.
The usecase is indeed quite niche, but I think it's great for those who want that feature.

The thing is, writing a timestamp file to disk and reading from it later on to apply it works, but seems quite unintuitive on the user side.

What I'm thinking: You can supply a query to that update script and the script would check the sources from each post returned from the query, fetch the creation date from there and update it then in the database. In utils.py you can use search_boorus for that. Usually the ID of the post is in the source URL, so you can use that for the search query.

Also, when using import-from-url, import-from-booru or auto-tagger, you can trigger the script/module as well (with a CLI/config flag) after the post has been uploaded, as the date is almost always present in metadata anyway.

With this, the user only need's the specify the DB credentials and if he wants to go that extra step of updating the timestamps. And he'd also have that "backwards" compatibility for posts that were already uploaded.

@ItsKaa
Copy link
Contributor Author

ItsKaa commented Dec 1, 2024

That doesn't sound like a bad idea to me. My initial use-case was for a newly built instance, so I wasn't too worried about existing posts at the time, but I do agree that it would be cleaner to handle existing posts as well.

I'll convert this into a draft then. I'm not entirely sure when I'll have the time to update this either but like you said, it's a niche feature anyway.

@ItsKaa ItsKaa marked this pull request as draft December 1, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants