-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spotlight search using data in MySQL CNID DB #1409
Comments
The complexity / fragility argument I definitely buy. It's a lot of scaffolding for this one feature. I wonder though if the metadata search can be tweaked to make it more generally useful? Do you have examples from the wild of "useless" metadata filtering? Regarding the MySQL proposal: Do you have a POC lined up already that demonstrates how this can be set up with the existing CNID backend? It's not immediately obvious to me how the CNID backend can be plugged in as a Spotlight backend as-is, but perhaps it is straight-forward. As a counter-argument to using MySQL for this purpose: We still arguably end up with a "difficult" solution since setting up and administering a MySQL database is non-trivial for most users. Can a simpler database be used for this, e.g. SQLite? Finally, as yet another path forward: Build a fully indexed Spotlight backend using Elasticsearch. This is what Samba has done. And since Samba and Netatalk share DNA, it might even be fairly easy to port over their solution. Perhaps an overload of options in one comment. My apologies. :) |
This already exists for dbd afaict
|
Hi @rdmark, I think you are underselling how easy Netatalk makes it :) Thanks to all the hard work by the Netatalk contributors, getting the MySQL backend working is simple. For example, for a production setup on FreeBSD and ZFS pool with NVMe Special Device/Vdev; Create a ZFS Vol for MySQL data dir;
If you don't have a special device vdev, remove
Install MySQLI personally recommend not changing the default MySQL data dir, as changing the default location can be painful, instead just mount the ZFS volume over the top of /var/db/mysql as above with
Run insecure setup script and set root password
Connect to MySQL and create cnid DB.
Configure Netatalk to use MySQLArguably the hardest part about this is the afp.conf config; https://netatalk.io/3.2/htmldocs/configuration - says very little, and does not mention anything about how Netatalk does the hard work for you of creating the DB Schema etc. https://netatalk.io/3.2/htmldocs/afp.conf.5 - lists the 4 Global options under Miscellaneous Options, but does not give you any hints that you also need an associated setting for each of the volumes.. (this caught me out for ages). Ie, to search for /usr/local/etc/afp.conf
Start/Restart Netatalk
Netatalk will create the required MySQL tables and schema automatically :) Rebuild/populate the CNID DB on MySQL
https://netatalk.io/3.0/htmldocs/dbd.1 With the default CNID backend, this takes several hours for a share with a few million objects. Regarding "We still arguably end up with a "difficult" solution since setting up and administering a MySQL database is non-trivial for most users." - I don't see this, if a user can setup a Linux file server with Netatalk they could also setup MySQL as above. For example, the default tuning values for MySQL 8+ allow for a Netatalk share with several million files, long before you have to increase any memory limits, or do any fancy tuning. And even if you do, there are tools like To be fair, you might have to fiddle with the boot scripts to ensure that the ZFS Vol is mounted before MySQL starts, and that Netatalk starts last after MySQL. And users should probably have MySQL listen on 127.0.0.1 only. Regarding user cases; I would guess that most users are using Netatalk for media file shares, where filename search is more useful. Simple search for simple file share. This is what brought me to the idea of just using the MySQL backend for the simple case. It already has all the filenames, it is fast, and it is updated in realtime (as soon as a file is created (via Netatalk), it can be found in the MySQL backend).. NB; For files added to the share volumes not via Netatalk, I run the Regarding a POC; Would you like me to play with some example SQL queries against the cnid DB? Thanks for your consideration |
Looks like it could be as simple as; Ie, Substituting '*' wildcard in Spotlight query for '%' should do the trick and is case insensitive too 🙂 There is no index on the Name field (obviously), so it should be O(n) time relative to table length. Keen to hear your thoughts? |
Impressive research! You say "simple", and then proceed with 3 screens worth of instructions. ;) Joke aside, I get that it's not "hard" per se, if you're familiar with operating a system like this. A good chunk of Netatalk users today are hobbyists who often don't have sysadmin experience, which is why I'm trying to streamline the installation process as much as possible. The MySQL backend has a documentation deficit, for certain. Would you be open to filing a PR that adds the example, and any additional context that you think would help? The docs are defined in XML format here: https://github.com/Netatalk/netatalk/tree/main/doc I do love the real-time updates and fast search of your proposed solution. It's definitely more practically useful than Tracker. What would be the next step? |
Haha, I knew you would make a comment about the length of my example. 😉 It started out short.. Absolutely! I would be happy to build a documentation PR. I can provide some narrative about the auto-setup goodness, and a complete afp.conf example. The next steps are tricker for myself, as I have not written any C for over 20 years.. |
I am frankly an amateur myself but am of course offering to assist to the best of my ability. In fact, I would say that the last 3 years of this project has been characterized by a bunch of non-Cdevs (re)learning C and figuring things out as we go along. :) |
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
I have seen very few, if any, truly useful cases in production of searching via the metadata that Tracker provides. The vast majority of searches are just filename searches, and the metadata often just adds noise to the results. As a result you often have to configure the tracker to not scan for extra data anyway.
Being able to have a simple name only search using the existing MySQL would be fast, simple, stable and efficient.
It would also allow for instant Spotlight searches without having to wait for the index scans to catch up.
The text was updated successfully, but these errors were encountered: