Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Person name + instance id is not a unique constraint #5045

Open
5 tasks done
MrKaplan-lw opened this issue Sep 23, 2024 · 8 comments
Open
5 tasks done

[Bug]: Person name + instance id is not a unique constraint #5045

MrKaplan-lw opened this issue Sep 23, 2024 · 8 comments
Labels
area: federation support federation via activitypub bug Something isn't working

Comments

@MrKaplan-lw
Copy link

MrKaplan-lw commented Sep 23, 2024

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Do you agree to follow the rules in our Code of Conduct?
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

Various places assume that the combination of instance_id (typically used as instance domain, but on DB level this would be the instance ID field) and username is a unique set, similar to the actor ID, but this is not actually enforced and can lead to duplicate persons in the DB.

Steps to Reproduce

Unknown at this point, was likely due to incorrect configuration on the remote side generating HTTP URLs instead of HTTPS URLs for actor IDs temporarily.

Technical Details

0.19.3 does not have a unique constraint for the combination of instance_id + name on the person table.
As far as I can tell this is the same in 0.19.5, as the list of indexes is the same for both versions:

Indexes:
    "person__pkey" PRIMARY KEY, btree (id)
    "idx_person_actor_id" UNIQUE CONSTRAINT, btree (actor_id)
    "idx_person_local_instance" btree (local DESC, instance_id)
    "idx_person_lower_actor_id" UNIQUE, btree (lower(actor_id::text))
    "idx_person_lower_name" btree (lower(name::text))
    "idx_person_published" btree (published DESC)
    "idx_person_trigram" gin (name gin_trgm_ops, display_name gin_trgm_ops)
lemmy=# select person.id, name, instance_id, actor_id from person join instance on instance.id = person.instance_id where name = 'Badabinski' and instance.domain = 'kbin.earth';
   id    |    name    | instance_id |            actor_id
---------+------------+-------------+---------------------------------
 9532845 | Badabinski |        6269 | http://kbin.earth/u/Badabinski
 8299094 | Badabinski |        6269 | https://kbin.earth/u/Badabinski
(2 rows)

Comments by the user are still showing up on lemmy.world, such as https://lemmy.world/comment/12501998 and also when looking at the post, but they are not associated with the person that is returned when looking at https://lemmy.world/u/[email protected].

See also https://lemmy.world/api/v3/user?person_id=8299094 and https://lemmy.world/api/v3/user?person_id=9532845.

Version

0.19.3

Lemmy Instance URL

lemmy.world

@MrKaplan-lw MrKaplan-lw added the bug Something isn't working label Sep 23, 2024
@MrKaplan-lw MrKaplan-lw changed the title [Bug]: Person name + instance id is not an unique constraint [Bug]: Person name + instance id is not a unique constraint Sep 23, 2024
@Nutomic
Copy link
Member

Nutomic commented Sep 24, 2024

Lemmy should never fetch anything from an HTTP URL in production mode, this is checked here. And here is the relevant fetch code used by Lemmy 0.19.3 which correctly calls the check.

It seems like this was caused by a misconfiguration on kbin.earth:

  • Either the instance wrongly redirected federation fetches from HTTPS to HTTP
  • Or when fetching data via HTTPS, it would return json containing an HTTP url (a check for this was added in Lemmy 0.19.4)

Can you find out if there are any users from other instances with the same problem? Or ask the admin of kbin.earth if there was such a misconfiguration in the past? At the moment it looks correct.

@MrKaplan-lw
Copy link
Author

I talked to jwr1 on Matrix and there were some config issues a while back related to HTTP/HTTPS and certificates, and during the troubleshooting/mitigation attempts there may have been responses like that indeed.
These duplicated persons are not occurring for recently registered kbin.earth users that signed up after those issues.

With this query I can actually find quite a few more affected users, showing a total of 210 rows:

with duplicate_users as (
    select lower(name) as name, domain
    from person
    join instance on instance.id = person.instance_id
    group by lower(name), domain
    having count(*) > 1
)

select person.id, actor_id, person.name, instance.domain
from person
join instance on instance.id = person.instance_id
join duplicate_users on lower(person.name) = duplicate_users.name and instance.domain = duplicate_users.domain
order by domain asc, name asc, person.id asc

I can see the following types of duplicates in our DB:

@Nutomic Nutomic added the area: federation support federation via activitypub label Sep 24, 2024
@Nutomic
Copy link
Member

Nutomic commented Sep 24, 2024

Most of these should be fixed by the url equality check I linked above. The remaining ones all seem to be caused by instances or users being deleted and then recreated. Though its also possible that a platform would allow multiple local users with the same name as long as they have different urls.

Can you point out where the assumption is made that (name, instance_id) is unique? That seems to be the real problem, but I cant think of any such case.

@MrKaplan-lw
Copy link
Author

The prime example would be lemmy referencing profiles of remote users as name@domain.
IDK is AP allows non-unique names on the same domain, if it does then it seems to be a rather bad idea (yet something done by most AP software I think?) to be referencing users as @user@domain, in Lemmy URLs as /u/user@domain, as there is no way to determine which user this would match.

@Nutomic
Copy link
Member

Nutomic commented Sep 24, 2024

True, that is a simplification to improve usability. Lemmy itself has duplicate usernames when seen from other platforms, as you can have both actors /u/test and /c/test with the same name. Activitypub only specifies that the id url is unique (section 3.1), but the preferredUsername has no uniqueness guarantees (section 4.1).

The only solution I can think of would be to reference remote users by the actor_id, eg /u/social.graves.cl/users/98vargwkp7 or using the db id like /u/[email protected]. But both of those look ugly and will be confusing for new users.

@dessalines
Copy link
Member

If this isn't possible in newer versions anyway, maybe instead of adding a constraint, we could do a migration that uses your query above to delete the duplicates (we've had to do migrations to do that before).

It could potentially be dangerous adding (person name + instance_id) uniqueness, since as @Nutomic said, person name doesn't have to be unique, only the actor_id.

@Nutomic
Copy link
Member

Nutomic commented Sep 25, 2024

A database constraint is not an ideal solution for federation because there can legitimately be multiple accounts with the same name and with their own post histories (eg instance deleted and recreated). It would then be impossible for these accounts to federate with Lemmy. However this is an uncommon case and we may decide not to support it. So then we would add the unique(name, instance_id) constraint and rename duplicate accounts.

This is essentially the same issue as #5052 which other Fediverse platforms experience with Lemmy, as Lemmy can have user and community with the same name. So to be consistent we should resolve that in the same way, by prohibiting creation of users and communities with the same name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: federation support federation via activitypub bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants