Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Table Structure #10

Open
harnen opened this issue Apr 3, 2021 · 2 comments
Open

Topic Table Structure #10

harnen opened this issue Apr 3, 2021 · 2 comments
Assignees

Comments

@harnen
Copy link

harnen commented Apr 3, 2021

One of the most fundamental questions that we have is the structure of the topic table. We can assume that a topic table will have a large, fixed size (~50k). We should:

  1. fully utilize the available space
  2. provide fairness across multiple registrant and topic
  3. protect against all kinds of spamming attacks

The main tool that we have to our disposal is the waiting time that we return to registrars. By returning longer or shorter waiting times, we sort incoming requests and decide on the ads that are coming in first. Our initial discussion was to decide between fixed and non-fixed per-topic limit (e.g., we won't accept more than N registration for specific topic), but it seems that there are more things to take into account. Fixed limit doesn't satisfy 1) and 3) if an attacker uses multiple topics. With non-fixed, we have to be careful to provide 2) and avoid starvation of less popular topics.

Another issue to solve is assigning the waiting time in a way that prevents all the registrant, who are waiting to get in, from coming at the same time. If all the registrants do come at the same time (e.g., when a first slot frees up), we can admit only one of them, while the rest came for nothing generating a lot of overhead. On the other head, if we schedule only one node to come at the first available slot (and the rest one by one at later slots), the node might disappear leaving the table underutilized. It also seems difficult to fill this "hole in the queue". Either we contact all the waiting nodes and tell them to come one slot earlier (probably unfeasible and requires keeping a lot of state on the registrar) or we admit without some node that comes to ask for the ticket for the first time (may be exploited by free raiders that will constantly ping registrars hoping to get in without waiting). Finally, one last solution is to schedule more nodes at slots opening at the end of the queue. This will safely fill "the whole", but with a delay (we have to wait until all the initial nodes get in).

Currently, we are leaning towards a 2 step approach for assigning waiting times:

  • determine the order by which we want to admit requests based on the diversity score - the more different the request (in terms of IP, topic, ID) is from what we already have in our table, the higher the score (see details here)
  • assign slots freeing up based on the order from above

If our table is full, but 3 ads are about to expire in 7s, 12s, and 18s; we'll order all the waiting requests by the diversity score and tell the first one to come in 7s, the second one in 12s and the third one in 18s. This should work, but has one major problem - what if there is a queue of 100 waiting registrants with already assigned slots, but suddenly, there are some new nodes coming, with high diversity score, that should be admitted before everyone else? This is exactly the same problem as filling the whole in the paragraph above.

One final question is "when should we start giving tickets with waiting times instead of admitting registrants right away?". Currently, we give tickets only when the queue is full. However, this leads to a straightforward attack, where a malicious nodes spams requests and will be admitted every single time without waiting. The diversity score will kick in when the table is full and put all the legitimate nodes at the beginning of the queue (pushing the attacker at the end), but at this time, the damage has already been done. It will take long time for all the malicious ads to expire and the attacker can repeat his actions as soon as there's no queue of waiting registrants. We probably have to switch to giving ticket even if the table is not full. It causes an additional round-trip, but it's probably worth it.

@harnen
Copy link
Author

harnen commented Apr 6, 2021

Some comments after discussing with Etienne:

  1. If we start giving waiting times before the table gets full, it give us a buffer to work with
  2. There's a chance that new requests getting at the beginning of the queue and disappearing registrants will even up
  3. The waiting time should be based on table occupancy + request diversity score
  4. We might use the initial walk used to join the DHT to bootstrap our table with some registrations (e.g., by asking each encountered node for its topics)
  5. The cost of entry for topic announcements from the same IP (or announcements for the same topic) be mathematically designed to increase polynomially (i.e., not linearly) with the number of entries currently in the table for the same IP/topic
  6. We need to make sure that we don't fully block users who share the same data centre on AWS (and thus IP poll) from registering

@harnen
Copy link
Author

harnen commented Apr 21, 2021

We should prevent nodes from keep coming back hoping for better waiting time. I.e., we should guarantee that the same node, coming back multiple times, won't be admitted faster than initially indicated.

In the current approach this can happen: if a node comes initially it receives an initial waiting time based on the current state of the table. After a few second multiple ads present in the table expire thus lowering average waiting time for newcomers. So the node can come again and again hoping for some ads expire.

We can take into account expiring ads. However, there might be also other registrant that could get admitted faster than node in question (thus inflating the waiting time of the original registrant).

Another approach is to make sure that the initial node, presenting itself as a new node and asking for a new waiting time, looses more on abandoning his original cumulative waiting time then he can get from other ads expiring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants