Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into the "create NULL values with JOIN using count(*)" difference attack #164

Open
yoid2000 opened this issue May 21, 2021 · 3 comments
Assignees

Comments

@yoid2000
Copy link
Collaborator

Cristian raised the following difference attack:

I was under the impression this attack was already described in another discussion. But to be more explicit:

We have 2 tables: purchases, with columns product_id and client_id, and clients, with columns id and ssn.
Column ssn is labeled as the AID. Assume the following data is present:

Clients:

id ssn
1 a
2 b
3 c
4 d
5 e
6 f

Purchases:

product_id client_id
1 1
2 2
3 3
4 4
5 5
1 6
2 6
3 6
4 6
5 6

If we issue a query like:

select count(*) from purchases join clients on client_id = clients.id

we get the following input to the count aggregator:

count ssn
1 a
1 b
1 c
1 d
1 e
5 f

If Ne=1 and Nt = 2, this results in a flattening of 4. The output will be 10 - 4 + noise(1) = 6 + noise(1).

For the query:

select count(*) from purchases left join clients on client_id = clients.id and ssn <> 'f'

we get the following input to the count aggregator:

count ssn
1 a
1 b
1 c
1 d
1 e
5 NULL

The last row is put aside as unaccounted.
If Ne=1 and Nt = 2, this results in a flattening of 0 for accounted rows. The output will be 5 - 0 + 5 - 0 + noise(1) = 10 + noise(1).

The difference in the results is large enough to dominate the noise, allowing us to know the entity with ssn = 'f' is an outlier and to detect other attributes of it by conditional inclusion/exclusion in queries.

@yoid2000 yoid2000 self-assigned this May 21, 2021
@sebastian
Copy link
Contributor

sebastian commented May 26, 2021

The last row is put aside as unaccounted.

The last row is not put aside as unaccounted for. The unaccounted for business only applies if there is another AID for us to attach the information to, which in this case there is not.

The output will be 5 - 0 + 5 - 0 + noise(1) = 10 + noise(1).

The result, in this example, would be: 5 + noise(1).

Presumably there would additionally be a noise layer associated with the negative condition excluding ssn f, or LED compensating for the low effect exclusion of AID values.


If you can exclude a victim, then this is clearly not safe...

I think a premise for this attack to work would be if we have a query that meets the following criteria:

  • with multiple AIDs (this results in the unaccounted for functionality kicking in) where at least one of the AIDs has an AID value for the offending row
  • a condition or query construct that excludes the victim that isn't detected or compensated for by LED.

Let's alter the above example such that both tables have an AID to make this attack work. TL;DR: The difference is now between 6 + noise(2) in the first query, and 10 + noise(2) (and presumably extra noise layers?) in the second query.

The tables would be as in the original example, only with product_id being an AID as well.

Clients:

id ssn (AID)
1 a
2 b
3 c
4 d
5 e
6 f

Purchases:

product_id (AID) client_id
1 1
2 2
3 3
4 4
5 5
1 6
2 6
3 6
4 6
5 6

Now the query select count(*) from purchases join clients on client_id = clients.id would yield:

Anonymized based on ssn:

ssn count Flattening by AID value
f 5 4
a 1
b 1
c 1
d 1
e 1

i.e. a count of 10 - 4 + noise(1) = 6 + noise(1)

Anonymized based on product_id:

product_id count Flattening by AID value
1 2 0
2 2
3 2
4 2
5 2

i.e. a count of 10 - 0 + noise(2) = 10 + noise(2)

The final anonymized result would be: 6 + noise(2)

For query select count(*) from purchases left join clients on client_id = clients.id and ssn <> 'f' would produce:

Anonymized based on ssn:

ssn count Flattening by AID value
a 1
b 1
c 1
d 1
e 1
unaccounted for 5

i.e. a count of 5 + 5 + noise(1) = 10 + noise(1)

Anonymized based on product_id:

product_id count Flattening by AID value
1 2
2 2
3 2
4 2
5 2

i.e. a count of 10 + noise(2)

The final anonymized result would therefore be 10 + noise(2)

@cristianberneanu
Copy link
Contributor

The example with a single AID was for illustrative purposes.
In many joins, there would be 2 or more AIDs, with the ones on the left side being non-null.
Plus, if we have an algorithm for handling null AID values, why wouldn't we use it consistently in all cases?

a condition or query construct that excludes the victim that isn't detected or compensated for by LED.

I don't think LED can compensate for not flattening outliers.

@yoid2000
Copy link
Collaborator Author

I don't think LED can compensate for not flattening outliers.

The design I've settled on will adjust for one AID, and this adjustment takes place before noise and flattening are computed. In fact it happens within the selectable containing the isolating condition.

So in the case of this attack, LED would insert the rows with ssn = f back into the computation, and there would be difference in the underlying data for both queries.

Can I suggest that we revisit this attack after LED is implemented in the reference implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants