Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy and Purpose Constraints #15

Open
eriktaubeneck opened this issue Oct 19, 2022 · 17 comments
Open

Privacy and Purpose Constraints #15

eriktaubeneck opened this issue Oct 19, 2022 · 17 comments

Comments

@eriktaubeneck
Copy link
Collaborator

eriktaubeneck commented Oct 19, 2022

From the introduction:

In the presence of this adversary, APIs should aim to achieve the following goals:
-Privacy: Clients (and, more specifically, the vendors who distribute the clients) trust that (within the threat models), the API is purpose constrained. That is, all parties learn nothing beyond the intended result (e.g., a differentially private aggregation function computed over the client inputs.)
-Correctness: Parties receiving the intended result trust that the protocol is executed correctly. Moreover, the amount that a result can be skewed by malicious input is bounded and known.

I suggest instead of Privacy, we make the first bullet:

Purpose Limitation: User-agents are reasonably assured that the API is purpose constrained such that no party can acquire data outputs other than what is intended and expected by the user-agent, given the inputs it provides.

Add bullets for verifiable input and auditability:

Verifiable Input: Parties using the API are reasonably assured that data provided by user-agents is accurate, reliable and honest.

Auditability: Parties providing data to, or receiving data from, the API can receive reports identifying: when, how, by whom and to whom data was communicated; when, how and by whom data and processed.

Correctness: Parties receiving the intended result can verify that the protocol is executed correctly and that the amount a result can be skewed, intentionally by adding noise, or by malicious input, is bounded, known and reported.

Originally posted by @bmayd in #14 (comment). Moving to a new issue since it's a distinct issue from the PR opened in issue 14.

@npdoty
Copy link
Collaborator

npdoty commented Oct 19, 2022

This reads like an uncommon use of "purpose limitation". Assurance that no party can acquire data of a certain kind would be access control (or data minimization). Purpose limitation typically refers to an attestation that even though a party can access some data, they will limit the purposes that they use it for.

We should pursue both: limiting the data that's accessible and getting promises that the data that is shared is only used for the specific purpose for which it is provided. But I think we shouldn't confuse terms of art.

@eriktaubeneck
Copy link
Collaborator Author

@bmayd - thanks for the suggestion. Splitting out "Purpose Limitation" and "Privacy" makes sense to me, as "Purpose Limitation" isn't de facto private. I don't think we can drop Privacy entirely though.

I'm less use about adding "Verifiable Input" and "Auditability". "Verifiable Input" seems like a tactic for achieving correctness, and "Auditability" seems like a tactic for trusting that a system is purpose limited. Curious what others think here.

@eriktaubeneck
Copy link
Collaborator Author

@npdoty, I'm not sure it's that far off, though the use of "data outputs" might be a bit confusing, and "information" would be a bit more clear. Something like:

Purpose Limited: The API (and surrounding system) provides sufficient guarantees to allow the user agent to trust that, given the APIs output, no party can learn any information beyond the intended result.

@npdoty
Copy link
Collaborator

npdoty commented Oct 19, 2022

I support the intent of your statement. You may be interested in a goal of developing functionality that will only produce a limited set of data, and that restricting the output data is the intended purpose of the system.

However, "purpose limitation" is a term for a well-known concept in privacy law that refers to something different, that of a party documenting in advance what purpose it will use data for, promising only to use it for that purpose, and collecting data that can be used for multiple things but only using it for that limited purpose.

@eriktaubeneck
Copy link
Collaborator Author

I believe we are utilizing that well-known concept in privacy law intentionally. The only difference is, because we are working on a web standard, and not a law with some form of enforcement, we cannot rely on a promise and must instead rely on technical guarantees.

The standard documents in advance what the purpose of the data is used for (i.e., attribution measurement), data is collected, which without the technical constraints could be used for more than that purpose, and then is only used for that limited purpose (because of the technical constraints.)

@bmayd
Copy link

bmayd commented Oct 20, 2022

This reads like an uncommon use of "purpose limitation".

@npdoty Appreciate your callout; agree terms of art should not be confused as it leads to unnecessary disambiguation overhead. What do you think of "Purpose Constraint" as an alternative?

We should pursue both: limiting the data that's accessible and getting promises that the data that is shared is only used for the specific purpose for which it is provided.

Although I think both are worth pursuing, I think this group ought to consider promises out of scope. However, technical and data-based evidence that can be used to affirm or refute promise claims should be in scope. For example, we can't enforce a promise that data won't be tampered with, but we can sign the data so that tampering can be detected.

@eriktaubeneck
Copy link
Collaborator Author

@bmayd I disagree that this causes confusion or leads to unnecessary disambiguation overhead. I am using this term of art intentionally. Here's a definition for purpose limitation:

A principle that data collected for one specified purpose should not be used for a new, incompatible purpose. The term purpose limitation may have a specific definition in certain jurisdictions. Under the General Data Protection Regulation (GDPR), for example, purpose limitation is a requirement that personal data be collected for specified, explicit, and legitimate purposes, and not be processed further in a manner incompatible with those purposes (Article 5(1)(b), GDPR).

This is our intention, with the only addition being that we are aiming to additionally provide technical guarantees to enforce it. @npdoty are you arguing that adding these technical guarantees makes this definition no longer compatible?

@npdoty
Copy link
Collaborator

npdoty commented Oct 20, 2022

@eriktaubeneck yes, I still find the use of the term here confusing and in conflict with its meaning elsewhere.

Typically, I think a purpose in this context would be "measurement of purchases resulting from advertising" not "to calculate a result of a certain class of function in aggregate form with a differentially private guarantee". The purpose of the data collected or how it's being used is not specified in explicit terms to the user as part of the technical guarantee of the cryptographic design of the system.

It can be very helpful to privacy goals to provide some technical guarantees. Existing data protection and privacy law is familiar with the concept of providing technical guarantees. For example, the principle in the GDPR following purpose limitation is data minimisation (sic), which includes limiting the data that is collected or processed by a party (for example, the data output of a privacy-preserving measurement aggregate calculation). That's the proposed contribution of this work -- that the data output is minimized such that the recipient never learns or accesses the individual user data at all.

Providing technical designs that limit the amount of output data and thus how it can subsequently be used in ways that might harm people's privacy is a valuable contribution, it's just something that we use different terms for than purpose limitation. Generally we would like to design Web technology that is more closely driven by use cases and not so easily re-used or abused for other purposes and we should come up with some terminology for that. I've occasionally used "fit for purpose" in describing that idea informally, but that's also not quite right.

@eriktaubeneck
Copy link
Collaborator Author

Typically, I think a purpose in this context would be "measurement of purchases resulting from advertising" not "to calculate a result of a certain class of function in aggregate form with a differentially private guarantee".

This makes sense, however I think we are mixing up the threat model for the PATCG/WG and the specific private measurement spec. For the threat model, our goal is to consider proposals which offer technical means of purpose limitation (for lack of a better name at the moment.) For the private measurement spec, the purpose is something like "differentially private measurement of conversions resulting from advertising impressions."

In that sense, the goal of the threat model is to help evaluate whether a specific proposal actually provides purpose limitation of its specified purpose through technical guarantees.

The purpose of the data collected or how it's being used is not specified in explicit terms to the user as part of the technical guarantee of the cryptographic design of the system.

I think this depends on what is meant by "the data collected". For the private measurement spec, if it's the (typically encrypted) data leaving the client, then I'd argue that it is in fact specified as the purpose above (and the system limits it to that.) If "the data collected" is the aggregates which come out of the private computation system, then I agree that it's not specified. That said, I think you can always make that argument and it's turtles all the way down.

@bmayd
Copy link

bmayd commented Oct 20, 2022

...I don't think we can drop Privacy entirely though.

@eriktaubeneck I find Privacy difficult to make meaningful or verifiable assertions about and was endeavoring to find an alternative that could be technically addressed. APIs can make assertions about, and report on, data inputs, processing and outputs, but regarding:

...all parties learn nothing beyond the intended result...

I think suggesting what might or might not be learned is problematic -- an API can only know what data is reported, it can't know how the data is applied and in most cases the intent is presumably to use the outputs to inform an understanding of a larger context. The question then becomes: are there cases in which increased understanding of the larger context could constitute a violation of privacy? I don't think it is a question that can be answered in the context of the API or a standard, but rather see it as a policy matter.

I'm less use [sure?] about adding "Verifiable Input" and "Auditability". "Verifiable Input" seems like a tactic for achieving correctness, and "Auditability" seems like a tactic for trusting that a system is purpose limited. Curious what others think here.

In the context of the initial statement:

In the presence of this adversary, APIs should aim to achieve the following goals:

I was including them because I believe they are goals that must be achieved in order for the model to be considered reliable and trustworthy in the face of adversaries seeking to corrupt it or apply it in violation of stated terms.

@csharrison
Copy link
Collaborator

I agree with @npdoty that "purpose limitation" is confusing in this context. I think the definition of "purpose" is broad enough to extend to post-processing of the private output of the API, which we probably don't want to constrain once its outside of the API's protection.

@alextcone
Copy link

I sense the miscommunication here is due to relativity. An API like ARA or IPA built for attribution is relatively much more purpose constrained than an API like document.cookie. That said, purposes can always be considered in a more granular fashion. For example: I want to understand how many widget buyers also drink wine, red wine, red wine in Tuscany, red wine in Montalcino during July, etc... Then you get into how that knowledge might be used and you near infinite combinations.

I suggest that it is a noble aim to set ourselves on a course for purpose limitations even if the purposes are still quite broad.

@bmayd
Copy link

bmayd commented Oct 24, 2022

Having read through the various responses above and looked at definitions of "purpose limitation" online, I'm still inclined to agree with @npdoty that this is an unconventional use of a term that has a domain-specific meaning and as such could be confusing.

His description:

... "purpose limitation" is a term for a well-known concept in privacy law that refers to ... a party documenting in advance what purpose it will use data for, promising only to use it for that purpose, and collecting data that can be used for multiple things but only using it for that limited purpose.

In other words, it refers to data reuse that is possible, but should not happen, while in our context the intent is to identify that data reuse is not possible and cannot happen due to technical limitations.

(Is this bike shedding?)

@jaylett-annalect
Copy link

I'm not sure this is bike shedding - I think clarity around the way we talk to each other is critical to have confidence that when we say we agree that we are actually in agreement. In this vein, avoiding the use of any term in a way such that any significant contributor to the discussion believes there is conflation or confusion seems the right thing to do.

What I believe we're talking about is technical measures in the design of the system (and crucially its data outputs) which inhibit the possible purposes to which data that exits the system can be put.

We need a fairly short and clear phrase for this if it's going to be a criterion we want to assess proposals against. One reason not to conflate it with the common understanding of purpose limitation as articulated above is that we may also want to leverage that concept (for instance by recommending that implementing parties make suitable public attestations, even auditably) where we cannot inhibit certain unwanted uses for data outputs by technical means and system / protocol design alone.

@bmayd
Copy link

bmayd commented Oct 25, 2022

I think clarity around the way we talk to each other is critical to have confidence that when we say we agree that we are actually in agreement. In this vein, avoiding the use of any term in a way such that any significant contributor to the discussion believes there is conflation or confusion seems the right thing to do.

Yup, I think making a concerted effort to be specific, concise and consistent when defining key terms within the problem domain leads to more productive discussions and better, faster alignment. Perhaps "usage constrained"?

@martinthomson
Copy link
Collaborator

technical measures in the design of the system (and crucially its data outputs) which inhibit the possible purposes to which data that exits the system can be put.

I like this, though I might avoid using "purposes" and instead using something like "application". "technical proscription of data application" could just be "technical application proscription" or even "proscription" in colloquial usage.

@jaylett-annalect
Copy link

I might avoid using "purposes"

+1

and instead using something like "application". "technical proscription of data application" could just be "technical application proscription" or even "proscription" in colloquial usage.

The reason I used 'inhibition' is that it accepts that although some things can be completely prevented by technical design, some things may only be possible to make harder (and ideally less powerful also). I'm not sure 'proscription' conveys that - 'constrained' would work for me. However at this point we may now indeed be bike shedding, so I won't object if others are happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants