Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying Cypher constraints #166

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

Mats-SX
Copy link
Member

@Mats-SX Mats-SX commented Dec 15, 2016

Specifies syntax for constraints and specifies three concrete constraints: node property uniqueness, node property existence, and relationship property existence.

CIP

Copy link

@petraselmer petraselmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some comments...

cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
@thobe
Copy link
Contributor

thobe commented Feb 28, 2017

I thought the idea was that we were going to specify a syntax for constraints without specifying which particular constraints an implementation should support. The syntax definition here explicitly talks only about uniqueness-constraint and existence-constraint.

@Mats-SX
Copy link
Member Author

Mats-SX commented Mar 1, 2017

CIP has now been reworked to try and fit the model discussed.

cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
cip/1.accepted/CIP2016-12-14-Constraint-syntax.adoc Outdated Show resolved Hide resolved
* `UNIQUE` - Ensures that each row for a column must have a unique value.
* `PRIMARY KEY` - A combination of a `NOT NULL` and `UNIQUE`. Ensures that a column (or a combination of two or more columns) has a unique identity, reducing the resources required to locate a specific record in a table.
* `FOREIGN KEY` - Ensures the referential integrity of the data in one table matches values in another table.
* `CHECK` - Ensures that the value in a column meets a specific condition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a leading keyword like that for non-UNIQUE constraints as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so, I would propose the word THAT, since it reads nicely...

CREATE CONSTRAINT FOR (x:Foo) REQUIRE THAT ...

@Mats-SX Mats-SX force-pushed the constraints-cip branch 3 times, most recently from 4121971 to 22a7f30 Compare March 6, 2017 08:21
@thobe
Copy link
Contributor

thobe commented Mar 7, 2017

PRIMARY KEY is not a helpful name for the concept it is used for describing.

On the high level there are two reasons for this:

  1. The word "primary" does not mean anything that is helpful in this context.
  2. The concept of primary keys carries with it a lot of associations from relational databases, many of which do not apply to the property graph model.
    • it should be noted that for the relational database case the word "primary" does have relevant meaning.

Diving further into these reasons, starting with the word "primary":

  • It implies that there is such a thing as a "secondary" key as well. In a relational database a secondary key is any index on a table other than the primary key.
  • It implies that this key has higher importance than any other key. While this might be true in many actual domain models, it is not always the case - in some cases there are other keys of equal importance.
  • It carries with it the association that there can be only one primary key. In many implementations - Neo4j being one of them - there is absolutely no need to enforce such a constraint on the ability to model your data.
  • It implies that the key needs to be defined first - before any data is inserted. In many implementation - Neo4j being one of them - this is not the case.

As for the aspects of primary keys in relational databases that do not apply to the property graph model:

  • The notion of there being a primary key implies that there might also be a foreign key - the idea of having foreign keys in a graph is quite silly, since we have direct relationships.
  • Coming from relational databases, I would expect preferential treatment for primary keys over any other (secondary) keys. I would expect lookup based on the primary key to be faster than any other key, since I would expect the data in that table to be structured by the primary key. In essence I would expect that leaf nodes in the index for the primary key being the actual row, with all of its data. Whereas for a secondary key the leaf node would only point to the actual row in the primary structure - there would be indirection for accessing the full data by a secondary key, thus penalizing access by secondary key.
    • Again this would not be true in for example Neo4j, where every key is actually secondary.
  • Relational databases equate the primary key with identity, the property graph (at least in some implementations - for example Neo4j) has a separate notion of identity, and while the type of key we are proposing to add to the model would allow you to uniquely identify an entity, it would not necessarily identify that same entity forever - the same entity might change some of the values of the key and thus be identified by a different key, but still have the same identity.

The only reason I can think of for introducing the concept of a primary key in Cypher is for being able to map Cypher onto a relational database model. If that is the case I would much rather see this proposed from a vendor working on such a mapping, since they would have the insight into what needs to be modeled.

I do think that the notion of a unique indexed key of mandatory properties is helpful, and I see the benefit of elevating such a concept to the status of receiving its own syntax, but I don't think PRIMARY KEY is a good name for it.

==== Mutability

Once a constraint has been created, it may not be amended.
Should a user wish to change its definition, it has to be dropped and recreated with an updated structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a note here that transactional implementations could do both the dropping and recreation in the same transaction so that the constraint is atomically mutated? This would of course allow leaving the old constraint in place should the creation of the new constraint fail.

@thobe
Copy link
Contributor

thobe commented Mar 7, 2017

I wonder if the notion of unique key (that at the moment are erroneously called primary key) should really be a constraint, or if it should have its own syntax, something like:

CREATE KEY FOR (p:Person) AS p.name, p.address

The reason for this being that it actually implies multiple constraints, and typically also an index. Since it is a composed concept like that, perhaps it would be sensible to elevate it to being a syntactical concept of its own.

In the syntax for this, if accepted, we should allow an optional name for the key as well, just like we do for constraints.

@Mats-SX
Copy link
Member Author

Mats-SX commented Mar 7, 2017

The CIP now uses ADD, NODE KEY and details a return record. I also took several review comments into account (thanks!).

@Mats-SX Mats-SX force-pushed the constraints-cip branch from 32d1322 to 16c2843 Compare May 4, 2017 12:11
@Mats-SX Mats-SX force-pushed the constraints-cip branch 2 times, most recently from 60c584f to 4ef7b32 Compare July 26, 2019 14:24
@Mats-SX Mats-SX force-pushed the constraints-cip branch from 85f2a88 to 0cca4f6 Compare July 1, 2021 10:13
Mats-SX added 11 commits July 1, 2021 12:15
For improved source readability
- use same EBNF style as @hvub did in opencypher#493
- use links to connect to grammar constructs in openCypher grammar spec
- modify constraint operators to be suffix-modeled, ala IS NOT NULL
- introduce 'grouped-expression' concept
Extend definition to reference grouped expression
- Update definition to reference grouped expression
- Reformulate equivalence example to use IS NOT NULL and IS UNIQUE over a grouped expression
Move ahead of referencing sections to ease readability of CIP
- Including links to these from the CIP
- Also update examples for the parser tests
@Mats-SX Mats-SX force-pushed the constraints-cip branch from 4654832 to 26f22bb Compare July 2, 2021 12:39
@Mats-SX
Copy link
Member Author

Mats-SX commented Jul 2, 2021

This CIP has now been updated. See commit history for a summary of new models.

TCK tests cannot be added as the CIP mandates no particular concrete constraint predicates to be enforced by every implementation of Cypher. However, I will exemplify TCK scenarios that one could consider if one were to implement the NODE KEY constraint:

Feature: CreateConstraint1

  Scenario: [1] Blocking creation of nodes that do not conform to NODE KEY constraint
    Given an empty graph
    And having executed:
      """
      CREATE CONSTRAINT
      FOR (a:A)
      REQUIRE (a.x, a.y) IS NODE KEY
      """
    When executing query:
      """
      CREATE (a:A)
      SET a.x = 1
      """
    Then ConstraintValidationFailed should be raised at runtime: NodeKeyRequired

  Scenario: [2] Allowing creation of nodes that uphold NODE KEY constraint
    Given an empty graph
    And having executed:
      """
      CREATE CONSTRAINT
      FOR (a:A)
      REQUIRE (a.x, a.y) IS NODE KEY
      """
    When executing query:
      """
      CREATE (a:A)
      SET a.x = 1
      SET a.y = 2
      """
    Then the result should be empty
    And the side effects should be:
      | +nodes      | 1 |
      | +properties | 2 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants