Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved yaml parsing by adding an extended parser subclass able to inline anchors #502

Open
wants to merge 3 commits into
base: 2.19
Choose a base branch
from

Conversation

HeikoBoettger-KarlStorz
Copy link

This pull request proposes a workaround for the lack of being able validate yaml files using anchors. The solution is to simply inline all anchors.

What's the matter with the upstream YAMLParser?

The YAMLParser currently produces the JsonEvents representing String and sets an additional status that a consumer needs to retrieve calling an additional method. However when trying to perform a YAMLSchema validation the current approach is to use the existing JSONSchema validation code which doesn't know about this extra method. As a result validating a yaml-file using aliases and anchors is failing to validate due to lack of anchor support.

The backlog contains already some issue talking about the need of a bigger refactoring to fully support anchors and aliases.

What's is the workaround?

This pull request adds a YAMLParserExt class (I didn't had a good idea how to name it) and a factory to make use of it. When the events are requested the implementation remembers the events produced by yaml content that has an anchor. Later when a alias is found and the anchor exists, it simply returns the same events that were part of the anchored content. As a result the schema validator will see the document as if the anchored content was inlined.

What's the risk pulling this in?

Repeating the events might have some unknown side-effects such as that a document will appear larger in terms of produced events than it actually is or code expecting the events to be unique might not work. I also haven't looked into whether there might be issue determining the position (line & column number) of whatever produced the event in the original document. However it works for use to validate our yaml-files and using a separate class to implement this option doesn't break any existing code.

Some thoughts:

  • I haven't checked whether there are solution using $ref from JSON but I can imaging the yaml aliases can be positioned more flexible.
  • It might be possible to solve this differently by using a smaller ParserImpl interface (a EventSource-interface).
  • The event based approach is not ideal when it comes to handling aliases since there is no DOM to just point to past data structures.
  • Parsing huge yaml files with big anchored content might require a lot of memory

In case you consider pulling this in, please let me know where I can find an example of a unit test, I am more than happy to provide one. The approach I have taken in our internal project was simply running the validation across some yaml documents containing anchors and checking the validation results.

@cowtowncoder
Copy link
Member

Ok, this is interesting but I need to digest it for a bit to know what I think. :)

As to unit tests, they are under yaml/src/test/java/....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants