Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport Snakeyaml update to branch 2.12 #177

Closed
janjwerner-confluent opened this issue Dec 8, 2023 · 25 comments
Closed

Backport Snakeyaml update to branch 2.12 #177

janjwerner-confluent opened this issue Dec 8, 2023 · 25 comments

Comments

@janjwerner-confluent
Copy link

janjwerner-confluent commented Dec 8, 2023

Hi @cowtowncoder
I'm working on CVE cleanup in Druid, where I ran into a wall with usage of Snakeyaml.
As it was mentioned in #134 Druid cannot easily move to newer version of of jaxrs-providers.
Would you be willing to accept a PR and release another 2.12 branch?

@cowtowncoder
Copy link
Member

I could accept a PR (if that made it easier from your side) but would not release a new version, too much work.

But I think that forcing a newer SnakeYAML version might work -- I don't think 2.12 was shading dependency, and SnakeYAML API has remained about same.

@janjwerner-confluent
Copy link
Author

@cowtowncoder
Thank you for the swift response. The forcing of a newer SnakeYAML version unfortunately does not work - the default constructor has changed and just overriding the dependency leads to a failure.

The required change is trivial:

-        _yamlParser = new ParserImpl(new StreamReader(reader));
+        _yamlParser = new ParserImpl(new StreamReader(reader), new LoaderOptions());

and adding an import:
+import org.yaml.snakeyaml.LoaderOptions;
in yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLParser.java

@cowtowncoder
Copy link
Member

Ok that is unfortunately a no go then since:

  1. Would cause compatibility problem in patch (since that new parser constructor not available in SnakeYAML version supported by earlier 2.12.x releases)
  2. Requires full release (since more than 1 component), which means 3-4 hour release process

Sorry.

@cowtowncoder
Copy link
Member

Having said that, then overriding jackson-dataformat-yaml dependency to newer one might work. But I guess that could be a rat hole as newer version of YAML format module probably does then require newer jackson-core and perhaps jackson-databind.
So it depends on exactly what prevents use of later JAX-RS provider version.

@janjwerner-confluent
Copy link
Author

Thank you for feedback and the consideration!

@pjfanning
Copy link
Member

Users can simply copy paste code and modify it. There is no reason for Jackson to issue new releases for every minor release every time a dependency jar has a CVE.

@cowtowncoder
Copy link
Member

I do think it's a PITA to maintain a fork -- users typically build with dependency jars from Maven repositories (or similar), except for few that build all from sources -- so I think it was a reasonable thing to ask if this was feasible.

But yes, building new patches for old branches would be potentially tons of work for Jackson project, unfortunately.

@janjwerner-confluent
Copy link
Author

@pjfanning
I agree and where possible, I would try to attempt to upgrade the dependency that brings the vulnerable component, or override the transitive dependency. In this particular case, first solution is very labor intensive, second solution is breaking due to incompatibile API.
I had to ask - the worst thing that can happen is someone telling you no :)

@pjfanning
Copy link
Member

pjfanning commented Dec 8, 2023

Is Druid still using Jersey 1.x? That is the reason Hadoop is stuck on Jackson 2.12. Jackson 2.13 removed the support for Jersey 1.x. If Druid does not use Jersey 1.x, you could probably upgrade to a much newer version of Jackson.

I offered to help get Hadoop upgraded - including doing a fork of the jackson-jaxrs libs that added back Jerset 1.x support but they weren't interested.

@janjwerner-confluent
Copy link
Author

@cowtowncoder
Copy link
Member

Hmmmmh. Oh. If there are multiple high-visibility project with this problem that might change things (Hadoop and Druid are both fairly popular)

My main concern is with SnakeYAML compatibility in that case.
Is the goal to get to SnakeYAML 2.x? If I remember correctly, 1.x / 2.x had compatibility problem (which is fair, it being major version bump).

@pjfanning
Copy link
Member

@cowtowncoder this all goes back to #134 (comment)

That change has made sure that Hadoop and Druid are stuck on Jackson 2.12 and can't upgrade. I know on the Hadoop side that attempts have been made to switch to Jersey 2 but these have failed. They are stuck indefinitely on Jersey 1 and Jackson does not support Jersey 1. So all changes in other Jackson modules - like the use of snakeyaml 2 are unattainable for Hadoop and Druid (and some other Hadoop reliant Big Data projects).

@cowtowncoder
Copy link
Member

@pjfanning Yes, I got that. Which is why I might be more amenable to selected patching of 2.12 branch, not just JAX-RS, and one new full release (but not one per update).
Not that I look forward to doing new release(s), but at least there would be reasons to do so for, f.ex, dependency CVEs. And ideally do as a batch.

But it is definitely risky, due to distance between latest (2.16) and 2.12.

@pjfanning
Copy link
Member

Is there any chance that we could consider undoing #134 ? That would probably be less work that cherry picking changes and doing releases.

@cowtowncoder
Copy link
Member

Looking at change, it's sizable enough that I really do not want to reintroduce it.
At least not to the head; I guess theoretically could consider some other version (2.15? leave out from 2.16+). That does not sound good either as acceptable version range would be discontinuous (2.12, 2.15). :-(

But as to cherry-picking, I would want to limit these to things that downstream projects would request -- like, creating an issue, and if we could find people involved, let them list top CVEs (and provide PRs). But keep the list small. So no new features, minimal behavioral changes. So this would be unusual one actually, otherwise it'd be just updating dependencies from pom.xmls.

Not sure that is a good idea either, just writing out my first thoughts.

@janjwerner-confluent
Copy link
Author

@pjfanning
Thank you for providing the context.
@cowtowncoder
I'd be happy to do the legwork - provide PRs for the dependency changes, test out things locally, if any of that makes a difference.

@cowtowncoder
Copy link
Member

That makes sense. I think YAML one is the tricky one, no matter how we slice it. If I remember correctly, there was no real way to make same code (jackson-dataformat-yaml) work both on latest 1.x and 2.0.

@pjfanning
Copy link
Member

  • Apache Druid only seems to use jackson-dataformat-yaml in test code
  • Releasing a jackson-dataformat-yaml 2.12.7.1 that changes to snakeyaml 2.2 is not a good move from a semver perspective
    • many users will upgrade thinking it's a trivial upgrade and it is not
    • Jackson itself waited till a minor release to this upgrade
  • Can Apache Druid team provide better evidence that they are seriously impacted - in what way can malicious users supply dangerous yaml payloads? @janjwerner-confluent you can provide this privately if this is sensitive - maybe you should notify [email protected]. I am a member of the ASF Security committee and will receive any emails sent to ASF security@ email addresses.
  • The OSS community would be better served by the Apache Druid team upgrading to jersey 2 than focusing on getting other teams hack their releases for them
  • If Apache Druid team absolutely must upgrade snakeyaml, you can almost certainly upgrade the affected pom.xml to use a newer version com.fasterxml.jackson.dataformat while excluding its Jackson transitive dependencies to prevent you from upgrading the core jackson jars.
    • jackson-dataformat-yaml 2.16.0 looks like it has changes that won't work with jackson-core/jackson-databind 2.12
    • but we'll probably find an imtermediate jackson-dataformat-yaml that does use snakeyaml 2.x and that works with jackson-core/jackson-databind 2.12
    • I can help with reviewing any Druid PRs that attempt to go this route

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 9, 2023 via email

@pjfanning
Copy link
Member

I would really much prefer to see the paper trail on this. I am a member of the ASF. I do not see any Apache discussions about this. I think this is completely cart before horse. The ASF warrants public discussions and peer overview. I don't see any Druid discussion going on about this.

In my fork of Apache Druid, there are 50 dependabot issues - only 1 is related to snakeyaml. What about the 49 other dependency issues?

Upgrading snakeyaml also brings in the annoying 5k limit that the snakeyaml maintainer introduced in v1.32. The snakeyaml v2 upgrade did not appear to add extra security improvements, just made the 'safe' constructor behaviour the default. So snakeyaml 1 users can still use 'safe' constructors, they just need to explicitly use them. Maybe I'm wrong but my understanding is that jackson-dataformat-yaml has long used the 'safe' constructor code path. That is, when we upgraded the snakeyaml jar in jackson-dataformat-yaml, we were just trying to pre-empt the people who think security is about ticking boxes about jar versions and not actually reading the CVEs and understanding if they apply to their use case.

@janjwerner-confluent
Copy link
Author

Thank you both @pjfanning @cowtowncoder

  • I am not aware of any druid discussions - it's my own initiative.
  • If you update your fork you should see that the amount of issues is coming down.
  • Yes, a lot of this effort is a security theatre to satisfy the checkbox ticked requirement.

@pjfanning
Copy link
Member

I updated Hadoop to use snakeyaml 2 - see https://issues.apache.org/jira/browse/HADOOP-18658

Until there is proper Druid discussion and reasonable proof that Druid has a real security issue related to snakeyaml - and a reasonable attempt is made to update snakeyaml independently of requiring a special jackson patch, I think this is a bad way to go forward.

@janjwerner-confluent
Copy link
Author

@pjfanning
I got it, i will look into carving out snakeyaml in Druid - specify version 1.33 as test dependency only, and let other libraries bring their own versions.
Thank you for your insight and patience.

@cowtowncoder
Copy link
Member

Sounds like a good approach for now. If we can avoid need for new 2.12 releases (full or micro), that would be great.

@cowtowncoder
Copy link
Member

Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants