Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the flaky test testSerdeWithInputFormat by sorting the result #17325

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yugvajani
Copy link

Fixed a flaky test in org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest.testSerdeWithInputFormat

Steps to reproduce

To reproduce the problem, first build the module kafka-indexing-service:

mvn install -pl extensions-core/kafka-indexing-service -am -DskipTests

Then, run the regular test:

mvn -pl extensions-core/kafka-indexing-service test -Dtest=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest#testSerdeWithInputFormat

To identify the flaky test, execute the following nondex command:

mvn -pl extensions-core/kafka-indexing-service edu.illinois:nondex-maven-plugin:2.1.7:nondex -Dtest=org.apache.druid.indexing.kafka.supervisor.KafkaSupervisorSpecTest#testSerdeWithInputFormat

Description

The testSerdeWithInputFormat was producing non-deterministic output because the JSON object used in the test did not include a parser that enforced a specific order, unlike other tests. Due to the inherent non-deterministic nature of JSON serialization, this resulted in inconsistent outputs. The fix involves sorting the serialized JSON to ensure a consistent order.

  • sortJsonNode - This function is the sort function to sort the objectNode and arrayNode inside the Json object.
  • sortArrayNode - This function is called to sort the arrayNode.
  • sortJsonString - Deserializes JsonString to JsonNode , sorts it and serializes it back to JsonString.

The sortJsonString function can now be directly used to sort the serialized Json ,which will ensure a deterministic order for all tests where the serialized Json gives a non deterministic order.


Functions added in this PR
  • sortJsonNode
  • sortArrayNode
  • sortJsonString

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -310,71 +317,115 @@ public void testSerdeWithInputFormat() throws IOException
KafkaSupervisorSpec spec2 = mapper.readValue(serialized, KafkaSupervisorSpec.class);

String stable = mapper.writeValueAsString(spec2);
String sortedStableJson = sortJsonString(stable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this seems like an unconditional resort at the Json level for only this test; but I wonder if it only affects this
from the repro steps: nondex have detected that the output here is unstable.

I don't know the Kafka part; but there might be lists/arrays for which order does matter - sorting all parts of a json unconditionally may possible hide issues.

I wonder if it was investigated what causes these mismatches. I suspect that some underlying class may use some unstable constructs - in most cases these used to boil down to a HashSet -> LinkedHashSet change...

Another possible way to fix this in a wider scope is to make the KafkaSupervisorSpec comparable and do the validation at that level; so that the real classes can decide if 2 maps/sets are equal or not ; and lists / arrays will still be validated correctly

what do you think about these approaches?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an underlying class were using HashSet, other tests in the same file would likely have failed as well, since they all use the same underlying class. The key difference with testSerdeWithInputFormat is that it wasn’t using a parser, which is probably what makes the list order deterministic in the other tests.

I tried adding the parser to this test, and while it did fix the order, I ran into an issue where Assert.assertEquals(4, spec.getDataSchema().getAggregators().length); returned a length of 0. I'm not entirely sure why, as I’m not very familiar with the Kafka-specific logic.

The sort function I created could be used as a general-purpose tool to ensure deterministic ordering across tests, if needed. In fact, most of the tests only rely on assert.contains, except for one AssertEquals. Since the sort function only arranges the list elements and doesn’t alter their content, it shouldn’t cause problems, especially since the parser itself was handling ordering in other cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants