Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyValueList and log attributes with duplicate keys as undefined behavior #533

Closed
wants to merge 9 commits into from

Conversation

pellared
Copy link
Member

@pellared pellared commented Mar 12, 2024

Towards open-telemetry/opentelemetry-specification#3931

Related to open-telemetry/opentelemetry-specification#3938

I think it is not a breaking change as this does not change anything in the encoding. The receivers already have to handle/validate key-values with duplicate keys.

Implementing de-duplication decreases performance. More:

Performance may be more important for applications instrumenting their code than even losing log records which will contain duplicates (which are very unlikely).

The OTLP exporters deduplicate key-val pairs before sending the data to satisfy receivers that require them to have unique keys. Alternatively, the SDKs can have a processor which deduplicate key-val pairs.

@austinlparker
Copy link
Member

While this might not be an explicit breaking change, it seems like it will certainly be an implicit one. Is there no other way to address this?

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @mx-psi mentioned, the OTel Collector is built on top of the key uniqueness requirement. Any key duplication is treated as a UB, and the result of the processing of such data is undetermined.

AFIAR the initial implementation of the proto was using map. Then it was changed to the key/value list due for performance improvements.

I'm strongly against this change since it can lead to further adoption of this practice.

@pellared
Copy link
Member Author

pellared commented Mar 12, 2024

Then it was changed to the key/value list due for performance improvements.

This proposal is also due for performance improvements.

the OTel Collector is built on top of the key uniqueness requirement

Is this something that can be changed if needed in future? In my opinion, for now it is not necessary.

@dmitryax
Copy link
Member

This proposal is also due for performance improvements

I think the data consistency and deterministic processing across the OpenTelemetry project are more important than the performance degradation of one instrumentation library.

@pellared
Copy link
Member Author

pellared commented Mar 12, 2024

I think the data consistency and deterministic processing across the OpenTelemetry project are more important than the performance degradation of one instrumentation library.

It is not only performance degradation of one instrumentation library but whole signal's (e.g. logs) processing pipeline.

EDIT: The issue is valid not only when the duplication actually exists. The SDK simply has to ensure that there are no duplicates. For instance in Go Logs API we pass attributes as array (and optional slice) to decrease the number of heap allocations. The SDK would need to use a map to remove duplicated attributes.

@MrAlias
Copy link
Contributor

MrAlias commented Mar 12, 2024

This proposal is also due for performance improvements

I think the data consistency and deterministic processing across the OpenTelemetry project are more important than the performance degradation of one instrumentation library.

There are two three known languages that need this currently (Go, Rust, and C++). It is unfair to say this is only for a single instrumentation library.

@MrAlias
Copy link
Contributor

MrAlias commented Mar 12, 2024

It seems like if the expectation is for the sender to de-duplicate the attributes prior to sending the data it should be achievable on the receiver side as well, right?

@MrAlias
Copy link
Contributor

MrAlias commented Mar 12, 2024

cc @jmacd

@pellared
Copy link
Member Author

pellared commented Mar 12, 2024

@mx-psi @TylerHelmuth @dmitryax

I want to point out that this PR does NOT require making any changes in the Collector if the last (or any) item with the same key will be preserved. The main point is just to allow passing duplicate keys and make sure that it does not break the receiver as this would allow performance improvements in SDKs.

EDIT: It would be even acceptable if the collector drop attributes with duplicated keys, but I do not imagine that this happens 😉

@cijothomas
Copy link
Member

This proposal is also due for performance improvements

I think the data consistency and deterministic processing across the OpenTelemetry project are more important than the performance degradation of one instrumentation library.

There are two three known languages that need this currently (Go, Rust, and C++). It is unfair to say this is only for a single instrumentation library.

OTel .NET too. (OTel .NET does not do de-duplication today for logs today).

@pellared
Copy link
Member Author

pellared commented Mar 13, 2024

Added to PR description:

The OTLP exporters can have a configuration to deduplicate key-val pairs before sending the data to satisfy receivers that require them to have unique keys.

@mx-psi
Copy link
Member

mx-psi commented Mar 13, 2024

Thanks for making this more in line with JSON. The new wording allows for a backwards-compatible implementation in the Collector. However, I still don't think we have a satisfactory solution (I don't think there even is one).

Collector components will at most be able to forward duplicate keys, but no matter what implementation we pick components won't be able to interact with duplicate keys in any meaningful way. If an application wants to support duplicate keys in a meaningful way, they won't be able to use the Collector as a library for implementing their OTLP server.

There already are a number of applications from vendors and non-commercial FOSS that use the Collector as a library (Jaeger, Grafana Tempo, Datadog Agent, AWS Cloudwatch Agent, Elastic Agent, all vendor-specific Collector distros...). All of these applications won't be able to interact with these duplicate keys. If a sizeable part of the OpenTelemetry ecosystem is not able to use handle this feature correctly at all, what's the point of making this change?

@pellared
Copy link
Member Author

pellared commented Mar 13, 2024

If a sizeable part of the OpenTelemetry ecosystem is not able to use this feature at all, what's the point of making this change?

Because it gives significant performance improvements for the SDKs. It would not require them to handle de-duplication.

@mx-psi
Copy link
Member

mx-psi commented Mar 13, 2024

Personally, I don't think a performance improvement justifies what we may as well call a correctness issue on the Collector side

@pellared pellared changed the title Allow attributes with duplicate keys Allow attributes with duplicate keys as undefined behaviour Mar 13, 2024
@pellared pellared changed the title Allow attributes with duplicate keys as undefined behaviour Allow attributes with duplicate keys as undefined behavior Mar 13, 2024
@pellared pellared changed the title Allow attributes with duplicate keys as undefined behavior KeyValues with duplicate keys as undefined behavior Mar 13, 2024
@austinlparker
Copy link
Member

For the sake of discussion, is there any data about the actual performance impact of deduplication in the log pipeline at an SDK level?

Furthermore, the OpenTelemetry model is fundamentally built on blending multiple signals together. While I can certainly believe that existing logging patterns may rely on duplicate keys in log messages, is this a pattern we expect to hold going forward?

@jsuereth
Copy link
Contributor

I'm very much against this change. This breaks a known invariant and forces all backends to deal with the issue. I don't think the benefit outweighs the overall cost.

@pellared pellared requested a review from a team March 13, 2024 12:35
@austinlparker
Copy link
Member

I'm very much against this change. This breaks a known invariant and forces all backends to deal with the issue. I don't think the benefit outweighs the overall cost.

To echo this, my interest in this topic is that it would also have implications for every single vendor or tool that ingests OTLP. The scope of this change is massive, which means the burden of proof needs to be commensurate.

@pellared pellared marked this pull request as draft March 13, 2024 12:37
@pellared pellared changed the title KeyValues with duplicate keys as undefined behavior KeyValueList and log attributes with duplicate keys as undefined behavior Mar 13, 2024
@pellared
Copy link
Member Author

For the sake of discussion, is there any data about the actual performance impact of deduplication in the log pipeline at an SDK level?

Please read the description. There is a benchmark for Rust which indicates 30% performance improvement.

Side note:

Furthermore, the OpenTelemetry model is fundamentally built on blending multiple signals together. While I can certainly believe that existing logging patterns may rely on duplicate keys in log messages, is this a pattern we expect to hold going forward?

That is a good question. I updated the proto so that it currently only should affect logs. But I see that this change may be a precedence for further requests e.g. to do the same for metrics metadata.

I am changing the PR to draft to communicate that at this point of time I do not plan to push this further.

Thanks everyone for your feedback.

Still, the change in the Logs Data Model would be beneficial even if the OTLP would require unique keys as other exporters can benefit from this. I will follow-up in open-telemetry/opentelemetry-specification#3931.

@pellared
Copy link
Member Author

pellared commented Mar 15, 2024

I'm very much against this change. This breaks a known invariant and forces all backends to deal with the issue. I don't think the benefit outweighs the overall cost.

I do not understand this statement as right now it is still possible to send duplicated key-values using OTLP. The backends need to deal with this anyway. The PR just proposes to mark this as undefined behavior, because it is technically possible to send it. Not checking for duplicates and improving performance may be more important for instrumentations than even losing log records which will contain duplicates. Especially given that such duplications are rare in real-world and checking for duplicates always causes an overhead.

I decided to reopen the PR as I think it is very coupled to open-telemetry/opentelemetry-specification#3938.

@pellared pellared marked this pull request as ready for review March 15, 2024 13:35
Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am blocking this until there is full clarity on whether this is a breaking change or no.

@tigrannajaryan
Copy link
Member

The PR just proposes to mark this as undefined behavior, because it is technically possible to send it.

@pellared This is unnecessary. There are many other payloads which are technically possible to send but are wrong because they violate a certain specified invariant. Unlike other specifications (e.g. C++ standard) the undefined behavior in OTLP spec is not explicitly specified. In this spec a behavior is "undefined" simply by being unspecified.

As an example: we have AGGREGATION_TEMPORALITY_UNSPECIFIED value of AggregationTemporality. The spec says that it must not be used. Technically nothing prevents senders from using it. However, the spec is silent about what happens when receivers see this value. It is an undefined behavior. It is implied that the behavior of the receivers when they see this value is not known.

@pellared
Copy link
Member Author

Closing as unnecessary.

@pellared pellared closed this Mar 15, 2024
@tigrannajaryan
Copy link
Member

tigrannajaryan commented Mar 15, 2024

Although this is now closed I would like to comment on it to clarify an important principle that we can use to evaluate change proposals like this.

This change says that a payload that was previously non-compliant is now possible. It then gives a leeway to receivers to behave as they wish with such payloads, including to reject they payload or to accept and process it in some reasonable way.

I think such approach is fundamentally not compatible with OpenTelemetry's value of vendor neutrality. Allowing such leeway can result in critical differences in behaviors by different vendor backends, making it impossible for the OpenTelemetry user to switch vendors as they desire. If one vendor happily accepts your data and the other outright rejects it then I think we are failing the interoperability goal. This is bad and goes against OpenTelemetry principles.

That alone I think is sufficient for me to reject the change.

@pellared pellared deleted the patch-1 branch March 15, 2024 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants