-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change map to key-value pair collection in Logs Data Model #3938
Conversation
6b07441
to
32699cb
Compare
@open-telemetry/specs-logs-approvers PTAL |
@@ -444,7 +448,7 @@ is optional. | |||
|
|||
### Field: `Attributes` | |||
|
|||
Type: [`map<string, any>`](#type-mapstring-any). | |||
Type: [`[]keyval<string, any>`](#type-keyvalstring-any). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not spent much time in the spec, but changing the data type of a field in a stabilized data format feels like a breaking change to me. The data model is more explicit than the proto that Attributes
are a map and since the model is stable I feel like we should assume implementations are depending on this fact (even though we know some languages are ignoring this fact).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is changing the data-type as much as renaming it to account for implementations that don't implement this as a map.
Any implementation that was compatible prior to this change, will still be compatible after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming it to account for implementations that don't implement this as a map.
But not implementing as a map was technically against specification right?
Any implementation that was compatible prior to this change, will still be compatible after it.
Agreed, but it allows 2 different implementation to generate incompatible data right? If 1 data source generates attributes as a map and another generates data as a list, when the data comes together, either in the collector or some backend, I anticipate problems, or at least unexpected/undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unresolving per @TylerHelmuth request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But not implementing as a map was technically against specification right?
Hard to say as the spec was not using normative language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a instrumentation-only concern and does not change how the exporters emit the data using OTLP please let me know.
It does feel like a breaking change to me to change a data type. I'd like to stick as closely as possible to semver as possible so that any implementers of our spec are not unexpectedly affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 1 data source generates attributes as a map and another generates data as a list, when the data comes together, either in the collector or some backend, I anticipate problems, or at least unexpected/undefined behavior.
The exporter should deduplicate the key-values if it is required.
Additionally, the user can always use a processor which deduplicates key-values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a instrumentation-only concern and does not change how the exporters emit the data using OTLP please let me know
Yes it is. I explicitly write in the PR description that OTLP exporters still have to send deduplicated key-values.
If the implementation allows having duplicates, then some exporters (e.g. OTLP) | ||
may require deduplication (removing pairs so that none of them have the same key). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption was that the OTLP proto was meant to reflect the Data Model and I assumed the semantics of the OTLP proto were documented in the data model. Is that not the case? Are the semantics of each completely independent?
Specially the sentence "The type representation is language-dependent" makes it seem like this document is exclusively referring to the in-memory representation of logs in the SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specially the sentence "The type representation is language-dependent" makes it seem like this document is exclusively referring to the in-memory representation of logs in the SDK.
This is a good comment.
I thought that is describing the model for the SDK. I guess I was wrong. Maybe I should change the Logs SDK specification instead? I would need guidance from specification approvers/maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the semantics of each completely independent?
For the SDK it makes sense to support other log exporters than OTLP which can have better performance when there is no de-duplication involved. I believe that there are even backends which are able to support key-values with duplicated keys (without losing the duplicates). There is a lot of software using OTel APIs and SDK for instrumenting application and NOT exporting these via OTLP. I would say that tightly coupling the model to OTLP is against OpenTelemetry "promise" (from https://opentelemetry.io/docs/what-is-opentelemetry/):
OpenTelemetry is vendor- and tool-agnostic, meaning that it can be used with a broad variety of Observability backends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I am not against the 'spirit' of this change, I am just not sure what is the best way to word it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also understand why we may want to be against this change as if SDKs would offer such functionalities then users may want to expect similar things from the Collector (from similar reasons). I have a feeling that this change would make more sense if it would be acceptable for the Collector and we would also update the comments OTLP proto: open-telemetry/opentelemetry-proto#533. At last, the value of this change may not be worth its cost. But at the same time, maybe we should design SDKs to make this change possible in future (so that the deduplication is an implementation detail and we would not need to make any public API changes to support duplicated key-values).
allowed (essentially allows to represent an equivalent of a JSON object). | ||
|
||
The type representation is language-dependent. | ||
It is implementation-specific whether the collection can contain duplicated keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a breaking change. We can't allow this. It expands the possible shape of the data that recipients should be able to deal with. This is also not possible to represent in OTLP, so I think this is a no go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm. All possibe receipents (processors, exporters)? Would it also be seen as breaking if in SDKs the OTLP exporters would handle the deduplication?
I have an idea to propose adding an option like WithoutKeyValueDeduplication
to Go Logs SDK for users who favor performance over the danger of potentially losing some log records. By default the SDK would get rid of duplicates.
EDIT: I am leaning towards closing this PR and issue if the idea above is reasonable and acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm. All possibe receipents (processors, exporters)?
This document is the logical data model and applies to anyone who creates the data and who reads the data, so yes it applies to processors, exporters, SDKs, Collector, backends.
Would it also be seen as breaking if in SDKs the OTLP exporters would handle the deduplication?
I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.
I have an idea to propose adding an option like WithoutKeyValueDeduplication to Go Logs SDK for users who favor performance over the danger of potentially losing some log records. By default the SDK would get rid of duplicates.
I think it is fair to provide non-default options like this, with a clear warning that the caller is responsible for ensuring there are no duplicates, assuming there is a need for such high performance.
Provided that such option option is possible to add later in non-breaking manner to Go SDK I would not do it right now and will only add it after we gather evidence from real-world usage that it is really needed by the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fair to provide non-default options like this, with a clear warning that the caller is responsible for ensuring there are no duplicates, assuming there is a need for such high performance.
Provided that such option option is possible to add later in non-breaking manner to Go SDK I would not do it right now and will only add it after we gather evidence from real-world usage that it is really needed by the users.
@tigrannajaryan, sounds like a good plan 👍 Do you think that the specification should be updated before we introduce such option?
CC @open-telemetry/cpp-maintainers @open-telemetry/rust-maintainers @open-telemetry/dotnet-maintainers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that the specification should be updated before we introduce such option?
Probably, no, if it is only for Go. Likely, yes, if multiple languages will use it.
Changing to draft per #3938 (comment) I plan to close this PR in a few days. |
@pellared Any reason we need to keep this open? |
FYI @open-telemetry/rust-approvers to inform our discussions in the future. |
…map as opt-in (#3987) Fixes #3931 Per agreement: #3931 (comment) > The SDKs should handle the key-value deduplication by default. It is acceptable to add an option to disable deduplication. Previous PR: #3938 > I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document. The main purpose of this PR is to have an agreement for following questions (and update the specification to to make it more clear): 1. Is the deduplication required for all log exporters or only OTLP log exporters? Answer: It is required for all exporters. 2. Can the key-value deduplication for log records be opt-in? Answer: Yes, it is OK as long as it is documented that it can cause problems in case maps duplicated keys are exported. Related to: - open-telemetry/opentelemetry-go#5086 - open-telemetry/opentelemetry-dotnet#4324
…map as opt-in (open-telemetry#3987) Fixes open-telemetry#3931 Per agreement: open-telemetry#3931 (comment) > The SDKs should handle the key-value deduplication by default. It is acceptable to add an option to disable deduplication. Previous PR: open-telemetry#3938 > I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document. The main purpose of this PR is to have an agreement for following questions (and update the specification to to make it more clear): 1. Is the deduplication required for all log exporters or only OTLP log exporters? Answer: It is required for all exporters. 2. Can the key-value deduplication for log records be opt-in? Answer: Yes, it is OK as long as it is documented that it can cause problems in case maps duplicated keys are exported. Related to: - open-telemetry/opentelemetry-go#5086 - open-telemetry/opentelemetry-dotnet#4324
Why
Fixes #3931
Performance
Checking for duplicates introduces performance overhead which can be undesirable for systems where logging should be high-performant. Changing the data model definition would allow postponing the deduplication to the exporters which require it (e.g. OTLP). Other exporters (e.g. console exporter) can use the key-values as provided.
Example benchmarks
Better bridging of existing logging libraries
Some logging libraries allow duplicate keys. The change is needed for faithfully bridging existing logging systems.
For instance
slog
(the structured logging package from Go the standard library) allows it. See: https://go.dev/play/p/fzIswL0Le7j.Consolidation
This is how C++, Rust, .NET already define
map<string, any>
. This is also how Go SIG wants to define it. The main reason is performance to avoid deduplication until necessary.What
Change
map<string, any>
to[]keyval<string, any>
to allow having duplicated keys.The current definition of
map<string, any>
is not using a normative language that would require to implementation to handle key-uniqueness. Therefore, I do not see this change as breaking.Remarks
Even know the backends accepting OTLP have to somehow handle duplicated key-values using OTLP as it is possible to send such data.
OTLP exporters still have to be able to send deduplicated key-values. There could be an option to deduplicate the key-values using a processor.