-
Notifications
You must be signed in to change notification settings - Fork 891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change map to key-value pair collection in Logs Data Model #3938
Changes from all commits
32699cb
5c3cce5
797ceff
8e1f937
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,7 @@ | |
* [Requirements](#requirements) | ||
* [Definitions Used in this Document](#definitions-used-in-this-document) | ||
+ [Type `any`](#type-any) | ||
+ [Type `map`](#type-mapstring-any) | ||
+ [Type `[]keyval`](#type-keyvalstring-any) | ||
* [Field Kinds](#field-kinds) | ||
- [Log and Event Record Definition](#log-and-event-record-definition) | ||
* [Field: `Timestamp`](#field-timestamp) | ||
|
@@ -99,7 +99,7 @@ The Data Model aims to successfully represent 3 sorts of logs and events: | |
|
||
### Definitions Used in this Document | ||
|
||
In this document we refer to types `any` and `map<string, any>`, defined as | ||
In this document we refer to types `any` and `[]keyval<string, any>`, defined as | ||
follows. | ||
|
||
#### Type `any` | ||
|
@@ -112,18 +112,22 @@ Value of type `any` can be one of the following: | |
|
||
- An array (a list) of `any` values, | ||
|
||
- A `map<string, any>`, | ||
- A `[]keyval<string, any>`, | ||
|
||
- [since 1.31.0] An empty value (e.g. `null`). | ||
|
||
#### Type `map<string, any>` | ||
#### Type `[]keyval<string, any>` | ||
|
||
Value of type `map<string, any>` is a map of string keys to `any` values. The | ||
keys in the map are unique (duplicate keys are not allowed). The representation | ||
of the map is language-dependent. | ||
Value of type `[]keyval<string, any>` is a collection of key-value pairs with | ||
string keys and [`any`](#type-any) values. | ||
|
||
Arbitrary deep nesting of values for arrays and maps is allowed (essentially | ||
allows to represent an equivalent of a JSON object). | ||
Arbitrary deep nesting of values for arrays and key-value collections MUST be | ||
allowed (essentially allows to represent an equivalent of a JSON object). | ||
|
||
The type representation is language-dependent. | ||
It is implementation-specific whether the collection can contain duplicated keys. | ||
If the implementation allows having duplicates, then some exporters (e.g. OTLP) | ||
may require deduplication (removing pairs so that none of them have the same key). | ||
Comment on lines
+129
to
+130
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My assumption was that the OTLP proto was meant to reflect the Data Model and I assumed the semantics of the OTLP proto were documented in the data model. Is that not the case? Are the semantics of each completely independent? Specially the sentence "The type representation is language-dependent" makes it seem like this document is exclusively referring to the in-memory representation of logs in the SDK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a good comment. I thought that is describing the model for the SDK. I guess I was wrong. Maybe I should change the Logs SDK specification instead? I would need guidance from specification approvers/maintainers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For the SDK it makes sense to support other log exporters than OTLP which can have better performance when there is no de-duplication involved. I believe that there are even backends which are able to support key-values with duplicated keys (without losing the duplicates). There is a lot of software using OTel APIs and SDK for instrumenting application and NOT exporting these via OTLP. I would say that tightly coupling the model to OTLP is against OpenTelemetry "promise" (from https://opentelemetry.io/docs/what-is-opentelemetry/):
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear: I am not against the 'spirit' of this change, I am just not sure what is the best way to word it :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also understand why we may want to be against this change as if SDKs would offer such functionalities then users may want to expect similar things from the Collector (from similar reasons). I have a feeling that this change would make more sense if it would be acceptable for the Collector and we would also update the comments OTLP proto: open-telemetry/opentelemetry-proto#533. At last, the value of this change may not be worth its cost. But at the same time, maybe we should design SDKs to make this change possible in future (so that the deduplication is an implementation detail and we would not need to make any public API changes to support duplicated key-values). |
||
|
||
### Field Kinds | ||
|
||
|
@@ -133,7 +137,7 @@ fields: | |
|
||
- Named top-level fields of specific type and meaning. | ||
|
||
- Fields stored as `map<string, any>`, which can contain arbitrary values of | ||
- Fields stored as `[]keyval<string, any>`, which can contain arbitrary values of | ||
different types. The keys and values for well-known fields follow semantic | ||
conventions for key names and possible values that allow all parties that work | ||
with the field to have the same interpretation of the data. See references to | ||
|
@@ -149,7 +153,7 @@ The reasons for having these 2 kinds of fields are: | |
- Ability to enforce types of named fields, which is very useful for compiled | ||
languages with type checks. | ||
|
||
- Flexibility to represent less frequent data as `map<string, any>`. This | ||
- Flexibility to represent less frequent data as `[]keyval<string, any>`. This | ||
includes well-known data that has standardized semantics as well as arbitrary | ||
custom data that the application may want to include in the logs. | ||
|
||
|
@@ -444,7 +448,7 @@ is optional. | |
|
||
### Field: `Attributes` | ||
|
||
Type: [`map<string, any>`](#type-mapstring-any). | ||
Type: [`[]keyval<string, any>`](#type-keyvalstring-any). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've not spent much time in the spec, but changing the data type of a field in a stabilized data format feels like a breaking change to me. The data model is more explicit than the proto that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this is changing the data-type as much as renaming it to account for implementations that don't implement this as a map. Any implementation that was compatible prior to this change, will still be compatible after it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But not implementing as a map was technically against specification right?
Agreed, but it allows 2 different implementation to generate incompatible data right? If 1 data source generates attributes as a map and another generates data as a list, when the data comes together, either in the collector or some backend, I anticipate problems, or at least unexpected/undefined behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unresolving per @TylerHelmuth request There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hard to say as the spec was not using normative language. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is a instrumentation-only concern and does not change how the exporters emit the data using OTLP please let me know. It does feel like a breaking change to me to change a data type. I'd like to stick as closely as possible to semver as possible so that any implementers of our spec are not unexpectedly affected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The exporter should deduplicate the key-values if it is required. Additionally, the user can always use a processor which deduplicates key-values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes it is. I explicitly write in the PR description that OTLP exporters still have to send deduplicated key-values. |
||
|
||
Description: Additional information about the specific event occurrence. Unlike | ||
the `Resource` field, which is fixed for a particular source, `Attributes` can | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a breaking change. We can't allow this. It expands the possible shape of the data that recipients should be able to deal with. This is also not possible to represent in OTLP, so I think this is a no go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm. All possibe receipents (processors, exporters)? Would it also be seen as breaking if in SDKs the OTLP exporters would handle the deduplication?
I have an idea to propose adding an option like
WithoutKeyValueDeduplication
to Go Logs SDK for users who favor performance over the danger of potentially losing some log records. By default the SDK would get rid of duplicates.EDIT: I am leaning towards closing this PR and issue if the idea above is reasonable and acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is the logical data model and applies to anyone who creates the data and who reads the data, so yes it applies to processors, exporters, SDKs, Collector, backends.
I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.
I think it is fair to provide non-default options like this, with a clear warning that the caller is responsible for ensuring there are no duplicates, assuming there is a need for such high performance.
Provided that such option option is possible to add later in non-breaking manner to Go SDK I would not do it right now and will only add it after we gather evidence from real-world usage that it is really needed by the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan, sounds like a good plan 👍 Do you think that the specification should be updated before we introduce such option?
CC @open-telemetry/cpp-maintainers @open-telemetry/rust-maintainers @open-telemetry/dotnet-maintainers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, no, if it is only for Go. Likely, yes, if multiple languages will use it.