-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop keyvals with duplicated keys #10
Conversation
keysPtr := keysPool.Get().(*map[string]struct{}) | ||
defer func() { | ||
clear(*keysPtr) | ||
keysPool.Put(keysPtr) | ||
}() | ||
keys := *keysPtr | ||
m := value.AsMap() | ||
for _, kv := range m { | ||
if _, ok := keys[kv.Key]; ok { | ||
return true | ||
} | ||
keys[kv.Key] = struct{}{} | ||
|
||
if hasDuplicatedKey(kv.Value) { | ||
return true | ||
} | ||
} | ||
|
||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not even covered by the benchmark 😉
@pellared I am curious how does this compare with "sort/dedup" approach that can be done in-place and requires no maps and no allocations. Did you by any chance try that? |
Also sorting itself would make it |
I meant sort/dedup after all attributes are collected and it is time to covert/export them. I am aware of the complexity of sort, but still curious of the comparison in Go code. We don't know what the constants of both approaches are, it may turn out to be cheaper to sort for the sizes of inputs we care about. |
Why
For demonstrating deduplication cost for sake of open-telemetry/opentelemetry-specification#3938
What
This PR handles deduplication of key-values containing duplicated keys.
The benchmarks use only with simple attributes (strings, ints, floats, bools). There are no duplicated keys. We can say that this represents the most common scenarios. In these benchmarks deduplication costs ~25% of CPU performance. The costs difference would be bigger if complex values would be involved.
There is also a memory overhead which is not visible as we are using a pool of sets (maps) for detecting duplicated keys.