Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop keyvals with duplicated keys #10

Closed
wants to merge 1 commit into from
Closed

Conversation

pellared
Copy link
Owner

@pellared pellared commented Mar 13, 2024

Why

For demonstrating deduplication cost for sake of open-telemetry/opentelemetry-specification#3938

What

This PR handles deduplication of key-values containing duplicated keys.

The benchmarks use only with simple attributes (strings, ints, floats, bools). There are no duplicated keys. We can say that this represents the most common scenarios. In these benchmarks deduplication costs ~25% of CPU performance. The costs difference would be bigger if complex values would be involved.

There is also a memory overhead which is not visible as we are using a pool of sets (maps) for detecting duplicated keys.

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/log
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                    │   old.txt    │               new.txt                │
                    │    sec/op    │    sec/op     vs base                │
/no_attrs/Simple-16   428.9n ±  4%   425.5n ±  5%        ~ (p=0.912 n=10)
/no_attrs/Batch-16    388.3n ±  3%   419.6n ±  5%   +8.06% (p=0.000 n=10)
/3_attrs/Simple-16    598.4n ±  2%   759.5n ±  6%  +26.94% (p=0.000 n=10)
/3_attrs/Batch-16     585.1n ±  6%   721.4n ±  4%  +23.30% (p=0.000 n=10)
/5_attrs/Simple-16    698.7n ±  3%   936.7n ±  5%  +34.06% (p=0.000 n=10)
/5_attrs/Batch-16     711.3n ±  8%   963.3n ±  6%  +35.43% (p=0.000 n=10)
/10_attrs/Simple-16   2.188µ ± 15%   2.587µ ± 17%  +18.26% (p=0.000 n=10)
/10_attrs/Batch-16    2.255µ ±  6%   2.950µ ± 11%  +30.80% (p=0.000 n=10)
/40_attrs/Simple-16   5.963µ ± 15%   7.921µ ±  6%  +32.85% (p=0.000 n=10)
/40_attrs/Batch-16    7.165µ ±  3%   9.473µ ±  5%  +32.21% (p=0.000 n=10)
geomean               1.198µ         1.480µ        +23.53%

                    │    old.txt     │                new.txt                │
                    │      B/op      │     B/op      vs base                 │
/no_attrs/Simple-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/no_attrs/Batch-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Simple-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Batch-16       0.000 ± 0%       1.000 ± 0%       ? (p=0.000 n=10)
/5_attrs/Simple-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/5_attrs/Batch-16       1.000 ± 0%       1.000 ± 0%       ~ (p=1.000 n=10) ¹
/10_attrs/Simple-16   1.392Ki ± 0%     1.396Ki ± 0%  +0.28% (p=0.000 n=10)
/10_attrs/Batch-16    1.394Ki ± 0%     1.399Ki ± 0%  +0.42% (p=0.000 n=10)
/40_attrs/Simple-16   6.692Ki ± 0%     6.762Ki ± 0%  +1.04% (p=0.000 n=10)
/40_attrs/Batch-16    6.698Ki ± 0%     6.773Ki ± 0%  +1.12% (p=0.000 n=10)
geomean                            ²                 ?                     ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │               new.txt               │
                    │  allocs/op   │ allocs/op   vs base                 │
/no_attrs/Simple-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/no_attrs/Batch-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Simple-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Batch-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/5_attrs/Simple-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/5_attrs/Batch-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/10_attrs/Simple-16   9.000 ± 0%     9.000 ± 0%       ~ (p=1.000 n=10) ¹
/10_attrs/Batch-16    9.000 ± 0%     9.000 ± 0%       ~ (p=1.000 n=10) ¹
/40_attrs/Simple-16   13.00 ± 0%     13.00 ± 0%       ~ (p=1.000 n=10) ¹
/40_attrs/Batch-16    13.00 ± 0%     13.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                          ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Comment on lines +94 to +112
keysPtr := keysPool.Get().(*map[string]struct{})
defer func() {
clear(*keysPtr)
keysPool.Put(keysPtr)
}()
keys := *keysPtr
m := value.AsMap()
for _, kv := range m {
if _, ok := keys[kv.Key]; ok {
return true
}
keys[kv.Key] = struct{}{}

if hasDuplicatedKey(kv.Value) {
return true
}
}

return false
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not even covered by the benchmark 😉

@tigrannajaryan
Copy link

@pellared I am curious how does this compare with "sort/dedup" approach that can be done in-place and requires no maps and no allocations. Did you by any chance try that?

@pellared
Copy link
Owner Author

@pellared I am curious how does this compare with "sort/dedup" approach that can be done in-place and requires no maps and no allocations. Did you by any chance try that?

log.Record and Record are different structs so we need to copy the data anyway. This is the code that converts API representation LogRecord (log.Record) to SDK representation of LogRecord (Record).

Also sorting itself would make it O(n*log(n)). This is looks O(n).

@tigrannajaryan
Copy link

I meant sort/dedup after all attributes are collected and it is time to covert/export them. I am aware of the complexity of sort, but still curious of the comparison in Go code. We don't know what the constants of both approaches are, it may turn out to be cheaper to sort for the sizes of inputs we care about.

@pellared pellared changed the title [Demo] Drop keyvals with duplicated keys Drop keyvals with duplicated keys Mar 19, 2024
@pellared pellared closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants