Drop keyvals with duplicated keys #10

pellared · 2024-03-13T14:22:47Z

Why

For demonstrating deduplication cost for sake of open-telemetry/opentelemetry-specification#3938

What

This PR handles deduplication of key-values containing duplicated keys.

The benchmarks use only with simple attributes (strings, ints, floats, bools). There are no duplicated keys. We can say that this represents the most common scenarios. In these benchmarks deduplication costs ~25% of CPU performance. The costs difference would be bigger if complex values would be involved.

There is also a memory overhead which is not visible as we are using a pool of sets (maps) for detecting duplicated keys.

goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/log
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                    │   old.txt    │               new.txt                │
                    │    sec/op    │    sec/op     vs base                │
/no_attrs/Simple-16   428.9n ±  4%   425.5n ±  5%        ~ (p=0.912 n=10)
/no_attrs/Batch-16    388.3n ±  3%   419.6n ±  5%   +8.06% (p=0.000 n=10)
/3_attrs/Simple-16    598.4n ±  2%   759.5n ±  6%  +26.94% (p=0.000 n=10)
/3_attrs/Batch-16     585.1n ±  6%   721.4n ±  4%  +23.30% (p=0.000 n=10)
/5_attrs/Simple-16    698.7n ±  3%   936.7n ±  5%  +34.06% (p=0.000 n=10)
/5_attrs/Batch-16     711.3n ±  8%   963.3n ±  6%  +35.43% (p=0.000 n=10)
/10_attrs/Simple-16   2.188µ ± 15%   2.587µ ± 17%  +18.26% (p=0.000 n=10)
/10_attrs/Batch-16    2.255µ ±  6%   2.950µ ± 11%  +30.80% (p=0.000 n=10)
/40_attrs/Simple-16   5.963µ ± 15%   7.921µ ±  6%  +32.85% (p=0.000 n=10)
/40_attrs/Batch-16    7.165µ ±  3%   9.473µ ±  5%  +32.21% (p=0.000 n=10)
geomean               1.198µ         1.480µ        +23.53%

                    │    old.txt     │                new.txt                │
                    │      B/op      │     B/op      vs base                 │
/no_attrs/Simple-16     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/no_attrs/Batch-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Simple-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Batch-16       0.000 ± 0%       1.000 ± 0%       ? (p=0.000 n=10)
/5_attrs/Simple-16      0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
/5_attrs/Batch-16       1.000 ± 0%       1.000 ± 0%       ~ (p=1.000 n=10) ¹
/10_attrs/Simple-16   1.392Ki ± 0%     1.396Ki ± 0%  +0.28% (p=0.000 n=10)
/10_attrs/Batch-16    1.394Ki ± 0%     1.399Ki ± 0%  +0.42% (p=0.000 n=10)
/40_attrs/Simple-16   6.692Ki ± 0%     6.762Ki ± 0%  +1.04% (p=0.000 n=10)
/40_attrs/Batch-16    6.698Ki ± 0%     6.773Ki ± 0%  +1.12% (p=0.000 n=10)
geomean                            ²                 ?                     ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │               new.txt               │
                    │  allocs/op   │ allocs/op   vs base                 │
/no_attrs/Simple-16   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/no_attrs/Batch-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Simple-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/3_attrs/Batch-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/5_attrs/Simple-16    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/5_attrs/Batch-16     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
/10_attrs/Simple-16   9.000 ± 0%     9.000 ± 0%       ~ (p=1.000 n=10) ¹
/10_attrs/Batch-16    9.000 ± 0%     9.000 ± 0%       ~ (p=1.000 n=10) ¹
/40_attrs/Simple-16   13.00 ± 0%     13.00 ± 0%       ~ (p=1.000 n=10) ¹
/40_attrs/Batch-16    13.00 ± 0%     13.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                          ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pellared · 2024-03-13T14:40:03Z

sdk/log/logger.go

+	keysPtr := keysPool.Get().(*map[string]struct{})
+	defer func() {
+		clear(*keysPtr)
+		keysPool.Put(keysPtr)
+	}()
+	keys := *keysPtr
+	m := value.AsMap()
+	for _, kv := range m {
+		if _, ok := keys[kv.Key]; ok {
+			return true
+		}
+		keys[kv.Key] = struct{}{}
+
+		if hasDuplicatedKey(kv.Value) {
+			return true
+		}
+	}
+
+	return false


This code is not even covered by the benchmark 😉

tigrannajaryan · 2024-03-15T15:28:19Z

@pellared I am curious how does this compare with "sort/dedup" approach that can be done in-place and requires no maps and no allocations. Did you by any chance try that?

pellared · 2024-03-15T19:34:56Z

@pellared I am curious how does this compare with "sort/dedup" approach that can be done in-place and requires no maps and no allocations. Did you by any chance try that?

log.Record and Record are different structs so we need to copy the data anyway. This is the code that converts API representation LogRecord (log.Record) to SDK representation of LogRecord (Record).

Also sorting itself would make it O(n*log(n)). This is looks O(n).

tigrannajaryan · 2024-03-15T19:49:10Z

I meant sort/dedup after all attributes are collected and it is time to covert/export them. I am aware of the complexity of sort, but still curious of the comparison in Go code. We don't know what the constants of both approaches are, it may turn out to be cheaper to sort for the sizes of inputs we care about.

Drop keyvals with duplicated keys

6409406

pellared mentioned this pull request Mar 13, 2024

Change map to key-value pair collection in Logs Data Model open-telemetry/opentelemetry-specification#3938

Closed

pellared commented Mar 13, 2024

View reviewed changes

pellared mentioned this pull request Mar 15, 2024

KeyValueList and log attributes with duplicate keys as undefined behavior open-telemetry/opentelemetry-proto#533

Closed

pellared changed the title ~~[Demo] Drop keyvals with duplicated keys~~ Drop keyvals with duplicated keys Mar 19, 2024

pellared mentioned this pull request Mar 19, 2024

sdk/log: Drop duplicated KeyValues open-telemetry/opentelemetry-go#5086

Closed

pellared closed this May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop keyvals with duplicated keys #10

Drop keyvals with duplicated keys #10

pellared commented Mar 13, 2024 •

edited

Loading

pellared Mar 13, 2024

tigrannajaryan commented Mar 15, 2024

pellared commented Mar 15, 2024

tigrannajaryan commented Mar 15, 2024

Drop keyvals with duplicated keys #10

Drop keyvals with duplicated keys #10

Conversation

pellared commented Mar 13, 2024 • edited Loading

Why

What

pellared Mar 13, 2024

Choose a reason for hiding this comment

tigrannajaryan commented Mar 15, 2024

pellared commented Mar 15, 2024

tigrannajaryan commented Mar 15, 2024

pellared commented Mar 13, 2024 •

edited

Loading