Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opentelemetry-sdk: speed up exemplars a bit #4260

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xrmx
Copy link
Contributor

@xrmx xrmx commented Nov 8, 2024

Description

Use a sparse dict to allocate ExemplarsBucket on demand instead of preallocating all of them in FixedSizeExemplarReservoirABC.

Make the following return around 2X more loops for both trace_based and always_off exemplars filter:

.tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

I get 8% less rounds than with a8aacb0c6f2f06bf19b501d98e62f7c0e667fa4c that is the commit before the introductions of exemplars. Without this we are doing 60% less rounds.

Refs #4243

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • .tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Use a sparse dict to allocate ExemplarsBucket on demand instead of
preallocating all of them in FixedSizeExemplarReservoirABC.

Make the following return around 2X more loops for both trace_based and
always_off exemplars filter:

.tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'
@xrmx xrmx requested a review from a team as a code owner November 8, 2024 16:27
@xrmx xrmx added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Nov 8, 2024
@aabmass
Copy link
Member

aabmass commented Nov 8, 2024

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Copy link
Contributor

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xrmx
This looks good to me.

@fcollonval
Copy link
Contributor

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Without any proof, I would say accessing a list item by index is faster than a dict value by key. So if there are lots of buckets this may be less optimized; but the number of buckets should never be very high.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 11, 2024

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Without any proof, I would say accessing a list item by index is faster than a dict value by key. So if there are lots of buckets this may be less optimized; but the number of buckets should never be very high.

With small data sets a list may be faster because it's simpler than the dict but with more items the O(n) list access time would lose against dict O(1) access time.

@aabmass
Copy link
Member

aabmass commented Nov 11, 2024

but the number of buckets should never be very high.

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add. Maybe with the default buckets and one for an ExponentialHistogram, which defaults to maximum of 160 buckets

@fcollonval
Copy link
Contributor

One important note about the exponential histogram case is that the number of buckets for the exemplar has a lower upper bound for buckets than the histogram:

Base2 Exponential Histogram Aggregation SHOULD use a SimpleFixedSizeExemplarReservoir with a reservoir equal to the smaller of the maximum number of buckets configured on the aggregation or twenty (e.g. min(20, max_buckets)).
Xref: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplar-defaults

@fcollonval
Copy link
Contributor

the O(n) list access time would lose against dict O(1) access time.

Is it that complexity for search rather than for accessing? If you know the index like in this case, I doubt the complexity of the list is higher.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 12, 2024

the O(n) list access time would lose against dict O(1) access time.

Is it that complexity for search rather than for accessing? If you know the index like in this case, I doubt the complexity of the list is higher.

Got nerdsniped in this and yes the getting a list item looks O(1) too:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (!valid_index(i, Py_SIZE(op))) {
        _Py_DECLARE_STR(list_err, "list index out of range");
        PyErr_SetObject(PyExc_IndexError, &_Py_STR(list_err));
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

So I guess the difference in performance comes just from the lazyness of the ExemplarBucket() instantiations.

@aabmass
Copy link
Member

aabmass commented Nov 14, 2024

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add.

@xrmx would you be open to adding a benchmark for this to make sure the dict doesn't make this much slower? That seems like the more common case to optimize for vs churning attributes.

@xrmx
Copy link
Contributor Author

xrmx commented Nov 18, 2024

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add.

@xrmx would you be open to adding a benchmark for this to make sure the dict doesn't make this much slower? That seems like the more common case to optimize for vs churning attributes.

Need to sort out how to write such benchmark 😅

@xrmx
Copy link
Contributor Author

xrmx commented Nov 26, 2024

@aabmass is something like this what you had in mind?

@pytest.mark.parametrize("num_labels", [0, 1, 3, 5, 7])
def test_histogram_record_1000(benchmark, num_labels):
    values = itertools.cycle(range(1, 10))
    def benchmark_histogram_record_1000():
        hist1000.record(next(values))

    benchmark(benchmark_histogram_record_1000)

Before

---------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------
Name (time in us)                    Min                Max              Mean            StdDev            Median               IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_histogram_record_1000[0]     2.9318 (1.15)     13.4856 (1.32)     3.1635 (1.11)     0.3693 (2.12)     3.1190 (1.11)     0.1350 (1.20)         22;69      316.1018 (0.90)       1022           1
test_histogram_record_1000[1]     2.5900 (1.01)     14.5938 (1.43)     2.8457 (1.00)     0.1850 (1.06)     2.8145 (1.00)     0.1146 (1.02)     3821;2833      351.4110 (1.00)      52373           1
test_histogram_record_1000[3]     2.5583 (1.00)     13.5032 (1.33)     2.8566 (1.01)     0.1873 (1.07)     2.8266 (1.01)     0.1201 (1.07)     3910;2536      350.0655 (0.99)      52902           1
test_histogram_record_1000[5]     2.5555 (1.0)      14.0648 (1.38)     2.8462 (1.00)     0.1965 (1.13)     2.8135 (1.00)     0.1127 (1.0)      3217;2935      351.3447 (1.00)      53338           1
test_histogram_record_1000[7]     2.5723 (1.01)     10.1784 (1.0)      2.8408 (1.0)      0.1742 (1.0)      2.8098 (1.0)      0.1136 (1.01)     4774;3252      352.0196 (1.0)       55710           1

After

---------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------
Name (time in us)                    Min                Max              Mean            StdDev            Median               IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_histogram_record_1000[0]     2.6487 (1.02)     13.6066 (1.0)      3.0223 (1.06)     0.3527 (1.82)     2.9970 (1.06)     0.2645 (2.56)        100;41      330.8757 (0.95)       2367           1
test_histogram_record_1000[1]     2.6105 (1.00)     14.9934 (1.10)     2.8952 (1.01)     0.2371 (1.22)     2.8554 (1.01)     0.1248 (1.21)     3357;3879      345.4039 (0.99)      65935           1
test_histogram_record_1000[3]     2.6114 (1.00)     17.8963 (1.32)     2.8626 (1.0)      0.1938 (1.0)      2.8303 (1.0)      0.1034 (1.0)      3534;3742      349.3333 (1.0)       53803           1
test_histogram_record_1000[5]     2.6058 (1.00)     24.4612 (1.80)     2.8805 (1.01)     0.2325 (1.20)     2.8433 (1.00)     0.1183 (1.14)     3105;3773      347.1662 (0.99)      55704           1
test_histogram_record_1000[7]     2.5993 (1.0)      18.7624 (1.38)     2.9343 (1.03)     0.2049 (1.06)     2.8927 (1.02)     0.1760 (1.70)     5919;1259      340.7929 (0.98)      59382           1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants