Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove invalid UTF-8 data. #117

Closed
wants to merge 2 commits into from
Closed

Remove invalid UTF-8 data. #117

wants to merge 2 commits into from

Conversation

lyda
Copy link

@lyda lyda commented May 6, 2024

Without this it panics with bad UTF-8 data which means we lose those hits.

@aldas
Copy link
Contributor

aldas commented May 6, 2024

Please provide some executable example for this problem

@aldas
Copy link
Contributor

aldas commented May 6, 2024

and are you sure this is the right place for this fix? any middleware or handler could do something with that "bad UTF8" url value.

and why statuscode needs to be replaced?
could method be bad utf8?

@lyda
Copy link
Author

lyda commented May 7, 2024

// Package main sends an invalid http request
package main

import (
  "fmt"
  "net"
)

func main() {
  conn, _ := net.Dial("tcp", "127.0.0.1:8080")
  data := []byte("GET /\xc0 HTTP/1.1\r\nHost: example.com\r\n\r\n")
  conn.Write(data)
  buf := make([]byte, 5000)
  conn.Read(buf)
  fmt.Println(string(buf[:]))
}

This will result in a panic when this middleware is used. So far I've only seen this for the path so I could change it to only include that. I've seen such requests in random probes of sites I look after. This is the only middleware that crashes that I've seen (but I don't use a lot of them).

@aldas
Copy link
Contributor

aldas commented May 7, 2024

Seems that https://github.com/prometheus/client_golang/blob/853c5de11e8a0ee0742689859f1642996517b479/prometheus/histogram.go#L1197 this method panics because strings are validated for being proper utf8 here https://github.com/prometheus/client_golang/blob/853c5de11e8a0ee0742689859f1642996517b479/prometheus/labels.go#L178

so I propose we change these lines

requestDuration.WithLabelValues(values...).Observe(elapsed)
requestCount.WithLabelValues(values...).Inc()
requestSize.WithLabelValues(values...).Observe(float64(reqSz))
responseSize.WithLabelValues(values...).Observe(float64(c.Response().Size))

to

if obs, tmpErr := requestDuration.GetMetricWithLabelValues(values...); tmpErr == nil {
	obs.Observe(elapsed)
} else {
	return fmt.Errorf("requestDuration observation failed, err: %w", tmpErr)
}

if obs, tmpErr := requestCount.GetMetricWithLabelValues(values...); tmpErr == nil {
	obs.Inc()
} else {
	return fmt.Errorf("requestCount observation failed, err: %w", tmpErr)
}

if obs, tmpErr := requestSize.GetMetricWithLabelValues(values...); tmpErr == nil {
	obs.Observe(float64(reqSz))
} else {
	return fmt.Errorf("requestSize observation failed, err: %w", tmpErr)
}

if obs, tmpErr := responseSize.GetMetricWithLabelValues(values...); tmpErr == nil {
	obs.Observe(float64(c.Response().Size))
} else {
	return fmt.Errorf("responseSize observation failed, err: %w", tmpErr)
}

This will catch and log UTF-8 without causing the middleware to panic.
@lyda
Copy link
Author

lyda commented May 8, 2024

OK, logging the error would be good. You only need to do the first one though. After that you know the values are ok.

Edit: See the latest commit for that.

@lyda
Copy link
Author

lyda commented May 13, 2024

Ping?

Copy link
Contributor

@aldas aldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add at least one test that hows this problem is fixed.

@@ -273,10 +275,17 @@ func (conf MiddlewareConfig) ToMiddleware() (echo.MiddlewareFunc, error) {
values[2] = c.Request().Host
values[3] = url
for _, cv := range customValuers {
values[cv.index] = cv.valueFunc(c, err)
values[cv.index] = strings.ToValidUTF8(cv.valueFunc(c, err), "�")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is necessary. It should be up to valueFunc creator to handle this.

@lyda
Copy link
Author

lyda commented Jun 25, 2024

In the end I made a number of changes to this to make it "safe by default" in a kubernetes context. It handles broken utf-8 in all fields and it avoids remote-driven cardinality explosions by default. I also simplified the code a bit. I have a bunch of other things to do so if you'd like some or all of those changes feel free to take them. It meets my needs now so I'll just use that. Thanks!

https://gitlab.com/lyda/echoprometheus

@lyda lyda closed this Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants