Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastapi: fix wrapping of middlewares #3012

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Nov 15, 2024

MRE:

import anyio
import fastapi
import httpx
import uvicorn
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter

resource = Resource(attributes={SERVICE_NAME: '🐛'})

traceProvider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter())
traceProvider.add_span_processor(processor)
trace.set_tracer_provider(traceProvider)

app = fastapi.FastAPI()

@app.get('/ok')
async def ok() -> int:
    return 123

@app.get('/error')
async def error() -> int:
    raise Exception('An error occurred!')

FastAPIInstrumentor.instrument_app(app)


async def main() -> None:
    config = uvicorn.Config(app=app, port=9999, log_config=None)
    server = uvicorn.Server(config)

    async with anyio.create_task_group() as tg:
        tg.start_soon(server.serve)
        await anyio.sleep(1)  # Wait for the server to start
        async with httpx.AsyncClient(base_url='http://localhost:9999') as client:
            response = await client.get('/ok')
            assert response.status_code == 200, f'{response.status_code}: {response.text}'
            response = await client.get('/error')
            assert response.status_code == 500,  f'{response.status_code}: {response.text}'
            tg.cancel_scope.cancel()


if __name__ == '__main__':
    anyio.run(main)

With this change it shows up properly

Run this and you'll see that there is no "http.status_code": 500 in the logs.

@adriangb adriangb requested a review from a team as a code owner November 15, 2024 15:35
Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice comment and test!

@xrmx
Copy link
Contributor

xrmx commented Nov 19, 2024

@adriangb Please take a look at the failing tests

# are handled.
# This should not happen unless there is a bug in OpenTelemetryMiddleware, but if there is we don't want that
# to impact the user's application just because we wrapped the middlewares in this order.
stack = ServerErrorMiddleware(stack)
Copy link
Contributor

@lzchen lzchen Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, is the purpose of this pr to handle exceptions generated from the OpenTelemetryMiddleware itself (if there is a bug) or unhandled exceptions thrown from the request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! If there is a bug in OpenTelemetryMiddleware without this double wrapping then the server might not send a 500 to the client and instead abruptly disconnect. In practice I believe every ASGI server I know of does also have it's own handling of unhandled exceptions such that this would not be the case, but I'm not sure we want to rely on that + there would still be a noticeable change (different response content at least, maybe different headers, etc.)

@Kludex
Copy link
Contributor

Kludex commented Nov 20, 2024

fixes #795

@@ -170,7 +170,8 @@ def setUp(self):
self._instrumentor = otel_fastapi.FastAPIInstrumentor()
self._app = self._create_app()
self._app.add_middleware(HTTPSRedirectMiddleware)
self._client = TestClient(self._app)
self._client = TestClient(self._app, base_url="https://testserver:443")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the root cause of tests failures. It turns out every request was being made twice because it was getting redirected to https. Before this PR that wasn't being instrumented correctly, so this was not being caught! I think that's another major bug this PR is fixing.

@adriangb adriangb requested a review from lzchen November 27, 2024 00:35
@adriangb
Copy link
Contributor Author

adriangb commented Nov 27, 2024

@adriangb Please take a look at the failing tests

@xrmx see #3012 (comment)

@adriangb
Copy link
Contributor Author

@Kludex could you update the Starlette instrumentations after we finish up this PR? I'm assuming they have the same bugs.

@Kludex
Copy link
Contributor

Kludex commented Nov 27, 2024

The pipeline has been running for some hours... 👀

app = ServerErrorMiddleware(app)
return app

app._original_build_middleware_stack = app.build_middleware_stack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do you think it would be possible to use wrapt for monkeypatching instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not. This is simpler and works just fine.

Co-authored-by: Riccardo Magliocchetti <[email protected]>
@vanyae-cqc
Copy link

vanyae-cqc commented Nov 27, 2024

Hello! Just wondering if you'd expect this to enable getting the trace_id in a FastAPI exception_handler?

This has been discussed in more detail here: open-telemetry/opentelemetry-python#3477, but currently OpenTelemetryMiddleware's context seems to be removed by the time an exception gets to the ServerErrorMiddleware.

Would be great to enable users to report back a trace_id when they get an unexpected 500 status response. Have tried to verify this PR provides this by running it locally but haven't had any luck - could be something wrong in my setup though.

[Update]: Have now verified this achieves the above behaviour - this fix is greatly appreciated!

@adriangb
Copy link
Contributor Author

Fixed issue with test shutdown hanging. All checks passing now.

@adriangb adriangb requested a review from xrmx November 27, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants