Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Improves Error Messaging in Responses #283

Closed
wants to merge 6 commits into from

Conversation

nkpng2k
Copy link
Contributor

@nkpng2k nkpng2k commented Jan 7, 2022

Actually propagate some level of error message to the response.

Initially you just get a 400 - Bad Request error and no reason for the failure. This fixes 2 problems:

  1. with introduction of shapley scoring. Shapley scoring in /model/score endpoint was potentially failing silently
  2. generally this helps reduce the requirement of the users to go to the logs (which could be running in some infrastructure they don't have access to), so they can potentially fix typos and user errors more easily.

@nkpng2k nkpng2k requested a review from orendain January 7, 2022 00:46
Copy link
Member

@orendain orendain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is so close to fulfilling #165, does it make sense to move it in that direction?

Shapley scoring in /model/score endpoint was potentially failing silently

Nice - fixing this would be awesome!

Comment on lines 3 to 6
public class ShapleyScoreException extends Exception {
public ShapleyScoreException(String message) {
super(message);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a future purpose for this custom exception type? Looks like the exception travels just a single frame (e.g., MojoScorer up to ModelsApiController), and is unpacked immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think the intent originally was to embed the error without throwing the error (400) so whoops on my part. @orendain wdyt about that though? If shapley scoring fails for /model/score should we return the score and a message saying shapley failed? or should we just fail outright?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If shapley scoring fails for /model/score should we return the score and a message saying shapley failed? or should we just fail outright?

Hmmm, lots of pros/cons floating around in my head ... For me, it helps me to think in terms of requirements and best practices:

  • Generally, there's no guarantee that scoring and contributions use the same backend (in fact, this scorer has separate pipelines for each). It's possible for 100% contribution failure while scoring succeeds 100% of the time. Defending scoring endpoint against contribution failure makes sense. (+1 for not failing b/c contributions)
  • If we apply the single responsibility principle loosely, /model/score should only be focused on scoring, and one could argue that contributions are at most a bonus (and not required as they have their own endpoint). (+1 for not failing b/c contributions)
  • If you do return messages: If contribution fails (say, consistently), adding a message w/o it being expected might mean contribution concerns hijack scoring packet size. Especially since error messages could easily make up 50%+ of response packet for single-row scores. (+1 to user explicitly requesting messages)
  • If a user explicitly asks for contributions as part of /model/score, and you decide to return a message instead of failing outright, requiring users to "opt-in" to warning messages might actually be a form of failing silently. That is, unless the user explicitly asked for messages, they won't get any, even when they're explicitly asked for contributions to be returned and would expect to be notified of failure. (+1 to interpreting users asking for contributions as opt-in for warning-messages).
  • If you do return messages: Modifying API spec to add message-related fields will change expectation of other scorers. Consider that other scorers updating to the latest API spec means they should implement any new features, unless you mark them as implementation-optional in the API.

Just some thoughts in case it helps to think about the problem - no hard opinion floating up there other than there should definitely be some mechanism by which users are made aware of issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm so this is a tricky one....

  1. I like the idea of not failing /model/score because of contributions failing.
  2. I like the idea of users opting into error messages but in reality the only place where messages would be "opt-in" would be for /model/score where shapley is requested.
  3. I would argue that technically this feature is already in the python scorer api, and that due to limitations in swagger this was never implemented.

So there maybe is a better way to do this by implementing a proper exception handler for the controller. I'm going to take another shot at this later. It just really bugs me that we literally have no feedback to the user on errors.

@@ -133,6 +133,7 @@ public ScoreResponse score(ScoreRequest request) {
} catch (Exception e) {
log.info("Failed shapley contribution due to: {}", e.getMessage());
log.debug(" - failure cause: ", e);
throw new ShapleyScoreException(e.getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's (generally) good practice to keep the stack of original exceptions when throwing. Most exception classes do this by accepting another exception as part of a custom exception's constructor.

Or, if you end up modifying/removing/etc. the custom exception type, you can just wrap the original exception with whichever RuntimeException subtype makes sense.

@@ -133,6 +133,7 @@ public ScoreResponse score(ScoreRequest request) {
} catch (Exception e) {
log.info("Failed shapley contribution due to: {}", e.getMessage());
log.debug(" - failure cause: ", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: generally, suggestion is to either log or throw, but not both. eventually, whatever part of the code catches all bubbled up exceptions would do the logging. And with all nested exceptions simply tacking on their own messages (e.g., throw new Illegal....Exception("Failed to ...", e)), it all gets wrapped up into a single object that can be logged by one catcher.

Else, error/warning/etc. messages are scattered across logs (often separated by regular, healthy, logs) instead of in a single, trackable, block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that can be moved, I tried to minimize my impact to the logs in general as much as possible. The goal was to actually give the response something useful that can be used as opposed to always sending 400 and nothing else.

@nkpng2k nkpng2k force-pushed the npng/master/propagate-errors-to-responses branch from e4cd94d to a7fea37 Compare January 11, 2022 23:56
@nkpng2k nkpng2k requested a review from orendain January 12, 2022 03:26
@nkpng2k
Copy link
Contributor Author

nkpng2k commented Jan 12, 2022

@orendain PTAL, I made a change which I like a bit more. Does not require any changes to api and actually allows the user to receive some message from errors.

Copy link
Member

@orendain orendain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand PR's new changes/intents, is the following correct?

  • Adds messages to responses of failed requests. So not necessarily propagating warnings like the issue linked somewhere above.
  • Returns valid model scores from /model/score, if any, even if contribution scoring fails.

I like that idea a lot!


FYI I tried scoring (simple request w/ only "rows" and "fields" that would otherwise score fine, except I misspelled "fields"), and got the following back:

Failed scoring request: class ScoreRequest {
    requestShapleyValueType: null
    includeFieldsInOutput: null
    noFieldNamesInOutput: null
    idField: null
    fields: null
    rows: [class Row {
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    }]
}, due to: null

For a contribution failure, I got something like (ignore the cause 😉):

Failed shapley contribution request: class ContributionRequest {
    requestShapleyValueType: TRANSFORMED
    fields: [AGE, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5, BILL_AMT6, EDUCATION, LIMIT_BAL, MARRIAGE, PAY_0, PAY_2, PAY_3, PAY_4, PAY_5, PAY_6, PAY_AMT1, PAY_AMT2, PAY_AMT3, PAY_AMT4, PAY_AMT5, PAY_AMT6]
    rows: [class Row {
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    }]
}, due to: No columns in output frame

And for a GET score request (against a file), got something more parsable:

Failed loading CSV file: /tmp/test.csv, due to: Could not find the input CSV file: /tmp/test.csv

None of the above were returned as JSON, included some implementation detail, and only the last wouldn't be hard to parse.

On the other hand, when trying to hit a non-existant endpoint, I got back response:

{"timestamp":1641968530410,"status":404,"error":"Not Found","message":"","path":"/does/not/exist"}

Is there a way to shuffle messages into Spring's JSON response?

Comment on lines +110 to +112
} catch (IllegalArgumentException e) {
String message = String.format("Unsupported operation due to: %s", e.getMessage());
throw new ScoringShapleyException(message, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When catching IllegalArgumentException (thrown by computeContribution() because of user error), should the message be changed from Unsupported...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't throw Unsupported... anywhere anymore, so it didn't make sense to keep it.

Comment on lines +23 to +24
return new ResponseEntity<>(scoringException.getMessage(), HttpStatus.BAD_REQUEST);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue with the PR, pointing out something I just now noticed:

A 4xx (client error) is being returned every time even when the process fails because of the backend (e.g., CSV file issue, mojo lib, etc.) (which should be 5xx). This was the case even before this PR, so no regression being introduced, or anything, of course.

Pointing out for if 4xx are fixed to 5xx (where correct) at some point, this class may need slight changes as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true this is a good point. I stuck with the error codes we were already sending.

@nkpng2k
Copy link
Contributor Author

nkpng2k commented Jan 12, 2022

@orendain thanks for taking a look!

wrt to your comments. I didn't really try to make the messaging any better per say... just actually make the messaging go to the user without having to dive into the rest server to get the answer... technically speaking the logs and the rest response will be the same at this point. Which I'd argue is a significant improvement over what we have now.

Also arguably swagger is the one that is defining a json response, and not spring. So while we could do it, I don't think its truly necessary. Given that the only thing you should get back in an error response is a code and a body. I think python requests package handles it like that. 🤷‍♂️ I don't mind sending back a JSON, I just don't really see the point at the moment.

@nkpng2k
Copy link
Contributor Author

nkpng2k commented Jan 12, 2022

@orendain do you think it would make sense to have a generated swagger object ErrorResponse? For me it seems like maybe overengineering?

Copy link
Member

@orendain orendain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really try to make the messaging any better per say... just actually make the messaging go to the user without having to dive into the rest server to get the answer... technically speaking the logs and the rest response will be the same at this point.

Great point - thanks for pointing that out. If that's the new intent, then it's been met and LGTM (I approved).

Friendly heads up, however:

Which I'd argue is a significant improvement over what we have now.

Effectively, we're introducing a new feature: returning the full interpretation of a user's request, along with the raw underlying exception message. Depending on who you ask, returning these values is considered information leakage.

Heads up that the users receiving these values are often not the ones who have deployed the scorer or even have access to the logs (e.g., customers prop up a scorers for use by their customers).

The most common way to avoid this is by setting up a tripwire (try/catch) and masking/hiding the information with a safe user-facing message (which you've done by wrapped exceptions with the message "Failed scoring request ..." or "Failed loading CSV file:..." 👍🏾).

It's the exposure of the raw request and nested exception message that would be an unforeseen issue. They add some heft to the packet size, and TBH I'm not sure how a downstream component (e.g., monitoring apps or processes) could reliably parse this pattern for its own logging or propagation to its users.

Given that the only thing you should get back in an error response is a code and a body. ... I don't mind sending back a JSON, I just don't really see the point at the moment.

Great point - JSON would be nice but not required or even in the spec 👍🏾. I take back my suggestion.

@orendain do you think it would make sense to have a generated swagger object ErrorResponse? For me it seems like maybe overengineering?

Could be 🤷🏾 . IMHO it could make sense if we were returning a known set of fields like in #165 (e.g., client is expecting a warning message at a particular JSON path). But if only always single string, then perhaps overengineering. TBH up to you.

@nkpng2k
Copy link
Contributor Author

nkpng2k commented Jan 15, 2022

Effectively, we're introducing a new feature: returning the full interpretation of a user's request, along with the raw underlying exception message. Depending on who you ask, returning these values is considered information leakage.

Heads up that the users receiving these values are often not the ones who have deployed the scorer or even have access to the logs (e.g., customers prop up a scorers for use by their customers).

The most common way to avoid this is by setting up a tripwire (try/catch) and masking/hiding the information with a safe user-facing message (which you've done by wrapped exceptions with the message "Failed scoring request ..." or "Failed loading CSV file:..." 👍🏾).

It's the exposure of the raw request and nested exception message that would be an unforeseen issue. They add some heft to the packet size, and TBH I'm not sure how a downstream component (e.g., monitoring apps or processes) could reliably parse this pattern for its own logging or propagation to its users.

@orendain this is a good point, although I would hope that our rest scorers responses are not being sent directly to end-users. Typically there's some check:

if response.status_code != 200:
  give some customer error

Additionally, to a degree, this is similar behavior with our python scorers. https://github.com/h2oai/mlops-byom-images/blob/main/python-scorer/src/h2o_scorer/server/exceptions.py ... albeit without the raw request. I would say it doesn't necessarily hurt to get rid of the raw request in the error message.... but we could also get an exception due to: null as the error message given that the java exceptions are not exceptionally verbose here.

@jakubhava
Copy link
Collaborator

Closing as stale

@jakubhava jakubhava closed this Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants