Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CE-136 JSON export download logging #3178

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from
Draft

CE-136 JSON export download logging #3178

wants to merge 3 commits into from

Conversation

eapearson
Copy link
Contributor

@eapearson eapearson commented Jan 26, 2023

Description of PR purpose/changes

This is a set of speculative changes to support logging of JSON downloads from the Narrative front end. This is primarily for discussion and exposition of one way to implement this in a manner quite similar to how front end logging has been implemented in the past.

The overall effort is to be able to log object JSON downloads, as triggered by the Narrative front end via the data panel's export functionality. These log entries should be available in a timely manner to KBase staff. In our specific case they are needed as part of Narrative object usage statistics.

The logging works by sending Python code to the back end for creating and emitting a log entry in JSON format via the standard Python logging API.

The log entries are currently appended to /tmp/kbase-narrative-ui.json. This is the same location (/tmp) but a different file name and format than the current logging mechanism uses.

I've separated most of the code so that current logging works unimpeded, both in the JS front end and Python back end code.

After getting the initial functionality working without too much trouble, I invested a couple hours trying to get more context into the log. One such piece of information is the narrative object reference. I tried but failed to get the current narrative version (e.g. after it obtains a new version when saved), but left that code in place anyway.

The log format is not something I put a tremendous amount of thought or any research into. At a minimum I want to satisfy our immediate needs - the time, username, and downloaded object ref. Being straightforward to parse is another important feature, so JSON. There is additional structure to make the log entries a bit more robust in the presence of other types of log entries, but clearly it is just a starting point.

Ultimately, we may want to use a different logging handler to send the log entries to a network endpoint for ingestion into a service and/or database.

Jira Ticket / Issue

Related Jira ticket: https://kbase-jira.atlassian.net/browse/DATAUP-X

  • Added the Jira Ticket to the title of the PR (e.g. DATAUP-69 Adds a PR template)

Testing Instructions

  • Details for how to test the PR:
  • Tests pass locally and in GitHub Actions
  • Changes available by spinning up a local narrative and navigating to X to see Y

Dev Checklist:

  • My code follows the guidelines at https://sites.google.com/lbl.gov/trussresources/home?authuser=0
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • (JavaScript) I have run Prettier and ESLint on changed code manually or with a git precommit hook
  • (Python) I have run Black and Flake8 on changed Python code manually or with a git precommit hook
  • Any dependent changes have been merged and published in downstream modules

Updating Version and Release Notes (if applicable)

@sonarcloud
Copy link

sonarcloud bot commented Jan 26, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@briehl briehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. I think the overall flow of event -> kernel message -> logging -> log destination will work fine.

Would could also create a dedicated kernel channel for logging, like we do for job messaging, but I don't really think that's necessary for a starting point.

Also noting that I'd like to see the rather ancient KBError / KBFail messages make use of something like this, but that's out of scope for the prototype.

I guess my only concerns are:

  1. where do the logs go?
  2. what format should logs be?
  3. do we keep a parallel log on the file system / make that an option for running in dev mode?
  4. can we unify how all the logs are sent and processed so we can combine some older stuff and remove some cruft?

I know that expands the scope of this, so we can probably tackle all of that piecemeal. Once we can answer 1 and 2 with some devops input, we can satisfy some metric needs and build up the rest.

* of which was that returned by the last statement
* in the Python code.
*/
async function executePython(code) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable. I think there's some other error catching / wait-until-kernel-is-alive code elsewhere that we can either repurpose or just make use of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll hunt for that.

# contains multiple objects in sequence.
ui_log = get_logger("narrative_ui_json")
log_id = str(uuid.uuid4())
if not _has_handler_type(ui_log, logging.FileHandler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think before going too far down this rabbit hole, we should talk to devops about what's the most reasonable logging format for their use. Though some local file logging would be good, too.

Also, file handling / formatting should be a more global thing for all logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just wanted to make sure that this prototyping work actually produced a JSON log in a minimal fashion with the simplest implementation -- a file in the same place we are currently logging (/tmp). And I agree it should be global. I figured, get it working in this case, then later switch over existing logging entry points to use it too.

I'm sure on the table still is logging to a file in a mounted directory and having an external process grab those logs. I don't think that is a good solution for Narrative logging, but it should be considered, as it is a traditional logging technique.

Comment on lines +303 to +304
# If logs are combined, we need to tie log entries to
# a specific version of a service in a specific environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@eapearson
Copy link
Contributor Author

I guess my only concerns are:

  1. where do the logs go?

Exactly -- I think that informs a lot of the rest. E.g. if we use a 3rd party logging service, the where is pretty much solved (unless we decide to hide it behind a service!), the format will be at least constrained, the metadata will be taken care of.

  1. what format should logs be?
  2. do we keep a parallel log on the file system / make that an option for running in dev mode?

I don't think in general we want to duplicate the logs. I mean, there is an argument for logging to file + service if the files are actually preserved, to serve as a backup in case the service is down and we could then backfill the logging service. But that also might be a "nice to have".

For development we could use a container to receive logs - if we use a 3rd party logging service, it should be easy to stand up such a service. Complicates development slightly but simplifies the Narrative code + configuration. Logging services should have nice tools for accessing the logs, filtering them, etc. Then again, Python logging allows one to retarget logs easily, though we'd need some sort of configuration to be able to switch it.

  1. can we unify how all the logs are sent and processed so we can combine some older stuff and remove some cruft?

I know that expands the scope of this, so we can probably tackle all of that piecemeal. Once we can answer 1 and 2 with some devops input, we can satisfy some metric needs and build up the rest.

It surely is sensible to have all logging be consistent, but we may want to limit the initial implementation (map) to these logs as it only affects a single activity within the Narrative -- logging from the JSON download, which is not a common user action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants