Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with scheduled queries #7111

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ezraodio1
Copy link
Contributor

@ezraodio1 ezraodio1 commented Aug 6, 2024

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • New Query Runner (Data Source)
  • New Alert Destination
  • Other

Description

There's currently an issue in Redash where some queries fail to refresh on their schedule. This happens when the query has no latest_query_data, which makes retrieved_at = None, which makes it so the query never gets added to outdated_queries. This PR fixes that.

The problem can be reproduced by running the unit test that's part of this PR (without making the fix first, of course)

def test_enqueues_scheduled_query_without_latest_query_data(self):
        """
        Queries with a schedule but no latest_query_data will still be reported by Query.outdated_queries()
        """
        query = self.factory.create_query(
            schedule=self.schedule(interval="60"),
            data_source=self.factory.create_data_source(),
        )

        outdated_queries = models.Query.outdated_queries()
        self.assertEqual(query.latest_query_data, None)
        self.assertEqual(len(outdated_queries), 1)
        self.assertIn(query, outdated_queries)

How is this tested?

  • Unit tests (pytest, jest)
  • E2E Tests (Cypress)
  • Manually
  • N/A

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Copy link
Collaborator

@eradman eradman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this manually, seems to check out

@justinclift
Copy link
Member

@arikfr Any idea if this will interact with the scheduled queries hash issue you were looking at yesterday?

@@ -389,6 +389,8 @@ def groups(self):
def should_schedule_next(previous_iteration, now, interval, time=None, day_of_week=None, failures=0):
# if time exists then interval > 23 hours (82800s)
# if day_of_week exists then interval > 6 days (518400s)
if previous_iteration is None:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return True here instead of the or not retrieved_at in the calling function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember there being a specific reason. I think both ways are logically equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's true that both will have the same result, it does make the code more confusing: should_schedule_next returns False, but we override it.. it will be more readable to have it return True which is what we want in this case anyway.

Comment on lines +613 to +614
if all(value is None for value in query.schedule.values()):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a little while since I looked at this, but I think this is the answer:

The only way to check whether a query doesn't have a refresh schedule is to ensure that everything in query.schedule.values() is None. For example, interval, until, time are all keys in the query.schedule. It's possible to have a query refresh schedule when just one of these keys is not None.

If a query doesn't have a refresh schedule, then we can just continue to the next query in the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a query does not have a schedule then the schedule value is null (or empty). It's possible there is some edge case scenario that might keep schedule object with empty values, but I would assume it's not common?

To keep the code maintainable I would rather avoid this change in this PR and if you still feel it's necessary let's bring it back in a separate one and have the discussion there (and potentially add tests for this case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth double checking. I could be mistaken, but I remember this edge case popping up during my testing. Perhaps if a query had a schedule and then that schedule was later removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we'll need to create a follow up PR to this one if/when we merge this. 😄

Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last review. Please let us know if you have the bandwidth to apply the changes. Thanks!

Comment on lines +613 to +614
if all(value is None for value in query.schedule.values()):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a query does not have a schedule then the schedule value is null (or empty). It's possible there is some edge case scenario that might keep schedule object with empty values, but I would assume it's not common?

To keep the code maintainable I would rather avoid this change in this PR and if you still feel it's necessary let's bring it back in a separate one and have the discussion there (and potentially add tests for this case).

@@ -389,6 +389,8 @@ def groups(self):
def should_schedule_next(previous_iteration, now, interval, time=None, day_of_week=None, failures=0):
# if time exists then interval > 23 hours (82800s)
# if day_of_week exists then interval > 6 days (518400s)
if previous_iteration is None:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's true that both will have the same result, it does make the code more confusing: should_schedule_next returns False, but we override it.. it will be more readable to have it return True which is what we want in this case anyway.

@ezraodio1
Copy link
Contributor Author

I lost access to the VM that I was using over the summer for Redash dev work unfortunately, so I think it's easiest if someone else applies the changes.

@justinclift
Copy link
Member

justinclift commented Oct 9, 2024

@ezraodio1 If you need a new VM for doing Redash dev work on, then I'm happy to spin one up for you. Would that be useful? 😄

@ezraodio1
Copy link
Contributor Author

I'm taking a tough course load right, so I'm pretty busy. I'd definitely be interested after this semester ends, though. I'll try to spin up a new VM for myself.

@justinclift
Copy link
Member

No worries. Keep it in mind as an option if/when it'd be useful.

Good luck with your course load, hopefully it all goes really well. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants