Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with scheduled queries #7111

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 15 additions & 7 deletions redash/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,8 @@ def groups(self):
def should_schedule_next(previous_iteration, now, interval, time=None, day_of_week=None, failures=0):
# if time exists then interval > 23 hours (82800s)
# if day_of_week exists then interval > 6 days (518400s)
if previous_iteration is None:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return True here instead of the or not retrieved_at in the calling function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember there being a specific reason. I think both ways are logically equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's true that both will have the same result, it does make the code more confusing: should_schedule_next returns False, but we override it.. it will be more readable to have it return True which is what we want in this case anyway.

if time is None:
ttl = int(interval)
next_iteration = previous_iteration + datetime.timedelta(seconds=ttl)
Expand Down Expand Up @@ -608,17 +610,23 @@ def outdated_queries(cls):
if schedule_until <= now:
continue

if all(value is None for value in query.schedule.values()):
continue
Comment on lines +613 to +614
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a little while since I looked at this, but I think this is the answer:

The only way to check whether a query doesn't have a refresh schedule is to ensure that everything in query.schedule.values() is None. For example, interval, until, time are all keys in the query.schedule. It's possible to have a query refresh schedule when just one of these keys is not None.

If a query doesn't have a refresh schedule, then we can just continue to the next query in the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a query does not have a schedule then the schedule value is null (or empty). It's possible there is some edge case scenario that might keep schedule object with empty values, but I would assume it's not common?

To keep the code maintainable I would rather avoid this change in this PR and if you still feel it's necessary let's bring it back in a separate one and have the discussion there (and potentially add tests for this case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth double checking. I could be mistaken, but I remember this edge case popping up during my testing. Perhaps if a query had a schedule and then that schedule was later removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we'll need to create a follow up PR to this one if/when we merge this. 😄


retrieved_at = scheduled_queries_executions.get(query.id) or (
query.latest_query_data and query.latest_query_data.retrieved_at
)

if should_schedule_next(
retrieved_at or now,
now,
query.schedule["interval"],
query.schedule["time"],
query.schedule["day_of_week"],
query.schedule_failures,
if (
should_schedule_next(
retrieved_at,
now,
query.schedule["interval"],
query.schedule["time"],
query.schedule["day_of_week"],
query.schedule_failures,
)
or not retrieved_at
):
key = "{}:{}".format(query.query_hash, query.data_source_id)
outdated_queries[key] = query
Expand Down
14 changes: 14 additions & 0 deletions tests/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,20 @@ def test_enqueues_query_only_once(self):

self.assertEqual(list(models.Query.outdated_queries()), [query2])

def test_enqueues_scheduled_query_without_latest_query_data(self):
"""
Queries with a schedule but no latest_query_data will still be reported by Query.outdated_queries()
"""
query = self.factory.create_query(
schedule=self.schedule(interval="60"),
data_source=self.factory.create_data_source(),
)

outdated_queries = models.Query.outdated_queries()
self.assertEqual(query.latest_query_data, None)
self.assertEqual(len(outdated_queries), 1)
self.assertIn(query, outdated_queries)

def test_enqueues_query_with_correct_data_source(self):
"""
Queries from different data sources will be reported by
Expand Down
Loading