Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: fail to search on QueryNode 216: Timestamp lag too large #36960

Open
1 task done
zhoujiaqi1998 opened this issue Oct 17, 2024 · 4 comments
Open
1 task done

[Bug]: fail to search on QueryNode 216: Timestamp lag too large #36960

zhoujiaqi1998 opened this issue Oct 17, 2024 · 4 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhoujiaqi1998
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024/10/17 07:33:52.777 +00:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=aafa07524fa2fb9567204a960e5b68f7] [error="failed to search: failed to search/query delegator 216 for channel milvus-cluster-rootcoord-dml_2_453104445904537346v0: fail to search on QueryNode 216: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 216 for channel milvus-cluster-rootcoord-dml_2_453104445904537346v0: fail to search on QueryNode 216: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 216 for channel milvus-cluster-rootcoord-dml_2_453104445904537346v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 216\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]
[2024/10/17 07:33:52.777 +00:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=aafa07524fa2fb9567204a960e5b68f7] [role=proxy] [db=default] [collection=benchmark_faq_zlkt_] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=4108] [OutputFields="[id]"] [search_params="[{"key":"topk","value":"100"},{"key":"anns_field","value":"dense_vector"},{"key":"metric_type","value":"IP"},{"key":"params","value":"{\"nprobe\":1}"}]"] [guarantee_timestamp=0] [nq=1] [error="failed to search: failed to search/query delegator 216 for channel milvus-cluster-rootcoord-dml_2_453104445904537346v0: fail to search on QueryNode 216: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 216 for channel milvus-cluster-rootcoord-dml_2_453104445904537346v0: fail to search on QueryNode 216: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 216 for channel milvus-cluster-rootcoord-dml_2_453104445904537346v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 216\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

milvus-log.zip

Anything else?

No response

@zhoujiaqi1998 zhoujiaqi1998 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 17, 2024
@zhoujiaqi1998
Copy link
Author

milvus-log.zip

@yanliang567
Copy link
Contributor

/assign @aoiasd
/unassign

@sre-ci-robot sre-ci-robot assigned aoiasd and unassigned yanliang567 Oct 17, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 17, 2024
@yanliang567 yanliang567 added this to the 2.4.14 milestone Oct 17, 2024
@xiaofan-luan
Copy link
Contributor

@zhoujiaqi1998
it seems that for some reason querynode can not consume from pulsar for a long time.
Try to restart querynode see if it helps

@xiaofan-luan
Copy link
Contributor

it seems that once pulsar falls into a state, the consumer can not recovered. 2.5 upgrade to pulsar client to 0.12 and hopefully that could help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants