Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexer Error #464

Closed
kinfey opened this issue Jul 9, 2024 · 7 comments
Closed

Indexer Error #464

kinfey opened this issue Jul 9, 2024 · 7 comments
Labels
community_support Issue handled by community members

Comments

@kinfey
Copy link

kinfey commented Jul 9, 2024

Describe the bug

When I run Indexer, it always give me this error

image


{"type": "error", "data": "Error executing verb \"cluster_graph\" in create_base_entity_graph: EmptyNetworkError", "stack": "Traceback (most recent call last):\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/datashaper/workflow/workflow.py\", line 410, in _execute_verb\n    result = node.verb.func(**verb_args)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in cluster_graph\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/series.py\", line 4924, in apply\n    ).apply()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1427, in apply\n    return self.apply_standard()\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/apply.py\", line 1507, in apply_standard\n    mapped = obj._map_values(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/base.py\", line 921, in _map_values\n    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/pandas/core/algorithms.py\", line 1743, in map_array\n    return lib.map_infer(values, mapper, convert=convert)\n  File \"lib.pyx\", line 2972, in pandas._libs.lib.map_infer\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 61, in <lambda>\n    results = output_df[column].apply(lambda graph: run_layout(strategy, graph))\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py\", line 167, in run_layout\n    clusters = run_leiden(graph, strategy)\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 26, in run\n    node_id_to_community_map = _compute_leiden_communities(\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graphrag/index/verbs/graph/clustering/strategies/leiden.py\", line 61, in _compute_leiden_communities\n    community_mapping = hierarchical_leiden(\n  File \"<@beartype(graspologic.partition.leiden.hierarchical_leiden) at 0x330776e60>\", line 304, in hierarchical_leiden\n  File \"/Users/lokinfey/conda/envs/pydev/lib/python3.10/site-packages/graspologic/partition/leiden.py\", line 588, in hierarchical_leiden\n    hierarchical_clusters_native = gn.hierarchical_leiden(\nleiden.EmptyNetworkError: EmptyNetworkError\n", "source": "EmptyNetworkError", "details": null}
 

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: phi-3-mini
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 13000
  request_timeout: 2800.0
  api_base: http://localhost:5146/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: jinaai
    request_timeout: 2800.0
    api_base: http://localhost:5146/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional
  


chunks:
  size: 300
  overlap: 100
  group_by_columns: [id] # by default, we don't allow chunks to cross documents
    
input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output/${timestamp}/artifacts"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "output/${timestamp}/reports"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [event]
  max_gleanings: 0

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  # enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 0

community_report:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000
  model_supports_json: false 

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # max_tokens: 12000

global_search:
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 0.1.1
  • Operating System: macOS
  • Python Version: 3.10.12
  • Related Issues:
@kinfey kinfey added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 9, 2024
@sriharshaguthikonda
Copy link

got similar error...let me know if you find a solution...Tqs in advance.!!

@AlonsoGuevara
Copy link
Contributor

Hi

My general rule of thumb when facing this issue is:

  • Check the outputs of the entity extraction, this will show if the graph is empty
  • If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures

Can you please check and share any of your llm responses from the cache folder?

@kinfey
Copy link
Author

kinfey commented Jul 9, 2024

cache.zip @AlonsoGuevara this is my cache folder

@cove9988
Copy link

check your log report at /outputs/latestdate-time/reports. mostly the llm was not working...

@kinfey
Copy link
Author

kinfey commented Jul 11, 2024

but it can gen something in backend

@Nuclear6
Copy link

Can you post the detailed log file?

@natoverse
Copy link
Collaborator

Consolidating alternate model issues here: #657

@natoverse natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024
@natoverse natoverse added community_support Issue handled by community members and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community_support Issue handled by community members
Projects
None yet
Development

No branches or pull requests

6 participants